Backgammon Winning Odds Guide
Backgammon Winning Odds Guide
Backgammon Races
1 Introduction
1.1 Motivation
Backgammon differs from other common games (checkers, chess, etc.) because
there is a formal way to increase the stakes as the game progresses. It is
important to know one’s current chances of winning in order to decide whether
or not to increase the stakes. This paper focuses on ways to estimate the
current chances of winning using only mental calculations.
The doubling cube, which was invented in the 1920s, is how the stakes may
be increased during the game. The cube starts in the middle of the board with
a value of 1. When one player (call him A) decides that his chances of winning
are just right, he gives the cube to his opponent (B) before A’s turn to roll.
The opponent can either accept or reject the cube. If B rejects, the game is
over and he must pay up. If he accepts, the two players play for twice as much
money as before. Then, B possesses the cube, and A cannot use it until A is
doubled and accepts.
Incorrectly estimating the chances of winning can result in resigning too
soon, wagering too much on a weak position, or not increasing the wager on
a strong position. So, it is important to be able to accurately estimate the
chances of winning, often to within 1 percentage point. We will abbreviate
the chances of winning as WP, or the Winning Percentage or Probability.
268 Andrew M. Ross, Arthur Benjamin, and Michael Munson
There are approximately 300 trillion possible racing games, when both players’
positions are considered. Our approach to estimating WPs is to develop a
way to “measure” a position to get an idea of how good it is. From the
measurements of the two players’ positions, we estimate the WP by computing
a function or looking something up in a (memorized) table. When we compute
WP values, we use only the behavior of the dice and checkers; we do not
include the possibility of winning or losing when a double is offered or declined.
If we name the player on roll A, and the opponent B, we will have the system
for estimating a WP shown in Figure 1 Once we decide how to measure a
We will compute the WPs from the perspective of player A, the player on roll.
These calculations were first published by Zadeh and Kobliska (1977), though
they had also been presented by Thorp (1975). Let WP(X, Y ) be the WP
Estimating Winning Probabilities in Backgammon Races 271
when the player on roll, A, has X pips to go, and B has Y pips to go. Denote
the probability of rolling n pips as p(n). Immediately after the roll, the two
players swap situations, but player A has moved some number of pips. So, we
sum over all possible rolls:
24
"
WP(X, Y ) = 1 − p(n) WP(Y, X − n). (1)
n=3
0.9
Probability of player A winning
0.8
0.7
0.6
0.5
0.4
X=70 X=80 X=90 X=100 X=110 X=120 X=130
0.3
60 80 100 120 140 160
Y, the RPC for player B
mentioned above. The 50% point is not at the point where Y = X; instead,
it is near Y = X − 4. It is not exactly there, since the value of being ahead is
not exactly 4. Also, we see that a constant difference has less of an effect as
both RPCs grow. For example, the points where Y = X + 10 tend downward.
272 Andrew M. Ross, Arthur Benjamin, and Michael Munson
At this point, we must stop and recall that we are approximating a discrete
distribution with a continuous one. When we consider P(nY − nX ≥ 0), we
will approximate a finite sum with an infinite integral. So, we must perform a
continuity correction. This is usually done by integrating to a point halfway
between two of the integer points. In this case, we want to include 0 in the
integral, and we are integrating toward positive infinity. Therefore, we will
start at −0.5.
Given this standard normal distribution, we may calculate the probability
of the player on roll winning:
& # $% '
1 Y −X µ3
P(nY − nX ≥ −0.5) = P Z ≥ − −
2 µ σ 2 (Y + X)
# √ $
Y − X + µ/2 µ
=Φ √ , (6)
Y +X σ
where Φ() is the Cumulative Distribution Function (CDF) of the standard
normal. Now we define D ≡ Y − (X − µ/2) and S ≡ Y + X. Note that
D has an extra term in it, while S does not. This extra term results from
the continuity correction, but it also can be interpreted as adjusting for the
µ/2 advantage that the player on roll has. Simplifying the above with this
expression, we get # √ $
D µ
Φ √ . (7)
S σ
By this, we see that if A is behind in the race (X − µ/2 > Y ), D will be
negative and A’s probability of winning will be low, as expected.
How does the formula in (7) become D2 /S? The way most people would
compute that formula is by computing its argument, and then looking up the
result in a memorized table. But, extracting square roots is not an easy mental
calculation. So, we square the whole argument to get
D2 µ
(8)
S σ2
and adjust the table accordingly. From there, divide the constant out and
again adjust the table. These adjustments are fairly transparent to those who
want to use the D2 /S method, since it does not really matter what numbers
must be memorized.
Kleinman makes an adjustment to the constant µ/σ 2 based on an obser-
vation about dice rolls. We will investigate this below.
274 Andrew M. Ross, Arthur Benjamin, and Michael Munson
and the offset for the variance is fairly complicated (it is given by Tijms 1994).
Substituting our values for µ and σ, we get an offset of −0.3615 (rounded).
This then needs two adjustments: Ordinary renewal theory does not count
a renewal at time zero, while we count it as the first roll, so we add 1 to
compensate, getting 0.6384. Also, ordinary renewal theory is in continuous
time, while our system is discrete, so we subtract 1/2 pip as a continuity
correction. This means subtracting (1/2)/µ = 0.061224 rolls, for a final offset
of 198/343 ≈ 0.5772, which we will define as δµ . This value has been verified
by formulating a discrete-time Markov chain that models the current RPC
and how it decreases with the dice rolls. The Markov chain is detailed below.
To find the variance offset, it was simpler to use the Markov chain’s results
than to apply continuity corrections to the usual renewal process variance
offset. We ultimately found a variance offset of δσ2 ≡ −0.0277945.
Now we know what the asymptotic values are for the mean and variance
of nX :
1
E [nX ] = X + δµ , (11)
µ
σ2
Var [nX ] = X + δσ 2 . (12)
µ3
Using the same logic as before, we end up with
& '
D
Φ ! . (13)
σ 2 S/µ + 2µ2 δσ2
Notice that since each player has a δµ term, they cancel each other. If we
apply this adjustment, we see (Figure 3) that the error in the approximation
is reduced by a factor of 2 or so.
Estimating Winning Probabilities in Backgammon Races 275
0.0015
using Phi((Y-(X-mu/2)/sqrt(2.262 S + 0.000000))
using Phi((Y-(X-mu/2)/sqrt(2.262 S - 3.708211))
0.001
-0.0005
-0.001
-0.0015
-0.002
60 80 100 120 140 160
Y RPC
2
Fig. 3. Adjusting the D /S method.
136
140
144
148 149 150 151 152 153 154 155 156 157 158 159 160
good as landing on it directly (they both end the game). Figure 5 shows how
some of the arrows are re-routed. Once we hit zero, we move directly to a
“done” state, where we stay. This is known as an absorbing state, and it helps
keep track of exactly when we finished.
276 Andrew M. Ross, Arthur Benjamin, and Michael Munson
0 1 2 3 4 5 6 7 8 9
DONE
Our matrix M has some nice properties. One is that M (i, i) = 0 (except
for the Done state), since we can’t stay in place when we roll the dice. Also,
M (i, j) = 0 for i < j, since we can’t move backwards. So, we have a lower-
triangular matrix (with the exception of the Done state). For any row i ≥ 24,
we have M (i, j) = p(i − j), as shown in Figure 4. For the rows i < 24, we have
24
"
M (i, 0) = p(j). (14)
j=i
From the above description, we compile our matrix M . Its first row and
column correspond to the “done” state. Its second row and column correspond
to the 0 RPC state; its third row and column correspond to an RPC of 1, etc.
With the one exception of the upper left-hand corner, this matrix is strictly
lower triangular. This matrix can be expressed schematically as in Figure 6.
The trapezoid represents the RPCs that cannot bear off in one turn. The
triangle represents those RPCs near zero where special summing must be
done. The rectangular block shows entries that take care of moving to the
“done” state.
0
0
Fig. 6. A schematic view of the Markov matrix.
Estimating Winning Probabilities in Backgammon Races 277
Once the matrix is formed, we can compute the mean and variance of
the time to finish in two ways. One way is to compute the full distribution
of the time to finish by raising the matrix to higher and higher powers: the
probability of moving from state i to state j in exactly k turns is the (i, j)
component of M k . So, to figure out the distribution of the number of turns
to bear off from X pips, we use
P(nX = k) = M k (X, 0). (15)
The mean and variance are also available without computing the full distri-
bution, using results from the theory of first passage times. Now that we have
a more full understanding of how to adjust the D2 /S method using mean and
variance offsets from the SCM, we turn to more-realistic simulations of racing
games.
6-point, but such a position would not happen very often in practice. We want
positions that satisfy the following criteria:
• Realistic home board positions,
• Realistic outer board positions,
• Realistic numbers of crossovers (moving from one board to the next),
• Won’t interfere with each other, and
• Two positions with close RPCs are only different by a few checkers.
By “interfere,” we mean that if we try to match two of these positions against
each other, they won’t call for both players to occupy any one point. That
way, we can use only one position for each RPC, and then generate games for
each RPC pair by just using those two positions. The last constraint means
that, ideally, we will change from 90 to 91 raw pips by moving one checker by
one pip, instead of shuffling around multiple checkers. This will not always be
possible, since we will have to jump over opponent’s pieces on our 11 and 12
points (the noninterference constraint).
Positions matching these constraints were generated. Figure 7 shows the 70
position matched against the 90 position, and Figure 8 shows the 130 position
matched against the 160 position. To test the sensitivity of the rollouts to these
initial positions, other sets of positions were generated. One set had an extra
crossover in each position, another set had two extra crossovers, and a third
had a gap (no checkers) on the 5 point.
When we propose a strategy for the computer to use, the question of
optimality immediately arises. How do we know that this is the best strategy?
We don’t. However, the strategy proposed below is good enough that real
players use it, and we are interested in predicting results for games played by
people. Also, both computerized players will use exactly the same strategy, so
the results will not be biased. In the future, we might change the strategies
slightly and have a small set of them to play against each other.
Bearing in
Our strategy for bearing in is based on the principle of doing crossovers (mov-
ing a checker from one table to another) as soon as possible. It consists of two
components, called H and I. Component H considers the larger die first. It
looks for a checker which crosses over a table boundary and is the farthest one
from home that does so. If it finds such a checker, it moves it, then tries to
do the same with the smaller die. Of course, if doubles were rolled, the sizes
of the rolls are irrelevant, and H goes through its scanning four times.
If H left more than one die to use, the remaining dice are summed, and
H is run again. When all ways of calling H have failed (no more checkers can
cross over with the dice remaining), I is called. It looks for the farthest checker
back that lands on a point not occupied by any other pieces, whether those
pieces belong to the moving player or the opponent. This will avoid stacking
Estimating Winning Probabilities in Backgammon Races 279
13 14 15 16 17 18 19 20 21 22 23 24
12 11 10 9 8 7 6 5 4 3 2 1
Fig. 7. 70 (black) versus 90 (white).
13 14 15 16 17 18 19 20 21 22 23 24
12 11 10 9 8 7 6 5 4 3 2 1
Fig. 8. 130 (black) versus 160 (white).
280 Andrew M. Ross, Arthur Benjamin, and Michael Munson
a player’s checkers before bearoff. Failing that, I looks for the farthest checker
back that doesn’t land on an opponent.
Once all pieces are in the home board, we switch strategies entirely.
Bearing off
of 1-2 is the same as 2-1, and likewise for any nondoubles roll, so we compute
those moves only once, and then weight them twice as much as the doubles
rolls.
How many possible single-player bearoff positions are there? There are at
most 15 checkers to distribute among 6 points; if Ki denotes the number of
checkers on the i-th point, then it must be the case that K1 + K2 + · · · + K6 ≤
15. If we add a “dummy” point, point 0, that counts the number of checkers
off the board, we have K0 + K1 + · · · + K6 = 15. The number of ways to do
this is (from Feller 1968)
# $
7 + 15 − 1 (7 + 15 − 1)!
= = 54,264. (16)
15 15!(7 − 1)!
To store the EPC for each bearoff position, we need an array of size 54,264
and a method to translate from a backgammon position into an array position
(an integer) and back. An algorithm or function that reduces a relatively
complex situation (e.g., a backgammon position) into an integer is called a
hash. If it is an injective function (so that no two positions map to the same
integer), it is called a perfect hash (Cormen, Leiserson, Rivest, and Stein
2001, Section 11.5). As an example, taking the RPC of a position is a hash
but it is not perfect, because many different positions can map to the same
RPC. A perfect hash was developed by Benjamin and Ross (1997) to map
a single player’s bearoff position (number of checkers on each point) into an
integer between 0 and 54,263, along with a reverse-hash to convert integers
into racing positions. This hash is also minimal, in that none of the integers
in the range are “wasted”—each integer maps back into a meaningful bearoff
position. This hash is also extendable to more board points than just the 6
points of the home board.
This enumeration makes calculating EPCs fairly easy. We create an array
of 54,264 real numbers, and initialize the locations corresponding to the six
positions for which we know the EPCs. Then, we march up the integers, gen-
erating rolls and moves for each position and storing the EPCs as we calculate
them. Conveniently, each legal move from the current position produces a po-
sition with a smaller integer representation, so we’ve already computed the
EPC for it. Therefore, we don’t have to call the EPC-calculating program
282 Andrew M. Ross, Arthur Benjamin, and Michael Munson
recursively, since we are being careful about the order in which we calculate
positions.
At this point, we have EPCs for each of the 54,264 bearoff positions. This
makes brute-force analysis of certain moves feasible. For example, consider
the hypothesis “The best possible move is one that bears off two or four
checkers (depending on doubles) if it possible to do so.” This could be called
a greedy algorithm, since it concentrates on getting checkers off the board
at the expense of having a smooth home board. The question is, is such
smoothing overrated? It turns out that the greedy approach is not always the
best, but it is not a bad alternative. There are 1668 pairs of position and dice
roll that result in the greedy choice being sub-optimal. However, the worst one
differs by only 1.35 expected pips, which is about one-sixth of an expected
turn. The position with this difference is position 41258, which is, in vector
form, (0, 5, 1, 8, 1, 0, 0), with a roll of 2-4. This position is obviously in need
of some smoothing. The greedy move would leave five checkers on the 1-point
and eight on the 3-point. The optimal move is 4-2, 3-off. This helps fills a gap
on the 2-point. However, the greedy and optimal strategies do not produce
results that are drastically different, so it might be easier to just go with the
“2-or-4” strategy over the board.
Another interesting hypothesis is that it is always best (in terms of EPC)
to use an ace to bear off from the 1-point if it is possible. An exhaustive search
shows this is true.
Now that we have a strategy for moving checkers, we can compare the SCM to
real backgammon. We will start by using the one-player versions of each, to see
if the number of turns is still approximately normal. This one player will move
simulated backgammon pieces (as opposed to the SCM), but the opponent’s
checkers will not be on the 12 and 11 points. So, the main difference between
the SCM and the single-player rollout is the bearoff rules.
In Figure 10, we plot the mean number of turns from the rollout data and
from the unadjusted CLTRP. On the left edge of the graph, the difference is
roughly 1.1 raw pips, and on the right edge it is 1.0 raw pips. However, our
calculations above show that the differences in the means end up canceling,
so we will not investigate it further.
Next, we look at the variance of the number of turns to finish. Above, we
mentioned that Kleinman has proposed a modification of the variation term
in some of the formulas. His suggestion is that, in real games, the standard
deviation of the dice rolls is closer to 4 than 4.3. Figure 11 shows that, indeed,
the rollout variance systematically differs from what the CLTRP predicts for
the SCM. It is closer to what Kleinman said. However, it is lower for the low
RPCs, and higher for large RPCs. A best-fit line is also plotted, showing that
the slope of the observed line is closer to the CLTRP value than to Kleinman’s
line, but the line is translated by roughly 13 pips to the right.
Estimating Winning Probabilities in Backgammon Races 283
22
20
16
14
12
10 Rollout data
CLTRP
8
6
60 80 100 120 140 160
RPC
Fig. 10. The observed mean in a single-player rollout exceeds the CLTRP value.
5.5
Variance of number of rolls to finish
Having looked at how well the SCM matches ordinary single-player rollouts
in terms of mean and variance of number of rolls, we next turn to our overall
goal of calculating WPs in two-player games. For each position, we aim for
a standard error in our WP value of 0.001, or 0.1%. This requires at least
250,000 games per initial position pair when WP is near 1/2.
Rather than running all of our created positions against each other, we
focus on stepping X in increments of 10 from 60 to 130, and stepping Y from
X − 10 to X + 30 in increments of 1. Figure 12 shows how our rollout data
compare to the SCM predictions. We see that the SCM underpredicts the
284 Andrew M. Ross, Arthur Benjamin, and Michael Munson
WP values when they are above roughly 1/2, and overpredicts when the WP
values are lower. Below, we will adjust the D2 /S model for this fact.
0.9
0.8
Prob. of A winning
0.7
0.6
0.5
0.3
60 80 100 120 140 160
RPC
0.9
0.8
0.6
0.5
0.4 No gaps
Both players have gaps
0.3
60 80 100 120 140 160
0.9
0.8
Probability of player A winning
0.7
0.6
0.5
0.3
60 80 100 120 140 160
&# $ '
1 1
Φ + u(Y − X) ! . (19)
2 v(Y + X) + 2δσ2
Now we square the argument of Φ above, to avoid taking the square root:
( )2
Y − (X − 2u
1
)
. (22)
v(Y + X)/u2 + 2δσ2 /u2
Now we will use $ to make the top a little bit more accurate. Usually, we
would round 1/(2u) to just plain 4, but if we expand it beforehand, we can
get a slightly better estimate:
# # $$2
1
Y − X− = (Y − (X − 4) + $)2
2u
= (Y − (X − 4))2 + 2$(Y − (X − 4)) + $2 . (24)
RPC
Player A’s position −→ X
)
∆ ≡ Y − (X − 4)
(25)
S ≡Y +X
*
RPC
Player B’s position −→ Y
∆2 + ∆/7
→ Table 1 → Π. (26)
S − 25
Now if the player on roll is behind and Π is above 50%, subtract Π from
100% to get the proper probability of winning. Our table gives much more
precision than would be used mentally, in case it is useful in future studies.
If the numerator is rounded to an integer, and −25 is used in the de-
nominator instead of −24.7, an error of up to 2.5% can occur. However, this
maximum error occurs right near the 50% point, where large errors are not
normally important. If the player needs to figure out who is ahead when the
race is close, the sign of D should be enough.
As an example, suppose X = 70 and Y = 80. We compute ∆ = 80 −
(70 − 4) = 14 and S = 70 + 80 = 150. Then, we compute the numerator,
∆2 + ∆/7 = 142 + 14/7 = 196 + 2 = 198 and the denominator, S − 25 = 125.
Estimating Winning Probabilities in Backgammon Races 287
Dividing, we get 198/125 = 1.584; looking this up in the table, we find we are
between 0.80 and 0.81, but closer to 0.80.
A less tidy example starts with X = 70 and Y = 90, so ∆ = 24 and
S = 160. The numerator is then 242 + 24/7 = 576 + 3.43 = 579.43, and the
denominator is S − 25 = 135. Dividing, we get 4.292, and looking this up in
the table gives a value between 0.91 and 0.92, but closer to 0.92. If we go back
and round the numerator to 579, our resulting division gives 4.288̄; even if
we rounded up to 580, our division gives 4.296. In either case, our ultimate
decision is not affected.
Our last example has the player on roll behind: X = 90 and Y = 70. We
get ∆ = −16 and S = 160. The numerator is 256 + (−16)/7 = 253.714, and
the denominator is 135 as before. Our ratio is 1.879, which the table says is
a WP between 0.82 and 0.83. However, we know that the player on roll is
behind, so these WP values apply to the other player instead; the player on
roll has a WP value between 1 − 0.82 and 1 − 0.83.
288 Andrew M. Ross, Arthur Benjamin, and Michael Munson
References
1. Benjamin, Arthur and Ross, Andrew M. (1997) Enumerating backgammon po-
sitions: the perfect hash. Interface: Undergraduate Research at Harvey Mudd
College 16 (1) 3–10.
2. Buro, Michael (1999) Efficient approximation of backgammon race equities. In-
ternational Computer Chess Association Journal 22 (3) 133–142.
3. Cormen, Thomas H., Leiserson, Charles E., Rivest, Ronald L., and Stein, Clif-
ford (2001) Introduction to Algorithms, 2nd Ed. MIT Press and McGraw-Hill.
4. Feller, William (1968) An Introduction to Probability Theory and Its Applica-
tions, Vol. 1. Wiley, New York.
5. Keeler, Emmett B. and Spencer, Joel (1975) Optimal doubling in backgammon.
Operations Research 23 (6) 1063–1071.
6. Kleinman, Danny (1980) Vision Laughs at Counting. Los Angeles.
7. Orth, P. J. (1976) A comment on “Optimal Doubling in Backgammon.” Oper-
ations Research 24 (6) 1179.
8. Tesauro, Gerald (2002) Programming backgammon using self-teaching neural
nets. Artificial Intelligence 134 (1–2) 181–199.
9. Thorp, Edward O. (1975) Backgammon: part I, the optimal strategy for the pure
running game. Presentation at the Second Annual Conference on Gambling,
South Lake Tahoe, NV.
Estimating Winning Probabilities in Backgammon Races 289
10. Thorp, Edward O. (2007) Backgammon: the optimal strategy for the pure run-
ning game. In: Ethier, Stewart N. and Eadington, William R. (eds.) Optimal
Play: Mathematical Studies of Games and Gambling. Institute for the Study of
Gambling and Commercial Gaming, University of Nevada, Reno, 237–265.
11. Tijms, Henk C. (1994) Stochastic Models: An Algorithmic Approach. Wiley, New
York.
12. Zadeh, Norman (1977) On doubling in tournament backgammon. Management
Science 23 (9) 986–993.
13. Zadeh, Norman and Kobliska, Gary (1977) On optimal doubling in backgam-
mon. Management Science 23 (8) 853–858.