0% found this document useful (0 votes)
44 views3 pages

TD Gammon

TD-Gammon is a backgammon program developed by Gerald Tesauro in 1992, utilizing a neural network trained through temporal-difference learning. It achieved a high level of play, nearly matching top human players, and introduced innovative strategies that influenced expert play. Despite its strengths in positional judgment, TD-Gammon struggled with endgame analysis due to its limited lookahead capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
44 views3 pages

TD Gammon

TD-Gammon is a backgammon program developed by Gerald Tesauro in 1992, utilizing a neural network trained through temporal-difference learning. It achieved a high level of play, nearly matching top human players, and introduced innovative strategies that influenced expert play. Despite its strengths in positional judgment, TD-Gammon struggled with endgame analysis due to its limited lookahead capabilities.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

TD-Gammon

TD-Gammon is a computer backgammon program developed in 1992 by Gerald Tesauro at IBM's


Thomas J. Watson Research Center. Its name comes from the fact that it is an artificial neural net trained
by a form of temporal-difference learning, specifically TD-Lambda.

The final version of TD-Gammon (2.1) was trained with 1.5 million games of self-play, and achieved a
level of play just slightly below that of the top human backgammon players of the time. It explored
strategies that humans had not pursued and led to advances in the theory of correct backgammon play.

In 1998, during a 100-game series, it was defeated by the world champion by a mere margin of 8 points.
Its unconventional assessment of some opening strategies had been accepted and adopted by expert
players.[1]

Algorithm for play and learning


During play, TD-Gammon examines on each turn all possible legal moves and all their possible responses
(two-ply lookahead), feeds each resulting board position into its evaluation function, and chooses the
move that leads to the board position that got the highest score. In this respect, TD-Gammon is no
different than almost any other computer board-game program. TD-Gammon's innovation was in how it
learned its evaluation function.

TD-Gammon's learning algorithm consists of updating the weights in its neural net after each turn to
reduce the difference between its evaluation of previous turns' board positions and its evaluation of the
present turn's board position—hence "temporal-difference learning". The score of any board position is a
set of four numbers reflecting the program's estimate of the likelihood of each possible game result:
White wins normally, Black wins normally, White wins a gammon, Black wins a gammon. For the final
board position of the game, the algorithm compares with the actual result of the game rather than its own
evaluation of the board position.[2]

After each turn, the learning algorithm updates each weight in the neural net according to the following
rule:

where:

is the amount to change the weight from its value on the previous turn.
is the difference between the current and previous turn's board evaluations.
is a "learning rate" parameter.
is a parameter that affects how much the present difference in board
evaluations should feed back to previous estimates. makes the program
correct only the previous turn's estimate; makes the program attempt to
correct the estimates on all previous turns; and values of between 0 and 1
specify different rates at which the importance of older estimates should
"decay" with time.
is the gradient of neural-network output with respect to weights: that is, how
much changing the weight affects the output.[2]

Experiments and stages of training


Unlike previous neural-net backgammon programs such as Neurogammon (also written by Tesauro),
where an expert trained the program by supplying the "correct" evaluation of each position, TD-Gammon
was at first programmed "knowledge-free".[2] In early experimentation, using only a raw board encoding
with no human-designed features, TD-Gammon reached a level of play comparable to Neurogammon:
that of an intermediate-level human backgammon player.

Even though TD-Gammon discovered insightful features on its own, Tesauro wondered if its play could
be improved by using hand-designed features like Neurogammon's. Indeed, the self-training TD-
Gammon with expert-designed features soon surpassed all previous computer backgammon programs. It
stopped improving after about 1,500,000 games (self-play) using a three-layered neural network, with
198 input units encoding expert-designed features, 80 hidden units, and one output unit representing
predicted probability of winning.[3]

Advances in backgammon theory


TD-Gammon's exclusive training through self-play (rather than tutelage) enabled it to explore strategies
that humans previously had not considered or had ruled out erroneously. Its success with unorthodox
strategies had a significant impact on the backgammon community.[2]

For example, on the opening play, the conventional wisdom was that given a roll of 2-1, 4-1, or 5-1,
White should move a single checker from point 6 to point 5. Known as "slotting", this technique trades
the risk of a hit for the opportunity to develop an aggressive position. TD-Gammon found that the more
conservative play of 24-23 was superior. Tournament players began experimenting with TD-Gammon's
move, and found success. Within a few years, slotting had disappeared from tournament play, though in
2006 it made a reappearance for 2-1.[4]

Backgammon expert Kit Woolsey found that TD-Gammon's positional judgement, especially its weighing
of risk against safety, was superior to his own or any human's.[2]

TD-Gammon's excellent positional play was undercut by occasional poor endgame play. The endgame
requires a more analytical approach, sometimes with extensive lookahead. TD-Gammon's limitation to
two-ply lookahead put a ceiling on what it could achieve in this part of the game. TD-Gammon's
strengths and weaknesses were the opposite of symbolic artificial intelligence programs and most
computer software in general: it was good at matters that require an intuitive "feel" but bad at systematic
analysis.

See also

Games portal

World Backgammon Federation

References
1. Sammut, Claude; Webb, Geoffrey I., eds. (2010), "TD-Gammon" (https://doi.org/10.1007/97
8-0-387-30164-8_813), Encyclopedia of Machine Learning, Boston, MA: Springer US,
pp. 955–956, doi:10.1007/978-0-387-30164-8_813 (https://doi.org/10.1007%2F978-0-387-3
0164-8_813), ISBN 978-0-387-30164-8, retrieved 2023-12-25
2. Tesauro (1995)
3. Sutton & Barto (2018), 11.1.
4. "Backgammon: How to Play the Opening Rolls" (http://www.bkgm.com/openings.html).

Works cited
Sutton, Richard S.; Barto, Andrew G. (2018). "11.1 TD-Gammon" (http://www.incompleteide
as.net/book/11/node2.html). Reinforcement Learning: An Introduction (2nd ed.). Cambridge,
MA: MIT Press.
Tesauro, Gerald (March 1995). "Temporal Difference Learning and TD-Gammon" (http://ww
w.bkgm.com/articles/tesauro/tdl.html). Communications of the ACM. 38 (3): 58–68.
doi:10.1145/203330.203343 (https://doi.org/10.1145%2F203330.203343).

External links
TD-Gammon (https://researcher.watson.ibm.com/researcher/view_page.php?id=6853) at
IBM
TD-Gammon (https://github.com/dellalibera/td-gammon) on GitHub

Retrieved from "https://en.wikipedia.org/w/index.php?title=TD-Gammon&oldid=1227681911"

You might also like