TD Gammon

TD-Gammon is a backgammon program developed by Gerald Tesauro in 1992, utilizing a neural network trained through temporal-difference learning. It achieved a high level of play, nearly matching top human players, and introduced innovative strategies that influenced expert play. Despite its strengths in positional judgment, TD-Gammon struggled with endgame analysis due to its limited lookahead capabilities.

Uploaded by

rhea.stuart.russell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

44 views3 pages

TD Gammon

Uploaded by

rhea.stuart.russell

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 3

TD-Gammon

TD-Gammon is a computer backgammon program developed in 1992 by Gerald Tesauro at IBM's

Thomas J. Watson Research Center. Its name comes from the fact that it is an artificial neural net trained
by a form of temporal-difference learning, specifically TD-Lambda.

The final version of TD-Gammon (2.1) was trained with 1.5 million games of self-play, and achieved a
level of play just slightly below that of the top human backgammon players of the time. It explored
strategies that humans had not pursued and led to advances in the theory of correct backgammon play.

In 1998, during a 100-game series, it was defeated by the world champion by a mere margin of 8 points.
Its unconventional assessment of some opening strategies had been accepted and adopted by expert
players.[1]

Algorithm for play and learning

During play, TD-Gammon examines on each turn all possible legal moves and all their possible responses
(two-ply lookahead), feeds each resulting board position into its evaluation function, and chooses the
move that leads to the board position that got the highest score. In this respect, TD-Gammon is no
different than almost any other computer board-game program. TD-Gammon's innovation was in how it
learned its evaluation function.

TD-Gammon's learning algorithm consists of updating the weights in its neural net after each turn to
reduce the difference between its evaluation of previous turns' board positions and its evaluation of the
present turn's board position—hence "temporal-difference learning". The score of any board position is a
set of four numbers reflecting the program's estimate of the likelihood of each possible game result:
White wins normally, Black wins normally, White wins a gammon, Black wins a gammon. For the final
board position of the game, the algorithm compares with the actual result of the game rather than its own
evaluation of the board position.[2]

After each turn, the learning algorithm updates each weight in the neural net according to the following
rule:

where:

is the amount to change the weight from its value on the previous turn.
is the difference between the current and previous turn's board evaluations.
is a "learning rate" parameter.
is a parameter that affects how much the present difference in board
evaluations should feed back to previous estimates. makes the program
correct only the previous turn's estimate; makes the program attempt to
correct the estimates on all previous turns; and values of between 0 and 1
specify different rates at which the importance of older estimates should
"decay" with time.
is the gradient of neural-network output with respect to weights: that is, how
much changing the weight affects the output.[2]

Experiments and stages of training

Unlike previous neural-net backgammon programs such as Neurogammon (also written by Tesauro),
where an expert trained the program by supplying the "correct" evaluation of each position, TD-Gammon
was at first programmed "knowledge-free".[2] In early experimentation, using only a raw board encoding
with no human-designed features, TD-Gammon reached a level of play comparable to Neurogammon:
that of an intermediate-level human backgammon player.

Even though TD-Gammon discovered insightful features on its own, Tesauro wondered if its play could
be improved by using hand-designed features like Neurogammon's. Indeed, the self-training TD-
Gammon with expert-designed features soon surpassed all previous computer backgammon programs. It
stopped improving after about 1,500,000 games (self-play) using a three-layered neural network, with
198 input units encoding expert-designed features, 80 hidden units, and one output unit representing
predicted probability of winning.[3]

Advances in backgammon theory

TD-Gammon's exclusive training through self-play (rather than tutelage) enabled it to explore strategies
that humans previously had not considered or had ruled out erroneously. Its success with unorthodox
strategies had a significant impact on the backgammon community.[2]

For example, on the opening play, the conventional wisdom was that given a roll of 2-1, 4-1, or 5-1,
White should move a single checker from point 6 to point 5. Known as "slotting", this technique trades
the risk of a hit for the opportunity to develop an aggressive position. TD-Gammon found that the more
conservative play of 24-23 was superior. Tournament players began experimenting with TD-Gammon's
move, and found success. Within a few years, slotting had disappeared from tournament play, though in
2006 it made a reappearance for 2-1.[4]

Backgammon expert Kit Woolsey found that TD-Gammon's positional judgement, especially its weighing
of risk against safety, was superior to his own or any human's.[2]

TD-Gammon's excellent positional play was undercut by occasional poor endgame play. The endgame
requires a more analytical approach, sometimes with extensive lookahead. TD-Gammon's limitation to
two-ply lookahead put a ceiling on what it could achieve in this part of the game. TD-Gammon's
strengths and weaknesses were the opposite of symbolic artificial intelligence programs and most
computer software in general: it was good at matters that require an intuitive "feel" but bad at systematic
analysis.

World Backgammon Federation

References
1. Sammut, Claude; Webb, Geoffrey I., eds. (2010), "TD-Gammon" (https://doi.org/10.1007/97
8-0-387-30164-8_813), Encyclopedia of Machine Learning, Boston, MA: Springer US,
pp. 955–956, doi:10.1007/978-0-387-30164-8_813 (https://doi.org/10.1007%2F978-0-387-3
0164-8_813), ISBN 978-0-387-30164-8, retrieved 2023-12-25
2. Tesauro (1995)
3. Sutton & Barto (2018), 11.1.
4. "Backgammon: How to Play the Opening Rolls" (http://www.bkgm.com/openings.html).

Works cited
Sutton, Richard S.; Barto, Andrew G. (2018). "11.1 TD-Gammon" (http://www.incompleteide
as.net/book/11/node2.html). Reinforcement Learning: An Introduction (2nd ed.). Cambridge,
MA: MIT Press.
Tesauro, Gerald (March 1995). "Temporal Difference Learning and TD-Gammon" (http://ww
w.bkgm.com/articles/tesauro/tdl.html). Communications of the ACM. 38 (3): 58–68.
doi:10.1145/203330.203343 (https://doi.org/10.1145%2F203330.203343).

External links
TD-Gammon (https://researcher.watson.ibm.com/researcher/view_page.php?id=6853) at
IBM
TD-Gammon (https://github.com/dellalibera/td-gammon) on GitHub

Retrieved from "https://en.wikipedia.org/w/index.php?title=TD-Gammon&oldid=1227681911"

FS93 02 003
No ratings yet
FS93 02 003
5 pages
1998 Pollack Blair ML
No ratings yet
1998 Pollack Blair ML
15 pages
The Significance of Temporal-Difference Learning I
No ratings yet
The Significance of Temporal-Difference Learning I
8 pages
Backgammon Strategy
No ratings yet
Backgammon Strategy
23 pages
Knightcap: A Chess Program That Learns by Combining TD With Minimax Search
No ratings yet
Knightcap: A Chess Program That Learns by Combining TD With Minimax Search
16 pages
TD Gammon
No ratings yet
TD Gammon
14 pages
AI Strategies for Backgammon
No ratings yet
AI Strategies for Backgammon
1 page
Newton's Method in RL & MPC
No ratings yet
Newton's Method in RL & MPC
35 pages
Comments On "Co-Evolution in The Successful Learning of Backgammon Strategy"
No ratings yet
Comments On "Co-Evolution in The Successful Learning of Backgammon Strategy"
3 pages
Lecture 21
No ratings yet
Lecture 21
29 pages
The History of Computer Games: Milestones in Computer Backgammon
No ratings yet
The History of Computer Games: Milestones in Computer Backgammon
13 pages
History of AI 1
No ratings yet
History of AI 1
28 pages
Playing Tic-Tac-Toe Using Genetic Neural Network With Double Transfer Functions
No ratings yet
Playing Tic-Tac-Toe Using Genetic Neural Network With Double Transfer Functions
8 pages
Assignment 3 - ReinforcementLearning - 200508263 - AdityaAnantharaman - Trikkur
No ratings yet
Assignment 3 - ReinforcementLearning - 200508263 - AdityaAnantharaman - Trikkur
9 pages
Game Playing
No ratings yet
Game Playing
16 pages
Chess Computer Guide for Enthusiasts
100% (2)
Chess Computer Guide for Enthusiasts
11 pages
Chess Champion 2150L
80% (5)
Chess Champion 2150L
11 pages
An Improved Chess Machine On An
No ratings yet
An Improved Chess Machine On An
5 pages
An Improved Chess Machine On An
100% (1)
An Improved Chess Machine On An
5 pages
Unit3 AI
No ratings yet
Unit3 AI
36 pages
Deepchess: End-To-End Deep Neural Network For Automatic Learning in Chess
100% (1)
Deepchess: End-To-End Deep Neural Network For Automatic Learning in Chess
8 pages
What Is A Key Point of General Game Playing?
No ratings yet
What Is A Key Point of General Game Playing?
2 pages
6 Machine Games in A Box: Sixpack G
No ratings yet
6 Machine Games in A Box: Sixpack G
9 pages
An Adaptive 'Rock, Scissors and Paper' Player Based On A Tapped Delay Neural Network
No ratings yet
An Adaptive 'Rock, Scissors and Paper' Player Based On A Tapped Delay Neural Network
5 pages
AI Test2
No ratings yet
AI Test2
3 pages
Thrun Sebastian 1995 8
100% (1)
Thrun Sebastian 1995 8
8 pages
Knowledge Discovery in Deep Blue
No ratings yet
Knowledge Discovery in Deep Blue
3 pages
Adversarial Search
No ratings yet
Adversarial Search
15 pages
Backtracking and Games: Eric Roberts CS 106B January 30, 2012
No ratings yet
Backtracking and Games: Eric Roberts CS 106B January 30, 2012
17 pages
AI Game Strategy Basics
No ratings yet
AI Game Strategy Basics
57 pages
06 Adversarialsearch
No ratings yet
06 Adversarialsearch
36 pages
Botermans - The Book of Games
No ratings yet
Botermans - The Book of Games
744 pages
Evolving A Hex-Playing Agent: Figure 1. A Completed Game of Hex On An 11x11 Board (From Wikipedia)
No ratings yet
Evolving A Hex-Playing Agent: Figure 1. A Completed Game of Hex On An 11x11 Board (From Wikipedia)
6 pages
2-0 and 2-1.programming A Computer For Playing Chess - Shannon.062303002
No ratings yet
2-0 and 2-1.programming A Computer For Playing Chess - Shannon.062303002
18 pages
Vdoc - Pub Computer Gamesmanship The Complete Guide To Creating and Structuring Intelligent Games Programs
No ratings yet
Vdoc - Pub Computer Gamesmanship The Complete Guide To Creating and Structuring Intelligent Games Programs
271 pages
Nim Game Theory Analysis
No ratings yet
Nim Game Theory Analysis
16 pages
Unit 2
No ratings yet
Unit 2
26 pages
Starting Out in Backgammon 1st Edition Paul Lamford Full
100% (2)
Starting Out in Backgammon 1st Edition Paul Lamford Full
92 pages
Tic Tac Toe
No ratings yet
Tic Tac Toe
34 pages
Co-Learning in Differential Games
No ratings yet
Co-Learning in Differential Games
35 pages
General Game Playing in AI
No ratings yet
General Game Playing in AI
16 pages
Loading Instructions
No ratings yet
Loading Instructions
4 pages
Icpram Chess DNN 2018
No ratings yet
Icpram Chess DNN 2018
8 pages
NCWBCB 1422
No ratings yet
NCWBCB 1422
6 pages
3 CSE3013 Adversarial Search
No ratings yet
3 CSE3013 Adversarial Search
48 pages
Starting Out in Backgammon 1st Edition Paul Lamford Full Chapters Included
No ratings yet
Starting Out in Backgammon 1st Edition Paul Lamford Full Chapters Included
172 pages
Combinatorial Games: From Theoretical Solving To AI Algorithms
No ratings yet
Combinatorial Games: From Theoretical Solving To AI Algorithms
15 pages
Anti-Chess AI: MinMax Algorithm
No ratings yet
Anti-Chess AI: MinMax Algorithm
22 pages
DALTLUv 1
No ratings yet
DALTLUv 1
11 pages
AI Masters Chess, Shogi, and Go
No ratings yet
AI Masters Chess, Shogi, and Go
32 pages
Computer Chess Timeline
No ratings yet
Computer Chess Timeline
112 pages
Artificial Intelligence - Adversarial Search
No ratings yet
Artificial Intelligence - Adversarial Search
4 pages
Comparison of Deep Learning Software - Wikipedia
No ratings yet
Comparison of Deep Learning Software - Wikipedia
4 pages
Philosophical Debate on AI Minds
No ratings yet
Philosophical Debate on AI Minds
28 pages
Load Balancing (Computing)
No ratings yet
Load Balancing (Computing)
16 pages
List of Artificial Intelligence Projects
No ratings yet
List of Artificial Intelligence Projects
12 pages
Data Applied
No ratings yet
Data Applied
1 page
Synthetic Environment For Analysis and Simulations
No ratings yet
Synthetic Environment For Analysis and Simulations
3 pages
Free HAL
No ratings yet
Free HAL
2 pages
Chatbot Evolution: Jabberwacky to Cleverbot
No ratings yet
Chatbot Evolution: Jabberwacky to Cleverbot
2 pages
CMU Sphinx
No ratings yet
CMU Sphinx
3 pages
PARRY
No ratings yet
PARRY
2 pages
AIML: A Guide for AI Developers
No ratings yet
AIML: A Guide for AI Developers
4 pages
Mycin
No ratings yet
Mycin
5 pages
Minds DB
No ratings yet
Minds DB
4 pages
Stockfish (Chess)
No ratings yet
Stockfish (Chess)
28 pages
G1 Term2 Ch6 Water
No ratings yet
G1 Term2 Ch6 Water
8 pages
Schelling's Timaeus
100% (2)
Schelling's Timaeus
44 pages
Ielts Writing Task 1 Sample Table
No ratings yet
Ielts Writing Task 1 Sample Table
2 pages
BSEE 27 Lesson Plan Demo
No ratings yet
BSEE 27 Lesson Plan Demo
12 pages
Supporting Students in Learning at Home PDF
No ratings yet
Supporting Students in Learning at Home PDF
3 pages
Foundation of Curriculum
No ratings yet
Foundation of Curriculum
6 pages
5e Lesson Pre-K Economics
No ratings yet
5e Lesson Pre-K Economics
3 pages
Data Science Workshop
No ratings yet
Data Science Workshop
2 pages
Chapter 1
No ratings yet
Chapter 1
3 pages
Pitogo District Inset 2020 - Research
No ratings yet
Pitogo District Inset 2020 - Research
38 pages
Composition in Art An Introduction
No ratings yet
Composition in Art An Introduction
1 page
GR 8 Math 4
No ratings yet
GR 8 Math 4
9 pages
Criteria 1 2 3 4 Score: Rubric For Essay: High School!
100% (1)
Criteria 1 2 3 4 Score: Rubric For Essay: High School!
2 pages
Academic Achievement in Early Adolescence: The Influence of Cognitive and Non-Cognitive Variables
No ratings yet
Academic Achievement in Early Adolescence: The Influence of Cognitive and Non-Cognitive Variables
23 pages
Ielts Vaccine Listening
No ratings yet
Ielts Vaccine Listening
73 pages
Conceptual framework-PR2
No ratings yet
Conceptual framework-PR2
4 pages
Writing Summaries and Essays
No ratings yet
Writing Summaries and Essays
3 pages
Multiple Criteria Decision Analysis An Integrated Approach
No ratings yet
Multiple Criteria Decision Analysis An Integrated Approach
380 pages
Heartfulnessreport
No ratings yet
Heartfulnessreport
8 pages
Blachowicz
No ratings yet
Blachowicz
7 pages
Principles of Communication
No ratings yet
Principles of Communication
12 pages
Resume Becky Odero-2
No ratings yet
Resume Becky Odero-2
3 pages
Unit 1 - Lesson 1a - Reading - Page 16
No ratings yet
Unit 1 - Lesson 1a - Reading - Page 16
6 pages
The Neural Substrates of Religious Experience: John Rabin, M.D
No ratings yet
The Neural Substrates of Religious Experience: John Rabin, M.D
13 pages
Performance Appraisal & Reward System
No ratings yet
Performance Appraisal & Reward System
23 pages
Chapter-2 Overview of The Organization (Rupali Bank Limited)
No ratings yet
Chapter-2 Overview of The Organization (Rupali Bank Limited)
37 pages
Scientific Paper. Ronald Tres Reyes
No ratings yet
Scientific Paper. Ronald Tres Reyes
6 pages
D1100 CH 01 Psychology, Slides For Lecture, Notes, 5e
No ratings yet
D1100 CH 01 Psychology, Slides For Lecture, Notes, 5e
45 pages
Toefl Writing
No ratings yet
Toefl Writing
2 pages
Mapping CPL Ke Dalam Bahan Kajian (BK) Dan Mata Kuliah (MK) Prodi Pendidikan Bahasa Inggris Universitas Mercu Buana Yogyakarta TAHUN AJARAN 2021/2022
100% (1)
Mapping CPL Ke Dalam Bahan Kajian (BK) Dan Mata Kuliah (MK) Prodi Pendidikan Bahasa Inggris Universitas Mercu Buana Yogyakarta TAHUN AJARAN 2021/2022
3 pages