This article is from the Backgammon FAQ, by Mark Damish damish@ll.mit.edu with numerous contributions by others.
From wbitting@crl.com Sat Sep 16 20:40:09 PDT 1995
Article: 9552 of rec.games.backgammon
From: wbitting@crl.com (William C. Bitting)
Newsgroups: rec.games.backgammon
Subject: TD-Gammon & IBM Family FunPak
Date: 14 Sep 1995 22:41:22 -0700
Excerpted from:
OS/2 Warp Monthly Newsletter September 95 (starting at p86
of 176, ascii version)
by Jeri Dube
(This section is out of sequence as presented in original article.)
Although playing backgammon on a computer that plays as well as a
world class master seems somewhat awe-inspiring, you can work up to
it. The game comes with five skill settings, where each higher setting
uses an increasing larger and more complex neural network as its
underlying engine. If you want to use TD-Gammon to improve your
backgammon skills, it is quite good as a learning device. Not only do
you get feedback from the results of your playing but the system is
quite supportive of you. It gives a modest, `I win' message when you
lose and a hearty `Congratulations, you win!', when the computer
loses.
To embody this expert backgammon-playing neural network into an OS/2
game, IBM Research hired Keith Weiner, a professional PC game
developer, to add a front end written for OS/2's presentation manager.
TD-Gammon is fully 32-bit and takes full advantage of OS/2 Warp's
multi-threading capabilities. Like all presentation manager programs,
TD- Gammon comes with a settings notebook where you can set things
such as the background color and the animation speed.
Given the success of the TD-Gammon game, I asked Gerry what his next
neural network game would be. He told me that researchers have used
other games such as Chess, Othello, and Go with varying degrees of
success to study neural network learning. None have been as successful
as backgammon. Gerry theorizes that the stochastic element of
backgammon (i.e. throwing the dice) is what makes backgammon so useful
in modeling the self-learning process. With that in mind, Gerry's next
venture into self-learning is with financial time series analysis. If
that project is as successful at learning as the backgammon game, then
I'm really looking forward to that program.
For more information on Gerry's work, you may want to read his article
``Temporal Difference Learning and TD-Gammon'' published in
Communications of the ACM, volume 38, number 3, pp. 58-68 (March
1995).
(The newsletter article starts here and ends with the above 4
paragraphs.)
When most people think of IBM Research, they tend to think of
fractals, scanning- tunneling-electron microscopes, or high
temperature superconductivity. Games are not usually one of the
thoughts that come to mind. However, the TD- Gammon game included in
the IBM Family FunPak for OS/2 Warp was developed by IBM Research.
By virtue of being created at such an auspicious place, you would
think that this version of backgammon is quite special. Well, to be
quite honest and not so humble, it is! TD-Gammon is the most advanced
computer version of backgammon. It can play at the most advanced
levels. If the system were a human, it would be rated as a World Class
Master.
TD-Gammon was developed by IBM Research Staff Member, Gerry Tesauro.
Gerry is not a game developer, rather he is a theoretical physicist
who has been working in the area of neural networks and artificial
intelligence for several years. He did not initially intend to develop
an OS/2 game for the Family FunPak. All he wanted to develop was a
basic research project to study learning algorithms that would enable
a computer to teach itself a task.
Gerry chose backgammon as the task because it appeared to be a good
domain in which a neural network might work well. At this point you
may be wondering now that I've mentioned it twice, what is a neural
network? Well, in short, it's a model of interconnected neurons (also
known as nodes) that was inspired by the logical neurons in the human
nervous system. Each connection between neurons has a particular
weight value associated with it.
In the case of backgammon, the state of the backgammon board is fed
into input neurons that have connections to hidden neurons (or units).
These hidden neurons in turn connect to an output layer that holds the
value of the state (that is, the chances of winning from that
particular state). The computation between the input neurons and the
hidden neurons is a weighted linear summation of all the input
neurons. The result of the summation is put through a thresholding
function. This function compresses the value to lie within a certain
range of probabilities. (In case it ever comes up in conversation, the
function is known as a squashing function.) The squashing function is
a non- linear function. The non-linearity allows a system to learn
more complex functions.
To use this model to teach a system backgammon, all the initial
weights between the neurons are randomly set. The neural network
starts from the opening backgammon position and plays both sides until
one of the sides wins. The outcome of the game is used as a reward
signal for reinforcement learning. That is, the neural network takes
the outcome of the game and adjusts the weights accordingly. The
adjustments improve the network's ability to evaluate board states for
subsequent plays of the game.
This learning process is repeated hundreds and thousands of times.
Using an RS/6000 computer, the learning actually took about two weeks.
Gerry and his colleagues were amazed at how well the neural network
learned to play backgammon. The system kept getting better and better
until it reached the world class master status. Actually, the neural
network could improve its play even more with further training and a
larger network.
TD-Gammon is available on the new IBM Family FunPak for OS/2. The
FunPak may be purchased from Indulable Blue [add url] or from a number
of other mail order software houses.
From: jiml@teleport.com (Jim Little)
Newsgroups: comp.os.os2.games,rec.games.backgammon
Subject: TD-Gammon available for free download
Date: 9 Nov 1995 23:52:45 -0800
IBM has made TD-Gammon, their supposedly groundbreaking neural
network- based version of Backgammon, available for free download. It
seems to be part of an attempt to promote their IBM Family FunPack.
You can get it by surfing to
http://www.austin.ibm.com/pspinfo/funtdgammon.htm and following the
"Read the license information" link. You will have to fill out a form
with your name, address, etc. (But nothing forces you to enter valid
information. ;) )
From their web page: "TD Gammon requires OS/2 2.1 or higher, an Intel
386-SX or higher, with Advanced and Expert levels requiring a 486-DX
33MHz or higher, and a minimum of 6 meg of memory is recommended."
-Jim Little (jiml@teleport.com)
[md] The original article is available at:
http://www.austin.ibm.com/psinfo/m4bakgam.htm
_________________________________________________________________
Newsgroups: rec.games.backgammon
From: tesauro@watson.ibm.com (Gerry Tesauro)
Subject: TD-Gammon paper available by FTP
Sender: Gerald Tesauro (tesauro@watson.ibm.com)
Date: Wed, 2 Jun 1993 18:06:35 GMT
Disclaimer: This posting represents the poster's views, not necessarily
those of IBM.
The following paper, which has been accepted for publication in Neural
Computation, has been placed in the neuroprose archive at Ohio State.
Instructions for retrieving the paper by anonymous ftp are appended
below.
---------------------------------------------------------------
TD-Gammon, A Self-Teaching Backgammon Program,
Achieves Master-Level Play
Gerald Tesauro
IBM Thomas J. Watson Research Center
P. O. Box 704
Yorktown Heights, NY 10598
(tesauro@watson.ibm.com)
Abstract:
TD-Gammon is a neural network that is able to teach itself to play
backgammon solely by playing against itself and learning from the
results, based on the TD(lambda) reinforcement learning algorithm
(Sutton, 1988). Despite starting from random initial weights (and
hence random initial strategy), TD-Gammon achieves a surprisingly
strong level of play. With zero knowledge built in at the start of
learning (i.e. given only a ``raw'' description of the board state),
the network learns to play at a strong intermediate level.
Furthermore, when a set of hand-crafted features is added to the
network's input representation, the result is a truly staggering level
of performance: the latest version of TD-Gammon is now estimated to
play at a strong master level that is extremely close to the world's
best human players.
---------------------------------------------------------------
FTP INSTRUCTIONS
unix% ftp archive.cis.ohio-state.edu (or 128.146.8.52)
Name: anonymous
Password: (use your e-mail address)
ftp> cd pub/neuroprose
ftp> binary
ftp> get tesauro.tdgammon.ps.Z
ftp> bye
unix% uncompress tesauro.tdgammon.ps
unix% lpr tesauro.tdgammon.ps
For a list of articles written by Gerry Tesauro, check out: A List of
Backgammon Articles in Science and Business
 
Continue to: