Games like chess and checkers (droughts), and even the lowly tic-tac-toe (naughts and crosses) have a similar structure. There is board and game play alternates between two players who decide on (what they perceive) is their best move and take it. To do this they must each evaluate the board and find a move that makes the evaluation more favorable for them. There is no rolling of dice or other element of chance. Each game has rules for the possible moves and rules for when the game ends along with who is the winner (if either).
Consider the following snapshot (what we'll call a game-node) of a chess game. These are found once a week in our local newspaper. It is white's turn to play and white can checkmate black in two moves (with just one move in between for black).
In evaluating the board manually you can see that white is already in a vastly superior position. Two bishops, a rook and pawn in just additional material strength. White also has an enormous number of possible moves. The rook alone can make 7 different moves. Most of these moves however would not benefit white and would benefit the black king, who has only two moves. It is very confined and in danger of being completely trapped.
You might want to say that there is no way for black to win anyway so why not simply resign. But to win white must put the black king in check from which there is no escape. If the black king can manover to a postion where is it not in check but cannot move without then being in check, then the game ends in a draw. White needs to avoid this.
Lookahead is the process of considering a possible move, then considering possible responses, and perhaps for each of these even your own counter responses. Lookahead can be carried to a depth that either ends the game or exhausts your resources. But the whole point is to simply find that best first move.
This minimax algorithm can be applied to the chess problem above. The computer creates a game node and takes an initial evaluation. It will then compute which moves are possible. It will eventually choose the best of these moves (with the maximum value for white).
Then, the process repeats for each of the dozens of child game nodes. The computer will choose the worst of these (for white) since black is playing and best for black is worst for white. So here, we minimize the value for white.
Finally, the grandchild game nodes are generated and evaluated and at this point the solution to the puzzle above pops out as the maximum for white. And many thousands of game nodes will be generate, connected in a tree. In this example the lookahead search is three levels deep.
This is not quite accurate. What I have described is a breadth first search but actually a depth first is done which makes possible an alpha-beta cutoff that we'll talk about later. The use of a maximum search depth makes this practical.
Once the maximum found, its value is propagated back up the tree to parent and grandparent. Back at the original game node, this best move is chosen for actual play. Then the opponent moves and unless it is the one we also calculated as his or her best, we must start the process all over again. Actually, we always do.
It used to be thought that a computer playing chess simply calculated lookahead to the end of the game but, as we can easily see above, that is just not possible.
To keep things so that we can easily watch what happens we will look at two much simpler games. The first is the classic childs game of tic-tac-toe which I assume everyone knows. If not google. The second is Kalah. I found it (along with SIR) in Shapiro's book "Techniques of Artificial Intelligence'. It is commerically available as a nice board game and goes by the name Mancala. I completely recoded the algorithm in object-oriented Python and used a common function to handle the minimax (later with alpha-beta cuttoff) for any game node instance with the expected set of functions.
In both of the games the number of different moves that can be made from a game node is limited (maximum of 9 for tic-tac-toe and 6 for Kalah).
As the first of two concrete examples, we shall look at game of tic-tac-toe.
Game nodes will be instances of the class TicTacToe which holds all information about the node including its path to an optimal value.
The attributes of each game node are as follows. For now ignore the attributes alpha and beta.
- player: Either X or O
- board: A nine character string, each characteris "X", "O" or "."
- depth: of this node from the start of the minimax search
- ivalue: the initial value assigned to this game-node (evaluation)
- value: the optimal value (so far) returned from successors
- next: pointer to the chosen child node
- bestmove: the one that generated "next"
- tag: an id tag to track how many game-nodes are produced.
The methods of the TicTacToe class are
- legalMoves - set moves attribute to tuple of legal moves from this node
- eval: set value and ivalue to an integer representing value to X
- over: return boolean if this node is an end point (win/loss/draw)
- move: take the move, modify the board, switch players, reevaluate, recalc moves
- maximizing: return boolean if this is a maximizing node (else minimizing)
- printMe: print a representation of the node to the screen
Let's play with this a bit
>>> from ticTacToe import TicTacToe
>>> gnode = TicTacToe('|X..|..O|...|','X')
>>> gnode.board
'X....O...'
>>> gnode.player
'X'
>>> gnode.moves
(1, 2, 3, 4, 6, 7, 8)
>>> gnode.printMe()
|X..| tag=1 depth=0 X to play
|..O| moves (1, 2, 3, 4, 6, 7, 8)
|...| ival=0 fval=0 A=-9999 B=9999
So, we have a game node gnode whose tag is 1. At depth 0 with X to play. Squares to play are numbered 0-8 and all moves except 0 and 5 are available. The initial value and final value are 0 indicating no advantage to either X or O. Ignore the A and B values for now.
Let's make a move and see how the game node changes.
>>> gnode.move(2)
>>> gnode.printMe()
|X.X| tag=1 depth=0 O to play
|..O| moves (1, 3, 4, 6, 7, 8)
|...| ival=30 fval=30 A=-9999 B=9999
Now the node value is 30 (X's favor) since it has one possible win. At this point this constrains O's response to square one. Let's do that now.
>>> gnode.move(1)
>>> gnode.printMe()
|XOX| tag=1 depth=0 X to play
|..O| moves (3, 4, 6, 7, 8)
|...| ival=0 fval=0 A=-9999 B=9999
That plugged the hole but now X has a killer move.
>>> gnode.move(6)
>>> gnode.printMe()
|XOX| tag=1 depth=0 O to play
|..O| moves (3, 4, 7, 8)
|X..| ival=60 fval=60 A=-9999 B=9999
Now O can make either defensive move bringing the ival back to 30, but it can't make them both. And X, of course will win with the following move.
Notice that the ival is useful but not actually any exact figure. The eval method is really just a heuristic to guide the search toward a win or to prevent the opponent from winning.
The minimax algorithm itself is not part of the TicTacToe class but is in a special function chooseMove found here Source for minimax.py .
After the walk through above with the chess example the code of the chooseMove function should be easy to follow. A NodeClass to generate new game nodes, a starting node and a maximum depth to search are passed in. From the starting node children nodes are generated for each possible move and then, through the recursive call, taken down to the depth allowed. Unless we arrive at an end node which is simply passed through
Let's walk through an example the same situation we did by hand above.
>>> from minimax import chooseMove
>>> from ticTacToe import TicTacToe
>>> gnode = TicTacToe('|X..|..O|...|','X')
>>> gnode = chooseMove(TicTacToe, gnode, maxDepth=12)
>>> gnode.printMe()
|X..| tag=1 depth=0 X to play
|..O| moves (1, 2, 3, 4, 6, 7, 8)
|...| ival=0 fval=100 A=-9999 B=9999
Next move is 2 giving us...
|X.X| tag=797 depth=1 O to play
|..O| moves (1, 3, 4, 6, 7, 8)
|...| ival=30 fval=100 A=-9999 B=9999
Next move is 1 giving us...
|XOX| tag=798 depth=2 X to play
|..O| moves (3, 4, 6, 7, 8)
|...| ival=0 fval=100 A=-9999 B=9999
Next move is 4 giving us...
|XOX| tag=850 depth=3 O to play
|.XO| moves (3, 6, 7, 8)
|...| ival=60 fval=100 A=-9999 B=9999
Next move is 3 giving us...
|XOX| tag=851 depth=4 X to play
|OXO| moves (6, 7, 8)
|...| ival=60 fval=100 A=-9999 B=9999
Next move is 6 giving us...
|XOX| tag=852 depth=5 O to play
|OXO| moves (7, 8)
|X..| ival=100 fval=100 A=-9999 B=9999
So, the solution found differs at the end from the one we worked out manually. But it's just as effective. Notice how the ival (initial value) field changes as we go down the tree. But fval (final value) is propagated back up the tree and even our starting node is recognized as a "winner".
We can do these runs from the command line with the advantage that it's very easy to get timing or the runs and to test with and without the alpha-beta enhancement which is coming right up. Running ticTacToe.py as a script invokes the timeSearch function at the bottom. This gathers arguments from the command line for the board, player, alphabeta and depth. It makes it easy to use command line editing to play various games. Source for timeSearch.py
$ python ticTacToe.py '|X..|..O|...|','X' 'X' 'N' 12
Find optimal move: AlphaBeta=False Depth=12
Initial game configuration is
|X..| tag=1 depth=0 X to play
|..O| moves (1, 2, 3, 4, 6, 7, 8)
|...| ival=0 fval=100 A=-9999 B=9999
Next move is 2 giving us...
|X.X| tag=797 depth=1 O to play
|..O| moves (1, 3, 4, 6, 7, 8)
|...| ival=30 fval=100 A=-9999 B=9999
Next move is 1 giving us...
|XOX| tag=798 depth=2 X to play
|..O| moves (3, 4, 6, 7, 8)
|...| ival=0 fval=100 A=-9999 B=9999
Next move is 4 giving us...
|XOX| tag=850 depth=3 O to play
|.XO| moves (3, 6, 7, 8)
|...| ival=60 fval=100 A=-9999 B=9999
Next move is 3 giving us...
|XOX| tag=851 depth=4 X to play
|OXO| moves (6, 7, 8)
|...| ival=60 fval=100 A=-9999 B=9999
Next move is 6 giving us...
|XOX| tag=852 depth=5 O to play
|OXO| moves (7, 8)
|X..| ival=100 fval=100 A=-9999 B=9999
Time: 0.258681774139 sec
This section is a little tricky. Each node carrys an alpha and a beta that it inherits from above. Alpha is a "best so far" for "X" and if things are going well it will be a positive number. Beta plays the same role for "O" but with the opposite sign.
As nodes for "X" are expanded the alpha values can only increase as we progress down the tree. And beta values can only decrease.
If, when expanding a node for X, its alpha value is greater than the beta value of its ancestors, then this node is bad news for 'O'. 'O' already has a better choices and will NOT choose this path. And that means we can abandon further expansion of this node for 'X' and simply return the value found. And symetrically, the same applys to expansion of 'O' nodes but with things flipped. See the code in the alphaBeta.chooseMove function.
This saves a lot of unneeded computation. Here is the same run above with alpha-beta turned on. The 'Y' in the second to last argument makes testGame import the function chooseMove from alphaBeta.py. Compare it with the same function in minimax.py and you'll see just a few extra lines tracking the alpha and beta values and using them to trim needless expansions.
$ python ticTacToe.py '|X..|..O|...|' X Y 12
Find optimal move: AlphaBeta=True Depth=12
Initial game configuration is
|X..| tag=1 depth=0 X to play
|..O| moves (1, 2, 3, 4, 6, 7, 8)
|...| ival=0 fval=100 A=-9999 B=9999
Next move is 2 giving us...
|X.X| tag=99 depth=1 O to play
|..O| moves (1, 3, 4, 6, 7, 8)
|...| ival=30 fval=100 A=-100 B=9999
Next move is 1 giving us...
|XOX| tag=100 depth=2 X to play
|..O| moves (3, 4, 6, 7, 8)
|...| ival=0 fval=100 A=-100 B=9999
Next move is 4 giving us...
|XOX| tag=134 depth=3 O to play
|.XO| moves (3, 6, 7, 8)
|...| ival=60 fval=100 A=0 B=9999
Next move is 3 giving us...
|XOX| tag=135 depth=4 X to play
|OXO| moves (6, 7, 8)
|...| ival=60 fval=100 A=0 B=9999
Next move is 6 giving us...
|XOX| tag=136 depth=5 O to play
|OXO| moves (7, 8)
|X..| ival=100 fval=100 A=0 B=9999
Time: 0.0150029659271 sec
This run took 1/40th the time and generated only about 1/6 the nodes.
You can play with this on the command line but the next game is much more interesting.
Like tic-tac-toe Kalah has limited moves for each player but is much more interesting to play. At the start of the game the board might look like
A 3 3 3 3 3 3
0 0
B 3 3 3 3 3 3
Here there are 14 pots. The two with 0 are the "Kalahs". The A pots are the top row and the B pots are the bottom row. In the middle row the A Kalah is on the far left and the B Kalah is on the far right.
One player plays A and the player other B. Each move consists of picking up the pebbles from one of the players 6 pots and distributing them counter-clockwise one per pot including the players Kalah but skipping the opponents Kalah. The winner is the player to first get a majority of pebbles into his own Kalah. There are some complications.
If the last pebble from the pot is dropped into the players Kalah, the player plays again. If the last pebble dropped is into one of the players pots that was empty, then that pebble and all the pebbles in the pot directly across go into the players Kalah.
Finally, if when the player has dropped the last pebble, his 6 pots are empty, then all of the pebbles in the 6 pots of the opponent are gathered and dropped into the players Kalah. That pretty much will end the game.
The Kalah class definition in kalah.py is quite similar to the TicTacToe class. The same attribute names are used in the same way by the chooseMove function. The differences are
- a board of a game node is represented by a 14 element list of integers. It works like a circular list.
- The move method is more complex do to handle the complexities of the moves, including the case where a player gets a second move.
- The legalMoves functions is simple as just non-empty pots for a player need to be found.
- The eval method uses a simple heuristic, either won/lost or how far ahead A is in filling his kalah.
- The printMe method of course is quite different
Let's play with this a bit as we did with TicTacToe
>>> from kalah import Kalah
>>> gnode = Kalah([0,4,4,4,4,4,4,0,4,4,4,4,4,4],"A")
>>> gnode.printMe("starting")
==============
starting
==============
Node=1 Stones=48 Depth=0
A to play. ival=0 value=0 depth=0 moves=(8, 9, 10, 11, 12, 13)
A 4 4 4 4 4 4
0 0
B 4 4 4 4 4 4
>>> gnode.move(10)
>>> gnode.printMe("Last pebble to kalah. A moves again")
==============
Last pebble to kalah. A moves again
==============
Node=1 Stones=48 Depth=0
A to play. ival=1 value=1 depth=0 moves=(8, 9, 11, 12, 13)
A 5 5 5 0 4 4
1 0
B 4 4 4 4 4 4
>>> gnode.move(13)
>>> gnode.printMe("Made 13 empty for later capture")
==============
Made 13 empty for a later capture
==============
Node=1 Stones=48 Depth=0
B to play. ival=2 value=2 depth=0 moves=(1, 2, 3, 4, 5, 6)
A 0 5 5 0 4 4
2 0
B 5 5 5 5 4 4
>>> gnode.move(2)
>>> gnode.printMe("B plays 2 to get a repeat move")
==============
B plays 2 to get a repeat move
==============
Node=1 Stones=48 Depth=0
B to play. ival=1 value=1 depth=0 moves=(1, 3, 4, 5, 6)
A 0 5 5 0 4 4
2 1
B 5 0 6 6 5 5
>>> gnode.move(3)
>>> gnode.printMe("B just made a dumb move")
==============
B just made a dumb move
==============
Node=1 Stones=48 Depth=0
A to play. ival=0 value=0 depth=0 moves=(8, 9, 11, 12)
A 0 5 5 0 5 5
2 2
B 5 0 0 7 6 6
>>> gnode.move(8)
>>> gnode.printMe("place last pebble in empty 13. 6 more to kalah")
==============
place last pebble in empty 13. 6 more to kalah
==============
Node=1 Stones=48 Depth=0
B to play. ival=6 value=6 depth=0 moves=(4, 5, 6)
A 0 6 6 1 6 0
8 2
B 0 0 0 7 6 6
>>>
And so on. That was more to test the correctness of the move method and to give you a flavor of how the game is played. Notice too how the ival changed as pebbles dropped into both kalahs.
The module kalah.py can be run as a script just like ticTacToe.py and when you do that it will run the function test which gathers arguments from the command line and calls timeSearch which does a lookahead alpha-beta search for the best move. Here is an example.
$ python kalah.py "[0,3,3,3,3,3,3,0,3,3,3,3,3,3]" A 4
Find optimal move: AlphaBeta=True Depth=4
==============
Initial game configuration is
==============
Node=1 Stones=36 Depth=0
A to play. ival=0 value=1 depth=0 moves=(8, 9, 10, 11, 12, 13)
A 3 3 3 3 3 3
0 0
B 3 3 3 3 3 3
==============
Next move is 12 giving us...
==============
Node=316 Stones=36 Depth=1
B to play. ival=1 value=1 depth=1 moves=(1, 2, 3, 4, 5, 6)
A 4 0 3 3 3 3
1 0
B 4 3 3 3 3 3
==============
Next move is 4 giving us...
==============
Node=368 Stones=36 Depth=2
B to play. ival=0 value=1 depth=2 moves=(1, 2, 3, 5, 6)
A 4 0 3 3 3 3
1 1
B 4 3 3 0 4 4
==============
Next move is 2 giving us...
==============
Node=372 Stones=36 Depth=3
A to play. ival=0 value=1 depth=3 moves=(8, 9, 10, 11, 13)
A 4 0 3 3 3 3
1 1
B 4 0 4 1 5 4
==============
Next move is 11 giving us...
==============
Node=376 Stones=36 Depth=4
A to play. ival=1 value=1 depth=4 moves=(8, 9, 10, 12, 13)
A 5 1 0 3 3 3
2 1
B 4 0 4 1 5 4
Time: 0.0099470615387 sec
The module playKalah.py will have the computer play A and query the user for moves for B. The code is quite straightforward. Here is a sample run
$ python playKalah.py "[0,3,3,3,3,3,3,0,3,3,3,3,3,3]" A 7
==============
Game start
==============
Node=1 Stones=36 Depth=0
A to play. ival=0 value=0 depth=0 moves=(8, 9, 10, 11, 12, 13)
A 3 3 3 3 3 3
0 0
B 3 3 3 3 3 3
Computer will play move 13
Hit return to continue:
==============
Board is now
==============
Node=1 Stones=36 Depth=0
B to play. ival=1 value=1 depth=0 moves=(1, 2, 3, 4, 5, 6)
A 0 3 3 3 3 3
1 0
B 4 4 3 3 3 3
Your moves are (1, 2, 3, 4, 5, 6)
Choose a move for B (1, 2, 3, 4, 5, 6):2
==============
Board is now
==============
Node=1 Stones=36 Depth=0
A to play. ival=1 value=1 depth=0 moves=(8, 9, 10, 11, 12)
A 0 3 3 3 3 3
1 0
B 4 0 4 4 4 4
Computer will play move 11
Hit return to continue:
==============
Board is now
==============
Node=1 Stones=36 Depth=0
A to play. ival=2 value=2 depth=0 moves=(8, 9, 10, 12, 13)
A 1 4 0 3 3 3
2 0
B 4 0 4 4 4 4
Computer will play move 8
Hit return to continue:
==============
Board is now
==============
Node=1 Stones=36 Depth=0
B to play. ival=7 value=7 depth=0 moves=(1, 4, 5, 6)
A 1 4 0 4 4 0
7 0
B 4 0 0 4 4 4
Your moves are (1, 4, 5, 6)
Choose a move for B (1, 4, 5, 6):
It seems the computer is off to a flying start. Notice we set the maximum search depth to 7. Try 4 or so for a more relaxing game.
All of the code files can be found in minimax.zip
If you have comments or suggestions You can email me at mail me
Copyright © 2014-2021 Chris Meyers and Fred Obermann
…