# Algorithmic Squava Players

I have four algorithms in my second attempt at writing algorithmic “players” for the game of Squava:

- Alpha-beta minimaxing
- Alpha-beta minimaxing with a better static evaluation function
- Monte Carlo Tree Search
- Monte Carlo Tree Search with UCT child selection

At this point, all but plain MCTS (#3 above) regularly beat me. How do my programmatic squava players do against each other?

I ran a round-robin tournament to find out which player is best. Four “players”, each one as player 1 and as player 2, to determine if there’s a first-player-advantage. 16 match-ups, 400 games in each match-up, for 6400 games total.

### Most Games Won

Ignoring match-ups where the first and second players are the same, there’s 4800 total games. Here’s who won the most games:

Player | Games won |
---|---|

MCTS/UCB1 | 1648 |

A/B+Avoid | 1506 |

AlphaBeta | 1273 |

MCTS/Plain | 373 |

I agree with this ranking. MCTS/UCB1 is the best player, Alpha-Beta minimaxing with the better static valuation function is almost as good. Plain MCTS is easily the worst of the four players.

I think the Alpha-beta minimaxing players aren’t as good because I haven’t discovered the best static valuation function. Additionally, I limited the number of moves an Alpha-beta player looks ahead at the beginning of the game. There’s so many response moves at the beginning of the game that a large look ahead takes a very long time. For the first 4 moves, I limit the look ahead to 8 further moves. Between 5 and 10 moves into the game, I limit look ahead to 10, and after 10 moves into the game, look ahead limited to 12 further moves. Alpha-beta minimaxing players absolutely do not look ahead to the end of the game until maybe the 8th move of a game. When playing against an Alpha-beta minimaxing player, humans will notice a large increase in computer computation time at the 5th move.

### More nuanced view

First | Second | First wins | Second wins | Mean moves | Median moves | Min | Max | |
---|---|---|---|---|---|---|---|---|

A | A | 189 | 211 | 19.52 | 21 | 13 | 24 | |

A | G | 146 | 254 | 19.48 | 20 | 13 | 24 | |

A | M | 344 | 56 | 14.18 | 13 | 7 | 25 | |

A | U | 207 | 193 | 17.59 | 19 | 9 | 25 | |

G | A | 239 | 161 | 19.13 | 21 | 11 | 24 | |

G | G | 195 | 205 | 19.30 | 21 | 12 | 24 | |

G | M | 358 | 42 | 13.85 | 13 | 7 | 25 | |

G | U | 213 | 187 | 17.63 | 18.5 | 9 | 25 | |

M | A | 93 | 307 | 15.60 | 14 | 8 | 24 | |

M | G | 82 | 318 | 15.15 | 14 | 8 | 24 | |

M | M | 274 | 126 | 12.21 | 11 | 7 | 24 | |

M | U | 79 | 321 | 14.65 | 14 | 8 | 24 | |

U | A | 292 | 108 | 17.77 | 18 | 10 | 24 | |

U | G | 276 | 124 | 18.61 | 19 | 10 | 24 | |

U | M | 379 | 21 | 11.99 | 11 | 7 | 24 | |

U | U | 188 | 212 | 17.34 | 19 | 11 | 23 |

- A: Alpha-beta minimaxing
- G: Alpha-beta minimaxing with a somewhat better static evaluation function
- M: Monte Carlo Tree Search
- U: Monte Carlo Tree Search with UCT child node selection

### Comparing self-competing players

I consider the A-A, G-G and U-U lines in the table above to be 50-50. The M-M line is not Why isn’t the M vs M competition 50:50?

Plain Monte Carlo Tree Search does only a single playout per new child node. It’s common for it to find a single losing playout on a child node that if examined further, would end up as a good move. The MCTS algorithm doesn’t allow further exploration of consequences of a move that it (randomly!) finds a loss for. It ends up picking some bad moves. I think this explains why plain Monte Carlo Tree Search scores the worst in every match-up. Monte Carlo Tree Search with UCT child selection remedies this by giving a higher score (for search) to some moves that have relatively unexplored consequences.

### Does first or second player have an advantage?

Player | wins moving 1st | wins moving 2nd |
---|---|---|

MCTS/UCB1 | 947 | 701 |

A/B+Avoid | 810 | 696 |

AlphaBeta | 697 | 576 |

MCTS/Plain | 254 | 119 |

Looks like maybe the first player has an advantage, although that advantage isn’t overwhelming.

The lines for A-A, G-G, M-A and M-G in the “More Nuanced View” table above could fool you into thinking that the Alpha-beta minimaxing algorithms have a second player advantage. I don’t think that’s true, I believe the scores are an artifact of randomness for the A-A and G-G cases, and the fact that plain MCTS is the worst player for the M-A and M-G cases.

### Do players win outright, or force the other player to lose?

Squava the game has outright wins (getting 4-in-a-row) and outright losses (getting 3-in-a-row, without 4-in-a-row). Do players win outright or force the other player into a loss?

First | Second | First won | Second lost | Second won | First lost |
---|---|---|---|---|---|

U | A | 157 | 135 | 94 | 14 |

U | G | 103 | 173 | 95 | 29 |

U | M | 359 | 20 | 13 | 8 |

G | A | 153 | 86 | 31 | 130 |

G | M | 351 | 7 | 2 | 40 |

G | U | 208 | 5 | 16 | 171 |

A | G | 95 | 51 | 72 | 182 |

A | M | 338 | 6 | 1 | 55 |

A | U | 203 | 4 | 20 | 173 |

M | A | 38 | 55 | 283 | 24 |

M | G | 38 | 44 | 296 | 22 |

M | U | 41 | 38 | 286 | 35 |

Looks like the Alpha-beta minimaxing players tend to win outright, while the MCTS/UCB1 player mixes up wins and forcing losses. The plain MCTS player lets other players win outright. It kind of sucks.

### Game length

Here’s a histogram of the length of games in my tournament.

13- and 21-move games predominate.

The first player won outright all of the 13-move games, but all 4 players won some 21-move games.

Of the 785 21-move games, the first player lost outright 705 games, while the first player won outright 80 games. The vast bulk of the 21-move games ended up with the players avoiding 3-in-a-row outright losses.

My thought is that 13-move games are about the practical limit of getting an outright win by setting up an outright loss if the losing player were to block the win. 21-move games are the practical limit of avoiding-a-loss end games.

The plot above shows that Squava does have openings, a middle game, and an end game. Before the 18th move of a game, it’s almost impossible to force an outright loss. If you force a loss on your opponent on move 16 or 17, you’re very wily. Between moves 17 and 20, you can win by outright getting 4-in-a-row, or by cleverly forcing a 3-in-a-row loss on your opponent. After the 21st move, it’s difficult to win, you have to avoid outright losses. An end game begins at about the 21st move.

#### 7 move games

My tournament included 63, 7-move games, the Squava equivalent of chess’ Fool’s Mate. All 64 7-move games were won outright by the first player (of course). The first player was all 4 of the algorithms. The second player was always plain ol’ MCTS. This is more an illustration of plain MCTS lucking into ways to not make obvious blocks than anything else.

#### 25 move games

There’s 25 slots for player’s to fill in on a Squava board. The maximum game length is 25 moves. 21, 25-move games appear in the tournament results, but only 5 unique final boards exist after considering rotations and mirroring. The first player loses all of these games outright, with a forced 3-in-a-row. All of the 25-move games have an Alpha-beta minimaxing player moving first, and an MCTS player moving second. There’s got to be something hidden in that, but I can’t figure it out.

### Analysis of first moves

Here’s a count of all the opening moves made during the tournament, in the location in which they’re made:

```
0 1 2 3 4
0 133 190 127 189 132
1 157 478 187 503 196
2 118 175 1167 201 119
3 189 476 204 476 175
4 149 208 123 192 136
```

For all intents and purposes, the opening moves are symmetric with respect to reflections.

Folding them all together via reflections, I get this count:

```
0 1 2 3 4
0 550 779 250 0 0
1 717 1933 391 0 0
2 237 376 1167 0 0
3 0 0 0 0 0
4 0 0 0 0 0
```

I’m not surprised that 1,1 is the favorite opening move,
but 2,2 is confounding.
Peiyan Yang’s full game analysis
has 2,0; 2,1; 2;2 and 0,2; 1,2 as winning moves for the **second**
player.

The choice of 2,2 as an opening move is down almost exclusively to the MCTS with UCB1 player:

```
0 1 2 3 4
0 0 0 0 0 0
1 0 190 0 200 0
2 0 0 842 0 0
3 0 185 0 183 0
4 0 0 0 0 0
```

About half of the time MCTS/UCB1 chose 2,2 as an opening move. Notice the symmetry of it’s opening moves, I think this aligns with the MCTS/UCB1 code being correct. Of those 842 games with 2,2 opening moves, MCTS/UCB1 won outright 290 games, won by forcing a loss on 147 games, and lost 405 games. MCTS/UCB1 won just over half the games it opened with 2,2.

The most common response by far to a 1,1; 1,3; 3,1; or 3,3 opening move is 2,2.

The most common response to a 2,2 opening is one of 1,1; 1,3; 3,1 or 3,3.

### Source Code

Github repo of source code for players used in this article.