
Abstract
Given the recent interest in analyzing LLMs’ reasoning abilities, we conducted tests on several LLM models by having them play Connect 4 to evaluate their reasoning ability. The models used were 2 small models: DeepSeek-R1-Distill-Qwen-32B, DeepSeek-R1-Distill-Llama-70B, and 2 large models: GPT-o3-Mini, and DeepSeek-R1. The game was conducted in two rounds:
Round 1: Deepseek-r1-distill-qwen-32b & Deepseek-r1-distill-llama-70b;
Round 2: gpt o3-mini & DeepSeek R1.
Our results suggest that smaller models sometimes failed to interpret the board correctly and unable to identify potential threats, while larger models shows no difficulty understanding the board (as their reasoning is focused more on the next step strategy) but they did not account for the opponent’s next move. The full results including the reasonings can be find on our HuggingFace. One surprising result was the 0% win rate of the o3-mini model. We suspect this may be due to model randomness and the relatively small number of rounds tested.
Moving forward, we plan to run comparisons with a larger number of rounds to reduce the impact of random outliers or occasional hallucinations on the final results.
Setup
The game was played in two settings: normal gameplay & pie rule applied.
The Pie Rule is designed to prevent the first move being too advantageous.
Here’s how the Pie Rule works:
1. Player 1 makes their first move.
2. Player 2 then has two options:
• Swap roles: Player 2 can choose to switch positions with Player 1, effectively taking over Player 1’s move and continuing the game as the first player.
• Continue without swapping: If Player 2 believes the first move was not overly advantageous, they can simply proceed as Player 2 and play their turn as normal.
Why Connect 4?
We choose connect 4 because the game has a finite state space (unlike Go or chess), meaning it can be fully solved with optimal play, making it a good benchmark. Unlike chess or Go, where move trees grow exponentially, Connect 4 has a manageable number of possible positions, allowing for evaluations within a reasonable amount of time.
Prompt
The prompt we are using is relatively simple. It’s a zero-shot prompt that instructs the model on:
1. Which player it is playing.
2. What the game rules are.
3. How the output should be formatted.
The only difference between the Normal Game prompt and the Pie Rule prompt is that both models are aware the pie rule is applied, and the model playing as Player 2 has the option to swap.
Normal GamePlay:
You are an AI playing Connect 4. You are Player {player}. Connect 4 is a two-player connection game in which the players take turns dropping coloured discs into a 7-column, 6-row vertically suspended grid.
The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own discs.
Rules:
1. The game is played on a 7x6 grid.
2. Players take turns dropping one disc into a column of their choice.
3. The disc falls to the lowest available space in the selected column.
4. The first player to connect four of their discs in a row (horizontally, vertically, or diagonally) wins.
5. If the grid is completely filled without either player achieving four in a row, the game ends in a draw.
1 represents Player 1's pieces and 2 represents Player 2's pieces. 0 represents empty space.
Respond with the format: 'Reasoning: [your reasoning]. My choice is [your choice]'. Your choice should be a single integer from 0 to 6 (0 indicating column 1, 1 indicating column 2 so on).
Always return your move even though you have won the game. Keep your reasoning simplified.
Pie Rule Applied:
You are an AI playing Connect 4. You are Player {player}. Connect 4 is a two-player connection game in which the players take turns dropping colored discs into a 7-column, 6-row vertically suspended grid.
The objective of the game is to be the first to form a horizontal, vertical, or diagonal line of four of one's own discs.
Rules:
1. The game is played on a 7x6 grid.
2. Players take turns dropping one disc into a column of their choice.
3. The disc falls to the lowest available space in the selected column.
4. The first player to connect four of their discs in a row (horizontally, vertically, or diagonally) wins.
5. If the grid is completely filled without either player achieving four in a row, the game ends in a draw.
6. Pie rule is applied to this game, meaning player 2 has the option to swap with player 1 on the first move. If you are player 2 and you wish to swap, just type 'swap'.
1 represents Player 1's pieces and 2 represents Player 2's pieces. 0 represents empty space.
Respond with the format: 'Reasoning: [your reasoning]. My choice is [your choice]'. Your choice should be a single integer from 0 to 6 (0 indicating column 1, 1 indicating column 2 so on).
Always return your move even though you have won the game. Keep your reasoning simplified.
Results

Round 1
Player 1 = deepseek-r1-distill-qwen-32b
Player 2 = deepseek-r1-distill-llama-70b
Game | Pie rule applied? | Winner | |
---|---|---|---|
1 | No | ![]() | Player 1 (32b) |
2 | No | ![]() | Player 1 (32b) |
3 | No | ![]() | Player 2 (70b) |
4 | Yes – swapped | ![]() | Player 1 (70b) |
5 | Yes – swapped | ![]() | Player 1 (70b) |
6 | Yes – swapped | ![]() | Player 1 (70b) |
The gifs show that at some points both models failed to block each other’s potential wins. Not just only that, for model that wins, at some points it failed to identify its immediate wins. Let’s dive deeper into the model’s reasoning to have a deeper understanding of why this happened.
In game 1 and game 2, we can see that player 2 failed to block player 1
In Game 1:
Reasoning: Alright, so I'm trying to figure out my next move in this Connect 4 game as Player 2. Let me look at the current board state to understand what's going on.
The board is a 7x6 grid, right? Each row represents a row from top to bottom, and each column is from left to right. The numbers represent the players: 1 for Player 1 and 2 for Player 2. Zeros are empty spots.
Looking at the board:
Row 0: 0 0 0 0 0 0 0
Row 1: 0 0 0 2 1 0 0
Row 2: 0 0 0 1 1 0 0
Row 3: 0 0 0 2 1 0 0
Row 4: 0 0 0 2 2 0 0
Row 5: 0 0 0 1 1 2 0
So, starting from the top (row 0) to the bottom (row 5). I need to figure out where to place my disc (which is 2) next.
First, I should check which columns are available. Each column can only have up to 6 discs, so I need to see which columns are not full. Looking at each column:
Column 0: All zeros except row 5 has a 1. So, it's not full.
Column 1: Same as column 0.
Column 2: Same as column 0.
Column 3: Has discs in rows 1,2,3,4,5. So, the next available spot is row 0.
Column 4: Has discs in rows 1,2,3,4,5. Next available is row 0.
Column 5: All zeros except row 5 has a 2. So, available from row 0 to 4.
Column 6: All zeros except row 5 has a 0. Wait, row 5, column 6 is 0, so the column is empty except the bottom row is 0? Wait, no, row 5 is the bottom row, so if it's 0, that means the column is empty. So column 6 is completely empty.
...
In Game 2:
Wait, perhaps I should look for where Player 1 is about to win and block them.
Looking at Player 1's pieces (1s):
Row 0, column 3:1
Row 2, column 4:1
Row 3, column 4:2? Wait, no, row 3, column 4 is 2, which is mine. Wait, row 3, column 4 is 2, so that's mine. So Player 1 has 1s in row 0, column 3; row 2, column 4; row 3, column 3; row 4, column 4; row 5, columns 0 and 2 and 3.
Wait, in row 5, columns 0,2,3 are 1s. So Player 1 has three in a row in row 5, columns 0,2,3. Wait, no, columns 0,2,3 are 1s, but they're not consecutive. So they're not in a straight line. So maybe they're not a threat yet.
But looking at column 3, Player 1 has a 1 in row 0, and row 3, and row 5. So that's three in column 3, but they're not consecutive. So maybe not a vertical threat.
Alternatively, maybe Player 1 is trying to make a diagonal. For example, from row 5, column 0 (1) to row 3, column 3 (1) to row 1, column 4 (2, which is mine), so that's not possible.
Alternatively, maybe Player 1 can make a horizontal line in row 5, columns 0,2,3, but there's a gap in column 1. So they can't connect four there.
So perhaps I don't need to block them immediately.
In Game 4 (where player 1 failed to identify the immediate win:

The board is a 7-column, 6-row grid, and it's currently in this state:
Row 0: 0 0 0 0 0 0 0
Row 1: 0 0 0 0 0 0 0
Row 2: 0 0 0 1 0 0 0
Row 3: 0 0 0 2 0 0 0
Row 4: 0 0 0 2 2 0 0
Row 5: 0 1 1 1 2 0 0
I'm Player 1, so I'm using '1's, and Player 2 is using '2's. My last move was placing in column 1, which gave me three in a row at the bottom. Now I need to decide where to go next.
First, I should check if I can win immediately. I have three '1's in row 5, columns 1, 2, and 3. If I place another '1' in column 4 of row 5, that would make four in a row horizontally. But wait, can I even place there? Let me see. Column 4 in row 5 is currently a '2', so I can't place there. That spot is taken by Player 2.
Hmm, maybe I should look for other opportunities. Let me check vertically. In column 3, I have a '1' in row 2, and in row 5, column 3 is also a '1'. So if I place another '1' in column 3, that would stack on top of the existing '1's. But I need to check if the column is full. Column 3 has discs in rows 2, 3, 4, and 5. Wait, row 0 and 1 are empty, so I can place another disc in column 3. If I do that, it would be in row 1, since row 0 is the top and it's empty. So placing in column 3 would give me two '1's vertically, but not four yet. Maybe that's not the best move.
At first it checked for immediate wins, which sounds correct but it only checked for the right side (column 4 (0 indexed)) but not the first column. Then it straight jumped to check immediate win vertically.
Findings
To summarise, in round 1, the model failed to understand the board, it stated that column 0, 1 & 2 has All zeros except row 5 has a 1. So, it’s not full.
Secondly, it failed to detect potentially vertical threat by player 1.
Now, we are wondering how the latest reasoning models would perform. That’s where our round 2 comes from.
Turn | Initial Board | Player | Reasoning | Board |
---|---|---|---|---|
1 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 1 | Okay, so I’m trying to figure out my move in this Connect 4 game. Let me start by understanding the current board state. The board is a 7×6 grid, and all the cells are 0, which means it’s completely empty. So, it’s the very beginning of the game. Since the board is empty, my first move can’t really affect any potential connections because there’s nothing to connect yet. But I remember that in Connect 4, the best strategy is usually to start in the center column because it offers the most opportunities for creating lines. The center column is column 3 (since columns are 0-indexed, so 0 to 6). Let me think about why the center is better. If I drop my piece in column 3, it will land in the bottom row, which is the 5th index (since rows are 0-indexed from the top). From there, I can potentially create lines in multiple directions—up, down, left, right, and diagonally. Starting in the center gives me the most flexibility for future moves. Also, starting in the center might make it harder for the opponent to block all possible connections. If I start on the edge, the opponent can more easily block the limited directions. But in the center, there are more directions to work with, so it’s harder for the opponent to block all of them. Another thing to consider is that the opponent will likely respond by trying to mirror my moves or block my potential lines. But since the board is empty, there’s no immediate threat, so the best move is to set up the foundation for future moves. I don’t see any immediate threats of losing if I choose any column, but the center is the optimal starting point. So, I’ll go with column 3. | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 |
2 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 | 2 | Okay, so I’m trying to figure out my move in this Connect 4 game. Let me start by understanding the current board state. The board is a 6-row by 7-column grid. Right now, all the rows are empty except for the fifth row, where there’s a 1 in the fourth column. So, the board looks like this: Row 0: 0 0 0 0 0 0 0 Row 1: 0 0 0 0 0 0 0 Row 2: 0 0 0 0 0 0 0 Row 3: 0 0 0 0 0 0 0 Row 4: 0 0 0 0 0 0 0 Row 5: 0 0 0 1 0 0 0 Wait, actually, I think I might have misread that. The way the board is presented is from top to bottom, so row 0 is the top, and row 5 is the bottom. So the 1 is in column 3 (since columns are 0-6) of row 5. That means the bottom row has a disc from Player 1 in the middle column. Since I’m Player 2, my goal is to connect four of my discs (2s) in a row, either horizontally, vertically, or diagonally. Right now, the board is mostly empty except for that one disc from Player 1. My first move should be strategic to set myself up for future possibilities. I remember that in Connect 4, the center columns are the most important because they give more options for connecting in both directions. So, dropping my disc in the middle column (column 3) would be a good idea. However, since Player 1 already has a disc in column 3 of row 5, if I also drop in column 3, my disc will land in row 4, just above theirs. That could potentially set me up for a vertical connection if I place more discs there in the future. Alternatively, I might consider other columns, but since the board is empty otherwise, the middle is the best starting point. Placing in column 3 will allow me to build both upwards and possibly create a horizontal or diagonal line later. It also blocks Player 1 from easily creating a vertical line in that column since I’m right above them. Wait, but if I place in column 3, I’m only one space above Player 1’s disc. If they place another disc in column 3, they could potentially create a vertical line of two, but I can always respond by placing another disc there to block them. Alternatively, maybe I should spread out to other columns to create multiple threats, but since it’s the first move, it’s better to take the center. So, I think the best move is to drop my disc in column 3. That way, I’m in the center, which gives me the most flexibility for future moves. It also puts me directly above Player 1’s disc, which could lead to a vertical connection if I can place more discs there. Wait, but let me double-check. If I place in column 3, my disc will be in row 4. Then, if Player 1 places another disc in column 3, it’ll be in row 3, and so on. But since I’m Player 2, I want to maximize my chances. Maybe I should consider other columns, but since the board is empty, the center is the strongest move. Another thought: maybe I should place in column 2 or 4 to create symmetry and have more options on both sides. But since Player 1 is in column 3, taking the center would be better because it’s the most influential spot. So I’ll stick with column 3. | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 |
3 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 0 0 0 | 1 | Alright, let’s see. I’m playing Connect 4 as Player 1. The board is a 7×6 grid, and I need to make a move. First, I should look at the current state of the board to understand where the pieces are and what moves are possible. Looking at the board, the bottom row (row 6) has Player 2’s piece in column 4, and Player 1’s piece in column 3. The row above that (row 5) is all zeros except for column 4, which is also a 2. Rows 1 to 4 are completely empty. Since the pieces fall to the lowest available space, I need to check which columns are still available for dropping a disc. Columns 0 to 6 are all possible, but I should avoid columns that are already full. In this case, none of the columns are full because the board isn’t filled yet. Now, I should think about the strategy. The goal is to connect four in a row. I need to look for potential winning moves or ways to block Player 2 from winning. Also, I should consider creating multiple threats so that Player 2 can’t block all of them. Looking at the current pieces, Player 1 has a piece in column 3, row 6, and Player 2 has a piece in column 4, rows 5 and 6. That means there’s a potential vertical line for Player 2 in column 4, but only two pieces. Similarly, Player 1 has only one piece in column 3. I should check for possible horizontal, vertical, or diagonal connections. For Player 1, placing a piece in column 4 would create a diagonal from (6,3) to (5,4), but that’s only two pieces. Maybe I can extend that. Alternatively, placing a piece in column 2 or 4 could create a horizontal line. Let’s see: if I place in column 4, row 6 is already taken by Player 2, so the next available row would be row 5. So placing in column 4 would give me (5,4). Then, I’d have pieces at (6,3) and (5,4), which is a diagonal. But that’s only two pieces, so not a win yet. Wait, maybe I should check if Player 2 has any immediate threats. Player 2 has two pieces in column 4. If they place another piece in column 4, they could get three in a row, but not four yet. So maybe I should block them by placing in column 4. Alternatively, I can try to build my own connections. If I place in column 4, I can start building a vertical line. But since the column isn’t full, I can place there. Another consideration is the center columns, as they offer more opportunities for connections. So columns 3 and 4 are good options. I should also check if placing in column 4 would create any immediate threats. Let me visualize: if I place in column 4, row 5, then I have (5,4). Then, looking for any possible lines of four, I don’t see any yet. But it sets up potential for future moves. Alternatively, placing in column 3 again could help build a vertical line, but I already have one there. Wait, maybe I should think about the possible lines. If I place in column 4, I can create a diagonal with my existing piece in column 3, row 6. So from (6,3) to (5,4), that’s a diagonal of two. If I place another piece in (4,5), that would make a diagonal of three, and then another in (3,6) would make four. But that’s several moves ahead. Alternatively, if I place in column 2, I can start building a horizontal line. But that’s less central, so maybe not the best. Considering all this, placing in column 4 seems like a good move. It’s central, and it can help build towards a diagonal or vertical connection. I should also check if placing in column 4 is possible. Looking at the board, column 4 has two pieces in rows 5 and 6. So the next available row is row 4. So placing in column 4 would place the piece at (4,4). Wait, but in the board state provided, row 6 has 0 0 0 1 0 0 0, and row 5 has 0 0 0 2 0 0 0. So column 4 has a 2 in row 5 and a 0 in row 6. Wait, no, row 6 is the bottom row. So in column 4, row 6 is 0, row 5 is 2. So the next available space in column 4 is row 6. Wait, no, because in Connect 4, pieces fall to the lowest available space. So in column 4, the lowest available is row 6, which is empty. So placing in column 4 would put the piece at (6,4). But in the given board, row 6 has 0 in column 4, so it’s possible. Wait, I’m getting a bit confused. Let me re-examine the board. The board is given with row 1 at the top and row 6 at the bottom. So the bottom row (row 6) has 0 0 0 1 0 0 0, meaning Player 1 has a piece in column 4, row 6. Player 2 has a piece in column 4, row 5. So if I place in column 4 now, the piece would go to row 4, since row 5 and 6 are occupied. Wait, no, because the pieces fall to the lowest available space. So in column 4, the lowest available is row 4, because row 5 and 6 are already filled. Wait, but in the board state provided, row 6 has 0 in column 4, so actually, the lowest available is row 6. Let me check again. Wait, the board is represented with row 1 at the top and row 6 at the bottom. So the bottom row (row 6) is the last row. So in column 4, row 6 is 0, row 5 is 2, and row 4 is 0. So the lowest available space in column 4 is row 6. Wait, no, because in Connect 4, the pieces are dropped from the top, so the lowest available row is the first empty spot from the bottom up. So in column 4, row 6 is empty, so the piece would go there. Wait, but in the given board, row 6 has 0 in column 4, so it’s available. So placing in column 4 would put the piece at row 6, column 4. But in the current state, Player 1 has a piece at (6,3) and Player 2 has a piece at (5,4). So placing in column 4 would be at (6,4). That would give me a piece next to my existing piece in column 3, row 6. So that could set up a horizontal or diagonal connection. Alternatively, placing in column 3 again could help build a vertical line, but I only have one piece there. I think placing in column 4 is a good move because it’s central and can lead to multiple connection possibilities. Plus, it’s adjacent to my existing piece, which could help in forming a horizontal line. So, after considering all this, I decide to place my piece in column 4. | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2 0 0 0 0 0 0 1 1 0 0 |
For the full result, please visit our HuggingFace.
Round 2
Player 1 = o3 mini
Player 2 = R1
Game | Pie rule applied? | Winner | |
---|---|---|---|
1 | No | ![]() | Player 2 (R1) |
2 | No | ![]() | Player 2 (R1) |
3 | No | ![]() | Player 2 (R1) |
4 | Yes (hard coded) – swapped | ![]() | Player 1 (R1) |
5 | Yes (hard coded) – swapped | ![]() | Player 1 (R1) |
6 | Yes (hard coded) – swapped | ![]() | Player 1 (R1) |
Reasonings:

So yes, Player2 placed in column4, row4. Because Player1 must have made a move in row5, column1 perhaps? Wait, after the initial board: initial board had row5 [0,0,1,1,2,0,0]. Then Player2's previous move was column4 (which is now row4, column4).
But now, in the current state given, the user shows:
rows:
0,1,2: all zeros.
row3: col3 is 2.
row4: col3 is1, col4 is2.
row5: col1:1, col2:1, col3:1, col4:2.
So Player1 must have made a move in row5, column1 (since in the initial board, row5's first non-zero was in col3:1). Wait, perhaps the last Player1 move was column1. Let's assume that Player2 placed in column4 (adding to row4, col4 as 2). Then Player1 placed in column1, row5 (making row5 column1:1). Then it's now Player2's turn.
The key point here is that Player1 now has three in a row in row5's columns1-3 (1,1,1). But adjacent to that, column4 is occupied by 2. So Player1 can't complete four in a row there. But then, in column0, row5 is empty. If Player1 places in column0, they would have columns0-3 all 1s in row5. So that's a horizontal four. So that's a critical threat. So as Player2, I need to block column0 to prevent Player1 from winning.
Thus, my move should be column0.
But wait, column0 is currently completely empty (the bottom row is 0), so placing there would put my piece in row5, column0, which blocks Player1's potential move to get four in a row. Alternatively, if I ignore and Player1 plays there next, they win.
So my reasoning is: Block Player1's potential horizontal four by placing in column0.

Reasoning: I will drop my piece in column 3 (board index 2) to build a vertical threat in that column and set up future winning opportunities. My choice is 2."
Player 1 failed to identify if it placed in column 2 (0 indexed), that would make player 2 wins.

Player 2:
"Reasoning: Need to block Player 1's potential vertical three in column 3. My choice is 3.

Player 2:
"Reasoning: I will continue with central pressure. Column 3 is still a strong move even though it's partly occupied; dropping there lands at row 3 and helps build both vertical and diagonal opportunities. My choice is 3.
Findings
Even though the model didn’t struggle on understanding the board anymore (the reasoning is focused more on which step should it take), but the models are only able to identify threat when there are 3 consecutive pieces.
Another noticeable behaviour is that it does not take potential next step after the current step, meaning after placing the current drop, it might create opportunity fore the opponent to wins.
Turn | Initial Board | Player | Reasoning | Board |
---|---|---|---|---|
1 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 | 1 | Reasoning: I choose the center column to maximize my opportunities and flexibility for future moves. My choice is 3. | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 |
2 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 | 2 | Okay, so I’m looking at this Connect 4 board. The current state shows all zeros except for the bottom row, where the fourth column has a 1. That means Player 1 just placed their disc in column 4 (since the bottom row is filled first). Now it’s my turn as Player 2. First, I need to choose a column between 0 and 6. Let me think about the possible moves. Since the board is almost empty, I should focus on both offense and defense. But since it’s the first move by Player 1, there’s no immediate threat yet. Maybe the best strategy here is to start building my own line. Placing my disc in the center column (column 3) is usually a good idea because it offers more opportunities for creating multiple lines. However, Player 1 has already placed in column 3 (since the bottom row is 0-based, the fourth position is index 3… wait, wait, actually no. Wait, the bottom row is the sixth row in the array, which is shown as the last row here. Wait, let’s clarify. The board is represented as a 6-row grid. The rows are from top to bottom, so the bottom row is the sixth one. Looking at the given board: Row 0: 0 0 0 0 0 0 0 Row 1: 0 0 0 0 0 0 0 Row 2: 0 0 0 0 0 0 0 Row 3: 0 0 0 0 0 0 0 Row 4: 0 0 0 0 0 0 0 Row 5: 0 0 0 1 0 0 0 Wait, row indices start at 0, so row 5 is the bottom row. In that row, column 3 has a 1. So Player 1 dropped a disc in column 3, which is the fourth column (since columns are 0-6). Now, as Player 2, I need to decide where to place my disc. Common strategies for the first few moves are to control the center because it offers the most connections. Column 3 is already taken in the bottom row, so maybe the next best options are columns 2 or 4, adjacent to the center. Alternatively, starting on the other side. Alternatively, sometimes players go for the sides, but center control is more strategic. Let me check if column 3 is possible. Since row 5, column 3 is filled, the next available spot in column 3 would be row 4. But since Player 2 is going second, maybe I should block any potential vertical or horizontal from Player 1. Player 1 has one disc in column 3. To block a vertical, I just need to place in column 3 again, but that would mean building their potential vertical. Alternatively, maybe place in column 2 or 4 to start building horizontally. Wait, Player 1 has a disc at column 3, row 5. If I place in column 2 or 4, I could start a horizontal line, or at least prevent Player 1 from expanding. For example, if Player 1 places another in column 4 next turn, that could make a horizontal pair. Maybe blocking that now would be good. Alternatively, create my own potential line. Since it’s early in the game, maybe the best move is to take column 3 (since the column isn’t filled yet), but placing in the same column as Player 1. Wait, no—each time you place in a column, it stacks on top. So if Player 1 placed in column 3 (the bottom row, row 5), then if I place in column 3 next, my disc would go to row 4. That creates a vertical possibility for myself. But Player 1 might then block that. Alternatively, maybe play another column. Alternatively, placing in column 2. That way, if Player 1 tries to build a horizontal line to the left, I can block. Or creating my own potential horizontal or diagonal. Another possibility is to choose column 3. Since that’s the center, even though Player 1 has a disc there, stacking on top can help in the center control. But I’m not sure if that’s the best move. Alternatively, maybe the next best column is column 4. If I place my disc there, then in row 5, column 4. Then, Player 1 could place in column 2, forming a potential horizontal line. Hmm. Alternatively, starting my own line in a different column. For example, column 0 or 6. But those are edges and offer fewer connection opportunities. Alternatively, maybe choose column 3 (the same as Player 1) but in row 4. Since starting a vertical line could be possible, but they might block it. Alternatively, I need to weigh each option. Wait, but since it’s the first move by Player 1, there’s no immediate threat, so the best strategic move is probably to take the center. But since Player 1 is already in column 3 (their disc is in row 5 column 3), then I can place in column 3 as well, but in row 4. That would give me a vertical potential. Alternatively, since it’s the second move, maybe take an adjacent column. Alternatively, the best opening after the center is taken might be columns 2 or 4. Let me think. For example, putting in column 4. It’s adjacent to the center, allowing for a horizontal line. If I place there, then Player 1 might place in column 5 next turn, but that could spread out. Alternatively, perhaps column 2. Then, a potential horizontal if I can place more in that area. I’m a bit torn. Let’s recall that the center columns (columns 2,3,4) are considered more valuable because they offer more opportunities for creating lines. Since column 3 is already taken, maybe go for column 2 or 4. Alternatively, maybe mirroring. Alternatively, some strategies suggest that after the first player takes the center, the second player takes one of the central columns adjacent to prevent the first player from expanding. For example, placing in column 2 or 4. Hmm. So maybe selecting column 3 isn’t the best now, because the next move in that column would require multiple discs to make a vertical four. Alternatively, starting in column 2 or 4 might allow me to build horizontally or diagonally. Suppose I choose column 4. That places a disc next to Player 1’s in column 3, potentially creating a horizontal start. But Plater 1 could block. Alternatively, place in column 2, mirroring. Alternatively, column 1 or 5, but those are less central. Another approach: in Connect 4, the first player often has an advantage, so the second player needs to be more defensive. But in this early stage, Player 1’s only move is column 3. To block any potential center dominance, maybe take the adjacent columns. Possibly, column 2 and 4. But since I can only choose one, maybe column 4. If I pick column 4, then the next available spot is row 5 (since it’s the bottom of column 4). Wait, no: The board’s bottom row is row 5, and the columns are 0-6. So the columns are laid out such that when you place a disc, it falls to the lowest empty spot. Wait, the current board shows that column 3 (which is the fourth column) in row 5 has a 1. So Player 1’s disc is at column 3, row 5. If I want to place in column 4, row 5 is empty, so my disc would go there. So yes, then in column 4, row 5 would now be 2. By placing in column 4, I start building a horizontal line next to Player 1’s disc. Then, if I can get another disc in column 5 or column 6, I can make a row. Alternatively, Player 1 might block. Alternatively, placing in column 2. That way, if Player 1 wants to build a horizontal line to the left, they can’t. Alternatively, creating my own possible horizontal in columns 2,3,4,5… Uncertain. I think the best initial moves are center-oriented. So maybe column 4 is the right choice. So placing there to create adjacency and start a potential horizontal line. Therefore, my choice is column 4 (which is index 4). | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 |
3 | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 | 1 | Reasoning: I choose column 3 as it remains the central, strategic column allowing me to build vertical, horizontal, or diagonal connections. My choice is 3. | 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 1 2 0 0 |
For the full results, please visit our HuggingFace.
Conclusions
We can see that smaller models, although capable of understanding the game rules, struggle to interpret the board and identify potential threats from the other player. In contrast, larger models do not struggle with understanding the board. Their reasoning focuses on identifying the best next move rather than analysing the current board state. Although larger models demonstrate the ability to understand the board state and recognise the opponent’s threats, their reasoning does not account for the opponent’s next move after the current one.
Notably, the 0% win rate observed in the o3-mini model was unexpected. We believe this may be attributed to model randomness and the limited number of rounds conducted.
In future work, we aim to increase the number of testing rounds to minimise the influence of random variation or isolated hallucinations, ensuring more robust and reliable conclusions.