Close-up view of someone playing the Strategy board game

DeepNash knows how to play the online version of the board game. It was lost in the Midwest.

Artificial intelligence has fallen in love with another game. A board game that requires long-term strategic thinking in the face of imperfect information has been matched by an artificial intelligence.

There are top human players in the game.

The achievement, described in Science on 1 December 1, comes hot on the heels of a study reporting an artificial intelligence that can play Diplomacy 2 in which players must negotiate.

Michael Wellman is a computer scientist at the University of Michigan who studies strategic reasoning and game theory. There are challenges that are different from games for which analogous milestones have been reached.

Imperfect information

It's more complicated than chess, Go or poker because it has characteristics that make it more difficult to play. Two players place 40 pieces on a board and can't see what their opponent's pieces are. The goal is to eliminate the opponent and get a flag. The graph of all possible ways in which the game can be played has 10 535 states. In terms of imperfect information at the start of a game, Stratego has 10 66 possible private positions, dwarfing the 10 6 such starting situations in two-player Texas hold'em poker.

According to a DeepMind researcher based in Paris, the sheer complexity of the number of possible outcomes in Stratego means that even the best programs don't work.

At Go, self-taught artificial intelligence is the best.

They developed DeepNash. John Nash, a US mathematician, came up with the term Nash equilibrium, a stable set of strategies that can be followed by all of a game's players, such that no player benefits by changing strategy on their own. One or many Nash equilibria can be found in games.

DeepNash uses a deep neural network and reinforcement learning to find a Nash equilibrium. The best policy to dictate action in a game is reinforcement learning. DeepNash has played more games against itself than any other person. The parameters of the neural network are adjusted if one side gets a reward and the other is punished. DeepNash converges on an equilibrium. DeepNash doesn't search through the game tree to find ways to improve.

During the month of April, DeepNash competed with human players on Gravon. After 50 matches, DeepNash was ranked third. Karl Tuyls, a DeepMind researcher based in Paris, says that search techniques do not need to be used to solve complex games. This is a huge step forward in the field of artificial intelligence.

Noam Brown, a researcher at Meta Artificial Intelligence, based in New York City, agrees that the results are impressive.

Diplomacy machine

Brown and his colleagues at Meta are trying to build an artificial intelligence that can play Diplomacy, a game with up to seven players, each representing a major power before the First World War. The goal is to control supply centers by moving units. The game requires private communication and active cooperation unlike two-player games.

The idea of Nash equilibrium is no longer useful for playing well with humans when you go beyond two player zero-sum games.

The poker bot is the first to win the game.

The team trained Cicero on data from 125,661 games of Diplomacy, an online version of the game. Combining these with some self-play data, Cicero's strategic reasoning module was able to predict the likely policies of the other players. The SRM chose an optimal action and signaled its intent to Cicero.

The dialogue module was built using text from the Internet and messages from Diplomacy games. Cicero, representing England, might ask France, "Do you want to support my convoy to Belgium?"

According to the Science paper 2,Cicero achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

Real-world behaviour

Brown believes that game-playing artificial intelligences that can interact with humans could pave the way for real world applications. He says that if you make a self-driving car, you don't want to assume that all the other drivers are rational. He says Cicero is a step in the right direction. We have one foot in the game world and another in the real world.

Wellman agrees that there is more work that needs to be done. He says that many of the techniques are relevant to real world applications. At some point, the leading artificial intelligence research labs need to figure out how to measure scientific progress on games that we actually care about.

The article is titled: "D41586 022-04246-7."