Artificial intelligence research group OpenAI said it has created software capable of beating teams of five skilled human players in the videogame Dota 2, a milestone in computer science.
The achievement puts San Francisco-based OpenAI, whose backers include billionaire Elon Musk, ahead of other artificial intelligence researchers in developing software that can master complex games combining fast, real-time action, longer-term strategy, imperfect information and team play.
The ability to learn these kinds of videogames at human or superhuman levels is important for the advancement of AI because they more closely approximate the uncertainties and complexity of the real world than games such as chess, which IBM’s software mastered in the late 1990s, or Go, which was conquered in 2016 with software created by DeepMind, the London-based AI company owned by Alphabet.
Dota 2 is a multiplayer science-fiction fantasy videogame created by Bellevue, Washington-based Valve. The tournament version pits two competing five-player teams. Each team is assigned a base on opposing ends of a map that can only be learned through exploration. Each player controls a separate character with unique powers and weapons. Each team must battle to reach the opposing team’s territory and destroy a structure called an Ancient.
The game, with more than a million active players, also is one of the most popular and lucrative in professional e-sports. The International, the game’s premier pro tournament, last year had a prize pool of more than US$24-million, the biggest for any e-sport to date.
OpenAI said its software in mid-June beat a semi-professional team that is ranked among the top 1% of Dota 2 players and an amateur team ranked in the top 10% — both times winning two games to one in a best-of-three series. Earlier in the month, OpenAI’s bot crushed three amateur teams.
Dota 2 is many times more complicated than chess or Go, where players take turns and have complete information about the state of the game. At any given moment, a player in Dota 2 must choose from an average of about a thousand valid possible actions, compared with 250 in Go and just 35 in chess. The state of the video game is also represented by about 20 000 data points, compared to 400 in Go and 70 in chess.
Learnt through trial and error
OpenAI’s software learnt solely through trial and error while playing against itself. This technique is known as reinforcement learning and is often compared to the way infants learn. It was also used by DeepMind to create its Go-playing AI. The software starts by making random movements and must learn through a series of rewards (usually points in a game environment) how to play successfully. Games are often used for reinforcement learning research because they have points that can serve as interim rewards and a clear winner or loser.
In this case, OpenAI used a relatively simple reinforcement learning algorithm it released last year that encourages the artificial intelligence to try new things without drifting too far from whatever it is currently doing that seems to be working. Throughout training, the researchers also gradually extended the time between rewards that the AI received as a way of encouraging the bot, once it had learnt the game’s basics, to think more about longer-term strategy and ultimate victory as opposed to shorter-term payoffs.
These techniques could point to big advances in the training of robots, self-driving cars, stock trading, or anything that can be reliably simulated, Greg Brockman, OpenAI’s co-founder and chief technology officer, said in an interview. “What Dota does is show that today’s algorithms can go a lot further toward being able to solve those real-world challenges than people realised,” he said.
The sort of reinforcement learning OpenAI used could be promising for solving real-world situations, particularly those that could be couched as games — whether that is military war games or those meant to simulate politics or business, said Jonathan Schaeffer, an expert on AI and games at the University of Alberta in Edmonton, Canada.
But Schaeffer said the amount of data and computing power required to use the technique effectively limited its applications. “Humans have the ability to learn with very few examples,” he said. “Humans also have the ability to generalise and learn at a higher level of abstraction than what we currently see being done by computer programs.”
To train its Dota 2 software, OpenAI used 128 000 computing cores — the central processing unit in your laptop might have just four cores — as well as 256 graphics processing units, a powerful type of computer chip originally invented to render visuals for videogames and animation. During that training, the software played the equivalent of 180 years of games against itself every day throughout a 19-day training cycle.
Founded in October 2015 by Musk, Sam Altman, the president of the Silicon Valley technology incubator Y Combinator, and a group of other PayPal alumni, OpenAI is a nonprofit company dedicated to creating what it calls “safe” artificial general intelligence and distributing it “as widely and evenly as possible”. Artificial general intelligence is a term that refers to software that would have the flexibility to equal or surpass human intellectual abilities across a wide variety of different tasks — much like the androids depicted in science-fiction movies.
OpenAI said it would challenge the top-ranked North American professional Dota 2 team to a match, which it will livestream, on 28 July. Then it will try to take on the world’s highest-ranked pros at The International, which is scheduled from 20 August to 25 August, in Vancouver, Canada.
DeepMind and Facebook’s artificial research unit have ongoing efforts to create software to play Starcraft and Starcraft II, science-fiction real-time strategy videogames produced by Activision Blizzard, but so far have not publicly demonstrated software that can beat good human players.
OpenAI’s claim to have mastered the five-against-five version of Dota 2 follows an attempt by the AI research shop to conquer a simpler one-on-one version last year. In that effort, OpenAI created software that beat one of the world’s top players in a formal demonstration. But within a few days, the AI researchers were embarrassed after amateur players discovered ways to easily defeat its software by confusing it with unusual tactics humans typically didn’t employ in real competition.
Schaeffer said reinforcement learning is likely to play a part in getting the field closer to artificial general intelligence. For now, most AI systems are “idiot savants” that can solve only one problem. This is true of OpenAI’s Dota 2 bots, too. They can play Dota 2 very well, but, once trained, cannot transfer any knowledge about strategy or tactics to other games that are conceptually similar. — Reported by Jeremy Kahn, (c) 2018 Bloomberg LP