Vasudevan Mukunth is the science editor at The Wire.
What happened
An artificially intelligent (AI) gaming system built by a company co-led by Elon Musk took on some of the best human players at a game more complex than chess or Go last week – and lost. AI losing is not something you hear about in the media often because it doesn’t lend itself to interesting discussions about how the AI was better. This is partly because the game in question, Defence of the Ancients (DotA) 2, is like little else that AI has attempted to master.
In late June, the AI, named Five, played against human players at DotA 2. Five’s builder is OpenAI, a California-based non-profit organisation helmed by Musk and Sam Altman, and it was created just to play DotA 2. This is a popular video-game in the style of a battle arena in which two teams engage in combat to take down each other’s base.
A team has five players (hence the AI’s name), each of whom can play as a ‘hero’ selected from over 100 options before the game’s start. Each hero in turn has various unique abilities that can be developed through the game to attack, defend, support other players, etc. When the hero kills a (playable or non-playable) character during the game, she earns gold that can be used to purchase items that enhance the hero’s abilities. A game typically lasts around 45 minutes.
In the three June games, Five demolished its semi-pro human opponents. However, there were three factors that enabled this outcome. First, the benchmark games were heavily rigged such that many of the features of regular human gameplay were disabled. This was because the benchmarks were the first that Five was playing against humans. Before that, including when OpenAI had taught it to play DotA 2, Five had only combated itself.
Second, as a result of the limited featureset, the human players couldn’t deploy the strategies they were used to. As one Reddit user put it, “right now, it is sorta like the two teams are trained in two different games – but the match is being played in the game in which the bots have the most experience.” Third, irrespective of the denial of features, Five – by virtue of being a sophisticated computer – sported faster reaction times than its human counterparts, possibly availing an advantage human players can’t ever do so themselves.
So what did the June games benchmark for? Mostly, Five’s preparedness to play against professional human DotA 2 players at The International, the equivalent of the football world cup for the game. These ‘regular’ games had far fewer restrictions, including the removal of some of the major ones from last time, and featured human players considered to be among the best in the world. And this time, Five lost the two games it played.
Five’s first loss was on August 5 against paiN Gaming, a team from Brazil ranked in the world’s top 20 but which had been eliminated early on in The International. Five’s second loss was scripted later in August by an ensemble of Chinese ‘superstar’ players (xiao8, BurNIng, rOtK, Ferrari_430, SanSheng).
According to an OpenAI blog post, Five “maintained a good chance of winning for the first 20-35 minutes of both games.”
These losses don’t present straightforward takeaways. Thanks to the multidimensional nature of DotA, there are numerous ways to analyse how Five played and little victories, as well as little mysteries, to be salvaged from the ruins. Here are four.
Team-play: The five heroes controlled by Five seemed reluctant to split up. The DotA 2 battle arena is very big (in terms of how much time any hero takes to traverse it), so maintaining an advantage requires heroes to split up when necessary, undertake patrols, plant wards (stationary items that act like sensors) and frustrate/deter opponent heroes. However, Five’s heroes tended to stay together even when it wasn’t necessary, opening themselves up to some attacks that the humans exploited. On the flip side, whenever the moment came for the heroes to play as one, Five’s coordination and execution was flawless. It is possible Five thought that unity was the better overall strategy considering its heroes (and opponents) had been selected by an independent group.
Mortal couriers: Five didn’t seem ready for single couriers. In DotA 2, a courier is a diminutive FedEx-like service that goes between a shop and the hero, carrying items that she might need. The benchmark games in June had five invulnerable couriers that couldn’t be killed by opponent heroes. The games played on the sidelines of The International, however, allowed for one very mortal courier. Five’s heroes seemed unprepared to deal with this (virtual) reality because they constantly played as if they would be able to access the items they needed wherever they were. In a normal game, heroes usually abandon the battlefield and fall back to pick up the items they need if they don’t have couriers.
Reaction time: After the benchmark tests, some players (not involved in the games against Five) had expressed concern that Five might be getting ahead by ‘unnaturally’ faster decision-making and computational abilities, mostly in the form of reaction time. The adult human reaction time is between 150 ms and 500 ms depending on the stimulus, physiology and task. According to OpenAI, Five had had a reaction time of 80 ms, since increased to 200 ms to appease these concerns. However, this does not seem to have made Five appear particularly weaker.
Brain freeze: There were multiple instances in both games where Five made decisions that put it in a weaker position even as human observers were able to quickly identify alternative courses of action that could have helped the AI maintain its dominance. A notable kind of these errors was a Five hero using a very strong attack spell against an opponent too weak to deserve it. When a hero uses a stronger spell, the longer she has to wait for it to recharge so she can use it again.
Why you should care
Games like DotA 2 present unique, low-cost opportunities to train AI in multiplayer combat with a colossal number of possible outcomes. However, a bigger gain in the context of Five is the techniques that OpenAI is pioneering and which the company says can be deployed in a variety of real-world situations. Some of them include the objects of interest called Rapid, Gym and competitive self-play, each building on the next to deliver sophisticated AI models.
Rapid is the name of a general-purpose reinforcement learning algorithm developed by OpenAI. It uses a technique called proximal policy optimisation (PPO), which in turn is a form of policy gradient method, to make and reward decisions. Policy gradient methods are used to help machines make decisions in environments in which they don’t have enough data to make the best decisions and when the machines have limited computational power to work with. OpenAI researchers published a preprint paper in July 2017 describing PPO as a gradient method that “outperformed other online policy gradient methods” while being simpler to implement.
The algorithms born as a result are generally classified as agents, and their performance is tested as they play out within an ambient context known generally as the environment. The latter role is played by Gym. That is, Rapid is tested with Gym. To their credit, OpenAI has made the Gym development library freely available to all developers and has qualified it to work with any numerical computation agent (including Google’s widely used TensorFlow). This means any engineer can build an agent and use Gym to test its performance.
When multiple agents made from the same algorithm are tested in a common Gym environment, the result is competitive self-play. This is how Five trained to become better at DotA 2 by playing against itself: two different agents from the same Rapid mould engaged each other, each in pursuit of its own triumph. And to force each version of Five to become better, the researchers only needed to identify the trajectories of success and then add the necessary rewards and sanctions to guide the Fives down those paths.
While DotA 2 may be too complicated to visualise this, consider the following video, where an agent needs to step over a line blocked by another agent, who has the same mandate, in order to win. The first two runs produce the same results, with the agents setting off for their respective lines in an unplanned race. The third run onwards, something else happens…
(More examples here.)
Competitive self-play is becoming increasingly popular. The other very-famous AI that trained this way is AlphaGo, the machine that beat the world’s foremost Go champion last year, including one game in which it developed previously unknown strategies to win. Keep in mind this is a game humans have been playing for about 2,500 years and which AlphaGo taught itself in three days. Competitive self-play is especially useful either when engineers want the agent to acquire a skill for which they can’t create rewards in a given environment or when they want the agent to develop new ways to solve an old problem.
So as a consequence of the way it is designed and trained, OpenAI Five is expected to be all the more powerful before The International next year, with some gamers predicting that it will take the crown. But more than disrupting the human kingdom of DotA 2, Five’s education can prepare it for even more complicated tasks. For example, OpenAI has also been developing a robotic hand to be controlled by AI, and the hope is to make it as agile and dexterous as a human hand. Thereon, it becomes a matter of questions: “Five, what is the best way to return a serve?”
OpenAI is one of the two major billion-dollar non-profit organisations in the West focusing on using AI to solve humanity’s problems. The other is the Allen Institute for Artificial Intelligence, founded by former Microsoft bigwig Paul Allen, with the express intention of making “scientific breakthroughs”. However, OpenAI is in a uniquely sticky situation vis a vis its resources. As Miles Brundage, an AI expert recently employed by OpenAI, wrote on his blog in 2015:
With Amazon’s investment and Tesla’s indirect involvement through Musk, it would seem that OpenAI will potentially have access to a lot of the data needed to make deep learning and other AI approaches work well. But these are different sorts of datasets than Google and Facebook have, and may lend themselves to different technical approaches. They also raise the question of proprietary data – how will OpenAI balance its push for openness with this proprietary access? Will it release code but not data? How will its experiments be replicable if they are tailored to particular data streams? How will privacy be addressed? Some of these questions are unique to OpenAI and others aren’t. However, they’re the sorts of questions OpenAI will need to answer.