OpenAI’s Dota 2 defeat is still a win for artificial intelligence

On Aug 29, 2018

Last week, humanity struck back against the machines — sort of.

Actually, we beat them at a video game. In a best-of-three match, two teams of pro gamers overcame a squad of AI bots that were created by the Elon Musk-founded research lab OpenAI. The competitors were playing Dota 2, a phenomenally popular and complex battle arena game. But the match was also something of a litmus test for artificial intelligence: the latest high-profile measure of our ambition to create machines that can out-think us.

In the human-AI scorecard, artificial intelligence has racked up some big wins recently. Most notable was the defeat of the world’s best Go players by DeepMind’s AlphaGo, an achievement that experts thought out of reach for at least a decade. Recently, researchers have turned their attention to video games as the next challenge. Although video games lack the intellectual reputation of Go and chess, they’re actually much harder for computers to play. They withhold information from players; take place in complex, ever-changing environments; and require the sort of strategic thinking that can’t be easily simulated. In other words, they’re closer to the sorts of problems we want AI to tackle in real life.

Dota 2 is a particularly popular testing ground, and OpenAI is thought to have the best Dota 2 bots around. But last week, they lost. So what happened? Have we reached some sort of ceiling in AI’s ability? Is this proof that some skills are just too complex for computers?

The short answers are no and no. This was just a “bump in the road,” says Stephen Merity, a machine learning researcher and Dota 2 fan. Machines will conquer the game eventually, and it’ll likely be OpenAI that cracks the case. But unpacking why humans won last week and what OpenAI managed to achieve — even in defeat — is still useful. It tells us what AI can and can’t do and what’s to come.

First, let’s put last week’s matches in context. The bots were created by OpenAI as part of its broad research remit to develop AI that “benefits all of humanity.” It’s a directive that justifies a lot of different research and has attracted some of the field’s best scientists. By training its team of Dota 2 bots (dubbed the OpenAI Five), the lab says it wants to develop systems that can “handle the complexity and uncertainty of the real world.”

The five bots (which operate independently but were trained using the same algorithms) were taught to play Dota 2 using a technique called reinforcement learning. This is a common training method that’s essentially trial-and-error at a huge scale. (It has its weaknesses, but it also produces incredible results, including AlphaGo.) Instead of coding the bots with the rules of Dota 2, they’re thrown into the game and left to figure things out for themselves. OpenAI’s engineers help this process along by rewarding them for completing certain tasks (like killing an opponent or winning a match) but nothing more than that.

This means the bots start out playing completely randomly, and over time, they learn to connect certain behaviors to rewards. As you might guess, this is an extremely inefficient way to learn. As a result, the bots have to play Dota 2 at an accelerated rate, cramming 180 years of training time into each day. As OpenAI’s CTO and co-founder Greg Brockman told The Verge earlier this year, if it takes a human between 12,000 and 20,000 hours of practice to master a certain skill, then the bots burn through “100 human lifetimes of experience every single day.”

Part of the reason it takes so long is that Dota 2 is hugely complex, much more so than a board game. Two teams of five face off against one another on a map that’s filled with non-playable characters, obstacles, and destructible buildings, all of which have an effect on the tide of battle. Heroes have to fight their way to their opponent’s base and destroy it while juggling various mechanics. There are hundreds of items they can pick up or purchase to boost their ability, and each hero (of which there are more than 100) has its own unique moves and attributes. Each game of Dota 2 is like a battle of antiquity played out in miniature, with teams wrangling over territory and struggling to out-maneuver opponents.

Processing all this data so games can be played at a faster-than-life pace is a huge challenge. To train their algorithms, OpenAI had to corral a massive amount of processing power — some 256 GPUs and 128,000 CPU cores. This is why experts often talk about the OpenAI Five as an engineering project as much as a research one: it’s an achievement just to get the system up and running, let alone beat the humans.

“As far as […] showcasing the level of complexity modern data-driven AI approaches can handle, OpenAI Five is far more impressive than either DQN or AlphaGo,” says Andrey Kurenkov, a PhD student at Stanford studying computer science and the editor of AI site Skynet Today. (DQN was DeepMind’s AI system that taught itself to play Atari.) But, notes Kurenkov, while these older projects introduced “significant, novel ideas” at the level of pure research, OpenAI Five is mainly deploying existing structures at a previously undreamt-of scale. Win or lose, that’s still big.

Is 6GB VRAM Enough for 1440p Gaming? Testing Nvidia’s RTX…

Feb 1, 2019

Fortnite Save The World Is A Weird, But Fun Co-op Game

Jan 31, 2019

But putting aside engineering, how good can the bots be if they just lost two matches against humans? It’s a fair question, and the answer is: still pretty damn good.

Over the past year, the bots have graduated through progressively harder versions of the game, starting with 1v1, then 5v5 matches with restrictions. However, they have yet to tackle the game’s full complexity. For the matches at The International, a few of these constraints were removed, but not all of them. Most notably, the bots no longer had invulnerable couriers (NPCs that deliver items to heroes). These had previously been an important prop for their style of play, ferrying a reliable stream of healing potions to help them keep up a relentless attack. At The International, they had to worry about their supply lines being picked off.

Whether or not the bots mastered long-term strategy is a key question

Although last week’s games are still being analyzed, the early consensus is that the bots played well but not exceptionally so. They weren’t AI savants; they had strengths and weaknesses, which humans could take advantage of as they would against any team.

Both games started very level, with humans first taking the lead, then bots, then humans. But both times, once the humans gained a sizable advantage, the bots found it hard to recover. There was speculation by the game’s commentators that this might be because the AI preferred “to win by 1 point with 90% certainty, than win by 50 points with a 51% certainty.” (This trait was also noticeable in AlphaGo’s game style.) It implies that OpenAI Five was used to grinding out steady but predictable victories. When the bots lost their lead, they were unable to make the more adventurous plays necessary to regain it.

This is just a guess, though. As is usually the case with AI, divining the exact thought process behind the bots’ actions is impossible. What we can say is that they excelled in close quarters but found it trickier to match humans’ long-term strategies.

The OpenAI Five were unerringly precise, aggressively picking off targets with spells and attacks, and generally being a menace to any enemy heroes they came upon. Mike Cook, an AI games researcher at the University of Falmouth and an avid Dota player who live-tweeted the fights, described the bots’ style as “hypnotic.” “[They] act with precision and clarity,” Cook told The Verge. “Often, the humans would win a fight and then let their guard down slightly, expecting the enemy team to retreat and regroup. But the bots don’t do that. If they can see a kill, they take it.”

“if they can see a kill, they take it.”

Where the bots seemed to stumble was in the long game, thinking how matches might develop in 10- or 20-minute spans. In the second of their two bouts against a team of Chinese pro gamers with a fearsome reputation (they were variously referred to by the commentators as “the old legends club” or, more simply, “the gods”), the humans opted for an asymmetric strategy. One player gathered resources to slowly power up his hero, while the other four ran interference for him. The bots didn’t seem to notice what was happening, though, and by end of the game, team human had a souped-up hero who helped devastate the AI players. “This is a natural style for humans playing Dota,” says Cook. “[But] to bots, it is extreme long-term planning.”

This question of strategy is important not just for OpenAI, but for AI research more generally. The absence of long-term planning is often seen as a major flaw of reinforcement learning because AI created using this method often emphasize immediate payoffs rather than long-term rewards. This is because structuring a reward system that works over longer periods of time is difficult. How do you teach a bot to delay the use of a powerful spell until enemies are grouped together if you can’t predict when that will happen? Do you just give it small rewards for not using that spell? What if it decides never to use it as a result? And this is just one basic example. Dota 2 games generally last 30 to 45 minutes, and players have to constantly think through what action will lead to long-term success.

It’s important to stress, though, that the bots weren’t just thoughtless, reward-seeking gremlins. The neural network controlling each hero has a memory component that learns certain strategies. And the way they respond to rewards is shaped so that the bots consider future payoffs as well as those that are more immediate. In fact, OpenAI says its AI agents do this to a far greater degree than any other comparable systems, with a “reward half-life” of 14 minutes (roughly speaking, the length of time the bots can wait for future payoffs).

Kurenkov, who’s written extensively about the limitations of reinforcement learning, said that the matches show that reinforcement learning can handle “far more complexity than most AI researchers might have imagined.” But, he adds, last week’s defeat suggests that new systems are needed specifically to manage long-term thinking. (Unsurprisingly, OpenAI’s chief technology officer disagrees.)

Unlike the outcome of the matches, there’s no obvious conclusion here. Disagreement over the bots’ success mirrors larger, unsolved discussions in AI. As researcher Julian Togelius noted on Twitter, how do we even begin to differentiate between long-term strategy and behavior that just looks like it? Does it matter? All we know for now is that in this particular domain, AI can’t out-think humans yet.

Wrangling over the bots’ cleverness is one thing, but OpenAI Five’s Dota 2 matches also raised another, more fundamental question: why do we stage these events at all?

Take the comments of Gary Marcus, a respected critic of the limitations of contemporary AI. In the run-up to OpenAI’s games last week, Marcus pointed out on Twitter that the bots don’t play fairly. Unlike human gamers (or some other AI systems), they don’t actually look at the screen to play. Instead, they use Dota 2’s “bot API” to understand the game. This is a feed of 20,000 numbers that describes what’s going on in numerical form, incorporating information on everything from the location of each hero to their health to the cooldown on individual spells and attacks.