Tony Peng
Last year, OpenAI's 1v1 Ai defeated the world's top player Dendi,openai CTO Greg Brockman Promise: Next year, we will return TI with 5v5 AI bot. Today, they fulfilled their promise to challenge the world's top Dota 2 human players with a new OpenAI Five. However, after a 51-minute race, OpenAI experienced a complete defeat.
According to the introduction, many of the TI8 participating teams are enrolled to participate in the game with AI, OpenAI today met the first opponent: from Brazil's team PaiN, the latter is the first team to be eliminated from the TI8 competition. But it is undeniable that it remains one of the most powerful 18 teams in the world so far. In the previous open competition, OpenAI Five defeated the Dendi in the 1v1 competition and defeated the 6000-level team composed of pre-professional players and game interpreters in the 5V5 competition.
There are AlphaGo go "precedent", before the competition, people have predicted Opanai AI win. However, the fact is not so simple, although the OpenAI agent in the operational response and other aspects of comparative advantages, when the overall strategy and cooperation is still inferior to the human team.
TI8, Bad start
Today's man-machine war is only one game, the two sides of the match are as follows:
- OpenAI: Helicopters, lich, Death Prophet, ice girl, tidal
- PaiN night dire side: Ryan, Necromancer, witch doctor, muskets, axe King
At the beginning of the game, OpenAI gave himself 97% of the winning percentage, but the start on the wrong foot, PaiN chose to open the fog directly rushed into the Tin FAI field, four people besieged the tide of the single, grabbed a blood. OpenAI also showed the artificial intelligence "not smart" side, under the tower constantly inserting eyes.
PaiN start playing well, the game into 7 minutes and a half, OpenAI Five backward 1000 economy. OpenAI gradually in 10 minutes when the score broke into 7:7, the scene sank into anxiety. We can see that computers do not pay attention to killing, only focus on the push tower. The AI then seized the opportunity in the two-way group battle. To 17 minutes, OpenAI's economy overtook human players.
21 and a half, Ai took Roshan, which is also the first time AI in a public game to kill Roshan, helicopters take the shield. However, in the 25-minute helicopter was caught in the night dire field, the shield was wasted. AI also did not come to save, directly abandoned the elder brother. 32 minutes The second time to win Roshan, but did not think, OpenAI very "selfish", who killed Roshan who took the immortal shield, even if he is auxiliary bit! Then, OpenAI down the road field to play a wave of two to four.
Artificial intelligence for the idea of inserting the eye different from the human, we can see up to three real eyes are placed in the Roshan doorway! And AI at home also put three eyes, attracted the explanation of the spit groove. After the full level, we see the AI Death Prophet has been recruiting wild areas, very strong!
PaiN players, although the head behind, in the heroic operation is not dominant, but gradually recovered the rhythm, in the push off the road after occupying the advantage of the scene. 35 minutes, the human player on the Highland, at this time OpenAI predicted winning percentage reduced to 67%.
Of course, humans don't think AI can still take advantage at this point. In 37 minutes, the PaiN economy has been ahead of 9000. To 40 minutes, OpenAI after killing two heroes of Mankind chose the third hit Roshan, but at this time the human professional players seem to have mastered the AI routines.
The game went to the first 49 minutes, the AI think their winning percentage has been reduced to 20%, the overall situation has been determined.
Eventually, PaiN's human player blew up the crystals after the OpenAI. The first game of the TI8 Man-Machine war was in the passage with the victory of mankind.
There are three big problems with AI in today's game:
First of all, the mid-game will not Gank and will not focus on the advantages of pushing the tower. In 20 minutes to 35 minutes, there is a period of empty window, PaiN muskets and axe King have not yet out BKB, this is a good opportunity for AI. But in addition to the eyes everywhere, is in the vicinity of Roshan, also did not organize a decent arresting and pushing tower. When the enemy economy hit out, BKB out, the game presented a one-sided posture.
Secondly, there is no place to arrange the resources rationally. Dota has always had 1-5 digits, and the number 1th carry,4-5. Assigning the best resources to the 1th bit is a Dota experience over the years, and the game PaiN is also allocating resources to muskets and axe kings. On the opposite side of AI, adhering to the principle of equality for all, there has been the tide and the Lich to take the immortal shield of this "tactical arrangement."
Finally, there is a big problem with the outfit. AI does not seem to understand what equipment is appropriate, and it wastes a lot of money in the eye.
OpenAI Five Project team member, research scientist Jonathan Raiman told the heart of the machine, the team members were not particularly disappointed, "before the game, most of us think the odds of winning this game are about 30%-40%." We learned a lot from this game, such as AI killed Roshan many times, and it's worth going back to our research. 」
Raiman revealed that the game environment because of changes in the set of pigeons, homing pigeons can be killed, which allows the model to re-adapt to the new environment, to a certain extent, affecting a number of factors, such as the purchase of equipment; In addition, the team is rethinking the setting for future reward weights. OpenAI has a team collaboration mechanism (which will be described in detail later), and all rewards are based on the final victory of the game, but now it seems that this setting reduces AI's enthusiasm in the early farm and accumulation of the economy.
It was just OpenAI's first game in TI8, and then there were two games left for OpenAI. However, from June, the first public OpenAI Five research results, to bon voyage in the benchmark test to torture the human team, OpenAI Five Why today suffered bluntness, perhaps, we can get some inspiration from the story before it.
After Alphago, you need to pick up your wand.
We set the time back to 2016 ...
Research in-game AI has always been a hot topic in the field of machine learning: The design of the game is designed to be fun and challenging, which makes it an ideal choice for AI, and the game provides a rich opportunity for human-computer interaction, and since the game is very popular, it naturally creates more data as a training AI of nutrients.
In the past few years, game research has made a major breakthrough in the field of machine learning: In 2015, Google's DeepMind published a new study in the science journal Nature: they developed deep-reinforcement learning (specifically the deeper Q Network) to train AI players in the Atari 2600 In a series of games, the performance approximates or exceeds the human level.
By the second year, DeepMind's AlphaGo turned out, based on the Monte Carlo tree search and reinforcement learning, it in the Korean go Master Li Shi He sedol contest to win 4:1; another year, AlphaGo evolved into Alphazero, not relying on human knowledge, close to self-game, in chess , the chess and go these three kinds of chess game to achieve more than the human level.
Li Shi He sedol
Chess, a wave of AI in the world, but sooner or later it will cool. The world needs new stimuli to maintain curiosity and enthusiasm for AI, and practitioners are also looking for new challenges to explore the boundaries of AI.
Although go is broken, but in the world of thousands of games, the space left to the researchers is still very large: from card games, first-person games, Jatali game series, to racing games, strategy games, sandbox games ... DeepMind and Facebook are StarCraft on StarCraft and StarCraft is considered one of the most difficult games in the world of video games, and so far DeepMind's performance has been less than ideal, prompting them to open up StarCraft 2 at last year's and Blizzard's. Machine learning platform.
In this context, OpenAI's Dota AI project has been given a few hopes.
November 5, 2016, OpenAI decided to develop an AI body that could learn Dota 2. The project team was led by OpenAI CTO Greg Brockman.
Greg Brockman
Before that, OpenAI did not know what game to study, just about a standard: The game is complex enough, and very popular, a rich API can be used to run on Linux. They searched all the games on the American live platform Twitch and ended up targeting Dota 2.
DotA, full name Defense of the ancients, originally from the Game "Warcraft" series hatched a multiplayer online tactical sports map, like the name of this game, Dota victory condition is to destroy the enemy's Ancient (Crystal).
In 2005, DotA first version of Map 6.01 officially released, DotA behind the core of the Map Programmer Ice Frog (IceFrog) over the years to maintain and update the Dota map. In 2013, the Ice Frog Joint game development Company, Valve issued the Dota 2, completely independent of Warcraft, became a real competitive game.
Dota 6.67C
Dota 2 meets all requirements of OpenAI:
First, it's very complex. Dota 2 has 115 available heroes, each hero 1-10 skills (Carl, said is you), on the demonstrating items, 20 towers, dozens of npc,5v5 composed of tin Fai and night dire two factions, in three lines on each other, derived from including on line, playing Wild, Gank, Regiment War, Different tactics and arrangements such as inserting the eye.
OpenAI A comparison of DotA 2 and board games on the official blog: DotA 2 averages 1000 possible effective behaviors per tick, compared to 35 for chess, and 250 for go; bot A through Valve (Dota 2 operating Company) Pi,openai the Dota 2 as 20,000 states, which represents all the information that humans can get in the game. Chess represents about 70 enumeration values, and go has about 400 enumerated values.
Second, Dota 2 is popular. This game in the world has tens of millions of players, although the number of "League of Legends" or today's "Eat Chicken" and "fortress", but it because of the relatively long history (Dota released in 2005), and based on the epic background of Warcraft, so that the game has a deep heritage and reputation.
Furthermore, Dota 2 has a professional electric race. Every August, the world's top players come to North America to participate in the Dota 2 International Invitational International, which is held by Valve. Last year TI7 's bonus pool amounted to $ more than 20 million.
At first, OpenAI was not looking to beat the top human players, and if you could use the current cutting-edge machine learning algorithms to develop an intelligent virtual robot that would play Dota (instead of using a bot), it would have been a breakthrough. Unexpectedly, the road farther and farther away.
We're probably going to lose.
At the beginning of 2017, OpenAI developed a rule-based scripting bot that they thought best. Thanks to a former researcher at the project team, and now the SVP Rafal Jozefowicz,rafal of the hedge fund DE Shaw Group has never played Dota, but he watches the replay every day, and other members talk about how the Dota 2 Heroes play skills, how to push the tower, how to buy equipment.
The researchers have written all the rules they can think of, and the script bot can actually win some amateur players, but in the face of a slightly stronger player there is no win.
OpenAI decided to go further and take the hard-coded parts out, instead of using machine learning. They use intensive learning (reinforcement learning) to let the bot learn from the beginning. As a result, they found that in a short period of time they could not achieve in the 5V5 environment, too difficult.
The researcher then retreated to second, first from a small game, and then gradually expand the game environment, this small game called kiting.
Kiting is a trick in Dota, usually in the line-up period: you attack enemy units and then pass the position to let it hit you, to go back and forth to consume the enemy's blood. OpenAI based on Dota2 created a small game: on a circular island, let the trained bot on the island through the kiting Way and script bot, to ensure that they do not hit the enemy units at the same time to kill is counted as winning.
Sounds pretty simple, doesn't it? The actual operation is not a matter at all, OpenAI bot in kiting always play not win human players. OpenAI's bot has always been trained along the same trajectory, but humans often do not follow a routine, which makes the results of the experiment always passable.
"We may have to fail," This is OpenAI at the time to conclude that the project has been starting for six months, the progress is very backward, a lot of researchers frustrated. At this point, OpenAI decides where to go, even if the latest research results are still valuable.
As a result, there was a turnaround. Researchers began to randomize the training environment, allowing heroes to walk faster, sometimes slowly, and sometimes stagnate because of failure. This method quickly received a miraculous result, and randomness made the bot's reinforcement learning strategy network very robust. March 1, 2017, OpenAI training out of the little Black (Dark Ranger) has been able to kill in kiting script code of the calf (shake sacred cow).
Kiting
They put the kiting strategy in the Dota 2 1v1 mode, also received the effect. Bot began to learn to fill soldiers, learn Kabin, can use a variety of skills. This gives OpenAI great confidence that as long as the same algorithm is used, and then the computational power is added, perhaps one day we can make 5v5 AI.
Jonas Schneider recalls that until 4 May in 2017, he could still easily beat AI, but as OpenAI added more effort to train the bot, its level began to leap. In early June, it won a 1500-point player. Two months later, Dota2 1v1, 2015 The International champion team member Sumail also defeated to OpenAI.
In the process, William"blitz"lee, the famous Korean-American commentator, helped OpenAI a lot. OpenAI at that time found Blitz, hope he can give some guidance, to know, not every Dota player to appreciate the practice of OpenAI, some people think this group of scientists in the game, some people are not optimistic, but Blitz from the beginning was OpenAI's results attracted, according to OpenAI research Investigators recalled, Blitz in and bot after a 1v1, said a sentence:
"This will change the way Dota players 1v1. 」
The next story, we all know: In last year's TI7 Dota2 one-on-one exhibition, the bot designed by OpenAI defeated Danylo "Dendi" Ishutin,dendi has won $730,000 in his career. OpenAI's bot defeated Dendi in about 10 minutes at the start of the first game. In the second game Dendi gave up and refused to play the third game.
OpenAI the fire. The Star research institution in the Circle of machine learning has become the focus of attention and debate around the world. AI attack Dota 1v1 successful PA screen last year's TI7, an exhibition, the limelight over the TI7 all the official games. Most people are excited, surprised, unbelievable, and some doubt and unwilling, five mixed.
OpenAI's Google search trend
1v1 's victory has solved many mysteries for OpenAI, the most important of which is: is reinforcement learning still working in a game environment that is so complex and requires long-term strategy?
No one will question the ability of AI to learn a particular skill, such as positive and negative, such as releasing a skill, which is simple. But in a complex environment, all the skills, the position, the line and so on together, in the 1v1 beat the world's top players, this is a major breakthrough, no doubt.
What many people do not know, however, is that human players have won once in the OpenAI 1v1. Last September 7, Dota2, a German player, Dominik "Black" reitmeier the final moment to complete the killing of the skin blood, achieving a 2:1 victory. This is the first time for humans to win in front of the full version of AI, to see how the Black excited.
OpenAI is not AlphaGo, at least, it is not invincible.
At the end of the game, OpenAI CTO Brockman announced another exciting message on TI7, "The next step is 5V5." We'll see you next year TI! 」
Solving the three core issues of 5V5
Although the words are put out, but will 1v1 success in 5v5 copy, OpenAI not full of certainty. Before actually starting to train the bot, the research team did a lot of preparatory work:
For example, maximizing the use of CPU and GPU to speed up large-scale training, time is money, OpenAI finally used the 128,000 CPU core and 256 GPU to support the calculation, so that the AI every day to play tens of thousands of game games, daily cumulative game time of 180 years (limit AI Tour Play time what is not there);
They abandoned the Kubernetes and developed a training system designed for intensive learning Rapid, which can quickly replicate the results and data trained on multiple machines in a distributed system, and then update the training parameters;
They used the GYM as a training environment. Gym is the OpenAI self-developed training environment for intensive learning, which includes various programs and background codes required by OpenAI Five.
After the deployment is complete, OpenAI needs to address three core issues: long-term operations, incentive mechanisms, and team collaboration.
To train each hero, OpenAI uses two machine learning techniques: long-term Memory network (LSTM) and near-end strategy optimization (proximal policy optimization).
Why use LSTM in fact very good understanding: hit Dota2 need to Remember, the enemy heroes of every current behavior will have an impact on subsequent behavior. LSTM is a cyclic neural network (RNN) that is more suited to handling and predicting important events with very long intervals and delays in time series than normal RNN. LSTM has an element called Cell that can tell whether the input information is useful or not, and whether it needs to be remembered.
Each bot neural network consists of a single-layer, 1024-unit LSTM, observing the game's situation and making corresponding actions. This interactive demo is a way for you to understand how each bot makes instructions, which is what the Dota 2 API observes.
Taking the Poison Dragon (underworld) in the lower right corner of the image as an example, he needs four indicators to do this: behavior (including moving, attacking, releasing skills, using items), target heroes, where to release the skills, and when to release. OpenAI eventually characterizes the Dota2 world as a list of 20,000 numeric values.
Bot self-learning relies on the optimization of near-end strategy, which is a reinforcement learning algorithm proposed by OpenAI in 2017, which proves that less data and parameters are needed to achieve better results than the general strategy gradient method. OpenAI Five and the early 1v1 bot are all learning from self-confrontation, starting with random parameters and not using human search or boot programs.
In order to avoid "strategy collapse", the agent in 80% of the game through self-confrontation training, and in 20% of the game with the past of the agent to battle.
The incentive mechanism involves two aspects: one is the weight of each action that ultimately affects the outcome of the game. For example, the weight of the anti-complement is 0.2, is the complement is 0.16, push off the height of the tower's weight 1.0, but push off the crystal outside the two outside the tower weight only 0.75, and push off the first tower of the weight of the same, was killed the warrants are negative.
The other is the training of each neural network to maximize the exponential decay of future rewards (exponential decay factor) and as a target. This is a fairly important parameter that determines whether the bot is concerned with long-term rewards or short-term rewards. If Gamma is too small, then the bot will only pay attention to the interests of the immediate, such as money, gamma is too large, then it will pay no attention to the future reward, for the early training bot no advantage.
OpenAI said in the official blog that they adjusted gamma from 0.998 (in 46 seconds to half-life) to 0.997 (5-minute half-life). In contrast, OpenAI's near-end strategy optimization (PPO) paper has the longest span of a half-life of 0.5 seconds, and the longest span in DeepMind's Rainbow paper is half-life 4.4 seconds, and Google Brain Observe and look Furth Er paper uses a half-life of 46 seconds.
How to get five neural networks to collaborate is another thing that makes a lot of people curious, which is actually based on the reward mechanism. OpenAI developed a super parameter called Team Spirit, with values ranging from 0 to 1, the smaller the number, the more "selfish" Each neural network is, and the more the overall interest of the team is taken into account. In the end, OpenAI found that setting team Spirit to 1 would win the game.
In the early days of training, the researchers would actually adjust the values very little so that the AI would consider their rewards more, learn how to tap, line, provide money and experience. Until each neural network learns the basic strategy and gameplay, the researchers will slowly increase the value.
Since all parameters are random, AI does not introduce any human experience, so AI does not have the concept of 1-5 digits, does not differentiate between auxiliary and carry, and the equipment is also learned from the beginning.
In the first game, heroes wander aimlessly on the map, and after a few hours of training, concepts such as planning, development, or medium-term fighting emerge. A few days later, the smart body used a consistent approach to basic human strategy: trying to steal wealth from rivals, push tower development, and rotate the hero in the map to gain a line advantage. Through further training, they began to learn advanced strategies such as 5 heroes pushing towers together.
"ai only took me two days to win.
Jonathan Raiman, who attended the Massachusetts Institute of Technology, joined OpenAI last October. Raiman and OpenAI Many fellows are old acquaintance, joined, they often in Monday night open five people black, this slowly became OpenAI tradition
One Monday in May (officially May 15), Ai won OpenAI's team for the first time in a limited DotA environment (ladder 2500).
"That game I remember human support for about more than 40 minutes,"raiman on the sidelines to watch the game. " After that, the game time is getting shorter. I'm super excited! I think we have a 50/50 chance to challenge the professional team. 」
In fact, a week before the game, AI had won a human race. But there were some problems with that victory, and the researchers examined the code behind the scenes and found that the code that ran the neural network was wrong! AI in the course of the game did not use LSTM memory function, the Blind cat met dead rats, but won. Before that, the researchers didn't see any problem with AI.
"Many machine learning issues are still being implemented in engineering and system bug fixes," said Susan Zhang, a research scientist at "openai. "For example, AI will avoid upgrading to level 25 for a long period of time because it finds that reaching level 25 will have a huge negative reward, so the level 24 AI won't go out and experience it. 」
Susan Zhang
Raiman also had a cross-over with AI. For the first time, his team won, but when the AI was trained for two days, Raiman was no match. "To my level of people, probably only 24-48 hours of empty window, after the AI has been beaten." In the beginning we can resist more than 40 minutes, to the back of only 20 minutes, and then to the back of more than 10 minutes, and finally we stay in the base is not come out. 」
by June 6, OpenAI had been able to beat the 4000-6000-point team, but lost to the team of 5500. In that match, the researchers found a lot of interesting phenomena:
OpenAI Five the habit of sacrificing their superiority road (Night Dire regiment's Road, the Radiant Regiment's next road), and then on the Inferior Road faction three heroes suppress the enemy's superiority road, forces the battle to move to the opponent more difficult defense side. The strategy has been in the field of expertise over the past few years and is now a popular tactic.
Early to mid-game conversion faster than the opponent, in the human player when the problem, the AI will take the initiative to Gank, before the opponent's organization, the direct push tower.
AI will give money and experience to auxiliary heroes in the early days (these heroes do not prioritize resources) to make them more damaging, thereby building a bigger advantage, winning a team battle and using each other's mistakes to ensure a quick win.
After nearly a year, OpenAI first announced the progress of OpenAI Five project, released OpenAI Five Project report.
As more details are disclosed, "180 years of training a day, OpenAI defeated the human Dota2 player", "openai to break dota2" 's news quickly swept the world. Microsoft founder Bill Gates said the "ai robot defeated humans in video game Dota 2. This is a big problem because their victories require teamwork and collaboration-an important milestone in promoting AI. 」
People began to really believe that: Dota 2, will be like go by the AI breached?
Only half a Dota.
The results of the first phase of OpenAI were gratifying, but the results did not satisfy many Dota fans because there were too many restrictions. In the June game, the game players can only control five heroes, can not plug the eye, can not open fog, no Roshan, not stealth, no scanning, etc... Is that even a Dota?
OpenAI not want to let go of the limit, but the AI need to learn something is too much, time is very limited.
For example, OpenAI strictly controls the number of heroes, and if you look closely you will find that most of them are Dota2 's entry heroes, such as Ice Girl, Shadow Fiend, Lich, witch doctor and so on. So one of the most common comments you'll see on a forum or microblog is: dare not let OpenAI play Carl or the head of the dog.
AI can play Carl, but it takes a lot of time to train. This and people are actually the same, the first to play the starting hero, skilled and then play the High-level heroes (I now do not play Carl), the greater the difficulty of the hero, the longer the study time.
Figure Jenkar 10 Skills
Because all the parameters in training are random, AI can only rely on constant training to find out how to use these skills, so it does not really understand these skills. Some of the skills are straightforward, such as the ice girl's big strokes will certainly hurt; some are relatively complex, such as Alchemists, whose two-skill "unstable compound" is a double-edged sword: Releasing in 5 seconds can stun enemy heroes to create damage, and more than 5.5 seconds will hurt themselves.
This is a headache for AI: Am I release vessel or not? So in a long time of self-confrontation, AI has always considered the Alchemist's two skills useless. This is completely different from the person, and no one will be injured by the Alchemist's two skills without using it.
Alchemist
Roshan is the same thing. Dozen Roshan can get not dead shield, level three Roshan also can get 3000 drops of blood of cheese, but also will pay a painful price, accidentally died inside. So AI has chosen not to play Roshan for a long time.
To solve this problem, the researcher chooses to randomly set Roshan's health value in training, such as sometimes he only has 100 drops of blood, then A.I, will choose to abort Roshan. With this training, AI now chooses to look at the same amount of blood each time it passes Roshan.
In today's game, OpenAI's heroes repeatedly see Roshan as the result of such training.
Roshan
Inserting the eye is a rather interesting "challenge". For a long time, AI often plugs into the eye, or does not have to be in the base of the eye. The researcher does not understand, why the old in the base to insert the eye?! It was later discovered that the AI would make the choice of making room for the locker to buy other equipment when the enemy units were pushing high ground (the third Tower of Defense).
As of today, AI will still be in some inexplicable places in the eye: Under the tower, the basement, and even a lot of eyes.
The illusion is still a limitation because OpenAI can't figure out how to let the hero control the illusion. Raiman said they had tried to make a hero out of a hatchet, but only when defending heights or towers did the hero use the equipment because the illusion could withstand some damage (it would be difficult for Chen to appear in the squad.) )
Avatar Axe
So in the June-August time, OpenAI began to gradually solve these problems. At the same time, they announced the next step: on August 5, a (former) professional with more than 99.95% players from around the world was invited to benchmark the AI bot.
"Even if we did poorly in TI at the end, it would be worth it if we could get the benchmark tested," "zhang said.
Benchmark test, blood abuse human
OpenAI's office is located in San Francisco's Mission District, and a bar about a mile from the office is called Folsom Street Foundry, and is well-received locally. The bar has a huge meeting room that can accommodate 300-400 people to host events such as concerts, parties, etc.
OpenAI Five for the first time openly and the top players of the 5v5 duel, is selected in the Folsom Street Foundry. August 5, Sunday, 12 o'clock noon, the bar is overcrowded. The high stool and bar in the hall have been removed and replaced with a row of seats. There are five landline computers in the center of the stage and a professional commentary desk next to it. OpenAI including the founder Ilya Sutskever, CTO Brockman and other dozens of researchers all dispatched to witness this historical moment.
On that day, OpenAI Five held four games: an exhibition match with spectators, and three benchmark matches with top players. If OpenAI wins, it means that the project has completed a phased goal.
The game also opened many environmental restrictions, such as the fog of war, can be inserted into the eye, can play Roshan, can choose each other heroes, the number of heroes from 5 to 18.
Before the race, active professional players in the human team, now the world ranked 104 Moonmeander on Twitter flag:"has never lost to the bot, this time will not. This time and moonmeander together to play with OpenAI old friends Blitz, capitalist, fogged and Merlini, they also have their own thing to wear "human" uniform uniforms, in the center of the stage competition.
Merlini (left), Blitz (middle), and Moonmeander (right)
What is the level of these five people? They played a team name, called "99.95th-percentile", meaning they are stronger than 99.95% of the world's players, the world's top 15000 people, about the corresponding "Divine 5" (Divine5), which is the past ladder 6000 points above.
But even so, the audience at the scene in the day before the game is not optimistic about human players. At least 10 people interviewed at the scene, more than 3/4 believed the AI would win. "I'm emotionally supportive of people, but I don't think they have a chance to win," said a live audience.
As a result, it is true.
In general, even a one-sided match, the 30-minute game length can be guaranteed. However, the AI won three games and the winning time was 13 minutes (and spectators), 21 minutes, and 25 minutes respectively.
The first race, the human for night dire side: calf, plague mage, Ice Girl, Razor, Shadow Demon, OpenAI Five for the radiant side: lich, aircraft, muskets, DP, Ryan.
In the first game, the human player seems to be very uncomfortable with the OpenAI, until the game five minutes before the Blitz shadow to lay a blood. OpenAI's play is quite radical, from the opening of the 212 shunt quickly converted to 311 Advantage Road Belt Line, and then in the 10th minute of the race began 411 episodes pushed the night dire the inferior road tower. This time is usually the line period, the human side has not organized a decent defense. In the first 13 minutes of the game, the AI has a 22:4 per cent head advantage.
In the following 10 minutes, the human rarely high-light performance, in addition to the shadow to complete a double kill, OpenAI is a human fight, in the first 21 minutes broke two roads, and in the Highlands to 0 for 4 to complete a wave of small regiment out, human play GG (good game, said surrender), the number of 8:39.
Humans play GG
In the second game, the man is the radiant side: the calf, the Shadow Demon, the Witch doctor, the Death Prophet and the Hidden Thorn, the OpenAI for the night dire side, sacrifices the musket, the helicopter, the ice girl, Ryan and the Lich.
Choose the hero stage has been doomed to the human defeat, when Blitz again select Shadow, OpenAI predicted winning percentage from 56% suddenly rose to 72%. Humans in this bureau obviously played a better state, take a blood outside, the score has been a tight bite. But then a few waves of human casualties, to 20 minutes, OpenAI began to focus on the advantages of the tower, three-way break, the sacrifice of super soldiers, human play gg,12:41.
There are a lot of interesting situations in this game: for example, OpenAI's ice girl has a gold hand, which has always been playing wild heroes or late-carry equipment; Ai learned to suspend the game, but did not know why it was suspended; Ai is very fond of inserting and inverting eyes, as well as learning to open fog; after pushing off two, Humans generally directly remove the last two towers outside the crystal, but the AI chooses to retreat, then the third road from the first tower to dismantle ...
The shadow demon says the pot is not back.
As the human race in front of the AI has no ability to fight back, OpenAI also reached the desired, so the third game has become an entertainment game, by the live audience and Twitch on the live audience to OpenAI Five choose Heroes. As a result, the audience chose four melee heroes (small fish, hidden thorns, axe, wandering) and a useless queen of pain, and the human side chose Death Prophets, Necromancer, Ryan, Lich, helicopter. OpenAI Five directly played 2.9% of the winning probability, and the final winning rate dropped below 1%. However, the AI is still very tenacious, the match 15 minutes, the head ratio also will be flat 15:15.
Although the game ended up being a human player's "OpenAI", ending the battle in 35 minutes with 48:22, but for the fellows of, the game brought a lot of data worth studying. For example, in the human repression, Ai seems to be at a loss, can not play against the wind: small fish people gallop, wandering and axe king has been in no brain demolition tower, when humans push heights, the AI Five heroes do not have a defense in the Highlands.
Highland no AI in defense
"Robots rely on self-confidence to play this game, knowing where everyone is and how much [damage] you have," said capitalist, the player in the contest and the famous commentator, in an interview with Motherboard. It knows exactly how much damage they can inflict between three or four heroes in a driveway, and when you're in the wrong place it pops up immediately. It knows. And I've never played with something like that, which looks amazing. 」
At the end of the three games, CTO Brockman said: "openai's AI system is ready for the next month's TI8 against top professional players! 」
Behind the victory, buried a huge hidden danger
OpenAI did not think that the third game of entertainment Bureau, unexpectedly become the lesson of today's failure.
In fact, after the benchmark, the researchers were under a lot of pressure. Benchmark human player level at around 6500K, but into TI's professional players in the level of more than 9000K, in just three weeks time will be the strength of the AI greatly increased, difficult.
Raiman also revealed that the third game was too bad, to solve the problem of the game has become OpenAI's top priority.
Zhang thinks there is too little time left for them. "We try to make some impressive things on TI, which of course has some pressure, mostly a matter of time," he added. You need to give the experiment time to run, train time, and then make some cool things at the end. We don't have that much time now! 」
"There is also a problem, the longer the game time dragged, the more unfavorable to the AI, because there are too many factors and variables to consider." 」
These words in the two weeks before the TI race were all fulfilled: AI had no way of long against the headwind.
But in any case, OpenAI has achieved complex collaboration and long-term game operation in a non-perfect environment, which is a huge breakthrough. Although OpenAI has not developed groundbreaking algorithms, they combine existing cutting-edge algorithms with models and calculation forces to make a smart body from nothing, self-confrontation and learning, develop a set of reasonable behavior patterns, this method in other AI applications, Robots and games are likely to be used.
TI8 is not the last stop of OpenAI Five, they will hold the last game, the time is undecided, is expected in October-November, and even possibly early next year. By then, OpenAI hopes to open up all the heroes in the hero Pool and let the AI and human players really play a game of Dota 2.
From now on, OpenAI's Dota journey is far from over.
We'll see how AI will behave in the second game tomorrow.
In the face of the most vegetables TI team, OpenAI in the Dota2 lose no fight against the power