Why DeepMind and OpenAI learn to play games with deep reinforcement

Source: Internet
Author: User

Do you know DeepMind?
Probably know, after all, that the company has had two major events in recent years:
1. By Google acquisition
2. Spent a lot of resources to teach the computer Weiqi, and beat the current all known go top players

Then you probably know that DeepMind in 13 sent a paper called "Playing Atari with Deep reinforcement Learning". This paper is about how DeepMind teaches computers to play Atari games.

But what you may not know is why DeepMind to teach computers to play games.

Well, you might think that this company is probably a very academic one. A team of scientists who have a strong academic taste, and then take a lot of investment, and then recruit a lot of highly academic paper scientist, send more academic flavor of the paper.

Coincidentally, there is another team of top machine-learning scientists, called OpenAI, who invested 1 billion of billions of dollars in teaching computers to play games, and they even got an open source platform called gym and another called universe. Can let everyone use this platform to teach computer play games, gym used to play atari,flappy bird, gluttonous snake This small game, universe is used to play GTA5, racing this large 3D game.

So, what do they want to do, to train the computer into electric competition master, and then do live. or by defeating the human race in each game, reaping the unparalleled sense of achievement.

To better answer the question, "what are these companies trying to do?" We try to use gym to teach computers to play Flappy Bird and gluttonous snake two games, the machine learning method is from DeepMind depth enhancement learning algorithm, the implementation framework is TensorFlow.

Here's what our computer does when playing both games, and at the end of the demo:

Although our computers have not become Super invincible players, it is obvious that computers can play Flappy Bird and gluttonous snakes very well after a certain period of self training.

Computer learning Flappy Bird spent the day, the gluttonous snake took 9 hours, and the GPU used the Nvidia GTX1070.

After seeing the incredible depth-enhancing learning that really worked, we began to think about how to make the world a better place when the computer played the game's skill tree.

First, there are two very important features of depth enhancement learning:
1. Any problem that can be abstracted into the environment, state, behavior, and reward can be solved with this algorithm.
2. The original image can be treated as a state without the need for manual rule setting.

We have the following chess as an example, we can take the chess board as an environment, the chessboard of the pieces distributed as a condition of the environment, in the current pieces of the distribution, we can take the way is behavior, after each step of the chess game results of the impact, is the reward.

In other words, if we want the computer to win a chess game or something, then the computer should be able to choose the best way to win the chess game on any piece of the board.
But the problem is that it's hard to assess which way to go best, and it takes a lot of human logic.
Of course we can be poor, in each state, we will try to do the behavior.
The simple question, however, is that it is almost impossible to have a slightly more complex game like Weiqi.

So it has been thought that why not use a deep neural network to assess the behavior of each state, by neural networks to make decisions.

Although the idea of deep reinforcement learning is fascinating, we are more interested in depth enhancement learning that can input image data as a state.
In other words, whether it's Weiqi, chess, Gobang, flying Chess (Flying chess) Why are you here ...? We need to design the input form for each chess game, tell the computer where the son is, that is, we need to involve the human.

But we do not want artificial participation, we want to throw the computer over there, tell him to learn the game for me, he can start his own study.
Just like a child, you do not need to tell him what the rules of the game, just let him observe the side, he can slowly learn to play the game (how many people are so learn to play Warcraft, StarCraft, DotA).

DeepMind published in-depth learning, is the game screen constantly "screenshot", and then as input signal to the program, so that the program to learn to play any game, do not need any artificial participation.

Now we can talk about what it takes to get the computer to have the ability to do the training independently of the visual signal.

A real case comes from Google's data Center for energy savings.
Google has a huge server cluster, to make these server clusters of efficient and stable work, the need for complex power distribution, thermal control, etc., high electricity tariffs naturally.
But what if we look at these complex resource allocation issues as a strategy game (power distribution tycoon) and then let the computer play the game?

The answer is that Google uses DeepMind technology to conserve 15% of its energy consumption. How much is 15% worth? Google's data center a year of electricity 4,402,836mwh,15% probably 660,425MWH,1MWH's price is probably $ $, so you can save 19,812,750 of dollars, but also by the way to protect the environment, And Google only spent 600 million of dollars to buy DeepMind.

And the whole process does not need to understand the data center complex electricity mechanism, the engineer only needs to focus on the resource allocation control and the energy consumption result collection is enough, even does not need to care about the training program exactly.

Another case also comes from Google. Google has a robotic arm farm, a number of robotic arms, and the training of items to crawl.

Here is the demo video:

In general, teaching computers to play video games is for the future, using robots to solve production problems in reality.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.