See how DeepMind play games with reinforcement learning

Source: Internet
Author: User

Introduction

Speaking of the coolest branch of machine learning, deep learning and reinforcement Learning (hereinafter referred to as DL and RL). These two are not only in the actual application of the cool, in the machine learning theory also has a good performance. DeepMind staff and the essence of the two, in the Stella Simulator to allow the machine to play their own 7 Atari 2600 of the game, the result is playing out of the Americas, into the world, beyond the limits of species. Not only overcame other robots, but even surpassed the human game experts in 3 of these games. Oh, forget to say, Atari 2600 is the 80 's popular American game machine, of course you will not like it now. What does it look like? Play the hottest Flappy Bird right now!

Gossip less, look at the preparations. The first is a Atari 2600, which is estimated to have been found by developers rummaging through their parents ' waste disposal bins. Wait a minute, what's going on with the rust? The battery is not fit to say! Calm...... By Stella portrait to build a simulator, and even for academia dedicated to contribute to the arcade Learning environment, mother no longer need to worry about my research. Input information is the current screen of the simulator, output to the selection of rocker and button "a-b-b-a--up-down", academic point is the current state of the legal operation set. The goal, of course, is to win the game and score the more.

And then it's playing games. As a cool and cool scientist, you will not play the game personally, of course, is also afraid of the boss found. However, if you want machines to play games, you have to figure out how humans play the game:

First, the game starts, stays at the initial moment. Then, the game scene starts to transform, the player eye captures the change of the picture, transmits the visual signal to the cerebral cortex for processing.

After that, the cortex transforms the visual signal into the semantic information of the game and, through experience guidance, maps the semantic information to the action that should be performed, and then transmits the signal to the body, such as the finger action, after the mapping. After the operation, the game scene into the next frame, the player gets a certain return, such as over the pass, or to eat gold coins. So loop until the game is over.

Think about this process, the things that happen inside the game are not what players need to think about, and the player can cover only the right half of the game loop. That is, input visual signal, output finger action. And the finger moves to the next frame scene, and the player gets rewarded is the game inside the process.

Now that you understand the human player's processes and break down parts of the game that actually require the player, the next step is to replace the human player with the machine. To differentiate, the machine player is usually referred to as the agent. Similar to the actions of human players, the agent is responsible for:

From the last frame return signal learn to play the game of knowledge, that is, experience (what kind of scene needs what operation)

Processing and understanding of visual signals (dimensionality reduction, high-level feature extraction)

Choose the right experience (action) based on experience and high level visual characteristics.

Action feedback to the game, that is, the player's manual part

So, the game is the more play the better, the human player so, agent also. Since the process has been portrayed, with the development of DL and RL, the real is not a problem. Below, first look at how the RL is to promote agent learning. Then we will talk about how the DL is reasonably placed into the RL learning framework, and how to play a role. Then, you will emphasize the difficulty of both in the game agent operation and how to solve the actual problem. Finally, to see how the agent game play. The summary involves the sublimation of the RL.

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.