This algorithm used to play games is the biggest reason why Google acquired DeepMind.

Last Update:2018-01-19 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Big data digest subtitle group

Hello! The YouTube network's red guy siaj is coming again!

This time he will explain Deep Q Learning for us --For this algorithm, GoogleAcquired DeepMind.

Click to watch the video

Duration: 9 minutes

With Chinese subtitles

Bytes

What does this algorithm do?

The answer is: it is used to play games!

In 2014, Google spent more than $0.5 billion to acquire a small London-based company: DeepMind. Prior to this, DeepMind published a paper on using Deep Reinforcement Learning to play video games at the NIPS conference in December 2013, Playing Atari with Deep Reinforcement Learning, human-level control through deep reinforcement learning won the cover of nature in February 2015. Later, the deep learning + reinforcement learning method was used on go, so we had a French dog.

Looking back at Deep Q Learning, which started with DeepMind, looks like a very simple software, an automatic program designed specifically for Atari video games. However, it is seen as the first attempt of "General Intelligence"-the paper shows that this algorithm can be applied to 50 different Atari games, and its performance is above the human level. This is the deep Q learner.

Here is an example of super Mary. We have video clips for the game as data input and use Mario mobile to mark the data. These training data is continuous, new video frames are continuously produced in the game world, and we want to know how to act in this world.

It seems that the best way is to try. Keep trying and making mistakes, so that we can understand the best form of interaction between us and the gaming world.

Reinforcement LearningIt is used to solve such problems. Every time Mario does something that helps win the game, positive labels will appear, but there is a delay in their appearance. Compared to calling them tags, the more accurate name is "Reward ".

The entire game process is represented as a sequence of States, actions, and rewards, the probability of each State depends only on the previous state and the action executed. This is called the Markov feature, which is named after the Russian mathematician Markov. This decision-making process is called a Markov process.

If a series of rewards after a certain point are expressed as a function, the function value indicates the best score possible when the game ends. After a given action is executed in a given State, this function is used to measure the Quality of an action in this state ),This is the Q function., No, quality function.

When Mario decides which action to perform, he will choose the actions with the highest Q value. The process of calculating the Q value is the process of learning.

So how can we go beyond the Super Mario game and promote algorithms to other games? Stamp the video above to learn more!

Original video address: (the big data digest is authorized for Chinese)

Https://www.youtube.com/watch? V = 79 pmNdyxEGo

Current staff

Translation:Zhou Yang IrisW Gao Shu

Proofread:Xiaoli

Timeline + later:Long muxue

Supervision:Long muxue

Recommended courses | machine learning engineers

Peer Evaluation (by cabbage)

The course content of the practical course is very close to the actual work. The complete machine learning project process, including data cleansing, data sampling, Feature Engineering, model selection, optimization, integration, and model evaluation, I learned and practiced it N times. The projects in the course involve numerical prediction, natural language processing, financial risk control, and recommendation systems, and are equipped with an online lab platform. It is a course that can improve the practical capabilities of machine learning projects.

Volunteer Profile

Reply"Volunteers" join us

Previous highlights

Click to read

Get on the bus! MIT's auto-driving course can create a self-owned car after learning it (the first bullet in the Chinese video)

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

This algorithm used to play games is the biggest reason why Google acquired DeepMind.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

This algorithm used to play games is the biggest reason why Google acquired DeepMind.

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support