This algorithm used to play games is the biggest reason why Google acquired DeepMind.

Source: Internet
Author: User

This algorithm used to play games is the biggest reason why Google acquired DeepMind.

Big data digest subtitle group


Hello! The YouTube network's red guy siaj is coming again!

This time he will explain Deep Q Learning for us --For this algorithm, GoogleAcquired DeepMind.


Click to watch the video

Duration: 9 minutes

With Chinese subtitles

Bytes


What does this algorithm do?

The answer is: it is used to play games!



In 2014, Google spent more than $0.5 billion to acquire a small London-based company: DeepMind. Prior to this, DeepMind published a paper on using Deep Reinforcement Learning to play video games at the NIPS conference in December 2013, Playing Atari with Deep Reinforcement Learning, human-level control through deep reinforcement learning won the cover of nature in February 2015. Later, the deep learning + reinforcement learning method was used on go, so we had a French dog.



Looking back at Deep Q Learning, which started with DeepMind, looks like a very simple software, an automatic program designed specifically for Atari video games. However, it is seen as the first attempt of "General Intelligence"-the paper shows that this algorithm can be applied to 50 different Atari games, and its performance is above the human level. This is the deep Q learner.



Here is an example of super Mary. We have video clips for the game as data input and use Mario mobile to mark the data. These training data is continuous, new video frames are continuously produced in the game world, and we want to know how to act in this world.



It seems that the best way is to try. Keep trying and making mistakes, so that we can understand the best form of interaction between us and the gaming world.



Reinforcement LearningIt is used to solve such problems. Every time Mario does something that helps win the game, positive labels will appear, but there is a delay in their appearance. Compared to calling them tags, the more accurate name is "Reward ".


The entire game process is represented as a sequence of States, actions, and rewards, the probability of each State depends only on the previous state and the action executed. This is called the Markov feature, which is named after the Russian mathematician Markov. This decision-making process is called a Markov process.


If a series of rewards after a certain point are expressed as a function, the function value indicates the best score possible when the game ends. After a given action is executed in a given State, this function is used to measure the Quality of an action in this state ),This is the Q function., No, quality function.



When Mario decides which action to perform, he will choose the actions with the highest Q value. The process of calculating the Q value is the process of learning.



So how can we go beyond the Super Mario game and promote algorithms to other games? Stamp the video above to learn more!


Original video address: (the big data digest is authorized for Chinese)

Https://www.youtube.com/watch? V = 79 pmNdyxEGo


Current staff


Translation:Zhou Yang IrisW Gao Shu

Proofread:Xiaoli

Timeline + later:Long muxue

Supervision:Long muxue



Recommended courses | machine learning engineers

Peer Evaluation (by cabbage)


The course content of the practical course is very close to the actual work. The complete machine learning project process, including data cleansing, data sampling, Feature Engineering, model selection, optimization, integration, and model evaluation, I learned and practiced it N times. The projects in the course involve numerical prediction, natural language processing, financial risk control, and recommendation systems, and are equipped with an online lab platform. It is a course that can improve the practical capabilities of machine learning projects.


Volunteer Profile

Reply"Volunteers" join us

Previous highlights

Click to read

Get on the bus! MIT's auto-driving course can create a self-owned car after learning it (the first bullet in the Chinese video)

Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.