This algorithm used to play games is the biggest reason why Google acquired DeepMind.
Big data digest subtitle group
Hello! The YouTube network's red guy siaj is coming again!
This time he will explain Deep Q Learning for us --For this algorithm, GoogleAcquired DeepMind.
Click to watch the video
Duration: 9 minutes
With Chinese subtitles
Bytes
What does this algorithm do?
The answer is: it is used to play games!
In 2014, Google spent more than $0.5 billion to acquire a small London-based company: DeepMind. Prior to this, DeepMind published a paper on using Deep Reinforcement Learning to play video games at the NIPS conference in December 2013, Playing Atari with Deep Reinforcement Learning, human-level control through deep reinforcement learning won the cover of nature in February 2015. Later, the deep learning + reinforcement learning method was used on go, so we had a French dog.
Looking back at Deep Q Learning, which started with DeepMind, looks like a very simple software, an automatic program designed specifically for Atari video games. However, it is seen as the first attempt of "General Intelligence"-the paper shows that this algorithm can be applied to 50 different Atari games, and its performance is above the human level. This is the deep Q learner.
Here is an example of super Mary. We have video clips for the game as data input and use Mario mobile to mark the data. These training data is continuous, new video frames are continuously produced in the game world, and we want to know how to act in this world.
It seems that the best way is to try. Keep trying and making mistakes, so that we can understand the best form of interaction between us and the gaming world.
Reinforcement LearningIt is used to solve such problems. Every time Mario does something that helps win the game, positive labels will appear, but there is a delay in their appearance. Compared to calling them tags, the more accurate name is "Reward ".
The entire game process is represented as a sequence of States, actions, and rewards, the probability of each State depends only on the previous state and the action executed. This is called the Markov feature, which is named after the Russian mathematician Markov. This decision-making process is called a Markov process.
If a series of rewards after a certain point are expressed as a function, the function value indicates the best score possible when the game ends. After a given action is executed in a given State, this function is used to measure the Quality of an action in this state ),This is the Q function., No, quality function.
When Mario decides which action to perform, he will choose the actions with the highest Q value. The process of calculating the Q value is the process of learning.
So how can we go beyond the Super Mario game and promote algorithms to other games? Stamp the video above to learn more!
Original video address: (the big data digest is authorized for Chinese)
Https://www.youtube.com/watch? V = 79 pmNdyxEGo
Current staff
Translation:Zhou Yang IrisW Gao Shu
Proofread:Xiaoli
Timeline + later:Long muxue
Supervision:Long muxue
Recommended courses | machine learning engineers
Peer Evaluation (by cabbage)
The course content of the practical course is very close to the actual work. The complete machine learning project process, including data cleansing, data sampling, Feature Engineering, model selection, optimization, integration, and model evaluation, I learned and practiced it N times. The projects in the course involve numerical prediction, natural language processing, financial risk control, and recommendation systems, and are equipped with an online lab platform. It is a course that can improve the practical capabilities of machine learning projects.
Volunteer Profile
Reply"Volunteers" join us
Previous highlights
Click to read
Get on the bus! MIT's auto-driving course can create a self-owned car after learning it (the first bullet in the Chinese video)
Copyright Disclaimer: This article is an original article by the blogger and cannot be reproduced without the permission of the blogger.