"One of the Deep Learning Introduction Series"--depth study of intensive learning

Source: Internet
Author: User

The preface introduces the basic concepts of machine learning and depth learning, the catalogue of this series, the advantages of depth learning and so on.

This section by hot iron first talk about deep reinforcement study.

Speaking of the coolest branch of machine learning, deep learning and reinforcement Learning (hereinafter referred to as DL and RL). These two are not only in the actual application of the cool, in the machine learning theory also has a good performance. What is deep reinforcement learning? To put it simply, deep learning + intensive learning. Deep learning and intensive learning is nothing new, but deep reinforcement learning in Google's DeepMind team operation, suddenly become very prosperous. This is the result of the DeepMind team's paper published in Nature Magazine (Human-level control through deep reinforcement learning), announcing how the game's more powerful AI is done. They let the machine play 7 Atari 2600 games on the Stella simulator, the result is playing out of the Americas, into the world, beyond the limits of species. Not only overcame other robots, but even surpassed the human game experts in 3 of these games. According to their thesis, the AI is the depth enhancement algorithm, and the results are better than other algorithms, of course, better than people, and this paper published in the world's top magazine, aroused a lot of people's interest, I am of course, is one of many people, I was thinking about how to apply this algorithm to the company's products, But recently also did not think of a good solution, then first familiar with this algorithm, by the way recorded. (citing http://36kr.com/p/220012.html and http://www.infoq.com/cn/articles/atari-reinforcement-learning)

In order to get a little more detailed understanding of the deep reinforcement learning algorithm, we first introduce the reinforcement learning algorithm and depth learning algorithm, and then say how they combine, and finally probably summed up.


First of all, let's talk about reinforcement learning. The definition of Baidu Encyclopedia: Intensive Learning (reinforcement learning, also known as re-excitation learning, evaluation of learning) is an important machine learning method, in intelligent control robot and analysis and prediction, and other fields have many applications. It sounds kind of abstract. Let me give you an example of how to strengthen learning in life is how: when we were young to see the circus dog can be counted plus subtraction, pigeons will be a tightrope, on the poor pig will fly (pull away), this is how to do. In fact, for pigeons, when pigeons go to the end of the wire or at some point in the middle (which can be designed), the trainer will give it some rewards, the role of which is to let it "know", pigeons, you just the action is right (or good, you want to continue to keep AH). In this way, the pigeon is invisible to be implied, as long as I do so, there is a reward (food) to eat. Why not. If you read this passage, then the popular understanding of intensive learning is the same. It's probably understood that the essence of learning is to tell you what decisions are good, what decisions are harmful, and when a similar situation happens, you make decisions based on previous experience.

Now a little bit of academic description: Reinforcement learning is a continuous decision-making process. The traditional monitoring learning in machine learning (supervised learning) is given some annotation data, these annotations as the Supervisor (supervisor), learning a good function, to make good decision to the unknown data. But sometimes you don't know what a callout is, that is, you don't know what "good" is at first, so the RL is not given a callout, it's a return function that determines what kind of result ("good" or "bad") the current state gets. Its mathematical essence is a Markov decision-making process. The ultimate goal is to optimize the return function of the whole in the decision-making process. The following illustration:


Enhanced learning settings (Reinforcement learning Setup) Hidden Markov model--HMM (hidden Markov model)

The above mentioned Markov decision, Markov decision is what thing. The Markov decision process (MDPs), named after Andre Markov, provides decision-makers with a mathematical modeling framework for decision making, which is partly random and partially controllable for some of the output results. MDPs is very useful for solving a wide range of optimization problems through dynamic programming and reinforcement learning. Let's briefly talk about Markov decision theory:



The following image shows the process:


When all this is confirmed, the rest is to look for an optimal strategy (policy). The so-called strategy is the mapping of State to action. Our aim is to find an optimal strategy that allows the decision process to follow this strategy to get the biggest overall return. So the nature of the RL is to find the best strategy under these signals.


It is a description of the reinforcement of learning concepts and theories, and we should recognize that intensive learning is similar to the accumulation of human experience. Let's talk about deep reinforcement learning.


The above mentioned reinforcement learning compared to cattle, why also want to add a depth of study. Is it just for the sake of deep learning? (This is also a reason, of course.) ), of course, not just. In the very beginning of the article on the nature of Nature, said:

Simple is that ordinary reinforcement learning although the application of the relatively successful, but the characteristics of the state need to manually set, for complex scenes, is a very difficult thing, especially easy to cause dimensionality disaster, at the same time the expression is not good. What's the solution? Remember yesterday when I introduced the depth of learning is the essence of learning is automatic learning characteristics, this does not just solve the reinforcement of the thirst of learning. As a result, Google's cattle people began to use the depth of learning to extract features, and then applied in the reinforcement of learning, get good results. Google's convolution neural network (convolutional neural network,cnn) for feature extraction,

CNN + RL = Deep q-network (depth hardening learning algorithm dqn).


What the hell is CNN? Originally is to introduce convolution neural network, and then introduce the depth of reinforcement, here a bit Sussu, I would like to simply say it, in the future will have a special chapter introduced.

Convolution neural network, a kind of artificial neural network, has become a hotspot in the field of speech analysis and image recognition. This network structure is highly invariant to translation, scaling, skew, or deformation of the form. Generally, the basic structure of CNN consists of two layers, one of which is the feature extraction layer, the input of each neuron is connected with the local accepted domain of the previous layer, and the local feature is extracted. Once the local feature is extracted, the position relationship between it and other features is determined; the second is the feature mapping layer, where each computational layer of the network is composed of multiple feature mappings, each of which is a plane, and the weights of all neurons on the plane are equal. As shown in the following illustration, a 33 convolution kernel is presented for convolution on 55 of the image. Each convolution is a feature extraction method, like a sieve, that filters out the parts of the image that match the conditions (the greater the activation value is in the condition).

After the convolution, the next thing to do is to do the pooling, people can calculate the image of an area of the average (or maximum) of a particular feature. These summary statistical features not only have a much lower dimension (compared to using all the extracted features), but also improve the results (not easy to fit).


Let's take a look at the full CNN:


Google training DQN's algorithm describes the following:


The above long-winded introduction so many, also can only say a probably. The following summary: Deep reinforcement learning is to use the depth learning network to automatically learn the characteristics of dynamic scenes, and then strengthen learning to learn the corresponding scene characteristics of the decision action sequence.


Description

This article cited a number of forum articles and the pictures of the paper, do not enumerate, if there is a similar, purely normal.




Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.