How does intensive learning solve the problem?

Last Update:2018-10-01 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

How does intensive learning solve the problem?

What is a reinforcement learning algorithm, and how far is it from us? 2016 and 2017 the most influential Alphago win the world go champion Li Shishi and Ke Jie event, its core algorithm uses the Reinforcement learning algorithm. It is believed that many people want to know or switch to research reinforcement learning algorithms are more or less associated with these two races. Today, intensive learning, after deep learning, has become a hot topic in academia and industry. From the present form, intensive learning is bearing fruit in all walks of life, promising a good future. However, the introduction of intensive learning is very difficult, obviously know it is a "Jinshan", but because the total can not be introduced, can only hope "Jinshan" and sigh.

What is a reinforcement learning algorithm? To answer this question, we must first answer the question of what reinforcement learning can solve, and strengthen learning how to solve these problems.

What problems can be solved by intensive learning

1.1 shows a successful example of a reinforcement learning algorithm. The A diagram is a typical nonlinear two-stage pendulum system. The system consists of a trolley (a blackbody rectangle) and two pendulum (red pendulum), the controllable input is the left and right movement of the trolley, and the purpose of the system is to stabilize the two-stage pendulum in the upright position. The two-stage pendulum problem is a classical problem of nonlinear system, in the theory of control system, the basic idea of solving this problem is to establish a precise dynamic model for the two-stage pendulum system, and then design the control method based on the model and various nonlinear theories. In general, this process is very complex and requires deep knowledge of nonlinear control theory. Moreover, when modeling needs to know the quality of the trolley and pendulum, the length of the pendulum and so on. Based on the reinforcement learning method does not need to model or design controller, only need to build a reinforcement learning algorithm, so that the two-level pendulum system to learn on their own. When the learning training is over, the two-stage pendulum system can be self-balanced. Figure 1.1 of the B figure is trained Alphago and Ke jie against the second game of chess, C diagram for the robot in the simulation environment himself learned to climb up from the state of falling. These three examples are a good example of how enhanced learning algorithms can achieve stunning results in different areas. Of course, reinforcement learning in addition to nonlinear control, chess, robot and other directions, but also can be applied to other fields, such as video games, man-machine dialogue, unmanned driving, MT, text sequence prediction.

Figure 1.1 Reinforcement Learning success Stories

Examples are not enough, you can use a sentence to illustrate the problem that reinforcement learning can solve: Intelligent decision-making problems. More precisely, sequential decision-making issues. What is the sequential decision-making problem? Is the need to continuously make decisions in order to achieve the ultimate goal of the problem. 1.1 In Figure A of the two pendulum problem, it needs to have an intelligent decision in each state (where intelligent decision-making refers to what direction should be applied to the car, how much force), so that the entire system gradually converge to the target point (that is, two pendulum vertical state). The Alphago in Figure B will need to make a decision based on the current game status in order to win the match. In Figure C, the robot needs to get the moment of each joint in its current state so that it can stand up. In a word, the problem of strengthening learning can be solved: sequential decision-making. So, how does reinforcement learning solve this problem?

1.3 Reinforcement learning how to solve problems

Before answering the question of how to solve the sequential decision-making problem, we should first look at how the supervised learning solves the problem. From the point of view of problem solving, supervised learning solves the problem of intelligent perception.

We still use a graph to indicate. 1.2, the most typical example of supervised learning is digital handwriting recognition, when a handwritten number is given, the supervised learning needs to determine how much the number is. In other words, supervised learning needs to perceive what the current input looks like, and when the agent senses what the input looks like, it can be categorized by the smart body. As shown in 1.2, the input handwriting is as much as 4, so the agent can tell that it is 4. IntelliSense is actually learning what "input" looks like (character), and what the one by one corresponds to (tag). Therefore, the prerequisite for IntelliSense is the need for a large number of appearance-differentiated inputs and input-related tags. Therefore, the way to supervise learning to solve problems is to enter a large number of tagged data, so that the intelligent body from the input of the abstract characteristics and classification.

Figure 1.2 The difference between intensive learning and supervised learning

Reinforcement learning is different, the strengthening of learning to solve is the sequential decision-making problem, it does not care what the input length, only concerned about the current input should be adopted what action to achieve the ultimate goal. Again, what is currently being used is related to the ultimate goal. In other words, the current action, can make the whole task sequence to achieve optimal. How to optimize the whole task sequence? This requires the agent to constantly interact with the environment and try again, because the agent is just beginning to know which action in the current state is in favor of achieving the goal. The framework for reinforcement learning problem solving can be represented in Figure 1.3. When an agent interacts with the environment through action, the environment returns to the agent a current return, and the agent evaluates the action taken according to the current return: the action that is conducive to achieving the goal is retained, and the action that is not conducive to achieving the goal is attenuated. The specific algorithm, we will introduce in the following. Summarize the similarities and differences between intensive learning and supervised learning in one sentence: the common denominator of intensive learning and supervised learning is that both require a large amount of data for training, but the data types required are different. Supervised learning requires a variety of labeling data, and intensive learning requires interactive data with returns. Because the input data type is different, this makes the reinforcement learning algorithm have its own unique method of acquiring data and exploiting data. So, what are the methods? This is what the book focuses on. Before entering the detailed explanation, we will first briefly understand the development history of these reinforcement learning algorithms.

Figure 1.3 Reinforcement Learning basic framework

We do not delve into the specific development history of the reinforcement learning algorithm, giving only two key points of time. The first key point is 1998, the iconic event is Richard S. Sutton published his first edition of the introduction to intensive learning, namely reinforcement Learning:an Introduction (the second edition of the book will be published by the electronics Industry Press), The book systematically summarizes the progress of the reinforcement learning algorithm over 1998 years ago. The basic theoretical framework of intensive learning in this period has been formed. Before 1998, the most interesting and developed algorithms were table-based reinforcement learning algorithms. Of course, this period was also proposed based on the direct strategy search method. As in 1992 years R.j.williams proposed the Rinforce algorithm directly estimates the strategy gradient. The second key point is the 2013 DeepMind proposed DQN (Deep Q Network), combining depth network and reinforcement learning algorithm to form deep reinforcement learning. From 1998 to 2013, scholars also did not idle, developed a variety of direct strategy search methods. 2013 years later, with the hot of deep learning, deep reinforcement learning is attracting more and more attention. Especially in 2016 and 2017, Google's Alphago defeated the world Go champion for two consecutive years, but also pushed the deep reinforcement study to the cusp. Now, deep reinforcement learning algorithms are developing in full swing, it can be said that it is the era of the schools of contention, perhaps in a few years, deep reinforcement learning technology will become more and more popular, and develop more mature, more practical algorithm, we will wait and see.

1.4 The classification and development trend of reinforcement learning algorithms

There are many kinds of reinforcement learning algorithms, which can be classified according to the following criteria.

(1) The model-based reinforcement learning algorithm and the non-model reinforcement learning algorithm can be classified according to whether the reinforcement learning algorithm is dependent on models. The common denominator of these two types of algorithms is that data is obtained by interacting with the environment, and different points are different in the way data is used. The model-based reinforcement learning algorithm uses the data learning system or the environment model, and then makes sequential decision based on the model. The model-free reinforcement learning algorithm is the direct use of data obtained by interacting with the environment to improve its own behavior. Two kinds of methods have advantages and disadvantages, generally speaking, the efficiency of the model-based reinforcement learning algorithm is more efficient than the model-free reinforcement learning algorithm, because the agent can use the model information when exploring the environment. However, some tasks that cannot be modeled at all can only take advantage of the model-free reinforcement learning algorithm. Because the model-free reinforcement learning algorithm does not need modeling, it is more universal than the model-based reinforcement learning algorithm.

(2) According to the updating and learning methods of the strategy, the reinforcement learning algorithm can be divided into the reinforcement learning algorithm based on the value function, the reinforcement learning algorithm based on the direct strategy search and the AC method. The so-called reinforcement learning method based on value function refers to the learning value function, and the final strategy is based on the greedy value function. In other words, the maximal action of the value function is the current optimal strategy in any state. The reinforcement learning algorithm based on direct strategy search is usually to parameterize the strategy and learn the optimal parameters of the target. The AC-based approach is combined with value functions and direct policy searches. The specific algorithm is described later.

(3) Depending on whether the return function of the environment is known, the reinforcement learning algorithm can be divided into positive reinforcement learning and inverse reinforcement learning. In reinforcement learning, the return function is artificially specified, and the reinforcement learning algorithm specified by the return function is called positive reinforcement learning. Many times, the return can not be artificially specified, such as the drone's effect performance, then the machine learning method by the function of their own learning to return.

In order to improve the efficiency and practicability of intensive learning, scholars have put forward a lot of reinforcement learning algorithms, such as layered reinforcement learning, meta-reinforcement learning, multi-agent reinforcement learning, relationship enhancement learning and transfer enhancement learning.

Intensive learning, especially deep intensive learning, is developing rapidly, and the development trend of intensive learning from current papers can be judged as follows.

First, the combination of intensive learning algorithms and deep learning will be more intense.

Machine learning algorithms are often divided into supervised learning, unsupervised learning, and reinforcement learning, and the three types of methods are well understood, and now the three methods are better to work together. Therefore, one of the trends of the reinforcement learning algorithm is that the three kinds of machine learning methods are gradually moving towards a unified road. Who can combine well, who will have a better breakthrough. The representative works in this direction, such as dialogue generation based on deep reinforcement learning.

Second, enhanced learning algorithms and expertise will be more closely integrated.

If the general reinforcement learning algorithm, such as the qlearning algorithm directly set into the professional field, it is likely not to work. At this point must not be discouraged, because this is normal phenomenon. In this case, we need to add the knowledge in the field of expertise to the reinforcement learning algorithm. This is not a unified approach, but varies according to the content of each profession. In general, you can reshape the return function, or modify the network structure (you can be very careful with alchemy irrigation). The masterpiece of this direction is NIPS2016 's best paper Value iterative network (value iteration Networks) and so on.

Third, the Reinforcement learning algorithm theory analysis will be stronger, the algorithm will be more stable and efficient.

Reinforcement learning algorithm After the fire, will certainly attract a large number of theoretical foundation of a strong cattle people. These cattle people do not worry about clothing, the pursuit of perfectionism, but also a strong mathematical skills, so in the reinforcement of learning this theory is almost a blank field, they will be make contributions, the name of the thousands of history. The representative works of this direction are such as the strategy method based on depth energy, the equivalence of value function and strategy method.

The Reinforcement learning algorithm is more closely related to brain science, cognitive neuroscience, and memory.

Brain science and cognitive neuroscience have always been a source of inspiration for machine learning, a source that often revolutionized machine learning algorithms. Knowledge of the brain is also one-sided, as brain scientists and cognitive neuroscientists gradually uncover the mysteries of the brain, and machine learning is bound to benefit again. This genre should be headed by DeepMind and the University College London, where there are not only many AI scientists but also many cognitive neuroscientists in these groups. The representative of this direction, such as DeepMind a list of papers on memory.

How does intensive learning solve the problem?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

How does intensive learning solve the problem?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

How does intensive learning solve the problem?

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support