David Silver Learning Intensive Learning _

David Silver Learning Intensive Learning __deep

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

This is the summary note of the first lesson of David Silver's intensive learning public class. The first lesson mainly explains the embodiment of reinforcement learning in many fields, mainly solves what problem, differs from supervised learning algorithm, which part of complete algorithm flow consists of, and what content the agent contains, and explains some concepts involved in reinforcement learning.

"Reprint please indicate the source" Chenrudan.github.io

This lesson video address: RL Course by the David silver-lecture 1:introduction to reinforcement Learning.

This course PPT address: http://www0.cs.ucl.ac.uk/staff/d.silver/web/Teaching_files/intro_RL.pdf.

The content of the article is a summary and discussion of the course, which will be organized according to your own understanding. Lack of personal knowledge and English listening is not so good may have some understanding of the place, welcome to discuss. 1. What is reinforcement learning?

Reinforcement learning is a product of multidisciplinary interdisciplinary, and its essence is to solve the problem of "decision making", that is, to learn to make decisions automatically. The machine learning algorithm is embodied in the Computer science field. The engineering field is embodied in the decision of the sequence of actions to get the best results. In the neuroscience field of understanding how the human brain makes decisions, the main research is reward system. In the Psychology field, we study how animals make decisions, and what causes animal behavior. In the economics field is embodied in the research of game theory. All this boils down to the question of why people can and how to make optimal decisions.

Reinforcement learning is a sequential Decision making problem, which requires continuous selection of behaviors, which result in the best results when these behaviors are completed. It does not have any label to tell the algorithm how to do, by first try to do some behavior to get a result, by judging whether the result is right or wrong to the previous behavior feedback, and then by this feedback to adjust the previous behavior, through constant adjustment, The algorithm can learn to choose what kind of behavior in the circumstances to get the best results.

There are a lot of differences between intensive learning and supervised learning, first of all, there is a label for the study, this label tells the algorithm what input corresponds to what kind of output, and reinforcement learning does not have a label to tell it in some cases what kind of behavior, Only a reward signal, which eventually returns after a series of actions, can determine whether the current choice is good or bad. Second, the results of reinforcement learning feedback has a delay, sometimes may need to walk a lot of steps to know before a certain step of the choice is good or bad, and the supervision of learning to do a relatively bad choice will immediately feedback to the algorithm. The input of reinforcement learning is always changing, and input is not independent and distributed as supervised learning. And whenever an algorithm makes an action, it affects the input of the next decision. 2. Reinforcement Learning Composition

Figure 1 Reinforcement Learning component (photo source [1])

The process of strengthening learning decision is shown above. You need to construct an agent (the part of the brain in the diagram) that can perform an action, such as deciding which direction the robot is going to go, and where the chess piece is under. The agent is able to receive a observation of the current environment, such as the current robot camera shot to the scene. The agent can also receive a reward when it performs an action, that is, the workflow for the T Step agent is to perform an action atat, obtain the environmental observation otot after the action, and obtain feedback reward RTRT for this action. The environment environment is the object of agent interaction, it is an uncontrollable object, the agent at the beginning did not know how the environment will react to different actions, and the environment through observation tell agent the current state of the environment, At the same time, the environment can be feedback to the agent based on the possible final results, such as the chess surface is a environment, it can be based on the current level of chess reward to estimate the ratio of the two sides win or lose. Thus, in step T, Environment's workflow is to receive a atat that responds to the action by transmitting environmental conditions and evaluating the reward to the agent. Reward Reward Rtrt, is a feedback scalar value, which indicates how good or bad the decision is made in step T, and the goal of the whole reinforcement learning optimization is to maximize the cumulative reward. For example, in a shooting game, hit an enemy aircraft, the final score will increase, then this step reward is positive. 3. Some variables

History is the sequence of all movements, states, rewards, Ht=a1,o1,r1,..., at,ot,rtht=a1,o1,r1,..., at,ot,rt

Environment State,setste, the current state of the environment, it reflects what the environment has changed. What needs to be understood here is that the state of the environment itself and the environment feedback to the agent are not necessarily the same, such as when the robot is walking, the current environment state is a definite position, but its camera can only be photographed around the scene, can not tell the agent specific location, The photos taken can be regarded as a observation to the environment, that is, the agent does not always know how the environment changed, only to see a result of the change show.

Agent State,sat

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

David Silver Learning Intensive Learning __deep

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

David Silver Learning Intensive Learning __deep

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support