Introduction to Enhanced learning----

Last Update:2016-05-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

PS: This article is for reading Zhou Zhihua "machine learning" notes

Introducing-------Tasks and rewards

If we want to plant watermelon, it has to go through a lot of steps, it is possible to grow a good melon, of course, it is possible to grow the melon is very poor, or directly to the species died. Then the process of abstraction of the melon, summed up a series of good operations, classified as a melon strategy, then, the process is "enhance learning."

This is a simple illustration, where:

The machine is in the environment, the state space is x, for example, the state space can be health, water shortage, apoptosis, and so on, small x is a state space X in a single state.

The action that the machine can take is a, for example: watering, not watering; all movements constitute a set of actions.

When an action a acts on a state x, the potential transfer function p will cause the environment to shift from the current state to another State in a certain probability. Such as: water shortage, choose watering, there is a probability of transfer to a healthy state.

Then, when moving to another state (another state can be the original state), the environment will give the machine a reward based on the potential "reward" function r, such as: Health is +1, water shortage is-1, apoptosis is-100.

Together, the Reinforcement learning task corresponds to a four-tuple e=<x,a,p,r>

Where,p:x*a*x->r; specifies the state transition probability. R:x*a*x-> specified the reward;

Thinking: What is the relationship between x and a fork symbol?

The transfer of State in the environment, the return of reward is not controlled by the machine, the machine can only affect the environment by selecting the action to be performed, and can only perceive the environment by observing the transferred state and the reward returned.

Give an example: look closely at each state, take the action a after the state shifts the probability p and the obtained reward R;

The machine has to do is to learn a "strategy" π by constantly trying in the environment, according to this strategy, in the state X will know to perform the action a=π (x), for example: To see the state of water scarcity, you know to choose watering action;

There are two ways to express a policy: one is to represent a policy as a function π:x->a, and a deterministic strategy commonly used in this way;

The other is probability, π:x *a-a probability, stochastic strategy commonly used in this expression;

Thinking: What is a deterministic strategy and what is a random strategy?

So, Pi (x,a) is the probability of selecting an action a under a state X, which means that in the case of water scarcity, the sum of the probability of choosing to irrigate the action is 1. Probability is the probability of selecting an action;

Thinking: P is the probability of state transition, why the sum of the transition probabilities of selecting an action under state X is also 1? Coincidence or the associated

The goal of learning is to find strategies to maximize long-term accumulation of rewards, long-term accumulation of a variety of calculation methods, commonly used are "T-step accumulation of rewards" and "Gamma discount cumulative reward." Where RT represents the reward value obtained in step T, E indicates the expectation of all random variables;

Differences from supervised learning:

"State" corresponds to the "example" in supervised learning, which is to remove the sample of the marked feature.
"Action" corresponds to "Mark"
"Policy" corresponds to "classifier"

In this sense, reinforcement learning can be regarded as a supervised learning problem with "delayed marking information".

Introduction to Enhanced learning----

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Introduction to Enhanced learning----

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Introduction to Enhanced learning----

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support