International - English

Cart Console

Topic Center

Contact Sales

Home > Others

David Silver Intensive Learning 8:integrating Learning and Planning__reinforcement

Last Update:2018-08-20 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

First, model-based RL

Model-free RL, learning value functions (and/or strategies) from experience.

Model-based RL, from experience directly learns the MDP model of the environment. (State transition probability p and reward matrix R) plan The value function (and/or strategy) from the model. Can be more effective learning, reduce the uncertainty of the model, but the disadvantage is that it will bring two (learning model, planning) process error.

Here is an important assumption that R and P are independent of each other, that is, the state and behavior of a Moment (S,a) (S,a) obtains the next moment of income r∼rr∼r and the next moment state S∼ps∼p Independent. So the first step, the empirical learning model is two supervised learning problems:

Regression question: S,a→rs,a→r

Classification problem: s,a→s′s,a→s′

As for the P and r of the model, the Gaussian process model, the linear Gaussian model and the neural network model are all available. The second step is to use the learning model for planning.

We have value iterations, strategy iterations, tree search methods, and more. In addition, the known models can be sampled directly, and the sampled experience are planned using model-free methods such as q-learning, Sarsa, Monte-carlo-control, etc. in the previous sections. second, integrated Arch

Dyna: Learning and planning value functions (strategies) from real experience and simulated experience. The latter is sample produced by the MDP (imprecise model) we have learned.

From the algorithm process, is at each step, with the real environment of sample data to learn Q, and learn a model, and then the model produced by the sample learning n times Q. third, simulation-based Search

Focus on the current state, using the forward search algorithm, to establish a current state stst as root of the searching tree.

Based on the simulation search: starting from the current state, using our model to calculate the K-episode, and then use the Model-free method for learning and planning.

The strategy used in the simulation: if the current required state and action are already contained in the constructed tree, then maximize q; otherwise randomly select action (exploration).

Dyna-2, using real experience learning long-term memory, uses simulated experience to learn short-term memory. Original address: Http://cairohy.github.io/2017/09/11/deeplearning/%E3%80%8ADavid%20Silver%E5%BC%BA%E5%8C%96%E5%AD%A6%E4%B9 %a0%e5%85%ac%e5%bc%80%e8%af%be%e3%80%8b-8%ef%bc%9aintegrating%20learning%20and%20planning/

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

Related Keywords:

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

What's Trending

Top 10 Tags

datastax versions naming convention zookeeper client class definition md5 microsoft sql server 2005 data structures exception handling error handling

Top 10 Keywords

microsoft download center down wordpress address url site address url wordpress address url windows installer 4 0 download 302 not found web address url definition site address url wordpress db2 integer mac os installation step by step pdf abbreviation for return

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

David Silver Intensive Learning 8:integrating Learning and Planning__reinforcement

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support