David Silver Intensive Learning 3:dp__reinforcement

Source: Internet
Author: User
First, some concepts

Two programming problems of MDP: prediction, given MDP and strategy π, finding out the value function vπvπ control, given MDP, finding the best value function v∗v∗ and the best strategy π∗π∗

Policy Evaluation Strategy Evaluation:

Given a strategy, the state value function obtained from the v0v0,v1v1 to the vπvπ, the K step, the state value function of k+1 step can be obtained by the bellman expectation equation. In this way, the final state value function converges and completes the evaluation of the strategy π.

Policy Iteration Policy iterations: 1. Evaluate the strategy and update the value function using the method of strategy evaluation; 2. Improve the strategy, according to the value function of the previous step, use the greedy principle to update the strategy; 3. The optimal value function V is found by the two steps of the iteration until the optimal strategy π is found.

Value iterations: According to the Bellman optimal equation, each cyclic computation (update) value function; There is no explicit strategy, the greedy calculation method is more direct in the optimal equation. Vk+1 (s) =maxa∈a[ras+γ∑s′∈spass′vk (s′)]

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.