The basic structure of the "RL series" SARSA algorithm

Source: Internet
Author: User

Sarsa algorithm strictly speaking, is TD (0) on the State action function estimation of the On-policy form, so its basic architecture and TD $v_{\pi}$ Estimation Algorithm (On-policy) is not very different, so here is no longer separate elaboration. In this paper, two simple examples are used to apply SARSA algorithm in practice, and the process and basic structure of SARSA algorithm are mastered and summarized.

The statistical methods of reinforcement learning (including Monte Carlo,td) in the implementation of episode task, there are no exceptions to the two layer of the most basic cycle structure. If we look at each episode task as a game, then the game has a beginning and an end, and the statistical method is to play in one inning after another and then summarize the optimal strategy. The difference between Monte Carlo and TD is that Monte Carlo is finished and summed up once, while the TD algorithm is a summary of the side playing. So the two layers of the basic structure of the outer layer is to play the number of cycles, the interior is the game process for the loop.

Sarsa as the On-policy control algorithm under the TD algorithm, only the game Edge Update Action Value function and policy, so the inner layer of the SARSA algorithm can be refined by the TD algorithm to the following structure:

The basic structure of the "RL series" SARSA algorithm

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.