Enhanced Learning ———— K-rocker-arm gambling machine

Source: Internet
Author: User

Exploration and utilization
The ultimate reward for enhancing learning tasks is to be observed after a multi-step action, so let's consider the simplest scenario: maximizing a one-step reward, which is one-step operation only. However, even so, intensive learning is significantly different from supervised learning because the machine tries to discover the results of each action, and no training data tells the machine what to do. In short: missing tags;

To maximize a single-step reward, consider two things: one is to know the rewards of each action, but to perform the most rewarding action.

In fact, one-step reinforcement learning task corresponds to a theoretical model, namely "K-rocker-arm gambling machine". What is the rocker-arm gambling machine, that is, gamblers put a coin, choose a rocker, each rocker has a certain probability to spit coins, this probability gamblers do not know. The goal of gamblers is to find a strategy to make themselves at the same cost, the most benefit.

So, assuming the gambler has 100 coins to do the cost, then he can have two options, one is "exploration only", that is, 100 coins into the 5 rocker arm, to explore each rocker arm to spit out the accumulated amount of coins, so as to find out which rocker optimal; one is "use only", that is, Put 100 coins into the current average reward for the best of the joystick (more than the best one randomly selected). Obviously, both of these are flawed, and to get the best average reward is to find the balance.
Then two algorithms, greedy and softmax, are introduced.

Enhanced Learning ———— K-rocker-arm gambling machine

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.