Enhanced Learning ———— K-rocker-arm gambling machine

Last Update:2016-05-31 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Exploration and utilization
The ultimate reward for enhancing learning tasks is to be observed after a multi-step action, so let's consider the simplest scenario: maximizing a one-step reward, which is one-step operation only. However, even so, intensive learning is significantly different from supervised learning because the machine tries to discover the results of each action, and no training data tells the machine what to do. In short: missing tags;

To maximize a single-step reward, consider two things: one is to know the rewards of each action, but to perform the most rewarding action.

In fact, one-step reinforcement learning task corresponds to a theoretical model, namely "K-rocker-arm gambling machine". What is the rocker-arm gambling machine, that is, gamblers put a coin, choose a rocker, each rocker has a certain probability to spit coins, this probability gamblers do not know. The goal of gamblers is to find a strategy to make themselves at the same cost, the most benefit.

So, assuming the gambler has 100 coins to do the cost, then he can have two options, one is "exploration only", that is, 100 coins into the 5 rocker arm, to explore each rocker arm to spit out the accumulated amount of coins, so as to find out which rocker optimal; one is "use only", that is, Put 100 coins into the current average reward for the best of the joystick (more than the best one randomly selected). Obviously, both of these are flawed, and to get the best average reward is to find the balance.
Then two algorithms, greedy and softmax, are introduced.

Enhanced Learning ———— K-rocker-arm gambling machine

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Enhanced Learning ———— K-rocker-arm gambling machine

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support