Enhanced learning greedy algorithm and Softmax algorithm

Source: Internet
Author: User

A

The algorithm is based on a probability to the exploration and use of the compromise: each attempt to explore the probability, that is, the probability of the uniform probability of selecting a rocker arm, in order to take advantage of the probability of selecting the current average reward the highest rocker arm (if there are multiple, then randomly selected).

Where: small k represents the K rocker arm. Because the large k represents the total number of rocker arms, n indicates the number of attempts, and VN represents the reward for the nth attempt.

The intuitive meaning of QN is: The average reward for the previous n-1 times. When it is multiplied with n-1, it is the former n-1 total reward. Plus the nth reward, N is the average reward for n times.

Among them: Argmax for the selection of the best Q (i). Count is starting from 0, so the value of Count (k) +1 is n, and the calculated Q (k) is the average reward for n times.

(ii) SOFTMAX algorithm

The Softmax algorithm is a compromise between exploration and utilization based on the currently known average swing-arm reward. If the average reward of each rocker is equal, the probability of selecting each rocker arm is also equal, and if the average reward of some probabilities is significantly higher than other rewards, the probability of their being chosen is also significantly higher.

In the greedy algorithm, the value is selected by the user. The distribution of rocker-arm probabilities in the Softmax algorithm is based on the Boltzmann distribution.

< search >boltzmann Distribution

Does not see the use of Botlzmann distribution from the algorithm?

The choice of two algorithms depends on the actual situation. As seen from the Softmax, when the temperature is =0.01, the curve is almost coincident with the "Use only" curve.

Enhanced learning greedy algorithm and Softmax algorithm

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.