Go game program implementation and thinking (2) -- exploration and utilization

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

The previous article introduced the Monte Carlo Situation Assessment.AlgorithmHow to use this algorithm to implement go gamesProgramWhat about it?The easiest way to think of it is to use the Monte Carlo situation evaluation algorithm to evaluate the situation after each of the following points for a given situation, so as to select the best starting point. This is feasible, but is there anything that can be optimized?

If you are a CPU, you know the go rules, but you do not know the higher level of Go knowledge. Instead, you can simulate a random match. How do you choose the game in the face of a huge chessboard? You will simulate the evaluation for 10 thousand times one by one, and finally compare the evaluation value to select the most advantageous one...

Let's start with go and introduce a "Multi-arm gangster problem". The problem is described as follows: A Multi-arm gangster can be seen as a multi-arm corner machine in a casino, and each corner machine has an unknown rate of return, the return rates of different angle machines are independent of each other. Given a limited number of attempts, I would like to ask how to get the highest return from these corner machines? This is a typical model weighing the exploration and exploitation in machine learning, which has been carefully studied in statistics.

After trying some corner machines, we naturally think of trying out those high-return corner machines. However, this is easily confined to existing experience, rather than more exploration. It is likely to miss those with higher returns, so we should try those with fewer attempts, to obtain more accurate information.

The UCB algorithm tries to find the balance between the two actions. The UCB algorithm uses the current average income value of a corner machine as the base number. The base number is the sum of the adjusted value and the UCB value. Each time you try a corner machine with the largest UCB value, this adjustment value decreases as the number of attempts made to the corner machine increases. The formula for calculating the UCB value is as follows:

XJ is the current average benefit value of J corner machines, n is the number of attempts on all corner machines, and TJ (n) is the number of attempts on J corner machines.

The right side of the plus sign is the adjustment value of the UCB algorithm, which is easy to get. The smaller the adjustment value, the more likely the corner machine to be tried.

As for how this formula came about, it was not something I could understand... As an engineer, you can use the results of scientists ~

Go back to go, and go back to the previous assumption. You are a CPU. How can you choose from this question? Using the UCB algorithm is certainly a good answer! You can view each of the bottom points on the current disk as a corner machine. Each time you perform Monte Carlo evaluation on the lowest points with the largest UCB value, you can quickly find reliable starting points.

Wait, are there repeated computations? Can optimization be continued?

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Go game program implementation and thinking (2) -- exploration and utilization

Contact Us

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support