Go game program implementation and thinking (2) -- exploration and utilization

Source: Internet
Author: User

The previous article introduced the Monte Carlo Situation Assessment.AlgorithmHow to use this algorithm to implement go gamesProgramWhat about it?The easiest way to think of it is to use the Monte Carlo situation evaluation algorithm to evaluate the situation after each of the following points for a given situation, so as to select the best starting point. This is feasible, but is there anything that can be optimized?

If you are a CPU, you know the go rules, but you do not know the higher level of Go knowledge. Instead, you can simulate a random match. How do you choose the game in the face of a huge chessboard? You will simulate the evaluation for 10 thousand times one by one, and finally compare the evaluation value to select the most advantageous one...

Let's start with go and introduce a "Multi-arm gangster problem". The problem is described as follows: A Multi-arm gangster can be seen as a multi-arm corner machine in a casino, and each corner machine has an unknown rate of return, the return rates of different angle machines are independent of each other. Given a limited number of attempts, I would like to ask how to get the highest return from these corner machines? This is a typical model weighing the exploration and exploitation in machine learning, which has been carefully studied in statistics.

After trying some corner machines, we naturally think of trying out those high-return corner machines. However, this is easily confined to existing experience, rather than more exploration. It is likely to miss those with higher returns, so we should try those with fewer attempts, to obtain more accurate information.

The UCB algorithm tries to find the balance between the two actions. The UCB algorithm uses the current average income value of a corner machine as the base number. The base number is the sum of the adjusted value and the UCB value. Each time you try a corner machine with the largest UCB value, this adjustment value decreases as the number of attempts made to the corner machine increases. The formula for calculating the UCB value is as follows:


XJ is the current average benefit value of J corner machines, n is the number of attempts on all corner machines, and TJ (n) is the number of attempts on J corner machines.

The right side of the plus sign is the adjustment value of the UCB algorithm, which is easy to get. The smaller the adjustment value, the more likely the corner machine to be tried.

As for how this formula came about, it was not something I could understand... As an engineer, you can use the results of scientists ~

Go back to go, and go back to the previous assumption. You are a CPU. How can you choose from this question? Using the UCB algorithm is certainly a good answer! You can view each of the bottom points on the current disk as a corner machine. Each time you perform Monte Carlo evaluation on the lowest points with the largest UCB value, you can quickly find reliable starting points.

Wait, are there repeated computations? Can optimization be continued?

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.