Implementation and consideration of Go game program (3) -- uct Algorithm

Last Update:2018-12-07 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

We already know that UCBAlgorithmWe can quickly find a reliable starting point and continue the previous article. Can we optimize it?

First, we need to know why the UCB algorithm converges faster than the blind Monte Carlo evaluation?In my understanding, the reason is that during the Algorithm Execution Process, The UCB algorithm can constantly adjust the policy based on the previous results and select which of the following points can be prioritized. In fact, this is an online machine learning strategy. The UCB algorithm can be used to solve the problem of multi-arm gangsters mentioned in the previous article. Compared with the simple Monte Carlo evaluation method, the UCB algorithm significantly improves the convergence speed, but further optimization is possible.

In the previous article, the lower points on the go board were compared to corner machines, but what are the differences between them?

The answer is: the multi-arm gangsters only have one level of corner machines. The go game is a game tree composed of multiple levels of corner machines!

The game tree has finally been mentioned. You need to know that the game tree is an essential tool for most chess games. Here is a brief introduction to the most basic search method used in game tree search-maximum and minimum search (if you have no idea about the game tree, Google it yourself ). In a two-person zero-sum game, every decision made by both parties involved in the game is to maximize their own interests (nonsense ~). Suppose we set the game situation when the black player wins to a positive value V, while the white player wins by-V (in other cases, the situation is between V and-V ), every player in the black game aims to make the situation as big as possible. As shown in the game tree, the black game layer always selects the node with the highest situation value as the result to return the previous layer. The white game layer is opposite-this is the maximum and minimum search.

With the above Popular Science, here we will give the answer in the previous article-a more optimized algorithm, uct algorithm (UCB for tree ). The following is an algorithm description:

Given a game tree.

1) Search down from the root point of the game tree and execute 2 ).

2) If node A has a child node that has never been evaluated, execute 3); otherwise, execute 4 ).

3) Evaluate the subnode by using the Monte Carlo method, obtain the benefit value, and then update the average benefit value of all nodes from the subnode to the Root Node path. Execute 1 ).

4) Calculate the UCB value of each subnode, and use the subnode with the highest UCB value as node A. Execute 2 ).

5) the algorithm can be terminated at any time, usually after a specified time or number of attempts.

The child node with the highest average return value under the root node serves as the output of the algorithm.

There are several points to explain about this algorithm:

1) The root node of the game tree refers to the current situation.

2) The evaluated nodes and their average income values areProgramSaves and updates during running.

3) You can set an appropriate value for the benefit value. I know that Mogo is set to 1 (WIN) or 0 (negative), and my program foolish go is to get the region/total region.

4) This algorithm is the cornerstone of the modern go game program.

My personal understanding of this algorithm is that it is essentially an iterative DFS.

In order to better understand the convergence of the uct algorithm, you may wish to think about how this algorithm will "recognize" the question of how to seek a sub-player in the next game?

The theory is almost finished, and the next article will go into practice.

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Implementation and consideration of Go game program (3) -- uct Algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Implementation and consideration of Go game program (3) -- uct Algorithm

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support