Alphago is how to achieve the _

Alphago is how to achieve the __ neural network

Last Update:2018-08-21 Source: Internet

Author: User

Tags value of pi

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Preface

Recently read Alphago's paper: Mastering then Game of Go with Deep nerual Networks and Search. Amazed at the creativity of these people and the power of neural networks, the game of Weiqi can be done to this extent. Write a paper in the method and their own thinking it, this article is basically a thesis in the view, but to my perspective to read, there are errors in the place to welcome. about chessboard chess game

Remember when in college, the school held an artificial intelligence challenge, the title is to do a black and white chess ai, was lucky to get the first, but did not use a very powerful method. Mainly search + valuation, go and black and white, is a game with perfect information, for a certain state (disk), if all players perfect play, the next choice is to have an optimal value, however, for people, to calculate all the way is unrealistic, for the computer. The situation may be better, the computational power is certainly a little stronger than the human, but in reality it is not possible to be exhaustive.

Black and white chess board only 8x8, and Weiqi is 19x19, this is not a few times the gap, is several orders of magnitude gap. For Weiqi, if the search method is used, the search tree probably includes the BD b^d, where b b is the breadth of the search (the legal walk in each state), and D d is the depth of the search, then b≈250,d≈150 b\approx250,d\approx150. Want to be poor all possible, with the current computer computing power, estimated to wait for several lifetimes

But, we can simplify the process, not to get a definite, optimal result, but to find a good result as possible, which is possible. First, the depth of the search tree can be reduced by the value function, what is the value function? For a given disk-surface state s, a value function V (s) v (s) can be used to give the current state to get the final result of the estimate, so that we can approximate this value to the final result, when the search to a certain depth, the V (s) v (s), and it as the final result, No longer search down, to reduce the depth of the search tree. Second, the breadth of the search can be reduced by the policy function P (a|s) p (a|s), in fact, is very simple, is not to search all possible ways to go, only to pick some of the likely interests of the larger points to look for, this is the habit of human chess. The application of Monte Carlo (MCTS) in checkerboard search

First of all, let's say Monte Carlo method, which is actually a method of solving by probability or thinking. The principle is to understand a system through a large number of random samples, and then get the value to be calculated.
It's very powerful and flexible, and quite simple to understand and easy to implement. For many problems, it is often the simplest method of calculation, sometimes even the only feasible one.

To give a simple example, ask for Pi Π\pi, it is possible to estimate the value of pi by calculating the distance between these points and the center through the square region of the graph below, random n-N points (x,y) (X,y), and by computing the distances between the dots and centers, and finally by the area formula and the probability of these points in the circle.

Knowing the Monte Carlo method and looking at the Monte Carlo tree is actually moving it to the tree. It is also thought of by doing a lot of simulation (rollouts) and finally trying to converge the answer to a branch. The Monte Carlo tree has four steps:
1. Select (Selection): Starting from the root node, the recursive selection of the best child nodes, until a leaf node reached. Select the optimal child node according to the formula Vi+cxlnnni‾‾‾‾√v_i + c\times \sqrt{\frac{\ln N}{n_i}}, where the VI v_i is the value of the node estimate, Ni N_i is the number of times the node is accessed, and N n is the total number of times the parent node has been accessed , c c is a constant.
2. extension (expansion): The current node is not a terminating node (the game is not finished), and the number of accesses to the node exceeds a set threshold, then one or more child nodes are created.
3. Simulation (simulation): Starting from the current leaf node, through a random adoption of a walk strategy p p, simulate the walk, until the end of the game.
4. Return (backpropagation): According to the final results of the simulation, reverse update the search tree.

Monte Carlo simply introduced here, interested in the Internet can find other information. Before Alphago out, the strongest go AI is based on MCTs, Alphago also used the MCTs method plus neural network optimization, and finally completed the victory of human professional players feat. structure

Before learning the Alphago algorithm, it is necessary to have a general understanding of its structure. We need to know what it does with the neural network and what it does with MCTs.
1. There is a monitoring learning strategy network Pσp_\sigma : Through supervised learning (sl:supervised learning), let the neural network learn professional players ' walks, This training provides fast and effective learning updates through immediate feedback and high quality gradients.
2. a policy network that can quickly make a decision PΠP_\PI : This is to allow faster in the next game to get out of the child strategy, after all, a large network time consuming, but the quality is worse than the pσp_\sigma.
3. using the Enhanced Learning Training Strategy Network pρ

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More