(turn) a graphic alphago principle and weakness

Source: Internet
Author: User

A graphic alphago principle and weakness2016-03-23 Jeong Woo, Zhang Junbo ckdd

Author Profile:

Jeong Woo, PhD, editor-in-chief of ACM Transactions on Intelligent Systems and technology, ACM Data Mining China Chapter Secretary General.

Zhang Junbo, PhD, member of ACM Data Mining China Branch, engaged in deep neural network related research.

--------------------------------------

Recently, Alphago in the man-machine go game Li Shishi 3 games in a row , embodies the breakthrough in the field of AI, as a worker in the field of artificial intelligence, we are deeply gratified. Its essence is the triumph of the Deep Learning Network (CNN) combined with the Monte Carlo search tree (MCTS) , which is the progress of human intelligence. Many so-called " Brick" family began to preach the machine to overcome human beings, and even human will be dominated by the machine ignorant speech, so that people can not see. As a go enthusiast and AI field worker, we think it's time to talk about Alphago 's principles and weaknesses.

can be very responsible to tell everyone, Alphago has not completely overcome the problem of go, professional chess player is not not without hope to win went, not to say the machine defeated the human. Alphago the road ahead is still a long way to go. If a Chinese professional chess player wants to challenge Alphago, we are willing to build the top (and understand go) AI Expert Advisory panel to help them win Alphago.

Although the online technology paste a lot, but not yet an article completely clear Aphago principle, Nature on the article also lacks a planing the overall picture (plus in English description, students are difficult to understand). Here is a picture of how Dr. Zhang Junbo and I, after reading the original text and collecting a great deal of other information, explained the principle of Alphago, and after reading it, we naturally know where the weaknesses are.


Figure 1, Alphago schematic (the author spent a lot of effort for this figure, copyright belongs to the two authors, welcome to forward, but do not steal the map)

The Alphago generally includes offline learning (upper half of Figure 1) and online chess (in the lower part of Figure 1) two processes.

The offline learning process consists of three training stages.

    • The first stage: use more than 160,000 professional chess game to train two networks. One is the policy network, which is trained on global features and deep convolutional networks (CNN). Its main function is to give the current disk state as input, and output the lazi probability of the next chess on the other open space of the chessboard. The other is the fast moves strategy (Rollout policy), which is trained using local features and linear models. The strategy network is slow, but the precision is high, and the fast moves strategy is the reverse.

    • The second stage: using the Strategy network of the T-wheel and the previously trained strategy network to match each other, using reinforcement learning to correct the parameters of the T-wheel strategy network, and finally get an enhanced strategy network. This part is greatly advocated by many "brick" families, but in reality there should be a theoretical bottleneck (limited lift capacity). This is like the 2 6-year-old children keep on the chess, the level will reach 9 segments of occupation?

    • The third stage: the first U-1 step (U is a random variable that belongs to [1, 450]) using a common strategy network, and then uses random sampling to determine the position of the U-Step (this is to increase the multiplicity of chess and prevent overfitting). Then, use the Enhanced Strategy Network to complete the following self-game process until the game ends. Thereafter, the face of the U-Step as a feature input, the outcome as a label, learning a value network (value net), to determine the outcome of the probability of winning or losing. Value network is actually a big innovation of Alphago, the most difficult place of go is difficult to judge the final result according to the current situation, this professional chess player also difficult to master. Through a lot of self-game, Alphago produced 30 million games to train the value network. But because the search space of Go is too big, 30 million chess game also can't help Alphago completely conquer this problem.

The online chess process consists of the following 5 key steps: The core idea of the Monte Carlo Search tree (MCTS) embedded deep neural network to reduce the search space. Alphago has no real ability to think.

    1. The corresponding characteristics are extracted according to the lazi of the current disk;

    2. Using the Strategy network to estimate the Lazi probability of other space in the chessboard;

    3. Calculates the weight down here based on the probability of Lazi, and the initial value is the Lazi probability itself (such as 0.18). The actual situation may be a function that takes a probability value as input, which is easy to understand here.

    4. Using the value network and the fast moves network to judge the situation separately, the score of two situations is summed up as the final moves winning score. Here the use of fast moves strategy is a speed to exchange for the amount of the method, from the position of the judge, the speed of chess to the end, each time the end of the game will have a winning and losing results, and then the overall statistics of the node corresponding to the winning rate. The value network can evaluate the final result directly according to the current state. Both have advantages and disadvantages and complement each other.

    5. Use the score from the fourth step to update the weight of the previous moves position (e.g., from 0.18 to 0.12), and then start searching and updating from the edge of the 0.15 with the highest weight. The update process for these weights should be parallel. When a node has been accessed more than a certain threshold, the next level of search is further expanded on the Monte Carlo Tree (2).


      Figure 2, MCTs expand the next Level node

Where is Alphago's weakness?

    1. attack its strategy network and increase search space . After entering the mid-live, if professional players can establish a more complex situation, each move is implicated in a lot of local chess fate (to avoid single-block, local combat), then the Alphago need to search space is sharply increased, a short period of time to get the precision of the solution will be greatly discounted. In layman's terms, the complexity of the changes, people are not clear, the current computer computing power is no way. This is what the fourth inning of the Li Shishi nine paragraphs means. Around here a total of 5 pieces of black and white chess are interrelated to each other, white 1, black chess need to consider a lot of places. Many places require a deeper search on the MCTs. In order to have the result in a certain time, can only abandon the search precision.

      Figure 3, Li Shishi to Alphago fourth chess game

    2. attack its value network, the Alphago: The value of the network greatly improved before relying solely on MCTs to do the situation to judge the accuracy, but the right to judge the situation of the go there is no small gap. Neural networks do not completely avoid some weird (even wrong) judgments at some point, not to mention that their training samples are far from enough. This is why it is still necessary to rely on the rapid moves to judge the situation when there is a value network. Everyone has suspected Alphago's ability to rob, but also feel the Alphago to avoid the signs of robbery. In fact, Professor Zhou Zhihua of Nanjing University once wrote that robbery would cause the value network to collapse, and the principle would not be repeated. not to say that Alphago will not rob, but fear in the early days of the game of multiple robbery coexist . That is, the robbery to take early, too late search space becomes smaller, even if the value of network failure, but also by fast moves network to compensate. Robbery should be in just into the mid-season as good (too early to rob the money is not enough), and to maintain a long time, the best in the disk can be at the same time more than two robbery . Without the value of the network of Alphago in fact, the level of occupation is about 3-5.

Conclusion

    • Alphago has reached the level of top players, but can not say that completely defeated the human race!

    • Alphago embodies the progress of human intelligence, but it has no thinking and wisdom in itself!

    • Data + Computing Resources + computing methods together promote the progress of AI in the GO Project!

Long press the following two-dimensional code to follow KDD China

Read the original

Sweep
Follow the public number

(turn) a graphic alphago principle and weakness

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.