This is DeepMind's paper on the January 28, 2016 Nature magazine, "Mastering the game of Go with deep neural networks and Tree Search", describes the AlphaGo program's Details. This blog post is a reading note on this article.
AlphaGo Neural network structure
AlphaGo is generally composed of two neural networks, the following I refer to them as "two Brains", which is not a reference in the original, but a metaphor for me.
The role of the first brain (Policy Network) is to determine where to go next in the current situation. It has two learning modes: one is the simple mode, which is trained by observing the game data on the KGS (a chess server). Roughly speaking: This can be understood to allow the brain to learn the "formula", that is, in a given situation where humans will generally go, this learning does not involve the judgment of pros and cons. The other is the self-reinforcing learning model, which learns to evaluate the pros and cons of each step by itself and the final outcome of its massive game. Because it is self-game, the amount of data can grow indefinitely.
The role of the second brain (Value Network) is to learn how to evaluate the overall disk surface. It is also trained through a mass of self-games (because the use of human games will fail due to too little data).
In chess, these two brains work together in such a way:
The simple pattern of the first brain will determine what goes on in the current situation and is worth considering.
The first complex model of the brain through the Monte Carlo to expand a variety of methods, namely the so-called "counting Chess" to determine the merits of each way. In this calculation, the second brain assists the first brain by judging the situation to cut off a large number of bifurcation trees that are not worth further consideration, thus greatly improving computational efficiency.
At the same time, the merits of the second brain itself, as a result of the next move, can also give advice on next moves.
Ultimately, the two brain recommendations are weighted evenly to make the final decision.
An interesting conclusion in the paper is that the average number of two brains is much better than the result of relying on each of them. This should be the key to showing AlphaGo and human similarity.
======================================
The input of both networks is the state of the whole chessboard, there is not a focus on the local one focus on the whole. The difference is in the function, the policy network is used to give a direct lazi strategy, and the value network is used to quickly estimate the probability that the current situation leads to eventual victory. MCTs, under the guidance of a simplified version of the policy Network, searched for the final result in real time (but slower) to estimate the probability that the current situation would prevail. The probability of the value network estimation and the probability of the MCTS estimate are directly weighted averages, and the final selection of the Lazi position is based on the weighted average and the full version of the Policy network gives a weighted average of the revenue for each lazi position, But the more times a location is searched by MCTs (similar to calculations), the less weight the Policy network results (similar to experience and intuition) give. So this step is not a simple weighted average, nor is it a weighted average of local judgments and global judgments. The two networks consider how many global and local factors are relevant only to the training data and the current checkerboard state, which is not related to which network to use.
The use of image recognition commonly used convolutional neural network to identify the chess game may not be the best solution, but the state and image of go have a certain translation symmetry, which is the convolution network good at using the characteristics. I don't know about go, but I guess people recognize the go situation should also have similar thinking when using image recognition. I see that people who know how to go can use some abstract fuzzy concepts for fast but imprecise deduction, and if that is important, now the algorithm may need to join the recurrent neural network for similar functions, but it should not be a big problem. Coupled with the ability to quickly search the machine, I think the March win or lose may not be good to say, but the AI on the go on the crush of mankind is a matter of a year or two.
Learning efficiency issues. Ai now learns in a very different way, but only because people are not really starting from scratch when they are learning a new field of knowledge, and models like neural networks are often trained from random parameters without any prior information. For example, if AI understand human language and common sense, common concepts, then start to learn to go without looking at a large number of human games, and then crazy themselves with themselves, and can be like people by the teacher from the basic knowledge of the beginning of a little professor, so maybe start faster. But in the need of a lot of imitation and practice to improve the skills of the stage, AI learning efficiency is not necessarily worse than people.
======================================
Ask the following questions and comments
First, these neural network training is largely achieved through self-game. This is both an advantage (according to Tanaka, a Facebook AI researcher, that the tens of millions of self-game scale is a staggering amount of data), and to some extent it has to be, because the total number of people in the game is so small that it can lead to overfitting problems that are common in machine learning.
But is it possible to create self-handicapping or even restricting consequences? This also involves people's understanding of the learning process of neural networks and the understanding of Weiqi itself. On the one hand, whether the neural network itself is tolerant to a certain degree of "think out of the box", it depends on the specific neural network algorithm, but it is also really an essential puzzle of the neural network method. On the other hand, since AlphaGo's most basic stereotypes still come from human games, the question depends on whether the human chess player has exhausted all the meaningful basic stereotypes of Weiqi.
(as a case in AlphaGo and 樊麾 's second game, a lot of people noticed that AlphaGo went through a nonstandard big avalanche, does that mean AI was wrong or did it find it better to go?) )
Second, the two brains work in a way that is very similar to human beings, a judgment detail, and a holistic overview. But AlphaGo's eventual combination of both is fairly straightforward: let the two evaluate each of the possible pros and cons, and then take an average. This is not a human way of thinking.
For humans, the combination of these two ways of thinking is much more complex (not just in Weiqi). People do not always make macro and microscopic judgments about the situation at the same time, but sometimes focus on the overall situation, sometimes focusing on detail. The specific energy distribution depends on the state of affairs, and also on the emotional, psychological and subconscious reactions of the person at the time. This is, of course, an imperfect human being, but also a source of human behavior richness.
And AlphaGo certainly embodies a certain bigger picture, but from the concrete algorithm, it in order to make local sacrifices for the macro advantage of the ability and human completely can not be compared. AlphaGo introduction of the overall panel evaluation is indeed the place where it is better than many other go AI, but fundamentally, it is just the first step in trying to make AI "strategic thinking," and there are too many possibilities for improvement.
Finally, like many other go AI, when the ALPHAGO study disk face judgment, the use of image processing technology, that is, the chess board as a picture to treat. This is certainly technically a natural choice, but the chess game is not the general sense of the pattern, whether it has certain characteristics is the common image processing method itself is not good at dealing with it?
Application
Why do you want AI to go for the Weiqi? There are many reasons. But it seems to me that the most important thing is that it gives us a deeper understanding of the nature of intelligence.
The leap in neural networks and machine learning over the past decade has really made AI do many things that only the human brain can do. But this does not mean that AI is thinking in a way that is close to human beings. Paradoxically, AI's great advances in computing power obscure its short board in learning the way human thought.
Take AlphaGo as an example. Compared with the dark blue system in chess, AlphaGo has approached humans a lot. Deep Blue still relies on human externally defined value functions, so it is essentially an efficient calculator, but AlphaGo's value judgment is self-acquisition, which has a human shadow. However, as mentioned earlier, the progress of AlphaGo relies on a huge number of self-games, which is of course its merits, but it also shows that it does not really grasp the human learning ability. A human chess player in the life of at most thousands of innings chess, can grasp the AlphaGo in millions of innings chess training of judgment, this suffice to explain, the human learning process also has some kind of essence is temporarily unable to use the current neural network program to portray.
(Incidentally, many comments suggest that AlphaGo can understand his Chifeng by observing a particular player's game to make a corresponding response.) At least from the paper, this is almost certain to be impossible. The number of games for a chess player is too little for AlphaGo, and it is impossible to train the neural network effectively. Observing and summarizing a person's "Chifeng" is still a man's ability to have a full advantage, which is more difficult for a computer than winning chess itself. )
This is certainly not to say that AlphaGo should try to re-carve the brains of a human chess player. But the meaning of AlphaGo should not be reflected only in its ultimate chess force. How did it grow? What is the pattern of the growth curve? How does its different parameter settings affect its comprehensive capabilities? Do these different parameters correspond to different Chifeng and personalities? If there is another different but the level of the AI and its repeated chess, it can "learn" from the other side and the ability of different self-chess? The study and answer to these questions is probably more than simply observing whether it will someday be able to transcend human knowledge to tell us much.
So even if AlphaGo defeated Li Shi he sedol in March, it seems to me to be another door opening rather than closing. In fact, even in the development of the go itself, if the AlphaGo two brains in such a simple way to be able to better than the human, it can only show that there is too much room for the law of Go to explore.
In the field of artificial intelligence, AlphaGo, like all neural networks, is essentially a big black box, and we can observe the great power it shows, but still little is known about how it "thinks" about it. It was a great triumph in engineering. In science, this is just the first step of Long March.
Resources
AlphaGo Project Home: http://www.deepmind.com/alpha-go.html
Nature paper: http://www.nature.com/nature/journal/v529/n7587/full/nature16961.html
Nature Report: http://www.nature.com/news/google-ai-algorithm-masters-ancient-game-of-go-1.19234
Dan Maas's superficial summary of the paper: http://www.dcine.com/2016/01/28/alphago/
Program algorithm Art and Practice reading notes on AlphaGo papers