January 28, 2016, Google DeepMind on the nature announced that its AI go system Alphago historic victory over the human professional Weiqi player! This heavy news has undoubtedly caused the go field and the artificial intelligence field widespread attention! March Alphago against Li Shishi will attract the attention of all mankind!
What makes the go algorithm produce a qualitative leap? To know, before the best go program can only reach the level of amateur human players. Is it true that artificial intelligence has been produced?
For most people, we all know that 1997 years of "Deep Blue" computer defeated the human chess champion Kasparov, but people do not think that "deep blue" really has artificial intelligence, the reason is very simple: chess (of course, go is also) every step is visible, under a deterministic chess game, There are only a limited number of methods to go. There must be an optimal one in this limited way. A basic idea is to predict the game, traverse every way until one wins, then go back and calculate the probability of each possible win, and finally use the highest probability as the best way to go. "Deep Blue" did such a thing, violent poor lift all the steps, and then find the best! Win the human race, but no intelligence, because the whole algorithm is completely artificial design of an algorithm, there is no way to see where the intelligence.
Obviously go theory can also be violent crack, but the problem is that Weiqi can walk too much, so that the current computational performance is not able to do brute force. This is why Weiqi is a major challenge in the face of AI.
To make the go program to defeat the top man, only rely on real artificial intelligence!
Go have a knowledge of the friends know that the next go need to have an intuitive understanding of the whole game, this is the difficulty of go places. Unless the computer really understand the game, it is possible to have bigger picture, it is possible to make a real good chess!
So, here's the question:
AlphaGo have real artificial intelligence?
My answer is:
Alphago has true AI, although not perfect!
So
Where is Alphago's real AI?
My answer is:
It's in the deep neural network.
All of the answers are published on google DeepMind in nature:
Mastering the Game of Go with Deep Neural Networks and Tree Search
Paper Links
This article will analyze Alphago This article nature, to decrypt the mystery of true AI!
What is the "brain" of Alphago?
Deep Neural network is Alphago "Brain", we first think of it as a black box, there are inputs, there are output, the middle of the specific how to deal with the first without consideration. Then Alphago's "brain" is actually divided into four parts:
- Rollout Policy quickly senses "brain": used to quickly perceive the disk surface of go, to obtain a better chess choice, similar to the first reaction of the human observation panel, the accuracy is not high
- The SL Policy Network depth Mimics "brain": a brain region that mimics learning with a 6-9-segment Master's chess game. This depth mimics the "brain" to produce a similar human chess-player approach based on the disk surface.
- The RL Policy Network learns to Grow "brain": Based on deep imitation of "brain", through continuous training with the previous "self" to improve the level of chess.
- Value Network Global Analysis of the "brain": the use of self-learning to grow the "brain" study on the entire face of the face judgment, to achieve a global analysis of the entire game.
So, Alphago's "brain" actually has four brain regions, each brain area function differently, but compared to find that these abilities basically for the human chess player to play the different thinking, including both local calculations, but also contains a global analysis. The policy network is used to judge the pros and cons of each move, while the value network judges the overall game.
And it's important to alphago that the first step is to rely on imitation, which is based on the depth of imitation "brain" to carry out self-level ascension. This is exactly the same as the way humans learn. The first is to imitate the other people's move, and then slowly produce their own.
So what about the performance of these different brain regions?
- Quick perception of "brain" vs. Chess selection comparison of the human master's chess choice only 24.2% of the correct rate
- Deep imitation of the "brain" on the choice of chess choices compared to the human master's chess choice only 57% of the correct rate, that is, the use of depth imitation "brain", in itself more than half the probability of choice and the human master like the way.
- Self-taught growth "brain" after continuous self-improvement, and the depth of imitation of the "brain" to play, unexpectedly to achieve 80% victory. This essentially shows that through self-study, the level of chess has been a huge improvement.
- Global analysis "brain" using self-learning to grow "brain" training, the overall situation of the judgment of the variance between 0.22~0.23. That is, about 80% of the probability is right to judge the situation. This is the key to Alphago's ability to achieve professional player level.
From the above analysis you can see the alphago of the different "brain regions" of the powerful. How each brain learns in the following subsections, let's take a look at how Alphago plays chess after having these trained brains.
How did AlphaGo play chess?
Before analysing how Alphago plays chess, let's look at how the human chess player will play:
- Step 1: Analyze and judge the overall situation
- Step 2: Analyze and judge the local chess game to find several possible lazi points
- Step 3: Predict the changes in the next few steps, and judge and choose the best Lazi point.
Alphago, on the basis of a powerful neural network "brain", uses a Monte-Carlo search to obtain the best lazi points, which are essentially close to the human approach.
The first is the basic idea of using the Monte-Carlo tree Search, which is very simple:
Simulate future games multiple times, then select the most frequently chosen method in the simulation
Alphago specific chess basic ideas are as follows (ignoring some technical details such as expanding leaf nodes):
- Step 1: Based on
深度模仿“脑”
the next steps to predict the future, go until L step.
- Step 2: A combination of two ways to evaluate the future to L, one is to use a global analysis of the "brain" for evaluation, to determine the winning face, one is to use the rapid perception of "brain" to do further predictions until the end of the game to get simulation results. The combination of both evaluates the forecast to the future L-step method.
- Step 3: Evaluate the results as the next step in the current chess game valuation. The next steps given at the beginning are evaluated according to the direction of the future.
- Step 4: Combine the valuation of the next step with the depth of the simulation of the brain to simulate again, if the same way, then the valuation of the Walk method averaging (Monte Carlo idea here)
Loop the above steps repeatedly to n times. Then choose the most selected walk as the next step.
It's a bit complicated, simple to say is the comprehensive global and concrete analysis of the calculation, the next chess simulation, to find the best next. The choice of the pace depends on the global analysis of the "brain" judgment, but also the depth of imitation of the "brain" judgment.
Analysis here, you can understand why in the Alphago and Fan Hui game, some Alphago Lazi not only consider the local tactics, but also consider the overall strategy.
Knowing the specific chess method of Alphago, we will understand that making alphago so strong is still in the alphago of several deep neural networks.
So, let's see how Alphago's brain learns.
How did Alphago learn?
Alphago's learning relies on deep learning and enhanced learning reinforcement learning, which together is the deeper reinforcement learning. This is actually the forefront of the current AI field research direction.
For deep learning and enhanced learning, this article does not make a detailed introduction. Deep Neural Network is a multi-layer neural network formed by a huge amount of parameters, input a certain type of data, output a certain result, according to the output error, calculate and update the parameters of the neural network, thereby reducing the error, so that the use of neural networks, the specific input can get specific desired results.
Take a deep simulation of "brain" for example. This is actually a 12-layer neural network. Input is mainly the whole board of 19*19 information (such as black chess information, white chess information, empty information, and some other information related to the Go Rule 48 kinds). The output requirement is the next lazi. Then Google DeepMind has 30 million lazi data, this is the training set, according to the output error can be trained neural network. The training end achieves a 57% correct rate. That is to enter a chessboard game state, the output of lazi more than half of the choice and the human master of the same Lazi way. In a sense, it is this neural network that learns the chess game, and thus can get the same Lazi method as the human master.
Another way to look at the Alphago is very scary, because this neural network is originally used in computer vision. The input to the neural network is the chessboard, which is similar to the Alphago to watch the board study.
Speaking of which, we can see that the real artificial intelligence comes from the neural network, the specific neural network parameters why can show the intelligence I am afraid nobody knows? What is smart is still waiting for the answer.
Deep Neural Network is the dawn of artificial intelligence!
Decrypt Google Deepmind Alphago go algorithm: Where does true ai come from?