"Deep learning" refers to the multilayer artificial neural network and the method of training it. A layer of neural network will take a large number of matrix numbers as input, through the Non-linear activation method weights, and then produce another data set as output. This is like the working mechanism of the biological neural brain, through the appropriate number of matrices, multi-layer organization links together, the formation of neural network "brain" for precise and complex processing, just as people recognize objects to mark pictures.
Although neural networks had been around for decades, it was only recently that the situation was clear. This is because they need a lot of "training" to find the digital value in the matrix. For early researchers, the minimum amount of training needed to get a good effect was far beyond the computational power and the size of the data available. But in recent years, some teams that have been able to get a lot of resources to recreate neural networks are using "big data" technology to train efficiently.
Alphago is using two different neural network "brain" cooperation to improve chess. These brains are multilayered neural networks that are structurally similar to those of Google Image search engine recognition images. They start with a multi-level heuristic two-dimensional filter to handle the positioning of the chess board, just like the image classifier network processes the image. After filtering, 13 fully connected neural network layers produce a judgment of what they see. These layers can be classified and logically reasoned.
These networks examine the results through repeated training, then proofread the adjustment parameters to make the next execution better. This processor has a lot of random elements, so it's impossible to know exactly how the network is "thinking," but more training can make it evolve better.
First brain: Drop selector (move Picker)
Alphago's first neural network brain is the "Monitoring Learning Strategy Network (Policy Network)", looking at the chessboard layout in an attempt to find the best next step. In fact, it predicts the best probability of each legal next step, so the first guess is the highest probability. You can understand it as a "drop selector".
(How does the drop selector see a chessboard?) Numbers indicate where the strongest human contestants are likely to be. )
The team through the KGS (network Weiqi vs platform) on the strongest human opponents, millions chess drop to train the brain. This is where Alphago is most like people, and the goal is to learn from the best of them. This is not to win the next, but to find a human master the same next drop. Alphago drop selector can correctly match 57% of human Masters. (Does not conform to the error that does not mean that it is possible for human beings to make their own mistakes)
The Alphago system actually requires two extra drop selectors for the brain. One is the "Enhanced Learning Strategy Network (Policy Network)", which is accomplished by millions an additional simulation board. You can call that stronger. Rather than basic training, just teaching the web to imitate the drop of a single human being, advanced training will be the most likely winner in the end with every simulated chess game. The SLIVER team summed up the millions training chess game with a stronger drop selector than they did in previous versions.
Using this drop selector alone is already a formidable opponent, to the level of an amateur chess player, or to the strongest go AI before. The point here is that this drop selector does not "read". It simply examines the position from a single chessboard and then presents the drop from that position. It is not going to simulate any future walk. This shows the power of simple deep neural network learning.
Alphago of course the team didn't stop here. Below I will explain how to give the ability to read the AI. To do this, they need a faster version of the drop selector brain. The more powerful versions are time-consuming-and fast enough to produce a good drop, the reading structure needs to check thousands of drop possibilities to make a decision.
The silver team builds simple drop selectors to make "fast-reading" versions, which they call "rolling networks." A simple version will not look at the entire 19*19 board, but will consider a smaller window in front of the opponent and under the new pieces. Removing part of the drop selector The brain loses some strength, but a lightweight version can be 1000 times times faster than it was before, making the "reading structure" possible.
Second brain: Chess evaluator (Position evaluator)
Alphago's second brain, relative to the drop selector, is answering another question. Instead of guessing the specific next step, it predicts the likelihood that each player will win chess, in the case of a given piece position. This "situation evaluator" is the "value network" mentioned in the paper, which is used to assist the drop selector through the overall situation network. This judgment is only approximate, but it helps to improve the speed of reading. By classifying "good" and "bad" potential future scenarios, Alphago can decide whether to read in depth through special variants. If the situation evaluator says this particular variant does not work, then the AI skips any more drop that is read on this line.
(What the situation evaluator looks at the chessboard.) Dark blue Indicates the next step in the position of winning chess. )
The situation evaluator is also trained by millions other chess games. The Silver team created these scenarios by copying the strongest drop selectors of the two alphago, carefully selecting random samples. Here AI Drop selector is very valuable in the efficient creation of large-scale data sets to train the situation evaluator. This drop selector allows you to simulate a lot of possibilities to go down, from any given chessboard situation to guessing the approximate winning probabilities. The man's chess game is not enough, I am afraid it is difficult to complete this training.
Here are three versions of the drop selection of the brain, plus the situation to assess the brain, Alphago can effectively read the future walk and steps. Reading, like most go AI, is done through the Monte Carlo Tree Search (MCTS) algorithm. But Alphago is smarter than any other AI, and can be more intelligent about which variants to detect and how deep they need to be probed.
(Monte Carlo Tree Search algorithm)
With infinite computational power, MCTS can theoretically compute the best drop by exploring the possible steps of each inning. But the search space for the future approach is too big for Weiqi (more than we know of the particles in the universe), and in fact the AI has no way to explore every possible variant. The reason MCTS practices are better than other AI is in identifying advantageous variants, which can be skipped in some unfavorable ways.
The Silver team lets Alphago install modules of the MCTS system, which allows designers to embed different functions to evaluate variants. The final full horsepower Alphago system uses all of these brains as follows.
1. From the current board layout, choose which next possibilities. They use the base drop selector brain (they try to use a stronger version, but in fact it makes the alphago weaker because it does not allow MCTs to provide a wider choice of space). It focuses on the "drop" rather than on reading a lot, instead of choosing a way that might be good for later.
2. For each possible drop, there are two ways to evaluate quality: either by using the board's position evaluator after drop, or by running deeper into the Monte Carlo Simulator (scrolling) to think about the future of drop, using a fast-reading drop selector to improve search speed. Alphago uses simple parameters, "mixed correlation coefficients" to weigh each guess. The maximum horsepower Alphago uses the 50/50 mixing ratio, uses the situation evaluator and simulates the rolling to make the balance judgment.
As they use plug-ins different, alphago ability changes and simulations of the above steps. Using only the independent brain, Alphago is almost as strong as the best computer go AI, but when used these combined methods, it is possible to reach the level of the professional human player.
(Alphago's ability changes are related to the use of MCTs plug-ins.) )
Engineering Optimization: Distributed computing, network computers to improve MCTs speed, but none of these changes the underlying algorithm. The parts of these algorithms are accurate and partially approximate. In special cases, the Alphago is stronger through stronger computational power, but the rate of increase in the calculated unit slows with the performance becoming stronger.
I think Alphago is going to be very tough on small-scale tactics. It knows how to find the best of humans by many places and types, so it doesn't make obvious mistakes in a given range of tactical conditions.
However, Alphago has a weakness in global judgment. It sees the checkerboard through the 5*5 pyramid-like filtration, so that the integration of tactical small block into a strategic overall trouble, the same reason, the image classification of neural networks are often involved in one thing and another confused. For example, a pattern of go in the corner creates a wall or citation, which drastically alters the position value on the other corner.
Like other AI based on MCTs, Alphago for the need to be very in-depth reading to solve the trend of judgment, or trouble, such as the Big Dragon Life and Death robbery. Alphago to some deliberately seemingly normal bureau will also lose judgment, Tianyuan open or unusual formula, because many training is based on the human Chess Library.
I'm still looking forward to seeing Alphago and Li Shishi in the 9-part duel! My prediction is that if Lee uses the formula and is like a duel with other professional players, he may lose, but if he lets alphago into unfamiliar situations, he may win.