AlphaGo is indeed a big event

Source: Internet
Author: User

AlphaGo is indeed a big event

Transferred from: HTTP://WWW.JIANSHU.COM/P/157A15DE47DF

words 3797 Read 696 comments 0 likes 4

Michael Nielsen, source address: https://www.quantamagazine.org/20160329-why-alphago-is-really-such-a-big-deal/

The Go program depicts the elements of human intuition, which is a progress that can have far-reaching effects.

In 1997, IBM's Deep Blue system defeated the chess world champion, Garry Kasparov. At the time, the victory was widely regarded as a milestone in the development of AI. But deep Blue's technology is only useful to chess, not to promote. Computer science has not revolutionized it.

Is there anything special about the AlphaGo that recently defeated the strongest chess player in history?

I believe the answer is yes, but not in accordance with the reasons that you might hear. Many articles suggest that chess is more difficult than chess, which makes the victory seem even more impressive. Or some say we don't think that machines can defeat humans in go in 10 years, so this is a major breakthrough. Some articles give correct observations, there are more sexual positions in Weiqi, but they do not explain why this is a more difficult problem for machines than for humans.

In other words, none of these points solves the core question: is there a broader impact of successful technological advances in AlphaGo? To answer this question, we must recognize that AlphaGo's technological advances are more important than the qualitative leap in the technology that makes deep Blue successful.

In chess, beginner players are taught the concept of the value of a pawn. In a system, a knight or elephant is worth three strokes. And the car, because can cover the movement range is very large, the value of five strokes. Then the Queen has the largest range, with a value of nine strokes. The King has an infinite value, because losing the king loses the game.

You can use these values to evaluate a viable walk. Abandon a car that eats like an opponent? It's usually a good choice. Abandon the knight and the elephant for the opponent's car? That's not a good choice.

The concept of value in computer chess is very important. Most computer chess programs search for a combination of millions or billions of of children. The goal of the program is to find a walking sequence that maximizes the value of the checkerboard state of the final program, regardless of the opponent's walking condition.

Early chess programs use the rules given above to evaluate the status of a chessboard. But later the program uses more detailed chess knowledge. Deep Blue combines over 8,000 different factors in the function of evaluating the checkerboard state. Deep Blue doesn't just say one elephant equals five strokes. If the same party's stroke is like the front, the stroke actually restricts the movement of the elephant, so that the value of the elephant itself is lowered. If the stroke is captured, it means that he can open the car by capturing an enemy stroke, deep Blue will be seen as translucent, and will not lower the value of the car too much.

The idea of relying on detailed knowledge like this is critical to deep Blue. According to their team's technical reports, this translucent pressure has played a key role in their second game with Kasparov.

In the end, the deep Blue team used two main ideas. The first one is to build a function that can use a lot of detail in chess just to evaluate any given checkerboard state. The second is to use powerful computational resources to evaluate many possible locations and choose the best final checkerboard state of the walk.

So what about going to go with this strategy?

Using such a strategy you will quickly get into a difficult situation. The question is how to evaluate the board state. Top chess players use a lot of intuition to judge the quality of a particular board state. For example, they would make a vague description of a checkerboard position as a "good form". And it won't be a very clear way of expressing intuition like chess.

Now you might think that it's just a lot of time and effort to get a good evaluation of the chessboard state. Unfortunately, there is no obvious way to be as successful as chess, so the Go program has been relatively sluggish. All the changes come from the advent of the 2006 Monte Carlo Tree Search algorithm, which MCTS based on a more intelligent stochastic simulation game. But this approach is still far from the power of human players. So it seems that a sense of intuition about the state of the chessboard is the key to success.

The new and important thing about AlphaGo is that people have devised a way to portray the conceptual aspects of intuition.

In order to explain its working mechanism, we describe the AlphaGo system first, the main content of the AlphaGo team published in this January's paper (the details of the system and AlphaGo are different from the Li Shishi game, but the main principle is consistent).

AlphaGo The human player's 150,000 game data, using artificial neural networks to discover the patterns. In particular, it learns to predict the probability of a human player walking in any given position (state). AlphaGo's designers then re-compete with their earlier versions to improve the performance of their neural networks and constantly adjust their networks to gradually improve their chances of success.

So how does this strategy network learn to predict good walking?

In short, neural networks are a very complex mathematical model, with millions of of parameters that can be adjusted to change the behavior of the model. When I say that the internet is "learning", I mean that computers are constantly making small adjustments to the parameters of the model, trying to find a way to make a small progress in the game. In the first stage of learning, the network tries to improve the probability of making the same walk as the human player. In the second stage, the network will try to improve the probability of winning the game in self-play. This looks crazy--repeating minor tweaks to a fairly complex function--but if it lasts long enough to learn, and with enough computing resources, the network will behave better. In addition, there is a peculiar phenomenon: The reason why the network is getting better is no one understands, because these optimizations are the result of billions of tiny automatic adjustments.

After these two training processes, the strategy network can be a good game of chess, and the level of amateur chess players may be comparable. But there is still a big gap from the professional level. In a sense, this is a move that does not search the future of the child and does not evaluate the result checkerboard state. To go beyond the amateur level, AlphaGo needs a way to measure the state of the chessboard.

To overcome this obstacle, the designers have developed the core idea of AlphaGo--the strategy network and its own game, to obtain a given chessboard state is the probability of victory is estimated. The probability of victory provides a way to evaluate the state of a chessboard. (In fact, AlphaGo uses a more complex implementation) and then AlphaGo combines this view with the search for many possible sub-processes, placing the search on a path that the strategy network considers more likely. Then choose the one that gives the highest board status evaluation.

We can see from this that AlphaGo is not starting with a system of evaluation based on a lot of go details, as deep Blue does for chess. Instead, by pre-analysing the game into censored and introducing quite a bit of self-play, AlphaGo constructs the strategy network in a way that makes small improvements by billions of of tiny tweaks. Then, the policy network helped AlphaGo build a concept that depicts what the human chess player called the intuition about different checkerboard states.

In this way, AlphaGo is more groundbreaking than deep Blue. Because of the early development of computers, computers have been used to search for ways to optimize existing functions. Deep Blue's view is simply that the goal of the search is to optimize the functions that are mostly represented by the existing chess knowledge, albeit complex but formal. Of course, the way to complete the search is also very clever, but compared with most programs in the 1960 's is no different.

Although the search method is more intelligent, but AlphaGo also used the idea of search and optimization. But what's new and unusual here is the use of neural networks in the early stage (prior stage) to learn the function functions that help characterize a good checkerboard state. By combining these two parts, the AlphaGo can reach its present state.

The ability to reproduce intuitive pattern recognition is really important. This is also part of a broader trend. In an earlier paper, the Google DeepMind team built the neural network to play 49 classic Atari 2600 video games, which in many games reached the level of more than human expert players. The conservative view of the problem is similar to that of deep Blue: Human programmers analyze each game and then give the game a detailed control strategy.

In contrast, DeepMind's neural network simply explores a lot of ways to play games. At first, the network and the human beginner very much like, play very bad, completely is blind play. But the network occasionally will give a few wonderful operations. It learns to recognize good gameplay--to be able to get a high score--which in fact resembles the way AlphaGo learns a good checkerboard state. And when this happens, the network strengthens the behavior and constantly increases the level of playing games.

This ability to gain intuition and recognition patterns has also been used in other scenarios. In 2015, Leon Gatys, Alexander Ecker and Matthias Bethge published a paper on ArXiv, describing a learning style using neural networks and the ability to apply that style to other images. The idea is very simple: the web will show a lot of pictures and get the ability to recognize images of similar styles. You can then apply the style information to the new image. For example, you get a composite picture of the right side of the picture of the Van Gogh painting that you put in the photo of the Eiffel Tower on the left.


Neural Art

This is not a very good art, but it is really a great example of how to portray a neural network to describe intuition and work in various fields.

Over the past few years, neural networks have been used to characterize intuition and recognition patterns in many areas. Many of the projects that use these networks have emerged, including tasks such as identifying artistic styles or developing video game strategies. But there are also amazing examples in different fields such as audio and natural languages.

Because of this diversity, I think AlphaGo is not a revolutionary breakthrough in itself, but it is more of an extremely important development at the forefront: the ability to build systems that portray intuition and learn pattern recognition. Computer scientists have been trying this task for decades, but they have not made much headway. But now, the success of neural networks shows the potential to expand the range of problems that can be solved with computers.

It is dangerous to say that universal AI will appear in a few years ' time in a frenzy of cheers. In short, let's say you break down the way you think into logical computers that are capable of doing it and "intuition." If we look at AlphaGo and similar systems as evidence that computers can mimic intuition, it seems that all the necessary foundations have been formed: computers can now perform logic and intuition. Then it is certain that universal AI is not far away!

But here's a rhetorical mistake: we're going to put a lot of mental activity into "intuition." But just because neural networks can portray certain types of intuition and think that it works on all types of intuition is inappropriate. Perhaps neural networks have little effect on certain tasks that we think need intuition.

In fact, there are many aspects of our understanding of neural networks that are lacking. For example, a 2014 paper described some "opponent samples" that could deceive a neural network. The author begins with a well-behaved neural network model. It seems that such a neural network has the ability to characterize pattern recognition. But their work shows that by making small changes to a picture, the neural network can be deceived. For example, the neural network in the image below can correctly identify the puppy on the left, but if you add a small disturbance in the middle breakout, the image network on the right side of the picture cannot be correctly identified.


Competitor Sample

The limit of another existing system is that they usually need to learn a lot of human samples. AlphaGo, for example, learns from 150,000 races of humans. That's a pretty big number! However, humans can learn very much from very few races. Similarly, a network that recognizes and operates images typically requires millions of of sample images, each with corresponding callout information. So the important challenge is to let the system not need less auxiliary information from a small number of human-provided sample data to concentrate on learning.

Systems such as AlphaGo are really exciting. We have learned to use computer systems to reproduce certain forms of human intuition. Now we also face many great challenges: extending the range of intuition that computers can express, making systems more stable, understanding the principles and mechanisms of their work, and learning better ways to combine these models with existing computer systems. Can we soon learn to portray the intuitive judgment that gives a mathematical proof, a story, or a good explanation? AI is now the brightest moment.

AlphaGo is indeed a big event

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.