Research on hardware configuration and algorithm of Alphago in Man-Machine war

Last Update:2016-03-17 Source: Internet

Author: User

Developer on Alibaba Coud: Build your first app with APIs, SDKs, and tutorials on the Alibaba Cloud. Read more ＞

Hardware configuration of the Alphago

Recently Alphago and Li Shishi in full swing, about the fourth set of Lee Shishi the hand is no longer within our scope of discussion. We focus on the following Alphago hardware configuration:

Alphago has multiple versions, the strongest of which is the distributed version of Alphago. The distributed version (AlphaGo distributed) uses 1202 CPUs and 176 GPUs, and can have 40 search threads, according to DeepMind employees ' papers published in the January 2016 Nature journal.

There are various versions of Alphago hardware configuration on Wikipedia:

The last column is the grade, which represents the level of the paper at the time of submission (November 2015).

The following is the March 14, 2016 Goratings released a global ranking table, Alphago is also listed in the fourth place, you can see the strength of each version Alphago.

Hardware configuration Google officially did not give a clear explanation, according to the Parties reported to Li Shishi against the strongest two distributed "Alfa Dog" one:

-(1920x1080 CPUs + 280 GPUs with 64 search threads)

-(1202 CPUs + 176 GPUs with 40 search threads)

There is a Twitter user mapping:

From this point of view, it is understandable that Korean protests against Li Shishi are unfair to the people.

The algorithm structure of Alphago

This article tries to use the simplest method to tell the Alphago algorithm, and to understand how Alphago plays chess.

Alphago's technical overall architecture if one sentence is summed up: The deep CNN Neural network Architecture combines the Monte Carlo search tree (Monte Carlo tree).

Deep learning neural Networks train two lazi strategies and a situation evaluation model, the neural network architecture of the three strategies is basically the same, this is the parameter is different.

Two Lazi Strategies: SL (supervised-learning Policy Network), RL (Reinforcement Learning Policy Network).

Lazi Strategy SL is through the learning of human chess game, to simulate a given current game situation, how people lazi ideas, this is purely learning human chess experience, its learning goal is: Given a game form, how people will lazi? So Alphago through the human chess game to learn these Lazi strategies, In other words, the SL strategy learns to take the next move like a human.

( numbers indicate the possibility that a human player will be in that place. )

Lazi strategy RL is through Alphago himself and his own chess to learn, is the SL Lazi strategy based on an improved model, the initial parameters of the RL strategy is the SL lazi Strategy learning parameters, that is, it is the SL Lazi strategy as the starting point of learning, and then through their own and their own chess, to evolve a better self, Its learning goal is: Not like the SL Lazi strategy just learn next how to go, but to two Alphago constantly lazi, until the outcome of a certain game, and then adjust the parameters of the RL strategy according to the outcome of the situation, so that RL learn how to find a series of winning chess and related to the current game and corresponding Lazi, Is that its learning goal is to win the whole game, rather than just predicting the next lazi like the SL strategy.

The Position Evaluator Value network uses a similar deep-learning network structure, except that it is not learning how to Lazi, but given a chess disc, learning from this disk, and finally winning the winning game of how High, So its input is a chess disc, through the study output a score, the higher the score represents from the board, then the likelihood of winning chess.

(The situation assessment is how to look at the chessboard.) Dark blue Indicates the next step to win the position of chess )

With the above three deep learning strategies, Alphago introduced these three strategies into the Monte Carlo search tree, so its overall architecture is still a Monte Carlo search tree, but in the application of the Monte-Carlo search tree in several steps integrated deep learning learned Lazi strategy and disk evaluation.

In the fourth inning of Alphago and Li Shishi, when Lee came out with 78 hands of God, Google DeepMind's Hassabis said:

@demishassabis 26m26 minutes ago

Lee Sedol is playing brilliantly! #AlphaGo thought it is doing well and got confused on move 87. We are in trouble now ...

@demishassabis 7m7 minutes ago

Mistake was on move, but #AlphaGo only came to that realisation on around move 87

Simply put, the dog did not immediately recognize the threat of 78, until the 87 hand only to find a decline in winning. This indicates that the dog did not know that he was bad after several steps, and that there was no subsequent development in the calculation results.

This is not a dog's bug, is the standard number of wins and fewer, this should not be a bug but the value network and Policy Network has yet to be perfected.

Reference documents:

https://www.dcine.com/2016/01/28/alphago/

http://www.afenxi.com/post/8713

http://geek.csdn.net/news/detail/59308

http://blog.csdn.net/malefactor/article/details/50631180

Http://www.leiphone.com/news/201603/Q1cWFZjnGl1wc4m1.html

Research on hardware configuration and algorithm of Alphago in Man-Machine war

This article is an English version of an article which is originally in the Chinese language on aliyun.com and is provided for information purposes only. This website makes no representation or warranty of any kind, either expressed or implied, as to the accuracy, completeness ownership or reliability of the article or any translations thereof. If you have any concerns or complaints relating to the article, please send an email, providing a detailed description of the concern or complaint, to info-contact@alibabacloud.com. A staff member will contact you within 5 working days. Once verified, infringing content will be removed immediately.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More

Research on hardware configuration and algorithm of Alphago in Man-Machine war

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support

Research on hardware configuration and algorithm of Alphago in Man-Machine war

Contact Us

What's Trending

Top 10 Tags

Top 10 Keywords

Trending Topic

A Free Trial That Lets You Build Big!

Sales Support

After-Sales Support