蒙特卡羅演算法在遊戲（圍棋）AI中的應用

最後更新：2018-12-03 來源：互聯網

上載者：User

創建阿里雲帳戶，並獲得超過 40 款產品的免費試用版；而企業帳戶則可以享有總值 $1200 的免費試用版。立即註冊！

我是在 aigamedev.com 上的2008年第17周的 RoundUp 裡看到這篇文章的推薦的，出於自己對中國象棋及其電腦博弈方面的興趣，雖然對於圍棋和圍棋AI一竊不通，但還是挺仔細地閱讀了這篇文章，覺得這裡的內容跟自己以前瞭解的電腦博弈方面的知識有不同。所以把它翻譯一下，為的是讓自己更好地理解其中的知識。本人英語甚差，如有譯錯，敬請賜教。另，本文的作者應該是中國人，真希望他以後也用中文寫寫他的研究所得。＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝＝Monte Carlo Method in Game AIs 蒙特卡羅演算法在遊戲（圍棋）AI中的應用Friday, April 25th, 2008 12:28 pm Written by: qqqchn作者：qqqchn翻譯：賴勇浩（戀花蝶）原文地址：http://expertvoices.nsdl.org/cornell-cs322/2008/04/25/monte-carlo-method-in-game-ais/As many of my classmates have posted, the Monte Carlo method isn’t actually any single method, but actually represents an entire class of methods which involve taking random samples to find a result. An interesting application my partner and I found for the Monte Carlo method was for one of the GO AIs we made for one of our other projects. (GO is an ancient Chinese Board Game that is still very popular today in East Asia, the rules and details can be found here)像我很多同學說過的，蒙特卡羅演算法不是一個演算法，而是一系列關於通過隨機抽樣來求解的演算法。我的 partner 和我發現了一個有趣的蒙特卡羅演算法應用：把它用在圍棋的人工智慧上。（圍棋是一種來自中國的古老的智力遊戲，直到今天在東亞仍然非常流行，參考這裡）One of the reasons we chose to use the Monte Carlo method was because the immense number of possible moves in GO made using the Minimax Algorithm (one of the more common methods used for finding the next ”best” move in many game AIs like chess by consecutively maximizing and minimizing the score for a player up to a certain depth, more details here) far too computationally intensive when looking at more than 2 or 3 moves ahead (looking only 4 moves ahead on a mere 9×9 board takes about 81^4 > 4 million board evaluations). An interesting quote illustrating the computational intensity of GO games on a full 19×19 board is that “the number of possible GO games far exceeds the number of atoms in the universe” (more details and derivation here) Interesting Facts: Lower bound on number of possible GO games on 19×19 board is about 10^(10^48) . Upper bound is 10^(10^171).我們選擇蒙特卡羅演算法的原因之一是圍棋中應用極小極大演算法（Minimax Algorithm，一種在棋類中常用的選擇“最佳”的下一步著法的演算法，參考這裡）來計算2步或3步之後的著法產生的計算量就非常巨大（在9x9的棋盤上計算4步著法就需要做81^4（大於4百萬）次盤面估值）。有一句非常形象的話來形象圍棋（19x19）的計算複雜度：遠大於宇宙中所有原子的個數（參考這裡）。實際上圍棋（19x19）的計算下限的 10^(10^48)，上限是10^(10^171)。So another way we used to evaluate how “good” a move is was to use the Monte Carlo method. What the Monte Carlo method does in this case to estimate how good or bad a certain move is for a given board position is to play “virtual games” illustrating what would happen if two Random AIs (AIs playing completely randomly) played out those moves. The way it does this is to start from this board position and play each of the viable moves in a fixed number of games with all subsequent moves being completely random. Then after all of the ”virtual games” are finished, we would average the total scores of each game and let it represent the “goodness” of the original move which spawned that game. Finally by choosing the move with the highest average score, the Monte Carlo AI would then play this move in the actual game itself, based on the assumption that the moves which score better over a large number of random games would be “better” moves in general.因此我們使用蒙特卡羅演算法來評估一個著法有多好（差）。蒙特卡羅演算法評估某一著法有多好（差）的方法是由兩個隨機AI（選擇的著法完全隨機）對一個給定的盤面下若干盤“虛擬棋”。從一個給定的盤面開始，然後對每一可行著法計算指定數量的後續著法完全隨機的“虛擬棋”。之後，我們統計所有可行走法的平均值，以反映出“好”的著法。最後是選擇有著最高的平均值的著法，蒙特卡羅AI在真正的棋局中應用這一著法。這是基於假設這一高分著法通常比其它的選擇產生的結局都要好來做的。For our project, we let our AI play about 500 virtual games for each move, which on slower computers actually can take a while, but it is still far faster than trying to use the Minimax Algorithm to look ahead just 4 moves (just over 1 million evaluations compared to 4 million +). In addition, the results of the Monte Carlo AI are pretty good as it can generally defeat most of our other AIs (Minimax AI looking 2 or 3 moves ahead and Random AIs), and it even put up a decent fight against some beginner human players as well.在我們的項目中，我們讓AI對每一個著法下500局“虛擬棋”。這也有不小的計算量，如果機器比較“破落”，可能需要計算挺長的一段時間。但它仍然比用極小極大演算法向前計算4步（計算量大約是9x9棋盤計算4步（約需評估4百多萬個盤面，見前文）的1百萬倍）要快得多。蒙特卡羅AI 的效果很好，它通常能夠打敗極大極小演算法AI（計算2或3步）和隨機AI，這樣的棋力跟初學圍棋的人類差不多。

本文最初發表於賴勇浩（戀花蝶）的部落格，http://blog.csdn.net/lanphaday，如蒙轉載，敬請保留全文完整，未經許可，不得用以商業用途。

Worth noting is that one very important factor for how well the Monte Carlo method works in this case is the scoring function which you use to decide a player’s score given a certain board position. The one we used which is very straightforward and relatively simple in that it just assigns an empty spot to whoever has the closest stones to that spot, with ties being broken by number of stones near it. This isn’t the most accurate or effective scoring method, but it worked decently well enough for our purposes.值得注意的是蒙特卡羅演算法依賴於一個很重要的因素，那就是對特定盤面的估值函數。我們用了一個簡單的函數：把空的點歸屬於最近的棋子，如果有多個棋子，則平分。它可能不夠準確和高效，但對於我們來說，已經足夠。The AI we developped using Monte Carlo methods was one of the better AIs we made, but it is still nowhere near the capabilities of a decently experienced amateur human player. Especially, the AI starts losing out near the end game when tactics mean a lot more than overall strategy (which Monte Carlo and Minimax seem to do well at). And the fact that we are using random moves to play each “virtual game” means that we can get very different results each time we play it, especially near the end game where results of moves really depend on the quality of subsequent moves, which in this case are completely random.我們開發的蒙特卡羅演算法AI是我們開始的AI中較好的一個，但它與訓練有素的棋手仍然相距甚遠。尤其在遊戲將結束時，戰術比策略顯得更為重要，AI 就容易輸棋（蒙特卡羅演算法和極小極大演算法都有這種問題）。我們使用隨機著法來下每一個局“虛擬棋”，所以我們每一次都會得到不同的結果。在將近結局的時候，最後的結果依賴於後續著法的品質，而在這裡後續著法是完全隨機的，所以效果差強人意。 GO is considered by many to be the most complicated game we know of to date, and it is very unlikely that we will be able to come even marginally close to solving the game anytime soon (want to even try writing out 10^(10^48)?). But it seems equally unlikely that people will give up on trying anytime soon either, as has been proven by human tenacity in the face of other “insurmountable” odds in the past (landing on the Moon…).圍棋被認為是目前為止最複雜的遊戲，而且我們不可能在很近的將來解決它。但大家都不會放棄，因為已經證明人類在面對“不可逾越”的問題上是堅忍不拔的（例如登月）。NOTE: when I said “random” in this post, I naturally mean the pseudorandom number generators computers use, which isn’t really random, but was more than close enough for our project.注意：本文中的“隨機”是指電腦使用的偽隨機數，而非真隨機，但從項目中來看已經不錯了。CITATIONS:引用http://en.wikipedia.org/wiki/Monte_Carlo_method http://en.wikipedia.org/wiki/Go_%28board_game%29 http://en.wikipedia.org/wiki/Go_complexity http://en.wikipedia.org/wiki/Minimax GO AI Project CS478 (Gordon Briggs, Qin Chen) -unfortunately not finished yet so don’t really have any statistics yet to cite-圍棋AI項目CS478（Gordon Briggs, Qin Chen）尚未完成，所以無法提供真正的統計資料。―――――――――――――――――――――――――――中文參考：蒙特卡羅演算法：http://baike.baidu.com/view/480343.htm圍棋：http://baike.baidu.com/view/1534.htm―――――――――――――――――――――――――――本文最初發表於賴勇浩（戀花蝶）的部落格，http://blog.csdn.net/lanphaday，如蒙轉載，敬請保留全文完整，未經許可，不得用以商業用途。

本文章原先以中文撰寫並發佈於 aliyun.com，亦設英文版本，僅作資訊用途。本網站不對文章的準確性，完整性或可靠性或其任何翻譯作出任何明示或暗示的陳述或保證。如對該文章有任何疑慮或投訴，請傳送電郵至 info-contact@alibabacloud.com 並提供相關疑慮或投訴的詳細說明。職員會於 5 個工作天內與您聯絡，一經驗證之後，即會刪除該侵權內容。

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

Get Started for Free

Sales Support

1 on 1 presale consultation

Chat Contact Sales
After-Sales Support

24/7 Technical Support 6 Free Tickets per Quarter Faster Response

Open a Ticket
Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.

Learn More