Go AI (2) Board Implementation

Source: Internet
Author: User
Code first published: publish. My goal is to enable my AI program to use the same algorithm to deal with the black and white games that go, Wuzi, and even played when I was a child. It does not need any chess knowledge. You just need to tell it the rules for playing chess. Our brain cells have known exactly what go is? They are just mechanical execution of their own functions, and hundreds of millions of cells stacked together will make humans play chess. The three chess boards mentioned above have some common characteristics: the Board is a grid consisting of parallel lines of n rows and n columns. The chess pieces are divided into two colors: black and white, A piece in a color. The two sides take turns, and each time the next sub-piece is to be placed at an empty intersection (Black and White seem to be in the grid, but there should be no essential difference ). Based on these features, we began to design the structure of the Board. 1. The bitboard is very keen to use the bitboard in go. Just like in chess, A 64bit number is used to describe a piece of chess. Although it can also be done on go, for example, using a 361bit number to describe the black game on the board, and another 361bit number to describe the white game, but no one has ever seen this. The traditional array is used to describe the chessboard. Each element of the array has three states: black, white, and empty ). Why is the computer not in triplicate? I used to think that if a computer is in triplicate, will it better describe go? Later, I found that there were not only three vertices on the board, but also an off_board, that is, a dot outside the board. Therefore, the checker is actually 4-in-order, and it fits well with a 2-in-order computer. How can I understand that off_board is also a state? We can look at the boundary of the board, and the boundary is off_board again. For go, a child usually has four breaths, however, when it comes to the border, it turns into three or two tones, just as if there are enemies outside the border. For wuziqi, if the other party rushes to the boundary, there is no need to block it, as if there is a piece outside the Board to block it. I assign a binary number for these states in this physical sense:

Empty00
Black01
White10
Off_board11

Here, empty has no pawns. black and white have two pawns respectively, while off_board has two pawns at the same time. When the pawns are close to it, it is represented by the other. The advantage of this is that I can use an 8-bit number to describe the neighboring points of a piece. 8-bit has a total of 256 cases, which is very suitable for the lookup table, I will be able to see the "gas" at the intersection in any situation. For the "gas" of the intersection calculation, another method used in lib-ego is, it only incrementally calculates the number of black, white, and empty conditions around the intersection (off_board is allocated to the black and white situations), regardless of the specific distribution. At present, I have not found the advantages of my methods, but I firmly believe that my methods are better than those in lib-ego because they conform to the same path. It seems that an 8-bit number can be used to store the status at four locations. Therefore, the entire Board requires 56 64bit numbers, not much more than chess, however, in the end, I did not implement the thought of the bitboard, because I felt that it was not natural, and I still chose the traditional array method. Ii. code optimization
Many people have pointed out that optimization should be done later. However, for a piece of code that has been optimized, it is difficult to understand the meaning of some code if you do not understand the optimization methods. 1. Replace variables with compile-time constants. For example, the size of the Board depends on the Coordinate Calculation of the chess piece, and the allocation of multiple spaces for some structures is also related to this amount. To avoid computing these things at runtime, we can use macros or const int to define them:
  1. Const uint board_size = 9;
But we want the program to run on 9, 13, and 19 boards, and can change the Board while running, so I use the template. The basic checkerboard structure is similar to the following:
  1. Template <uint T>
  2. Class Vertex {
  3. Uint idx;
  4. Public:
  5. Const static uint cnt = (T + 2) * (T + 2 );
  6. };
  7. Template <uint T>
  8. Class Board {
  9. Public:
  10. Static const uint board_size = T;
  11. Color color_at [vertex <t>: CNT];
  12. };

Vertex indicates the intersection of the board. Vertex's internal implementation does not need to be implemented in a way similar to class CPoint {int x; int y ;};, but uses only one integer to represent coordinates, because in many cases, processing a one-dimensional array is faster than a two-dimensional array, although theoretically they are the same. 2. control loop if such macro definition is displayed in the code

  1. # Define color_for_each (COL )/
  2. For (color Col = 0; color: in_range (COL); ++ col)

C ++ should not be eager to reject vertex_for_each_all and vertex_for_each_nbr, (I know that in C ++, there is an "elegant" way to implement for_each without relying on macros, and I also know that this has brought about a dialect). Please first consider why for_each is needed. First, we do not want to see a large number of for (;) statements in the code, because it will make the code line ugly and it will be difficult to modify later. Secondly, we have the need to choose whether to expand cyclically based on the situation.

  1. // The so-called loop expansion is, the normal code is as follows:
  2. For (INT I = 0; I <4; I ++) {code ;}
  3. // The code for loop expansion is:
  4. I = 0; Code;
  5. I = 1; Code;
  6. I = 2; Code;
  7. I = 3; Code;
The Efficiency Improvement of loop expansion cannot be generalized. It is related to the length of the code block and the number of loops, but macros give us the ability to control. I don't know what simple methods can be done except macros. 3. Avoid the condition statement because it affects the CPU Instruction Cache hit rate. A well-known example of replacing conditional statements with bitwise operations is:
  1. Player other (player PL ){
  2. If (PL = black) return white;
  3. Else return black;
  4. }
The bitwise operation is like this:
  1. Player other (player PL ){
  2. Return player (PL ^ 3 );
  3. }
It is assumed that black is 1 and white is 2. If black is 0 and white is 1, the code should be changed to (pl ^ 1 ). However, in this example, there is no change in the efficiency on my CPU. Before there are no convincing examples, let alone doubt. 4. You need to be clear about how to control inline. inline may not increase the running speed. As an example, replace the no_inline modifier in front of the play_eye function in the Code with all_inline (indicating that it is always inline), compile and run it again, and check that the consumed time has doubled. Why? This function is called in the following scenarios:
  1. If (...){
  2. Return play_eye (player, V );
  3. } Else...
In actual operation, play_eye is not frequently called. if it is inline, the previous if statement determines if it is not the branch of play_eye, this will cause the instruction pointer to skip a long piece of code to reach the following branch, so the instruction cache will become invalid. You may say that modern compilers can do well without worrying about these details. Well, in fact, I only recommend that you manually specify whether to inline In the bottleneck, which may lead to unexpected performance improvement. (Note that the keyword "inline" only recommends that the compiler inline. The Compiler does not guarantee this, but the compiler usually provides additional instructions for you to precisely control whether or not to inline .) 5. Do not use the lookup table instead of the computation. Because the table usually exists in the memory, and your commands are placed in the instruction cache of the CPU, if one or two commands can calculate the result, you may not be able to go to the table. Iii. Class Design
In general, the class indicating the rule and the Board is implemented as a class. If the rule and the board are separated, the application code can create a board class, add different rule classes as required, as shown in the following code:
  1. Board <t> Board;
  2. Board. Attach (New gorule <t> ());
  3. Board. Play (...);

Looks elegant, right? But before deciding how to design the class structure, Let's first look at two performance requirements: 1) the reason for not using a virtual function is that, apart from the space overhead of the virtual function table, in addition to the several machine commands added during the call, virtual functions make it difficult for the compiler to implement inline. Because Virtual functions are bound late, the runtime determines which function to call, the C ++ compiler generally only supports inline during the compilation period. 2) the Board can be quickly copied. Remember, our purpose is to allow the Board to simulate a lot of random board games. Each random board game should be copied to the original Board, if you copy a board at a high cost, the simulation efficiency will be very low. Now, we have to reject the above Code because we cannot create a new rule class, which will damage the quick copy capability of the Board. The fastest board copy code I can think of is memcpy, if the data member of the board contains pointers, the board generated by memcpy may be faulty. What about inheritance? We define a Board interface, that is, pure virtual class, and then inherit from this interface. This is a common elegant solution, but virtual functions are used. In addition, a single inheritance will cause too many classes. For example, I have a basic BasicBoard class. Now I want to implement the neighbor count function, so I wrote an NbrCounterBoard to inherit from the BasicBoard class, our GoBoard can be inherited from NbrCounterBoard. Go also needs to calculate the hash value of each piece of chess to determine whether the situation is repeated. So I want to implement a ZorbistBoard, which inherits from NbrCounterBoard, and the final GoBoard inherits from ZorbistBoard. Black and white games do not need to calculate hash. They can be inherited directly from NbrCounterBoard, and do not need both features of wuziqi, so it inherits directly from BasicBoard. Everything sounds perfect, but it's just luck. If there is a game that requires hash but does not require neighboring computing, this design will be over. Can a combination be used? Of course. See the following:

  1. Class goboard {
  2. PRIVATE:
  3. Zorbistboard ZB;
  4. Public:
  5. Basicboard BB;
  6. };
  7. // If you set the combination class to private, you need to transfer many functions.
  8. Void goboard: Foo () {return ZB. Foo ();}
  9. // If it is set to public, it requires a very cool call form
  10. Goboard;
  11. Board. BB. Bar ();
  12. // Even worse,
  13. // If zorbistboard: Foo () needs to call basicboard: bar (), will you write code like this?
  14. Void zorbistboard: Foo (goboard * pgb ){
  15. Pgb-> BB. Bar ();
  16. }
  17. Void GoBoard: foo () {return zb. foo (this );}
As we can see, this code looks coorse. This leads me to multi-inheritance from the combination:
  1. Class GoBoard:
  2. Public BasicBoard <GoBoard>,
  3. Public ZobristBoard <GoBoard>
  4. {
  5. };

I used the ATL library for reference and passed the GoBoard as a template parameter. In this way, when ZobristBoard needs to call the BasicBoard method, you can do this:

  1. Template <typename Derive>
  2. Class ZobristBoard {
  3. Public:
  4. Void foo (){
  5. Derive * p = static_cast <Derive *> (this );
  6. P-> bar ();
  7. }
  8. };

4. Simulated communications
In this way, we conduct a simulated game: The two sides take turns playing chess with the permission of the rules. If one party does not have a game, it will pass, and the two sides will terminate the game when they pass the game in succession. For go and black and white games, this process is adaptive. For Wuzi games, we need to add the judgment of winning the middle game. In fact, in GO games, we can also use the middle game to speed up the simulation, that is, if one party has obvious advantages, it does not need to final the Double pass. First, let's take a look at how the go rule is implemented. The three major rules of go, namely, the lifting of the sub-game, the robbery, and the ban, create the complexity of go. If there is no suggestion, the two sides will return the same result in any case. If there is no robbery, mutual confrontation between the two sides makes it impossible to final the situation. The ban is the same, that is, the global ban can be seen as a general situation of robbery, but also to prevent the board from being terminated. In another case, the game cannot be terminated, that is, both parties fill in their own bits. Although in theory this situation can be banned from the same rule, however, I cannot wait for the day when the game ends. What's more, this kind of failure is not worth considering in the game program because I hope that the other party will fail. Therefore, in our random simulation, we need to add an unspecified rule. There is also a branch in the submitter, that is, the submitter, that is, the submitter. In general, it is not allowed to commit suicide in the competition, but it seems to be allowed in the yingshi rule. In the simulation, it is imperative to prohibit the suicide of a single chess piece, because this will also lead to an inability to final the game (the same as above, this situation can be banned from the same restrictions, and the same issue will be banned later ), but do I want to prohibit the multi-child suicide in simulation? There is no prohibition in lib-ego, but I found that the simulated victory rate caused by the prohibition or non-prohibition is different. To make the simulated match closer to the actual rules, I chose to prohibit multiple subsuicide attempts, although this requires more computing. In this way, in the simulation, the following five rules are required: withdrawal, robbery, no eye filling, no suicide, and no. In theory, we only need to implement the same two rules. 1. If the same rule is to be disabled, we need to record a hash value for each piece of chess. To reduce the possibility of conflict, we generally use a 64-bit hash value, then, if the hash value is the same as the previous hash value, cancel this step. On average, a single game is about no more than 1000 steps, so binary search can quickly judge duplicate hash. But how can we revoke a single game? You must know that go has a player. If you have a player in this game, you need to release the player when you cancel the game. It is a way to remember the positions of removed pawns each time. Lib-ego adopts a simple and inefficient method: whether it is to judge whether it is repeated or undo, the entire game board is re-played based on the history. I initially thought this method was too inefficient, but I figured it out later, because it only stored the history step and computed the hash. In fact, this indicates that, instead of implementing the ban on the board in the simulation, the ban is only used to judge the real game. Even further, I don't think it is necessary to simulate historical chess steps and hash computing. In reality, there are few global similarities in the game, and the overhead of detecting global similarities is too large. In the simulation, we set the maximum number of chess games, all simulated games that exceed this step are discarded, so that the same ban is bypassed. 2. In order to effectively judge the chess pieces, I used the "pseudo-gas" technique. As long as there is an empty intersection, every piece around the intersection can get a sigh of relief, which is called false. Take the figure as an example, but in pseudo-breath, there are two ports, because the empty point is connected to two sunspots, and each sunspot has a breath. According to the pseudo-gas calculation method, each sub-item loses four breath points in the upper and lower sides, and adds four breath points for each sub-item. With the pseudo-pneumatic tool, it is much easier to calculate the sub-keys, and the chess string with the pseudo-pneumatic value 0 is removed from the board. How can I get a chess string? We implement the chess string as a circular linked list. At the beginning, a single chess piece is connected to itself at the beginning and end, and has a chess string id (taking its position as the id value). If the two chess pieces are adjacent, and the chess string id is different, then merge them into a chess string. Because they are all cyclic linked lists, the merging process is equivalent to breaking two rings and then connecting them to a larger ring. Therefore, the merged result is still a cyclic linked list. 3. A simple method is used to determine the robbery: if only one child can be raised in the eye of the other, the position of the sub-account to be removed is recorded as a location for the competition, and the location for the sub-account is cleared before each sub-account. That is to say, as long as the sub-account is not in the competition, the pass or the Pre-competition location will be cleared, the next position is allowed. 4. The current go player should know how to judge the true and false eyes. When the opponent occupies two "shoulders" or "shoulders" at the corner of the board ", the eye is a false eye. When we simulate it randomly, as long as the eye has not been diagnosed as a false eye, we will not look down. There will be misjudgment here. For example, the two eyes of BaiQi are regarded as fake eyes according to our rules, but the two eyes of BaiQi are active ones: please refer to the following link for more information: ○ ●●●●○ ●●●○ ○ ●● ● failover does not matter, we disable eye filling to make the final game possible in most cases, rather than prevent computers from killing live games. 5. The suicide list determines that when playing a game in the eyes of the other party, the air in the upper left and right sides of the chess string is reduced by 1. If there is no chess string, the air is equal to 0, so this is a suicide behavior. We put the gas back and deny it to the next hand. For example, Bai Qi commits suicide at point A and B. Please refer to the following statements for suicide, when getting up or down at a point where there is no air, first reduce the air on the left and right sides of the chess string by 1, and then judge that if the opponent's chess string is not set to 0, it does not make at least one of our chess strings not 0, so this is a suicide, and we will add it back. For example, in black games, point A is suicide, and point B or point C is not suicide. ○ Minimum ○ The key point here is to make a judgment before merging the chess string, this is because it is inconvenient to split the string once it is merged. Compared with go, the implementation of five-game rules is much easier. Just follow the merging algorithm of go-game strings and create two chess strings in four directions, determine whether the length of a chess string in four directions is greater than or equal to 5. I will not consider the professional rules of wuziqi, such as the ban and three-hand exchange of Five-hand and two-playing games. After all, there are programs that are cool like Blackstone's. 5. Next Step
Naturally, the UCT algorithm is introduced, or UCG, that is, UCB for Graph.

Related Article

Contact Us

The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion; products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the content of the page makes you feel confusing, please write us an email, we will handle the problem within 5 days after receiving your email.

If you find any instances of plagiarism from the community, please send an email to: info-contact@alibabacloud.com and provide relevant evidence. A staff member will contact you within 5 working days.

A Free Trial That Lets You Build Big!

Start building with 50+ products and up to 12 months usage for Elastic Compute Service

  • Sales Support

    1 on 1 presale consultation

  • After-Sales Support

    24/7 Technical Support 6 Free Tickets per Quarter Faster Response

  • Alibaba Cloud offers highly flexible support services tailored to meet your exact needs.