In the first half of this year, after Alphago many times to upgrade the Human master, Texas Poker finally ushered in its "broken-wall people"-Artificial intelligence libratus. January 30, the AI libratus from CMU beat the top Texas poker players. However, in the game, the creator of the Libratus is secretive about how it works, and the technical details of the libratus are rarely exposed for six months. Until this month-first on the nips to get the best thesis award, then 15th "Science" magazine published its pre-printed version of the paper, the full perspective of Libratus technical details. and Libratus creator Tuomas Sandholm and Noam Brown is active to appear Reddit, in the United States East Coast time 18th 9 o'clock in the morning (Beijing time 18th 10 o'clock in the evening) on the question of the libratus of netizens launched a Super quiz AMA (ask Me Anything). The following direct dry goods.
Tuomas Sandholm
Noam Brown
Two people are handsome.
Warm tip: The following questions and answers about 9400 words, mainly divided into 4 parts, respectively, are libratus existing technology and variants, libratus with other algorithms, libratus future development, Libratus team own situation.
Libratus existing technologies and variants
1. Question: As far as I know, Claudico in 2013 in the human team failed very thoroughly ... Libratus and Claudico Compare, AI program has what improvement/adjustment. How Claudico's failure affects the new strategy of libratus, where these improvements are manifested. (Small note: Claudico is the previous generation of Libratus)
Tuomas Sandholm:claudico was a match between April and May against human opponents in 2015, not 2013 years. Claudico to 9bb/100 speed to the human, and Libratus to bb/100 Speed defeated the human.
Libratus has new algorithms in three main modules:
1. Before matching, a new and better balance search algorithm is used to calculate the blueprint strategy.
2. A new secure and nested game solving technique. Claudico's Final solution is neither secure nor nested.
3. A self improvement module calculates a more approximate Nash equilibrium in the state space where the opponent has discovered a potential vulnerability in the AI strategy,
2. Question: What is the Libratus's poker style that makes professionals very interested or surprising? I heard Polk mention that it played some very rare bets on poker, such as the overbets strategy of frequent but well-balanced.
Noambrown: Of course, here are some of the interesting things:
1 AI uses a number of different bets and can achieve an effective balance between these bets. Humans usually use only one or two of the number of bets.
2 The AI uses a hybrid strategy (different actions for different probabilities). Humans tend to use pure strategies. So humans find it difficult to estimate the "range" of AI bets at key points, because the number of AI bets can be arbitrarily sized.
3 AI uses a number of unusual bets. In particular, when it bets very large, it can simply make human players difficult. I've heard some poker pros say that since the challenge of Libratus and the human race, this has become more common among the top players, largely because of Libratus's success with the larger number of bets.
3. Question: Do you really beat them or are you judged to be winning in the statistical sense of error?
Tuomas Sandholm:libratus very obviously defeated the human, not the statistical sense of error in the range of the results of the determination. Specifically, Libratus defeated humans with a statistically significant 99.98% (ie, p = 0.0002, 4 times sigma statistically significant).
4. Question: If we run the libratus on a non-supercomputer (or just a weaker unit), by combining similar operations and simplifying the decision tree, you think we'll see a lot of difference compared to the results of libratus running on supercomputers. Will there be too much difference or suboptimal results.
Noambrown: Before the game, we didn't know how hard it would be to beat the top players. Nor have we tried to predict what resources we need to have, but to make the most of all the resources we can. Therefore, the supercomputer was selected. My guess is that if you change your PC, you can still perform very well on it. The percentage of bb/100 shows that supercomputers are definitely more than sufficient. It's true that you said you had to give up some accuracy and reduce the number of bets, but I don't think it's going to be a huge cost.
I think that as these technologies improve, the cost of computing will also fall. We have seen great strides in artificial intelligence in imperfect information games, and there is no reason to think that these advances will slow down in the years ahead. I think within 5 years, we will see an AI that runs on smartphones and is as powerful as libratus.
5. Question: What does the word "safe" mean in the article?
Noambrown: The probability that the AI is not defeated in theory.
6. Question: If you change the play of poker, AI will win it.
Noambrown: This is a very good question. Based on my research and discussion with other AI developers in this field, I believe that all the popular poker variants now have super AI that is hard to beat by human players. Omaha Poker will not be an AI opponent, not even Omaha, a 9-person player. (Small note: The Omaha poker Game (Omaha Poker) is a poker game similar to Texas poker. )
One of the most effective ways to design an AI-hard game is to introduce some kind of semi cooperative element. such as settlers of the Catan of the transaction process or the negotiation process in diplomacy. There may be some kind of element that lets you swap cards with other players. Of course, if the game is still poker, it's not necessarily the case. At present, there is no real successful way to deal with the principle of semi-cooperative game. But I think it's going to be a very interesting research direction, and I think it will take at least a few years to see the very good performance of AI in this kind of game.
7. Question: If you test this program in front of a bunch of players who are looking for the lowest risk rather than chasing the biggest gains.
Noambrown: Our AI model is to estimate a Nash equilibrium, not to see how the opponent is playing, so low-risk players do not "confuse" AI in any way you think. I also do not think see Ai's winning percentage from 50bb/100 into 100bb/100 will have what meaning.
8. Question: Libratus does not use depth learning techniques. Is this intentional? Or do you just end up not using deep learning? It is still said to have tried, but the depth of study has no effect. Considering the success of Deepstack, you will consider using deep learning.
Noambrown:libratus does not use any depth learning-related techniques. We hope this will help people realize that there is more to AI than deep learning. And deep learning itself is not enough to play good poker games.
In other words, the technology we are introducing is not inconsistent with deep learning. I'll describe them as substitutes for MCTs. Deep learning is not particularly necessary for games like poker. But I think for some other games, some kind of function approximation is quite useful.
Deepstack uses deep learning techniques, but it's not sure how effective it really is. For example, it didn't beat the top poker ai before. Also, I think Deepstack is doing a pretty good job because it uses nested game solutions that are independent and developed by two teams. But that doesn't require deep learning. Libratus is actually using a more advanced version of the nested game solution, plus some other good things, and then finally bring a powerful performance.
9. Ask: Why did you not finally implement intensive learning in your model? This seems like a natural thing.
Noambrown: We used the variant of the virtual regret minimization algorithm (CFR) in Libratus. In particular, we use the Monte Carlo CFR method to compute the blueprint strategy, while CFR + is used in the real Time child game solution. CFR is a self-game algorithm similar to reinforcement learning, but CFR takes into account the benefits of hypothetical actions not selected during the ego game. The existence of CFR is actually a pure reinforcement learning variant, but it takes a long time to find a good strategy in practice.
10. Question: Have you ever thought of trying to try a 6-person table of Texas Poker (6-max games).
Noambrown: To be brief, all the current techniques are good at 6 people poker. But I think more than 3 games will be a more interesting scientific challenge, but poker is a wrong area. There are other games that are more appropriate.
In a detailed answer, the game of two players above presents many interesting theoretical and practical challenges to the existing technology. For a new game, the approximate Nash equilibrium is very computationally inefficient. Even if you do, you're not sure you want to make that decision. In a two-person zero-sum game, Nash can guarantee that no matter what your opponents choose, you will not fail in the expected results. But in the game of 3+ players, this is no longer the iron law. You can play cards according to Nash equilibrium, but you may end up losing. So we need new technology to deal with 3+ players, and we need to decide how to evaluate the performance of the models in these games.
That is to say, all the technology we have now can perform well in 3+ player's poker, mainly because of two reasons:
1 in 3+ above the player game, many people will fold early, so in fact, most of the time the players quickly become 2 people.
2 in the game of players above 3+, there is basically no chance of cooperation. You can't work with one player to get rid of another player. To do so would be judged as conspiracy and would violate the rules of the game.
For these reasons, the people I surveyed who developed the poker AI as sparring tools told me that these techniques were also excellent in 6-max, and basically every poker variant that played online. It is also not feasible to make a meaningful competition in 6-max because it is difficult to be wary of collusion among human players (including subconscious collusion).
11. Question: You say Nash equilibrium is not guaranteed to avoid failure in the game of 3+ players. Is that true? The definition of Nash equilibrium is not to lose much.
Noambrown: Nash equilibrium only guarantees that you will not fail in the expectation of double zero and game.
In the player game above 3+, Nash equilibrium only guarantees that if all other players follow the same Nash equilibrium, then you can perform best. So even if you're all playing the same Nash balance, you can still fail because your opponent is working with you.
Similarly, you may be caught in a multiple Nash equilibrium "balanced selection problem", and you may choose one of these, and other players may choose another strategy. So you can't simply figure out a Nash equilibrium and start playing cards according to the strategy they give you because you don't know if others will choose the same balanced strategy. In the two-person zero-sum game, because any linear combination of Nash equilibrium is another Nash equilibrium, this will not happen. But in the player game above 3+, in general, this is not true.
12. Question: Libratus is enough to defeat the human player, but in my opinion, it is not invincible, because another robot may appear within a few years and will be able to beat Libratus. How far do you think the libratus distance becomes a perfect poker player? For example, each probability distribution for each action is the best for a given historical opponent. Or so ask, is there any possibility of such an improvement. In addition, when you introduce more players into this equation, the robot must consider more dynamics. How complicated it would be to solve a three-person game.
Noambrown: I don't think the mainstream unrestricted poker variant will be "resolved". The game is too big. It is difficult to answer the question whether there is any improvement. I tend to deny the idea that artificial intelligence is now a Superman in these games, and I think we'd better be a community that focuses on other games.
I explained in the link below why the three-player game is a theoretical challenge, but it is not a real problem in poker.
Libratus compared with other algorithms
13. Questions: You can recommend some similar, but small scale, less efficient poker AI for online learning.
Noambrown: This site may be the best performance in the Open poker AI, although it does not perform real-time calculations.
14. Question: What is the difference between your software and the AI that runs the Piosolver simulator on a supercomputer? (Small note: Piosolver is a program that can quickly calculate the best strategy a player can play based on Nash equilibrium in Texas Poker)
Noambrown: There's a big difference. Libratus is using a better product than Piosolver. There are several reasons why you can't just use piosolver for this kind of game. (in advance, of course: My understanding of piosolver is quite limited, but I will answer as much as I know.)
1 piosolver requires a person to input the belief distribution of both players. And Libratus is able to completely determine this information on his own.
2) Piosolver may be deceived by behavior that appears to have zero probability in equilibrium. For example, if you bet 10%, and Piosolver thinks this is never going to happen, then the distribution of belief in your current hand is uncertain, and it will give you meaningless answers. I remember Piosolver had a clear disclaimer that if the opponent did a "strange" play, you should not believe it. Obviously, this would be a serious problem if you were dealing with top players who specifically beat Ai by discovering the weaknesses of AI. And Libratus has no such weakness. Even if you choose to act with zero probability in the equilibrium, it will have a robust and correct response to these behaviors.
15. Question: Given the recent hot chess, Alphazero will defeat Libratus.
Tuomas Sandholm: No, Alphazero is not designed for imperfect information game games.
16. Question: In terms of versatility, how does Alphazero compare with Libratus?
The
Tuomas Sandholm:alphazero is mainly for the perfect information game (for example, Weiqi, Chess and Chess), and libratus is for imperfect information game. This is the