Have to say, Pluribus beating top humans for 10,000 hands isn't the same thing as being super human. It's just too small of a sample to make that claim.
Further, thousands of the hands that Pluribus played against the human pros are available online in an easy to parse format [0]. I've analyzed them. Pluribus has multiple obvious deficiencies in its play that I can describe in detail.
It seems like it's very difficult to set up any kind of proper repeatable and controlled experiment involving something as random as poker. Personally, I would be much more convinced if Pluribus played against real humans online and was highly profitable over a period of several months. This violates the terms of service / rules of many online poker sites, but it seems like the most definitive way to claim terms like "solved" or "superhuman"
Normally 10,000 hands would be too small a sample size but we used variance-reduction techniques to reduce the luck factor. Think things like all-in EV but much more powerful. It's described in the paper.
> Finally, we tested Libratus against top humans. In January 2017, Libratus played against a team of four top HUNL specialist professionals in a 120,000 hand Brains vs. AI challenge match over 20 days. The participants were Jason Les, Dong Kim, Daniel McCauley, and Jimmy Chou. A prize pool of $200,000 was allocated to the four humans in aggregate. Each human was guaranteed $20,000 of that pool. The remaining $120,000 was divided among them based on how much better the human did against Libratus than the worstperforming of the four humans. Libratus decisively defeated the humans by a margin of 147 mbb/hand, with 99.98% statistical significance and a p-value of 0.0002 (if the hands are treated as independent and identically distributed), see Fig. 3 (57). It also beat each of the humans individually.
>The remaining $120,000 was divided among them based on how much better the human did against Libratus than the worstperforming of the four humans.
Surely the correct strategy here is for the human players to collude to give as much money as possible to a single player and then split the money afterwords, no?
Also, the fact that they players can only gain money without losing anything likely changes their play somewhat. By default I'd assume (and have generally observed) that most players on a freeroll (or better than a freeroll really) tend to undervalue their position and gamble more than is usually wise.
I'd definitely be interested in seeing a "real" game where the humans are betting their own money.
The four humans were getting $120,000 between them. Their share of that was dependent on how much better they did than the other humans. That means there was no incentive to collude.
Top pro poker players understand the value of money. They weren't treating it as a freeroll and anyone that has seen the hand histories can confirm that.
Do you think human players could use the results of this paper to learn how to be better poker players? I'm wondering if it could be an alpha go type situation where players learned different strategies.
Further, thousands of the hands that Pluribus played against the human pros are available online in an easy to parse format [0]. I've analyzed them. Pluribus has multiple obvious deficiencies in its play that I can describe in detail.
It seems like it's very difficult to set up any kind of proper repeatable and controlled experiment involving something as random as poker. Personally, I would be much more convinced if Pluribus played against real humans online and was highly profitable over a period of several months. This violates the terms of service / rules of many online poker sites, but it seems like the most definitive way to claim terms like "solved" or "superhuman"
[0] http://kevinwang.us/lets-analyze-pluribuss-hands/