Hacker News new | past | comments | ask | show | jobs | submit login
No limit: AI poker bot is first to beat professionals at multiplayer game (nature.com)
524 points by Anon84 on July 11, 2019 | hide | past | favorite | 392 comments



One of the researchers, Tuomas Sundholm, has a real badass CV. Former pilot in the Finnish airforce. Finnish windsurfer champion. Snowboarder. Professor at Carnegie Mellon. Speaks four european languages, including swedish. And now at the age of 51, he has created the best AI powered poker bot.

https://www.cs.cmu.edu/~sandholm/cv.pdf


Not to belittle the man's other achievements but speaking four languages is pretty normal in Europe, except when you're from the UK.


> speaking four languages is pretty normal in Europe

Northern Europe, maybe. French people for instance tend to suck at foreign languages. We rarely go beyond 3 languages (French, English, then German or Spanish. The last two are often forgotten after school.)

I suspect Spain and Italy are similar.


For example Spanish, Catalan, English and French would hardly be an unusual combination.


While Italy, France and Spain are pretty much tied in their English proficiency (Spain might be ahead, but not significantly like Portugal), there are 4 official languages in Spain, and several regions where pretty much everyone is bilingual.

I recall something like a 2.2 average.


Italy 1.8 seems accurate, in my experience most Italians know only Italian, although younger generations are likely to know a bit of English.

The surprising thing for me is Germany having 2. Seems unlikely.


> The surprising thing for me is Germany having 2. Seems unlikely.

Germany is big. I've heard that the proficiency in foreign languages tends to decrease as your country gets bigger. Because the bigger the country, the less likely you are to interact with foreign languages. Bigger countries also tend to have foreign works translated (or dubbed) into their own language more often.

So, no, I'm not surprised.


I was surprised it was not less.


At least a quarter of my city barely speaks german.

But they need to be able to get citizenship afaik... So basically everyone can speak two languages on paper, though their knowledge of the native one is extremely rudimentary

You're also required to learn 2 foreign languages in school if you want to go to university


As an American, I am now going to bang my head into a wall.


>into a wall.

I thought it was somewhat delayed, not paid, yet.


Nothing to do with being American, since you're afforded the luxury to learn other languages for free through public schooling. If anything, bang your head because you chose not to.


The offer is made, but the reason for doing so isn't made clear. I didn't understand it at the time; I availed myself of it in a minimal way. Most don't do that.

Some of that is the accident of geography: it simply wasn't necessary. Today, we are more connected to our Spanish-speaking neighbors, and the value of learning that language is becoming increasingly obvious. I don't know whether the schools are doing a better job of stressing that than they did when I was in school.

I have indeed chosen to learn other languages, several of them. I wish I'd done it in school, at a time when my brain was more open to it. Unfortunately, that was also a time when I didn't know very much and put my priority on other things that ended up making less of a difference in my life.


It's a myth that you learn languages easier earlier in life. Mastering a language takes about 10 years, it's just that when you start at age 6, you could be done by age 16.


Speaking as an American who speaks a handful of languages, very few Americans achieve any proficiency with foreign languages based on school from public school classes. Indeed I'm willing to take to zero those that don't have an active speaking component (most).


The quality of said language is highly variable which also has an impact. It simply isn't a priority to a lot of schools.


Eh, not for many, many Americans. My school, and most of the schools in my county, offered only Spanish and my understanding is that four years of it still wouldn't qualify a person for AP credit.

It's hard to find more data beyond my anecdata -- an EdWeek article I found reported that less than 50% of schools report world language enrollment data.

Also, the Europeans who learn three or four languages in school also have the luxury to learn those languages for free* through public schooling, so I'm not sure I understand your point.

I am sure that your implication that every American kid can get a quality free foreign language skill in school is false: just like almost every single other educational outcome in the US, it's generally great in the good (wealthy, suburban) schools and terrible in the bad (poor, rural or urban) schools.


Public schooling is a waste of time and not where people learn foreign languages. I learned my second and third language purely through the Internet. One of them I also had in school, but like I said it was a waste of time. The method is just completely wrong, since in school they do the two things that are the most detrimental to learning a foreign language. Those two things are correcting mistakes (since the emphasis will be on the mistake, which will be remembered) and learning grammar. Grammar is useless overhead when learning. Once you know the language you can bother with grammar, if you care. I never did.


> speaking four languages is pretty normal in Europe

Clearly we have different experiences (swedish person living in spain currently) but I haven't met that many people who speak four languages and are from a european country (but have yet to been in eastern europe).

That finns speak swedish is a special case though, as AFAIK, they learn swedish in school and being finn-swedish is a thing too.


I am a classical musician, and in my profession it is quite common. I speak a lowly 3 languages, but many colleagues speak 4+. It is a very international market, and if you leave your home country to study, it is not uncommon to work in yet another country before returning home.

Our solo flute speaks a whopping 6 languages well, and I suspect our harp player knows even more.


I should have mentioned: this is in Sweden.


Id love to know how you earned those downvotes.


In Iceland it's pretty normal. We know Icelandic (ofc.) and learn English, one Scandinavian language (Danish, Swedish, Norwegian or Finnish), then at 15 one of German, French, Italian or Spanish. We are on an island in the middle of the Atlantic. I'd expect more linguistic pluralities on the mainland.


I feel that's a bit of an overstatement, having studied them a bit is one thing, but most people here cannot comfortably communicate at all in Danish or a 4th language, and cannot read a book in these languages.


Being Swedish I bet you at minimum can understand and communicate proficiently with speakers and writers of Swedish, Norwegian, Danish and English. Probably you learned either Spanish, French, or German in school as well?

Nordic countries are a special case.

Norden er et spesielt tilfelle.

Norden är ett speciellt fall.

Norden er et specielt tilfælde.


A quick google search seems to contradict your statement: https://jakubmarian.com/wp-content/uploads/2014/10/number-of...

Average number of languages spoken: France: 1.8 , Germany: 2.0 , Spain: 1.7 , Portugal: 1.6 , Italy: 1.8 , Greece: 1.8 , Poland: 1.8 , Sweden: 2.5 , Finland: 2.6 , UK: 1.6


That's averages. And 'pretty normal' is not a mathematical thingy it just means: that this isn't rare or noteworthy at all.


Most norwegians only speak 2 languages. Swedish and danish is very similar to norwegian, more like dialects, so it doesn't count.


I am on holiday in Norway right now and have been super impressed by the english fluency of most people I have spoken with. It goes far beyond basic conversational fluency.


I've met a lot of people while working and travelling around Europe the past couple years, I would say 2-3 is more common.

I rarely met someone who could speak four languages fluently.


Getting in touch with two foreign languages in school is not uncommon, but speaking up to four (including your mother tongue) with any sort of sophistication definitely is not normal, at least in western Europe.


Not uncommon in Scandinavia, if you know one of the languages you can learn the other easily. Some people from Finland have swedish and finnish as their mother tounge, the german most likely came from upper secondary school, together with english.


As a Swede I have little issue understanding Norwegian, but I would absolutely not claim I speak it. Yes, the languages are similar enough that we can understand each other, but no Scandinavian will be able to speak another Scandinavian language without practice as there are many differences.


No it's not, what are you talking about? I've met thousands of young Europeans and ones that speak 4 languages are extremely rare. Unless they're from countries where they get 2 languages "for free" like Holland/Belgium/Switzerland. Definitely not "pretty normal".


I am swiss and the only Languages I speak are German and english. I should have learned french as well (and had it for a few years in school), but things tend to not stick if you're beeing forced to learn it against your will.


French guy here: no.

French people can usually speak basic English, and a third language is common if that person has ties with another country but that's it. At school, we are normally taught two foreign languages. The first one is usually English, few people actually practice their second one.

The situation is completely different in Scandinavian countries. And it is indeed quite normal to speak 4 languages in Finland (usually Finnish, Swedish, English and a 4th one, often German). Because their native language is only spoken by a few, foreign languages are a necessity for international relationships. And as a Finnish friend told me, learning new languages is a popular way to pass time during long winter nights.


If you want to keep your conversation private it is not enough to choose a rare language in Berlin. There is always somebody who understands what you are saying.


No its not normal to speak four languages in Europe.


Irish person here.

It's not normal.


it's pretty normal in backwater countries that can't thrive on their own. otherwise not so much.


> Speaks four european languages, including swedish.

Judging by his name, I'd assume Swedish is his first language, so that particular aspect isn't that surprising to me


Just from the fact that he exists in Finland and is older than 20 it's basically a given that he speaks Swedish because they learn it in school.


Learning it in school and actually speaking it are very different things though. Source: learned Swedish in school in Finland, rarely used it since, have practically forgotten all about it now.


That's stretching it a bit. Many Fins only speak very basic Swedish.


Finland has two official languages - Swedish and Finnish, his name suggests he is Swedish to begin with.

It's not uncommon to speak four languages (often C2 in couple of them) in the North Europe, esp. the Baltic region.

Like mentioned by sibling (sakarisson), that particular part is not impressive, the rest - sure


CV is also 100+ pages, not bad!


You gotta love resumes that says “founding of companies listed later” and have a dedicated chapter on “EVIDENCE OF EXTERNAL REPUTATION”.


392 published papers



>> Pluribus is also unusual because it costs far less to train and run than other recent AI systems for benchmark games. Some experts in the field have worried that future AI research will be dominated by large teams with access to millions of dollars in computing resources. We believe Pluribus is powerful evidence that novel approaches that require only modest resources can drive cutting-edge AI research.

That's the best part in all of this. I'm not convinced by the claim the authors repeatedly make, that this technique will translate well to real-world problems. But I'm hoping that there is going to be more of this kind of result, singalling a shift away from Big Data and huge compute and towards well-designed and efficient algorithms.

In fact, I kind of expect it. The harder it gets to do the kind of machine learning that only large groups like DeepMind and OpenAI can do, the more smaller teams will push the other way and find ways to keep making progress cheaply and efficiently.


Yes! I work for a company that does just this: pull big gears on limited data and try to generalise across groups of things to get intelligent results even on small data. In many ways, it absolutely feels like the future.


Interesting, are you using bayesian methods?


Does "Bayesian methods" mean anything specific? Parts of the core algorithms were written before I joined, and they are very improvised in the dog-in-a-lab-coat way. I haven't analysed them to see how closely they follow Bayes theorem and how strictly they define conjugate probabilities etc. (we are also heavily using simple empirical distributions), but the general idea of updating priors with new evidence is what it builds on, yes. I have a hard time imagining doing things any other way and still getting quality results, but that is probably a reflection on my shortcomings rather than a technical fact.


The FB post is much more detailed and I think the link on this post should be updated to point there.


It's easy to "take away" too much information from this. The focus is that an AI poker bot "did this" and not get too much into other adjacent subjects.

But what's the fun in that?

10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.

In 2019, it's impractical to adapt as a competitive player in live poker. A grinder can see 10,000 hands within a day. The live poker room took 12 days. Another characteristic of online poker is that players can also use data to their advantage.

So, I wouldn't consider 10K hands as long term, even if this was a period of 12 days. Once players get a chance to adapt, then they'll increase their rate of wins against a bot. Once you have a history of hand histories being shared, then it's all over. And again, give these players their own software tools.

Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.

This doesn't take away from the development of the bot. If we learn something from it, then all good.


>10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.

If you read the paper/facebook post[0] (no idea why this worse article is the link here) - you'll see they address this.

>Although poker is a game of skill, there is an extremely large luck component as well. It is common for top professionals to lose money even over the course of 10,000 hands of poker simply because of bad luck. To reduce the role of luck, we used a version of the AIVAT variance reduction algorithm, which applies a baseline estimate of the value of each situation to reduce variance while still keeping the samples unbiased. For example, if the bot is dealt a really strong hand, AIVAT will subtract a baseline value from its winnings to counter the good luck. This adjustment allowed us to achieve statistically significant results with roughly 10x fewer hands than would normally be needed.

0. https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-...


>Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.

Perhaps more famously, Jungleman compiled hand histories from many different people while he was playing Tom Dwan in the 'durrrr' challenge (which I guess technically isn't over....)


You clearly didn’t read the additional links they posted. They mentioned why they chose 10k (AIVAT), and it goes far beyond any of the variables you mentioned.

For any number of hands, my money is on the bot.


That really doesn't address the point that was raised. It's not that the bot wins through luck and that 10k is too small a sample, it's that a good professional poker player isn't good over 10k hands, they're good over 5 years.

Any good player will have their play analyzed and responded to, so there's a feedback loop there - any good player will have their play analyzed, exploited and will have to re-adjust their strategy to respond to exploitative play. The question is: How does the AI strategy adapt over time to players who know the hand history of the AI strategy. That's an extremely important part of being a top level player. To give you an example - if you watch Daniel Negreanu's vlog about his time at the WSOP he actively talks about changing his strategy in response to his analysis of different players' profiles. This is especially important in Sit & Go where at high stakes you'll have regular grinders who build up reputations - less so in tournaments where you're less likely to meet any given player.


This will be interesting to see.

Brown and Sandholm's algorithm aims to play a Nash Equilibrium which by deifnition _cannot_ be exploited by a single opponent player as long as all players are playing the equilibrium strategy. As they note in the paper this gives you a strong optimality guarantee in the 2-player setting. It was unclear whether this would transfer to real-world winnings in the multi-player case, and while it looks like it does for now (for current strategy-profiles of human players) humans might be able to adapt to the strategy played by the bot. Given the fact that the bot wins against current human strategy-profiles in the n-player setting, it's likely (but not a sure thing) that human players will have to team-up against the bot to exploit it. That seems rather unlikely to me.


I'm one of the authors of the bot, AMA


What took you so long? I mean not the Pluribus team specifically, but Poker AI researchers in general.

The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.

As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?

Bonus questions in case you have the time and inclination to oblige:

What does this mean for people who like to play on-line Poker for real money?

Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?


I think it took the community a while to come up with the right algorithms. So much of early AI research was focused on beating humans at chess and later Go. But those techniques don't directly carry over to an imperfect-information game like poker. The challenge of hidden information was kind of neglected by the AI community. This line of research really has its origins in the game theory community actually (which is why the notation is completely different from reinforcement learning).

Fortunately, these techniques now work really really well for poker. It's now quite inexpensive to make a superhuman poker bot.


So will this be the end of online poker?


It's pretty easy for good players to recognize other good players. And since the house takes such a large cut, the only way for pro players to have positive expected value online is to seek out games with poor players. So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.

That said, I suppose it would be possible for the bots to become so prevalent that all this sort of opportunity is effectively used up, so the return vs time and risk for a human player is no longer worthwhile. (That already happened long ago for most players, as the initial online poker boom faded and most casual players left.)

On the other hand, all the major platforms have terms prohibiting using bots, so their numbers might be sufficiently limited to prevent that scenario.


It's my understanding the big sites have some pretty sophisticated bot detection systems, so in theory a bot that would be successful at beating online poker couldn't be a huge winner, it'd presumably raise too many red flags. However, if it were a near break-even player, with dozens, if not hundreds, of instances running at any given time, it's going to slowly grind out a substantial figure. You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc. I'm not a coder, but it seems like it'd be a tremendous undertaking to code a bot that would be a substantial threat to players. Then again, maybe I'm naive about the level of scrutiny the poker sites are employing.


One of the professors I used to work with some years ago was involved in stylometry research on human-computer interactions such as keystrokes and mouse input (for example, to determine if a user who had authenticated successfully earlier is not the same person currently typing based on keystroke cadence and pattern analysis - e.g. if someone sat down at an unlocked workstation and started typing, you could detect it and force them to reauthenticate).

It would probably be possible to figure out the types of detection being performed by the poker sites and use adversarial training methods to train a machine learning solution to mimic human input patterns. Or, more pragmatically, have the bot analyse the state of the game and give orders for a human to perform at their own natural pace.


Poker sites mainly detect bots based on their login times, number of tables, time per action, etc.

A successful bot shouldn't get caught for "playing like a bot" because the moment it's actions are that predictable it would presumably no longer be effective.

But it will get caught for operating like a bot. So, don't run it 24hrs a day. Sites also randomize things to keep bots at bay, even card imagery.

If your performance and success drops whenever they randomize something that gives the bot false inputs, then you might get caught.

Inputting all of the poker events manually would be really tedious I'd imagine.

Of course, if you're winning millions, they can interview you about your poker history and how you got so good.

It sounds like easy money, but probably not.


Just play as you normally would, with the bot advising moves from the laptop next to you.


Right, but the bot needs to know who is in what position what the bets are, who folded, etc. Try inputting all of that information manually to the laptop next to you and you'll quickly get frustrated. Online poker is a fast game with lots of data points.


TensorFlow, PyTorch, Caffe, Keras, MXNet, and OpenCV could copy the game if you split the video input for the player and the bot.


Yes, but see my previous comment.

People have tried it and online poker sites know they've tried it, so they'll randomize images and other data. If you take a dive when the randomizations are triggered and outperform otherwise good luck trying to collect your winnings.


An external camera with Image processor does that


Not to mention, if you get caught, there could be worse consequences than just having your account locked. The site could (and likely would if the scale was significant) sue you for not only all your winnings, but damage to their business. They would likely win (since you're flagrantly breaking their terms of use contract), and bankrupt you.

Edit: In fact, if we're talking worst case, circumventing their anti-bot restrictions would presumably be illegal under the CFAA. So if you're in the US you could even be charged criminally, although I expect in reality that would be less likely.


>You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc.

You might be surprised by the lengths people go to in order to bypass bot-detection just for ordinary games. All of the things you mentioned are pretty standard. Considering there is serious money on the line here, I am positive that plenty of poker bots will be virtually indistinguishable from professional players, if they aren't already.


The same argument of money being on the line applies to the detection. Poker software is already pretty damn impressive with its tracking. The online casinos actually stand to lose more money than the bot creators could make, so the detection has a greater incentive, and is likely to triumph.


They only lose if there are less plays, surely? I assume they take a cut of all winnings, they're not putting up stakes.


Yes, I'm assuming that if bots work their way into everyday online poker that people will stop using it, so there would be less players.


I guess the real threat isn't a "bot" but something in the way of a program that interprets the data on the screen real-time and whose output instructs the player of the "optimal" play, given the circumstances. How the hell would you deter that as a site operator?


No, I think your earlier example of a swarm of just-above-break-even bots would be much more difficult to combat. Even if they can be detected, the anti-detection countermeasures can evolve, turning it into an arms race. Anything you can model in your bot detection algorithm, the bot-maker can model too.

Reaction times ought to be one of the easiest things to fake. All it would take is a bunch of monitoring of large numbers of games to create a nice model of real player reaction times, which in all likelihood are normally distributed anyway.


Not normally distributed, as negative reaction times are unlikely. You could use log-normal, but I believe that a mixture of exponential and gamma tends to be used by reaction time researchers (search ExpGamma).


negative reaction times are unlikely

Oh, right. I was thinking along the lines of 100m dash, where people often do have negative reaction times (which we penalize as false starts).

In poker we don't have much of an incentive to react instantly to any play.


Pretty sure I've read a long time back on 2p2 that large consistant winners on certain sites have been asked to submit camera footage of their play with a clear view of screens and inputs. So this is probably something that companies like Pokerstars have been dealing with for years already.


It would be pretty easy to hide something signalling you on what to play from cameras.


True, but ultimately if they're unsure they'll just ban you from the platform anyway. Consistent, winning players aren't really where they make their money, and they're free to ban anyone they like. (I realize technically they take a cut from all players, but more money gets sloshed around for them to skim off of if winning players aren't removing it from the system.)


That was what I was thinking, the bot augmenting a human's playing ability rather than playing itself.


> So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.

The problem would be if i was a pro i would rather run 1000 bots than play myself. Which means the only players left are AI and fish. Once the fishies learn of this fact, they will abandon in drove.

It's all gonna go back to live poker soon.


No, having losing odds never stopped anyone from gambling.


That's simply not true... I don't play casino games because of the losing odds. I play poker because of the winning odds. I guess you meant "having losing odds doesn't stop everyone from gambling".


Even with a magical human test, you couldn't know whether it was human + robot performance.


Just bet on bots playing each other.


So... like Wall Street!


I'm sitting here considering the possibility of making my own bot to play low stakes online poker ($1.50 sit n go). Run it on 6 tables at once and I imagine it would be facing really poor opponents and would have a steady flow of cash.


until your bot gets caught (possibly quickly) and then you're banned from the sites.


Even if it is, it means a new live poker boom which is a very good thing


It must be. It is way too hard to prevent humans from using an AI. Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.


>> Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.

Not really. With perfect information you know the correct strict equity plays assuming normal opponents. This doesn't give you the ultimate answer, because a player's reads and inference about another player is definitely an input - especially at the highest level - but it is more than enough to give you a winning/losing player at the small/midstakes.

source: worked for an online poker company that had these tools... and far more available to us


> a player's reads and inference about another player is definitely an input - especially at the highest level

I think player-dependent strategy is more important at lower levels because the players are much further away from what you call "normal opponents", so there's far more opportunity to exploit their mistakes.


>Some chess services try to check if you're playing "too perfect" [...]

That's interesting, could you share an example? Most of my search results are anecdotal Reddit threads about how many people cheat in online chess.


All the major online chess websites have anti-cheat mechanisms. They don't publish details of how they detect cheaters though, and I don't know how good they are.

From what I've read, they work by comparing the player's moves against chess engines, and if the player is picking engine's choice too often in positions where there are multiple roughly equal moves, they get flagged.


I always found weird that someone would want to cheat in a game like online chess. I mean, what's the point? Does anyone have insight on what's going on in the head of cheaters?


A few reasons come to mind. One is simply that if you have any metrics (ranking, win/loss ratio, greater site access..) it's going to feel nice to see them improve. Another is that losing at anything can be ego-hurting (similar reason good players sometimes sandbag with new accounts / lower ranks they can't possibly lose to, they need to 'win' more). Or reverse sandbagging/trolling with a bot might be amusing. Another is the cheater may justify it as a self-teaching game, and might not always play the strongest move but see if their move is even in consideration or try to improve ability to see the better moves by having them always pointed out -- but why not just play the bot, or save that for post-game analysis? I like to run my go games through gnugo's annotated analysis at the end (as I'm very weak I assume even the weak gnugo can teach me things), it'd be too troublesome to use it in a live game.


Other players justify cheating by convincing themselves that everyone else is cheating.


it's where the enjoyment comes from. cheaters don't enjoy the game as much as they seeing their ELO/MMR go up or in the worst case they're psychopaths who just want to mess with other people's heads.


People enjoy the the feeling of having power over others.


Even so most people don't bother with standard games online since its way too easy to cheat by mirroring the game and basically undetectable if they are good enough to not play lines that look like "computerish" moves.


>> Computers are better at maths than humans.

OP discussed it but while this is true, it is not necessarily true or straightforward when it comes to games with hidden information like poker. This is more of a game theoretical problem (Economics) than it is a purely mathematical one, which had less support in the AI/ML community, hence the delay.

The lower CPU/GPU/resource use supports that fact as does your intuition. Breaking poker required a lot of manual work and model design over brute force algorithms and reinforcement learning.


The bot does not seem to consider previous hands in its decisions. That is to say, it does not consider who it is playing against. Should this affect how we perceive the bot as “strategic” or not? Bots that play purely mathematically optimally on expected value aren’t effective or interesting. But it feels like this is playing on just a much higher order expected value.

It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.

At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?


The bot bluffs, and understands that when its opponent bets it might be a bluff. I would consider that to be strategic behavior. The fact that its strategy is determined by a mathematical process doesn't change that in my opinion.


It does bluff, but that’s not my point. My issue is that it bluffs without consideration of its opponent. High level strategic play of most games is about adapting to your opponents play. This bot does not do that. It is secretly a giant lookup table of game state to response.

In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

I’m surprised that you managed to beat pros without adaptability. It’s pretty impressive and says a lot about how we define strategy. If human adaptability is just not as good as machine optimality across all games, we could imagine discovering that an adaptable poker AI can’t outperform this one. It raises a whole lot of interesting questions because lots of criticism towards something like Starcraft AI is that it is strategically stupid and doesn’t adapt. Now the Starcraft Ai is admittedly kind of stupid now, but we may hit a wall on its creativity simply because creativity is, despite human intuition, a dumb idea.


If you think about it, any AI that's stopped learning and is now efficiently doing pattern matching or pattern completion (assuming memory and attractor states), instead of running a complex search, is arguably a fancy lookup table hashed by similarity. This includes humans. In other words, lookup table isn't the slight most think it is. But the bot does do real time search so it's not "merely doing" a look-up.

Because of how Poker is not sub-game solveable (it is not possible to self-locate within the tree), this bot's play has to get into its opponent's mindspace in a sense. To not be exploitable, it essentially has to infer the other player(s) hidden state and paths from observed actions. This isn't something I've seen in Dota, Starcraft, Chess, Go bots.

It's true that it doesn't learn online to find exploitable patterns of other players, but doing this without also making yourself exploitable in turn is a very difficult other problem. Low exploitable near optimal play according to game theoretic notions is considered strategy.

While you're correct that online learning is powerful and something machines are not currently good at (in complex spaces), you can avoid being exploited without learning if your experience is rich enough and you know how infer what your opponent is trying to do and anticipate them. I'd argue this lineage of poker bots are the closest to playing that way of the major game playing bots.


I don’t mean look up table as a bad thing. I mean it’s a lookup table on game state, without incorporating any information about the players. But good points


> High level strategic play of most games is about adapting to your opponents play.

Is this true in any meaningful sense?

For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

I'm very impressed by this achievement because I had expected good multi-player poker AI (as opposed to simple colluding bots found online making money today) to be some years away. But I would not expect "adaptability" to ever be a sensible way forward for winning a single strategy game.


Adaptability is certainly not necessary (almost by definition) if you're playing a near to equilibrium strategy but adaptability is a useful skill to have in a general non-stationary world.

That said, for this bot, I wouldn't say it's playing completely independent of the other players's interior state. Pluribus must infer its opponents strategy profile and according to the paper, maintains a distribution over possible hole cards and updates its belief according to observed actions. This is part of playing in a minimally exploitable way in such a large space for an imperfect information game.


> Pluribus must infer its opponents strategy profile

This is what interests me. It doesn’t do this. In fact because it played against itself only, it is should be assumed that the only strategy profile it considers is its own.


You're right that it uses itself as a prototype for decisions but the fact that it also maintains a probability distribution over possible hole cards and that it updates according to observed actions is already richer than the local decision only approach taking most all other bots. This is sort of forced by the simplicity of poker's action space combined with the large search space and imperfect information. Here, the simplicity ends up making things more difficult! They also use multiple play styles as "continuation strategies" so it's a bit more robust. And to be fair, I suspect much of human play does use themselves and experience as a substitute too.


> For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

In an n-player game, a table can be in a (perhaps unstable) equilibrium which the "optimal" strategy will lose at. This has been demonstrated for something as simple as iterated prisoners' dilemma (tit-for-tat is "best" for most populations, but there are populations that a tit-for-tat player will lose to). I don't play poker but I've definitely experienced that in (riichi) mahjong - if you play for high-value hands the way you would in a pro league, on a table where the other three players are going for the fastest hands possible, you will likely lose.


Well in online poker high level players make great use of player tagging, taking notes about players they have played before and what they've done in important hands or their patterns. Software exists to track how opponents behave in any given situation, and if it pops up again you use that.

I would think if professional players are utilising this information, a bot could benefit from it. I don't see how they would ever lose out from this information, even if it only uses situations where the opponent has a history of 100% of the time responding a certain way.

I am impressed by the bot but I have to laugh a bit because years ago I joked with a friend about making an "amnesiac bot" that had no recollection of previous hands, it seemed so useless we obviously didn't make it, we've evidently been proven wrong. (pointless tangent there)


Player tagging just makes you exploitable. I play one way now, you tag me "Haha, fool bet-folds way too much" and then I change it up to exploit you, "Huh, I keep trying to fold him out with worse and he doesn't bite even though my notes say he will".

The theoretically optimal play just skips that meta and meta-meta play and performs optimally anyway. Because poker involves chance the optimal play will be stochastic and so you can stare at the noise and think you see a pattern, that just means you'll play worse against it, because you're trying to beat a ghost.

For example, suppose in a certain situation optimally I should raise $50 10% of the time. It so happens, by chance, that I do so twice in a row, and you, the note-taker, record that I "always" raise $50 here. Bzzt, 90% of the time your note will be wrong next time.


You would be a fool to act based off only 2 instances of seeing a particular behaviour. That's why you have to weigh up how many instances you've seen. Sometimes if it's less than X instances it's not worth considering that particular statistic as valid.

Now say I have thousands of hands viewed against you, and you raise pre-flop 50% of the time. That is pretty significant information about the types of hands you play. If I have only 10 hands I've observed, that same stat means nothing.

The theoretical optimal play depends on who you're playing, as more value could be extracted in certain situations vs certain players.

For example, if I've seen you face a pre-flop 3-bet 1000 times and you've folded 99% of the time. That would be a good opportunity to recognise that 3-bet bluffing this player more often would have value, and be a more optimal play than some default. Contrast playing someone who called pre-flop 3-bets 75% of the time it wouldn't be optimal to 3 bet bluff here. Different opponents, different optimal plays.


I think we need to make a distinction between two kinds/styles of play:

1. Coming up with an unexploitable strategy, then scaling it up by playing as many hands as you can, earning the slim expected value each time.

2. Picking a good table / card room / 'scene', and then trying to extract as much value from it as possible.

You most often see 1 online, and 2 live, for obvious reasons.

A skilled human would be a lot more successful, I believe, than a bot in case 2. For 2, important skills are:

1. Be entertaining. You have to play in a way that is entertaining to those playing with you, such that they want to continue playing with you (and losing money to you). Good opponents (i.e. that are bad at poker but want to play high stakes) are hard to find, it is vital that you retain them.

2. Cultivate a table image, then exploit it. Especially important for tournament play, where you have the concept of "key hands" that you really need to win to potentially win the tournament. With the right table image, you may be able to win hands you otherwise wouldn't have won.

3. Exploit the specifics of the players you are playing against. Yes, that also makes you exploitable, but the idea is to stay one step ahead of your opponents.


Note that 1) is only true if your opponent is also not making many mistakes. Which fails to be true for most humans, where the combination of randomization and calculating state appropriate ranges is very difficult. This means that weak players can still lose heavily from mistakes/poor play within a reasonable number of hands, it need not be slim.

Furthermore, you can kind of account for such players by including more random or aggressive profiles in the inference/search stage.


Player tagging is more complicated than a single game, and goes far deeper than playing a few hands one way and then switching it up. You can have player stats based on thousands of hands, you can know things about your opponent even they don't know.

I don't think you play very much, which is fine, but makes this discussion a bit pointless.


> In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

Adaptability is beaten by perfect strategic play in games with clear victory conditions.

My familiarity with optimal control theory is nil but Kydland (1977) applied it to monetary policy to show that the right rules dominate discretion. What the right rules are for monetary policy is still an open question though, because while the victory conditions in economic policy are clearly defined the surrounding environment is very far from static so you deal with out of training set data regularly. Once AI can deal with these kind of out of context problems it seems plausible GAI is a matter of time.

http://www.finnkydland.com/papers/Rules%20Rather%20than%20Di...

> Rules Rather than Discretion: The Inconsistency of Optimal Plans

> Even if there is an agreed-upon, fixed social objective function and policymakers know the timing and magnitude of the effects of their actions, discretionary policy, namely, the selection of that decision which is best, given the current situation and a correct evaluation of the end- of-period position, does not result in the social objective function being maximized. The reason for this apparent paradox is that economic planning is not a game against nature but, rather, a game against rational economic agents. We conclude that there is no way control theory can be made applicable to economic planning when expectations are rational.


"Strategic" is probably the wrong word, but I think there is a valid question here regarding the approach the AI is taking. One of the key things for a good poker player is having the ability to adapt and adjust their strategy depending on how others at the table are playing. Sometimes you can have the exact same cards in the exact same position and in one game it is smart to fold and in another game it is smart to raise. From the description in the article, it doesn't appear that this AI takes those ebbs and flows into consideration. Instead it seems to play "purely mathematically optimally on expected value" that was honed through trillions of simulations.

There is a cliche about how poker is about playing your opponents and not the cards. Is this AI is only focusing on its cards and ignoring its opponents?


The AI doesn't adapt to the opponents, and that's still an interesting challenge for AI research. That said, at the end of the day, it was making quite a bit of money playing against elite human pros. I think that suggests the cliche is, at least in part, wrong.


Making "quite a bit of money" still leaves open the possibility that the AI is leaving a lot of money on the table by not taking opponents into consideration.

Also I would be curious to see how it performs against people that aren't "elite human pros". Would this AI win at a higher rate in a game against average recreational players compared to the rate a pro would win?

Lastly it is also possible that the pros simply didn't have enough time to adapt to the AI which would be extra important considering the AI plays unlike humans and therefore is harder to predict.


I think the bot would make a lot of money playing against average recreational players, but it's absolutely true that if you can exploit bad players' weaknesses, then you can make more money than what the bot would earn.

We played 10,000 hands over 12 days in the 5 humans + 1 AI experiment. That's quite a long time, and there's no indication that they even began to uncover any weaknesses in that time period. So I'm fairly confident the AI is robust to exploitation, and I think that's a very important quality to have in any AI system.


That 10,000 total hands number isn't particularly meaningful on the point of adaptability because the humans aren't sharing information with each other. The important number is how many hands each individual human played against the AI. Another question would be whether the pros knew which player was the AI? Because if they didn't, you are basically throwing a modified Turing Test against the pros before they can even begin to try to find tendencies in the AI. Predicting opponents is a huge part of how people play poker. If the AI plays unlike any human, pros are at huge disadvantage against an AI compared to how they would fair against a similarly skilled but more traditional human player.

None of this is meant to diminish what you all accomplished, I'm just highlighting areas of poker in which this AI would be less successful than humans even if it is more successful overall.


The humans knew the whole time which player was the bot.


There was an interesting IRL poker game a few years ago. The player who was running behind started going all in on every hand without even looking at their hand (with a huge amount of success).

Out of curiosity, how does a bot deal with oddities things like this?


This is a solved problem. Open-shoving is a feature of sit-n-gos, so of course people have simulated these and compiled so called "pushbot tables". The parameters are basically pot size and winning probabilities against a random hand.

While this particular bot may not have those programmed in, a more powerful variant eventually will.


The mathematical theory explored in the paper is that if multiplayer poker isn't one of the multiplayer finite state games that pathologically fails to converge to a Nash equilibrium, then it has one, and this strategy should approximate it. Intuitions about adaptability and the advantages thereof aren't applicable in the scenario where the opponent is playing to a Nash equilibrium. You can perform equally well by participating in the other side of the Nash equilibrium, but anything else is a losing strategy. The fact that this approximation converges to a strategy that's actually really good suggests that there is a Nash equilibrium, and that the converged-upon strategy is converging on it.

You can't out-think or adapt to a rock-paper-scissors opponent who selects at random. All you can do is also select at random and accept that the two of you have even odds.


>> Bots that play purely mathematically optimally on expected value aren’t effective or interesting.

Interesting is up to you, but effective is definitely wrong.

ICM-perfect bots crush small tournaments, which do not take into account opponent behavior - merely modeling the gamestate. The faster the blinds and the smaller the stacks, the better, but even normal structures get killed by these so-called "expected value" only bots.

Game Theory Optimal (GTO) attacks are incredibly effective at all levels of the game. The AI need not incorporate opponent feedback to be a winner. It can make it better, but it is not at all required.


First of all, I laughed at the 20-second average per game in self-play, since I ran into the same thing and have been trying to speed up the algorithm but haven't been able to get it faster (without throwing more hardware at it).

Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?

[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]

Also, curious how much poker you folks play in the lab for "research".


We're doing cash games in this experiment. At the end of the day, this is about advancing AI, not about making a poker bot. Going from two-player to multi-player has important implications for AI beyond just poker. I don't think the same is true for cash game vs tournament.

There's a cash game almost every night at the FBNY office! I don't usually play though -- I'm not nearly as good as the bot.


> In Tourney play, the top 2 or 3 players get paid out

Or top 2 or 3 thousand... depends on the tournament but it's usually the top 15% ish.


True, I am thinking "sit and go" tournament where you would have 6 players like in this research.


Is there much to do here? ICM bots have this space covered pretty effectively.


But ICM is only a model that helps you evaluate information in the tournament, players will use it often to cap their bets or as a tipping point on a call, but I've never seen it used as a complete basis of play.


How do you think these same pros would do in a follow-up match? As described in the article, the bot put players off their game with much more varied betting and with donks. Do you think the margin would decrease as players are exposed to these strategies?

Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.

It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.


In the paper we include a graph of performance over the course of the 10,000-hand 5 humans + 1 AI experiment that was played over 12 days. There's no indication that the bot's performance decreased over time (there is a temporary downward blip in the middle, but that's likely just variance). Based on discussions with pros, it sounds like they didn't find any weaknesses and they didn't seem to think they'd find any given more time.


I think it would be hard for the pros to find exploits against the bot, but they could definitely lose less. When using solvers, pros generally only input a couple of sizings for bets, and avoid 2x+ pot sizings, which from the video it seemed like the bot used at much higher frequencies than other pros.


I'm not great at poker, but I did play a decent amount and I know a lot of my strategy involves probing for other people's weaknesses and shifting my strategy mid game to throw people off.

I feel like a lot of trained ML models have a lot laughable weaknesses, but perhaps they've been trained on every game they're well prepared for any tomfoolery.


The bot is trained to play Game Theory Optimal, aka it's playing to be breakeven at worst, which is why I believe it would be hard for a human to beat it. It's not playing perfectly, but the edges it's giving up is so marginal to perfect play that a human is going to lose simply by making a mistake at some point, even if a human were to use a solver to completely optimize their strategy.


I also suspect it would not be able to maintain a ~40bb/100 hand win rate. The thing about human players is, while the best are capable of learning and employing truly balanced GTO strategies, in practice they rarely adhere to these because other humans (even good pros) will still have exploitable flaws in their strategies, and attempting to exploit these will be more profitable than sticking to the unexploitable strategy; of course it also opens the exploiter to counter-exploitation, creating a fluctuating cycle of players trying to exploit, getting exploited, then moving back towards playing unexploitably. That's the normal state of a pro's strategy in a given game - so to switch to a steady state of always playing unexploitably would be a fairly big adjustment even to top tier pros who are capable of it.


Yeah, that is kinda what I was trying to tease out. These 10K hands are nothing compared to the XM of hands these pros have already played. It would be interesting to see how well they did after 1M hands. I'm sure the bot would likely still have an edge but I'd assume the players would adjust their strategy and but less confused by the random sized bets.

I was also confused by the sample videos where everyone had $10K at the start of each of the demo hands. It was unclear to me if that just the simulation of the hands or actual game play. If everyone starts every hand with $10K, then the feat seems less strong as going all-in has less risk.


Stacks are reset to 10k at the beginning of each hand, so they can use every hand to train a single model with the same starting state.


The fixed stack size doesn't really discount anything to me - it makes sense as an experimental control; and it's a cash game so there's no additional risk to going all in regardless of stack size.

But yea the sample size is definitely too small imo; when tested the heads up version of the bot some years ago they had it play a bigger sample (50 or 100k iirc?).


In online poker (at least with 100BB stacks) it's customary to top up between hands if you're below full stack.

The reason is simple: with table stakes, your maximum win for a hand is constrained by your own stack size.


I remember reading in the mid-to-late aughts that a lot of old-school poker players that used more swagger and intuition were starting to be run out of the game by kids who applied statistical methods.


Could you perhaps speak to some of the engineering details that the paper glosses over. E.g.:

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

- Is there anything interesting going on with how the strategy is compressed in memory?

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?


We tried to make the paper as accessible as possible. A lot of these questions are covered in the supplementary material (along with pseudocode).

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

The information abstraction is determined by k-means clustering on certain features. There wasn't much thought put into the action abstraction because it turns out the exact sizes you use don't matter that much as long as the bot has enough options to choose from. We basically just did 0.25x pot, 0.5x pot, 1x pot, etc. The number of sizes varied depending on the situation.

- Is there anything interesting going on with how the strategy is compressed in memory?

Nope.

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

We set a threshold at $100.

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

In each case, we multiplied by the biased action's probability by a factor of 5 and renormalized. In theory it doesn't really matter what the factor is.

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

This comes out naturally from our use of Linear Counterfactual Regret Minimization in the search space. It's covered in more detail in the supplementary material

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

I think it's all pretty robust to the choice of parameters, but we didn't do extensive testing to see. While these bots are quite easy to train, the variance is so high in poker that getting meaningful experimental results is relatively quite computationally expensive.

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

I think the key is that the search algorithm is picking up so much of the slack that we don't really need to train an amazing precomputed strategy. If we weren't using search, it would probably be infeasible to generate a strong 6-player poker AI. Search was also critical for previous AI benchmark victories like chess and Go.


Any chance of the code being released or a cepheus style answer key being provided? http://poker.srv.ualberta.ca/strategy


I don't think the poker world would be happy with us if we did that. Heads-up limit hold'em isn't really played professionally anymore, but six-player no-limit hold'em is very popular.


It depends who you ask. I think it's inevitable that it's released one day. By not releasing you're just delaying it.

All the top high stakes players already have solvers that they've spent a lot of money developing and studying privately. They would definitely be upset with you, but by releasing the code you are democratizing the information to all the midstakes pros who want to study but don't have the resources to pay developers and solve the game privately.


If you're already using programs to help you, I don't see how you can be upset if someone else is cheating better than you are.


Someone watch this guy and see if he buys any fancy watches or nice cars in the next few years. ;)


Doesn't that make it a rather poor candidate for a scientific paper? Chest-thumping without data and code is, well, chest-thumping without data and code.


Have you thought about open sourcing the non-AI pieces? It would be great for other researchers so they wouldn’t have to build the poker pieces from scratch


There is some open-source code in this area, and hopefully there will be more going forward. Here's one example: https://github.com/EricSteinberger/Deep-CFR


a. Is CFR applicable in single player hidden-information games? (e.g. state is initially hidden, gradually revealed to the agent, but there is not adversary)

b. How much more efficient is the improved search algorithm? the $150 number sounds like a couple of order of magnitudes..


a. There was this paper a couple years ago applying CFR to single-agent settings: https://arxiv.org/abs/1710.11424

b. It really depends on the game and the situation. It can be several orders of magnitude in six-player poker. In other games, it can be even more.


Why are you concerned about the happiness of the poker world?


Well if they upset the poker world do you think they would have top pros willing to go on record endorsing them?


Top pros will endorse whatever they're paid to endorse.


This is falsifiable by any number of cases, but Isaac Haxton spurning PokerStars is probably one of the best examples so others see your comment is not universally applicable.

https://upswingpoker.com/isaac-haxton-pokerstars-partypoker/


>However, Haxton isn’t accepting PokerStars’ olive branch as he was among the victims defrauded by the online giant for millions of dollars.

I'm not sure the really provides strong opposition to the GP's claim.


PokerStars offered to make him - and him alone - whole through sponsorship dollars. Haxton used to be their lead pro and is widely considered one of the very best players in the world.


It could just be for ethical reasons. I think anbop has a good reason even for unethical folks: hitting the best players hard in their wallets will definitely make it harder to recruit them for comparisons that validate these experiments. My prediction is that releasing this software will lead to profitable cheating like what people do with Blackjack at casinos.


Why not run the bot, post its proceeds transparently online, and donate everything to charity?

By not releasing it, you're ensuring a higher concentration of money in the hands of a few, IMO.

Anyone with access to this source code could run a bot themselves, or employ someone to do so.

Plus, if you've accomplished this, no doubt someone can replicate it.


By not releasing it, it doesn't validate the experiment. How can we be sure there wasn't human support?


As other commenters have said, I do too think you're delaying the inevitable but releasing now would mean you get credited with the first free solution.


In your Science paper, you mention playing 1H-5AI against 2 human players: Chris Ferguson and Darren Elias. In your blog post you also mention playing 1H-5AI against Linus Loelinger, who was within standard error of even money. Why did Linus not make it into the Science paper?


That took place after the final version of the Science paper was submitted. It would have been nice to include but it takes a while to do those experiments and we didn't feel it was worth delaying the publication process for it.


The article makes it sound like the AI is trained by evaluating results of decisions it makes on a per-hand basis. Is there any sense in which the AI learns about strategies that depend upon multiple hands? I’m thinking of bluffing/detecting bluffs and identifying recent patterns, which is something human poker players talk about.


The bot handles each hand independently. How the players play in one hand does not affect how the bot plays in future hands at all.

That said, it did train by playing against itself (before the experiment against the humans began).


Interesting. Does this mean that it cannot adjust to human players "switching gears"? Isn't this a huge leak?


It’s not a leak, it just means it cant beat the opponent for the maximum it could by playing the exploitative counter strategy vs their tendencies. Instead it just plays gto which will win against any given non gto strategy, though not for as much as the exploitative counter strategy. Playing an exploitative strategy however leaves you open for exploitation and this goes back and forth until the players converge onto gto, assuming the players are (very) good.

t. former poker pro


Was Judea Pearl's work relevant for the counterfactual regret minimization, or is there some other basis? I've added CR to the list of things to look into later but skimming the paper it was exciting to think advances are being made using causal theory...


The CFR algorithm is actually somewhat similar to Q-learning, but the connection is difficult to see because the algorithms came out of different communities, so the notation is all different.


Who were the pros? Are they credible endbosses? Seth Davies works at RIO which deserves respect but I've never heard of the others except Chris Ferguson who I doubt is a very good player by todays standards (or human being, for that matter), but I've never heard of the others when I do know the likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.

Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.


LLinusLove was one of the players. Chris Ferguson was in one of the 5 AI's + 1 Human experiment but not the 5 Humans + 1 AI experiment.

We used AIVAT to reduce variance, which reduces the number of samples we need by roughly a factor of 10: https://poker.cs.ualberta.ca/publications/aaai18-burch-aivat...


What? The pros chosen were definitely highly skilled players. They're fairly well known in the online poker community.

Furthermore, Chris Ferguson, scumbag aside, is absolutely still a very good player by today's standards, and one way higher than the mean participant in a research experiment.

10,000 hands is an effective enough sample at a certain win rate and analysis of variance of play; the n-value alone is not enough to tell you if it was enough hands.


They're credible enough. I'd like the sample sizes to be bigger as well but they're enough to verify that even if the bot got lucky over the sample size, it's close enough that it doesn't really matter. Add a bit more compute, optimize some algorithms a little, and you'd make up the difference. The real point is that they have a technique that scales to 6-max, and whether it's 97% or 99% is kind of immaterial in the grand scheme of things.

FWIW, they did some variance reduction techniques that dramatically reduce the number of hands needed to be confident in your results, so the number of hands may be bigger than you think. e.g. the results of 10k HU hands have much higher variance than the results of 10k HU hands where everyone just collects their EV once they're all in.


Jimmy Chou, Jason Les, Dong Kim are affiliated with Doug Polk.

It is an interesting point that these are pros but their specialities are either tournament or heads up. The current 6 max pros are LLinusLove, Otb_RedBaron, TrueTeller.


I'm very late to this post, so not sure if you're still around.

What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?


Congrants on the bot!

I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.

Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?

Thanks


Why did you optimize for using less cpus? Was it a happy accident or a goal?


A little bit of both. We didn't think we needed the extra computing power. And we really wanted to convey how cheap it is to make a superstrong poker AI with these latest algorithms.


Knowing when to bluff often depends on the psychology of the opponent, but since it trained playing itself it doesn't seem that knowing when to bluff would be learned. Did it bluff very often?


The bot does bluff, and in fact it learns from self-play that bluffing is (sometimes) the optimal thing to do. At the end of the day, bluffing is simply betting when you have a weak hand. The bot learns from experience that when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet. The bot doesn't view it as deceptive or dishonest. It just views it as the action that makes it the most money.

Of course, a key part of bluffing is getting the probabilities right. You can't always bluff and you can't never bluff, because that would make you too predictable. But our self-play and search algorithms are designed to get those probabilities right.


> when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet.

This makes no sense. If I am betting for thin value with a weak hand, then I make less money when my opponent folds. Does the bot not know whether it is bluffing or value betting?


It makes complete sense. There’s a component of value and a component of bluff for a given hand in front of you. They’re related.

Value betting and bluffing aren’t defined by the outcome of a hand — action yet to be completed. Poker is a game of hidden information so betting with “thin value” implies that your component of bluffing is larger. You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.

QQ can get KK to fold based upon board texture, street, and prior action. But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.


> You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.

No, that is simply not true. If I am betting for value, then I want my opponent to call no matter how weak I am or how thin it is.

> But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.

Then it's a value bet. As you said, it's not defined by the outcome.


“Value betting” and “bluffing” are human heuristics to simplify complicated situations.

The bot doesn’t “know” whether it’s value betting or bluffing—it’s not a relevant question. The relevant question is whether to bet, and what amount, in order to maximize value of the particular hand it has, with reference to the board and opponent actions taken.


Right, we agree on that, but the above comment lumps all of what you describe (“betting with a weak hand”) under “bluffing” and says the bot learns that it makes more money when its opponent folds.


Where does your quote say that the bet is a value-bet? I read it as saying that the bot learned to bluff (not value bet) by betting when it has a weak hand (I.e. The bot has a weak hand, so it's getting better hands to fold by betting). The phrase "value bet"was not used.

(This, in addition to what the other comments have said about there being spots where a bet can get better hands to fold with some probability AND get worse hands to call with some probability - see the chapter "The grey area between value betting and bluffing" in Applications of No Limit Hold Em)


"At the end of the day, bluffing is simply betting when you have a weak hand."

I was the one who introduced the term "value betting" to the conversation, applied specifically to weak hands.


I mean, unless only those who interpret it wrong would respond, then I must be the one reading it wrong. Because these responses aren't lining up with how I read it or what I meant.


At the highest levels of play psychological factors are pretty minimal. Before a showdown which cards you actually hold aren't particularly material, as the only information you convey is through your bids. This means if you predict that you're more likely to win a hand by bidding (and inducing a fold) than by calling and going to a showdown it makes mathematical sense to "bluff". I'm sure AIs have no trouble learning that fact.


The issue is that you don't know exactly the probability of your opponent folding.

This is psychology.


The probability of the opponent folding doesn't matter. The goal of bluffing in modern games is so that optimal players are indifferent in their decision (no matter how they play, you can't lose money). And because this is a zero sum game, if you can't lose money then you win if the opponent makes mistake.

You only need to know the probability of the opponent folding so that you can deviate from the theoretical optimal strategy to win even more money if they are a biased player


I'll have to go back and watch Data playing poker on Star Trek NG -- what do sci fi writers think of this.


Are there any ethical considerations relating to the prospect of use of this bot for cheating in real-money games? Either from your internal team or after public replication?


We're really focused on advancing the fundamental AI aspect. We're not here to kill poker. The popular poker sites have quite sophisticated anti-bot measures, but it's true that this is an arms race.


There are no ethical reasons why a game like poker must exist. In fact, poker gives a false sense of hope to the thousands of gambling addicts that enter casinos. It is a fun game, but there are an unlimited potential number of fun games..


1 ethical reason it ‘must exist’ is that it is a man-made game that some people enjoy without causing harm to themselves or others. Not quite sure what you’re suggesting, but “banning poker” is not going to solve the problem of gambling addiction.


I saw people who were going occasionally to casino without problems because nothing makes you lose and tilt so much as poker. I witnessed poker destroying families and people more than other games. There are people who don't like other casino games but lose heaps on poker and before they started poker their lives had more quality and meaning. I don't play other casino games but poker had a really bad influence on my life and the lives of people around me. Also, majority of money from poker comes from the players, not from the viewers and sponsors like in other sports, like football, baseball etc.


Very impressive. If my understanding of how the AI works is correct, it is using a pre-computed strategy developed by playing trillions of hands, but it is not dynamically updating that during game play, nor building any kind of profiles of opponents. I wonder if by playing against it many times, human opponents could discern any tendencies they could exploit. Especially if the pre-computed strategy remains static.


We played 10,000 hands of poker over the course of 12 days in the 5 humans + 1 AI experiment, and 5,000 hands per player in the 1 human + 5 AI's experiment. That's a good amount of time for a player to find a weakness in the system. There's no indication that any of the players found any weaknesses.

In fact, the methods we use are designed from the ground up to minimize exploitability. That's a really important property to have for an AI system that is actually deployed in the real world.


A hearty congratulations, Noam, on finishing another chapter of the story i opened in the early 1990s...

Another person asked "What took you so long?", and i had the same question. :) I really thought this milestone would be achieved fairly soon after i left the field in 2007. However, breakthroughs require a researcher with the right amount of reflectiveness, insight, and determination.

Well done.


The progress you have made in this research field is amazing. What do you think will be next step or where do you the the future of your research?


Thanks! I think going beyond two-player/team zero-sum games is really important. This was a first step, but it's definitely not the last. I'm hoping to continue in this direction, and maybe start looking at interactions involving the potential for cooperation in addition to competition.


I haven't finished digging through the paper and the supplement yet, but I'm curious about how many hands were multiway to the flop (and whether the percentages differ significantly between 1H/5AI and 5H/1AI). I'd guess that it's a pretty small fraction of the total hands, and I'm wondering what the performance is like in those particular cases.


I don't have the exact percentages but I think it's less than 10%. It's not really possible to measure the bot's performance just in specific situations, but my feeling is the bot performs relatively well in these situations. Multi-way flops were basically impossible to do in a reasonable amount of time for past AI's. Our new search techniques make these situations feasible to figure out in seconds.


Cheers, thanks. One of the reasons I asked about 1H/5AI vs 5H/1AI is that historically the new bots for a given form of poker have played a bit wider than the accepted wisdom of the time, so I was curious if there were relatively more multiway pots with 5AI than with 5H.


The pros described the bot's preflop strategy as very sensible, so I think it's unlikely there were more multiway pots with 5 AI's.


What table information does the bot take into account? Position? Other player's stack size?

>Regardless of which hand Pluribus is actually holding, it will first calculate how it would act with every possible hand .

Is this information used to form an idea of what other players might be holding based on how the other player acts and how closely that action matches Pluribus's 'what if' action?


No, it's to mask actions. If you bet big with monsters and check with air 100% of the time, you opponent knows when to fold and bet.

iirc, the frequency of bets in that spot is roughly equivalent to the frequency of times you're definitely in front of your opponent in that particular spot, but not always with the hands that are beating your opponent.

The concept is called Game Theory Optimal (GTO) and it's pretty popular in higher stakes games.


Can you share some about what strategies the bot prefers and how these compare with common professional human strategies?


We talk about this a bit in the paper. Based on the feedback from the pros, the bot seems to "donk bet" (call and then bet on the next round) much more than human pros do. It also randomizes between multiple bet sizes, including very large bet sizes, while humans stick to just one or two sizes depending on the situation.


Is there a way to see the EV the bot is calculating when it's deciding between checking and donk betting? When you place these spots in solvers, they actually advocate for a significant amount of donk betting on certain boards, but pros don't do it because the EV is marginal and it's better for pros to simplify their strategy so they make less mistakes. If you have a flop donk bet strategy, you also have to develop a corresponding turn and river strategy, which makes it extremely difficult.


When human players donk bet it's almost always a weak player employing an extremely exploitable strategy, whereas pros almost never do it because the metagame has evolved around the presumption that nobody ever donk bets. I'd love to see what the bot's balanced GTO donking strategy looks like.


It's basically been true along every step of the the poker bot evolution (HU limit, HU NL, and 6-max NL) that the bots donk a lot more than the humans. 10 years ago you could find pros arguing that donking in any situation is always wrong. That's been shifting for years, but still not to the level that the bots do it.

My personal belief is that the "no-donk" strategy is an adaptation by fallible human minds to reduce the branching on the decision tree to something tractable.


Your personal belief is likely correct. Balancing a donking range is incredibly difficult for humans and doing so perfectly likely yields only a very small EV bonus over just always checking. For humans it makes a whole lot of sense to reduce the branching in a case like that whereas for computers it doesn't really matter.

Another good example is varying continuation betting sizes. A true GTO strategy would mix in a number of different sizings (and I'm sure the bots adapted to do this), but you only sacrifice a very tiny amount of EV by basically betting the same size every time. Doing the latter limits humans risk for making errors which is far more valuable than squeezing out .05bb/100 more by varying the sizes.


If true in cash games, it is funny since it is a not uncommon strategy in high-level tournament play to control pot size.


Donk bets exist in the meta, ie when the turn is extremely good for your range but is horrible for your opponent. ( if you have a fd on the flop and it hits on the turn you can overbet the pot on the turn with your bluffs and foushes then just go all in on the river) if they have top pair its pretty hard to play against that


Oh, sure. I more meant flop donk bets; I guess it doesn't specify which street the donking was happening.


The same logic can apply to flop donk bets. Some flops favor the donking player's range more than their opponent.


Yea I'm not saying it's impossible to devise an unexploitable flop donking strategy. I think the reason thinking players generally don't is because of the complexity of adding significantly more branches early in the game tree - basically going from 3 (check-{fold,call,raise}) to 6 (those 3 plus donk-{fold,call,raise}).


The other issue is that increasing the number of branches also decreases the number of hands that go into each bucket, to the point where it might not be effective any more without being able to randomize the branch choice for specific threshold hands. Most pros I know just have a hard cutoff for each branch and don't worry too much if they're slightly out of balance, but smaller bucket sizes could magnify errors. If you have 31 combos for one action when you're supposed to have 30.5, then whatever, but if you have 6 when it should be 5.5, that could become a problem faster.


Neal - super interesting stuff. Couple of questions:

1) What were the reasons for choosing 6-handed play (assuming logistical and costs)? It would be interesting to see how the bot’s strategy would differ in a full ring game. 2) Are there any plans to commercialize the bot as a tool for training human players?


1) The goal was to show convincingly that we could handle multi-player poker. The exact number of players was kind of arbitrary. We chose six-player because that's the most common/popular format. Considering training the 6-player bot would cost less than $150 on a cloud computing service, I think it's safe to say these techniques would all work fine in other formats.

2) I'm quite happy working on fundamental AI research and plan to continue in that direction.


6-handed is a very common format online.


Are any papers available yet?

Is the bot going for game-theory-optimal play, or trying to exploit weaknesses in other players?


The paper is here: https://science.sciencemag.org/content/early/2019/07/10/scie...

It's going for game-theory-optimal play. It doesn't adapt to its opponents' observed weaknesses. But I think it's cool to show that you don't need to adapt to opponent weaknesses to win at poker at the highest levels. You just need to not have any weaknesses yourself.


I thought myself the same. However if players do expose each others weaknesses fast enough it could lead to a chip gain which might be hard to overcome right? Just in theory ofc. :)


This is a great question. I wonder how this bot would do in a game with a couple of pros and a couple of reasonably skilled amateurs?

Still well, I suspect, since straightforward theoretically-correct poker will take money off the amateurs efficiently. But it seems possible that playing to wipe out weaker or less consistent players could provide enough margin to bully the more stable AI player.


This is true in tournament play. In a cash game it doesn't matter since there is no elimination, you can always rebuy.

And it's not like in the movies where if you don't have the money to call a bet, you lose. You simply are considered all in for the main pot and then sidepots that you aren't eligible to win will be created for any bets you can't cover.


Yeah that is a really interesting insight. I presume that also makes optimization much simpler. The rules are fixed. Opponents are not.


> you don't need to adapt to opponent weaknesses to win at poker at the highest levels

that may be true for limit poker, but in a no-limit tournament the best this bot could do is not lose. as the pressure increases with the blinds and the players are forced to bluff and call bluffs how does this bot avoid folding itself to death from a run of bad cards?

I could see this bot doing well at cashing but I don't see how it could consistently place 1st the way the top human players do.


Optimal play includes bluffing. It's "optimal" according to game theory.

For example, game theory may tell you that in a particular situation, you can't be exploited if you bluff 10% of the time. If the opponent bluffs less than that, you can come out ahead by more often folding when he bets. If the opponent bluffs more than 10%, you can call or reraise when he bets. But if he bluffs the optimal amount, it doesn't matter either way, you can't take advantage of him.

So this bot would bluff at 10% to avoid getting exploited, but wouldn't try to detect whether the opponent is exploitable. (The latter is risky since a crafty opponent can switch up strategies, manipulating you into playing an exploitable strategy.)


To add onto this, some players that truly abide by the GTO strategy will use a prop, for example a watch, to determine what play to make.

If you want your perceived range to be balanced and make x play 50% of the time and y play the other 50%, you look at the watch and if the second hand is in the first 30 seconds, you make x play, 30-60 seconds, y play.

That's just an example but your point is 100% accurate.


I think this comes down to ambiguity over what "optimal play" means.

There's a poker strategy we might call 'deterministically' optimal play, which consists of precisely assessing each hand's expected value with little to no bluffing. This is already common in online cash games with both bots and players running multiple games at once. And you're right - it's excellent at running net-positive and not losing, but unlikely to win significant tournaments.

Pluribus, though, is playing something close to game-theoretically optimal poker. In playing against itself, it's attempting to develop a takes-all-comers strategy with no exploitable weaknesses. That includes bluffing and calling bluffs - the goal is simply to find a mixed-strategy equilibrium where those moves are made some percentage of the time, in proportion to their expected payoffs. This can involve doing all of the same basic operations as pro players, like valuing button raises differently than donks or attempting to bluff based on how many players remain in the hand. The distinctive limitation is simply that Pluribus plays 'locally' optimal poker with no conception of opponent's identities or behavior in prior hands.


that's a helpful explanation thank you! I was misunderstanding the statement about Pluribus not modeling its opponents between hands as between rounds - it's definitely modeling its opponents and detecting bluffs by understanding when a bluff is likely strategically based on each opponents actions so far in the hand, it's just not taking anything it learned into the next hand.

I could see this being an effective strategy in a WSOP, that ability to perfectly forget the previous hand is probably more valuable than anything the way WSOP champions play. I could see it coming down to whether or not the ability to exploit a reliable tell during a pivotal hand matters more than 10% of the time.


I couldn't find it confirmed in the primary or secondary article, but I would bet the bot is just playing cash at a fixed stack depth rather than a tournament; just like in the wild, bots are much more of a problem in online cash than online tournaments. Dynamically adjusting strategies by stack depth, number of players, and pay jumps, would probably be several orders of magnitude more complex.


Smaller stack sizes reduce possibilities and thus reduce complexity. Pay jumps result in chips having different utility to each player which forces some situational playstyles to be more optimal. I would guess that this also reduces the complexity of the game.

Since tournaments don't often spend much time with stacks much deeper than 100bb, I would guess that tournaments would be more easily solved. Though tournaments are much more frequently run with 9-10 players rather than 6 at a table.

https://www.cardplayer.com/poker-news/18226-explain-poker-li...


You're right that a single short stack hand in a vacuum has fewer game tree branches, and that factoring in chip utility is also fairly straightforward. But I strongly disagree that it reduces the overall complexity of the game. The model in the article played every single hand with 100bb; to be an effective tournament player it would have to be able to fluidly adjust strategies between big, medium and short stack play, as well as reasoning about the stack sizes of other players at the table. It's basically 4 different games at >100bb, 50-100bb, 25-50bb, and <25bb, so it would have to develop optimal strategies for each. And even if the shallower stacked games are generally simpler in isolation, there's a meta strategy of knowing which one to apply in a given hand with heterogenous stack sizes. To paraphrase Doug Polk "If cash game play is a science, tournaments are more of an art."


The bot could likely just be trained on the 4 or so different games. You’re likely increasing the complexity by a constant factor, nothing exponential here.


> There were two formats for the experiment: five humans playing with one AI at the table, and one human playing with five copies of the AI at the table. In each case, there were six players at the table with 10,000 chips at the start of each hand. The small blind was 50 chips, and the big blind was 100 chips.

In the fb article linked above.


Ah thanks. As I suspected, cash game with fixed 100bb stacks.


Isn’t this survivorship bias, or do you know which player repeatedly will place 1st beforehand? Granted that poker is pretty popular, there must be quite a few people who always become first place.

Or to turn this around: given enough bots, some bots will place 1st a lot more than others. It’s just unclear which one.


The game actually becomes simpler when you have less blinds to the point where if you have 15 blinds or fewer you actually just follow a chart and go all in or fold preflop


The blinds don’t increase it’s a cash game not tournament


Why would you choose Chris Ferguson to participate? Don't you know his terrible history?


Congrats! As soon as I saw the title I thought “I wonder if this is the project Noam works on...”


Thanks!


Congratulations on the win! Can you recommend any papers, blog(post)s, or books for the interested layman? (I am currently scanning though the facebook post, which is great, but personally I am looking for something more technical).


Do you want to do a Hearthstone / CCG bot? I have an engine and testers for you.


Very interesting results. From the paper it sounds like the algorithms you used are very similar to Libratus (pre-solved blueprint + subgame solving). What change made it so that the computation requirement is much lower now?


There were several improvements but the most important was the depth-limited search. Libratus would always search to the end of the game. But that's not necessarily feasible in a game as complex as six-player poker. With these new algorithms, we don't need to go to the end of the game. Instead, we can stop at some arbitrary depth limit (as is done in chess and Go AI's). That drastically reduces the amount of compute needed.


Can you share more details about the abstraction? The paper is kind of vague on it. How does it decide if it should use 1 or 14 bet values? Is it a perfect recall abstraction? How many information sets are there?


We give more details on this in the supplementary material.


When do you solve bridge? :)


It is in a way disappointing that this question gets so little attention, and yet, it might be the most significant. If a bot can false-card - if it can discern the strategy that the opponents have in mind, and deliberately mislead them to its own advantage - we have a real world AI. However, skills of computer bridge programs remain at club level standards.


Interesting that the conventional wisdom of never open limping emerged as confirmed through self-play. What other general poker “best practices” were either confirmed or upended through this research?


For someone not in the AI field, can you explain why AI is needed and an elaborate code with conditional blocks is not enough? Where does AI fit in with a poker game.


Conditional blocks would work, but it would be an impossibly detailed and granular tree to setup. The AI component simply helps you arrive at the decisions to create the complex tree.


This is super interesting! What steps would you recommend a professional poker player take in order to use AI to improve his/her personal poker skills?


Does it beat poker by reaching Nash eq (where you cant make profit and no one can profit from you) or exploits opponents weakness to seek profit ?


It doesn't exploit its opponents' weaknesses. Its focus was on not having any weaknesses that its opponents could exploit. However, the algorithms are not guaranteed to converge to a Nash equilibrium in this setting because it's not a two-player zero-sum game (and in either case, it's not clear that playing a Nash equilibrium would provide much benefit in this setting).


What sort of defense applications could this sort of technology be used for? The last line of the Facebook blog post sparked curiosity.


Do you expect the human players to play at the best of their ability when they're not playing for actual real money?


There was real money at stake in this experiment. The pros were guaranteed $0.40 per hand just for participating, but that could increase to $1.60 per hand depending on how well they did.

To answer your question, no, I don't think human players would play at their best when not playing for actual money.


Sorry, I meant for way less than they typically play.


Any chance you could put Libratus / Pluribus online for people like me to try to beat it?


Unfortunately we don't have any plans to do that currently.


Are all the hands posted online somewhere for analysis. I would be very interested!


How many games did the bot beat the same 5 players? And how many games were played?


We played 10,000 hands of poker in the 5 humans + 1 AI experiment. The number of hands won isn't a useful metric in poker. If you win only 10% of your hands and make $1,000 on those hands, while losing only $1 on the other 90% of hands, then you're a winning player. The bot won at a rate of 4.8 bb/100 ($4.8 per hand if the blinds are $50/$100). This is considered a large win rate by professionals.


> This is considered a large win rate by professionals.

It depends on context. 4.8bb/100 is quite good for high-level online play, but wouldn't be enough to make a living at live poker. The biggest game that runs on a regular basis in most areas is $5/10. At ~33 hands per hour, that's 1.6bb or $16 an hour.

And I'd assume there was no rake in your game? That would take a big chunk out of the rate.


For one player...

$16/hr X (VM|microservice thread) could become astronomical profit.


That's why it's a good rate online where you can play multiple tables. Unlimited VMs don't help you in a live casino.


Any chance you’ll consider releasing the hand history of the session?


they're in the extra data section of the science mag article. formatting is terrible for importing into hand history viewers, so i'm trying to get a friend to re-format


What was the most challenging part about implementing this?


Honestly, probably debugging. Training this thing is very cheap, but the variance in poker is huge (even with the best variance-reduction techniques) so it takes a very long time to tell whether one version is better than another version (or better than a human).


When will you test it with 10 total players in a game?


The number of players is kind of arbitrary given the techniques we're using. We chose 6 because that's the most popular/common format for poker. I don't think there's any scientific value in also doing 10.


I am obviously a human, not a bot, but in my experience playing poker, it seems much more likely for me to be successful, personally in a 6 player game, whereas a 10 player game, I never seem to do well.


Any plans to make money using this in online games?


No, I don't have any plans to do that. This is really about advancing fundamental AI research.


What are the names of the poker pros the AI beat?


Are all the hands available to the public?


The hand logs from the 5 humans + 1 AI experiment are included in the supplementary material of the Science paper.


They are missing the stack sizes of the players. Would love to have logs that include that info!


will you release the source code?


Our goal is to make the research as accessible as possible to the AI community, so we include descriptions of the algorithms and pseudocode in the supplementary material. However, in part due to the potential negative impact this code could have on online poker, we're not releasing the code itself.


While you are not releasing the code to the general public, some people who worked on it obviously have access to it and someone will likely use it in the wild. The potential profits are astronomical - Rob Reitzen solved limit hold 'em and made what is rumored to be over $100 million hiring women to play online poker using his system from his house in Beverly Hills [1].

Did you guys set any rules as to whether or not members of the team that worked on this are allowed to use it?

[1] https://www.cigaraficionado.com/index.php/article/robotic-po...


Is this publicly available? How can I use it?


What's the name of the bot? Please say its Poker McPokerface


This is literally in the second sentence of the article

>A superhuman poker-playing bot called Pluribus has beaten top human professionals at six-player no-limit Texas hold’em poker...


It was a joke.


This is fascinating stuff. So do I understand this right, Liberatus worked using computing the Nash equilibrium, while the new multiplayer version works using self-play like AlphaGo Zero? Did you run the multiplayer version against the two-player version? If yes, how did it go? Could you recommend a series of books / papers that can take me from zero to being able to reprogram this (I know programming and mathematics, but not much statistics)? And how much computing resources / time did it take to train your bot?


Training was super cheap. It would cost under $150 on cloud computing services.

The training aspect has some improvements but is at its core similar to Libratus. The search algorithm is the biggest difference.

There aren't that many great resources out there for helping new people get caught up to speed on this area. That's something we hope to fix in the future. Maybe this would be a good place to start? http://modelai.gettysburg.edu/2013/cfr/cfr.pdf


Is Oskari Tammelin still working on this stuff? I remember he wrote some very fast CFR optimisations a few years ago


So let me see if I understand this. I don't believe it's hard to write a probabilistic program to play poker. That's enough to win against humans in 2-player.

With one AI and multiple professional human players sitting at a physical table, the humans outperform the probabilistic model because they take advantage of each other's mistakes/styles. Some players crash out faster but the winner gets ahead of the safe probabilistic style of play.

So this bot is better at the current professional player meta than the current players. In a 1v1 against a probabilistic model, it would probably also lose?

Am I understanding this properly? Or is playing the probabilistic model directly enough of a tell that it's also losing strategy? Meaning you need some variation of strategies, strategy detection, or knowledge of the meta to win?


Interesting article. Too bad a don't have a subscription to read the paper.

The bot played like 10 000 hands. There is no way that is enough to prove it's better or worse than the opponents.

More so in no-limit where some key all-ins can turn the game up side down. The variance is higher than limit or fixed, right?

I did a heads up Texas holdem fixed bot with "counter factual regret minimization" like 8 years ago from a paper I read. It had to play like 100 000 hands vs a crappy reference bot to prove it was better.

Strategy detection in so short games is probably worthless.

The edge is probably in seeing who are tired or drunk in paper poker.


They mention that they use AIVAT to reduce variance.

> Although poker is a game of skill, there is an extremely large luck component as well. It is common for top professionals to lose money even over the course of 10,000 hands of poker simply because of bad luck. To reduce the role of luck, we used a version of the AIVAT[1] variance reduction algorithm, which applies a baseline estimate of the value of each situation to reduce variance while still keeping the samples unbiased. For example, if the bot is dealt a really strong hand, AIVAT will subtract a baseline value from its winnings to counter the good luck. This adjustment allowed us to achieve statistically significant results with roughly 10x fewer hands than would normally be needed.

[1] https://arxiv.org/abs/1612.06915


Hi Noam: I'm intrigued that you trained/tested the bot against strategies that were skewed to raise a lot, fold a lot and check a lot, as well as something resembling GTO. Were there any kinds of table situations where the bot had a harder time making money? Or where the AI crushed it?

I'm thinking in particular of unbalanced tables with an ever-changing mixture of TAG and LAG play. I've changed my mind three times about whether that's humans' best refuge -- or a situation that's a bot's dream.

You've done the work. Insights welcome.


With the advent of AI bots in Poker, Chess etc., what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?

I see on chess channels that grand masters have to rethink their whole game preparation methodology to cope with the "Alpha Zero" oddities that have now been introduced into this ancient game. They literally have to "throw out the book" of standard openings and middle games and start afresh.


The chess channels you're visiting are grossly overstating Alpha Zero's impact. AFAICT, it hasn't made any impact on opening theory at all. AZ's strength is in the middlegame, where it appears to be slightly better than traditional engines (like Stockfish) at finding material sacrifices for long term piece activity and/or mating attacks.


> what happens to the old adage of "Play the player, not the game". How do modern human players manage when you don't have the psychological aspects of the game to work with?

I would say that it's thoroughly rebounded to play the game not the player in poker and this isn't because of super bots like the one used in this paper.

Ever since game theory invaded poker players that play in highly visible events such as tv tournaments try as hard as possible to make their game unexploitable.


Like already stated, saying that Alpha Zero has forced the chess world to seriously reconsider the basic principles of chess openings etc. is a bit of a stretch. But interestingly enough, the current world champion (Magnus Carlsen) is having the chess streak of his life as we speak. On the side, he's been openly joking about Alpha Zero being one of his biggest chess idols. It's safe to say the streak is probably mostly related to his preparation from the last world championship match half a year ago carrying over to all the tournaments after.

However, even according to the former world champion (Viswanathan Anand) the run he's been on is something quite shocking: “His results this year is simply [great].... difficult to find words. [It’s been] completely off the charts. I think the chess world is still in a bit of a shock. The rest of the players are struggling to deal with a phenomenon [like him]. Even in 2012-13, his domination was less than it is this year. Everyone is still processing this information.” [1]

Carlsen is basically on route to breaking 2900 Elo - at 2882 Elo with a clear upwards trend - while there's only two other active players even above 2800 Elo and struggling to keep it above that treshold. (Elo is the rating system used in chess. Above 1500 Elo is an average player, 2000 Elo is a good player, 2500 Elo is a grandmaster. Anything above 2700 Elo is basically godlike.)

Oddly enough, instead of playing more like a machine, it seems like Carlsen has been playing chess that is much more about the human aspect of the game rather than trying to find the top ranked engine move on every turn. (The current traditional top engine - Stockfish - makes an assumption of each move's validity using a point system, which the chess world has been more or less obsessing over for the past decade. Alpha Zero doesn't have such a point system whatsoever.) He's been playing a drastically more aggressive and dynamic variety of chess compared to what has been seen in a long time at the top tournaments.

He's been playing to create dizzying positions on the board, making a few moves that aren't necessarily liked by the traditional top engines, but still finding himself in a winning position several moves after. It definitely looks like some sort of black magic, but it seems like the big thing Alpha Zero has brought to the general philosophy on how to approach chess at the top level is that it's possible to play aggressive chess, take risks and win in 2019. Magnus Carlsen is the first player to successfully reinvent that style of play, more than likely partly inspired by Alpha Zero. So, I'd say the big thing about Alpha Zero isn't necessarily that it could beat the other top engines, but more importantly that the 'artistic' aspect of its play is something that has never been seen from another chess engine. The fact that it proved that sort of style superior to the play ever before played by another chess engine is just the icing on the cake.

Garry Kasparov on Alpha Zero's chess persona: "I admit that I was pleased to see that AlphaZero had a dynamic, open style like my own. The conventional wisdom was that machines would approach perfection with endless dry maneuvering, usually leading to drawn games. But in my observation, AlphaZero prioritizes piece activity over material, preferring positions that to my eye looked risky and aggressive." [2]

[1] https://sportstar.thehindu.com/chess/viswanathan-anand-on-ma... [2] https://science.sciencemag.org/content/362/6419/1087


How long until a slightly worse version of this model is reverse engineered and appears at every table in online poker?


Slightly worse versions are already out in the wild. Bot using the published technique will be live in a couple of months tops.


Colluding bots are the main worry if you play online though.


Plenty of systems already exist that can win against weaker players and/or at limit (especially limit heads-up).


I'm wondering how long until poker games will require a captcha on every round


Captchas are easier than the game


Will not help against human/AI hybrids. Use machine vision to decipher state of the game, and covertly suggest moves via audio or perhaps vibrations.


Pokerstars who suspect accounts are bots actually require the player to film 360 degrees around themselves, then play for an hour with a camera focusing on input devices and the screen, they check for differences between current behavior and historic behavior


This is something I've always wondered, how come bots haven't taken over online poker considering how much money there is to be made, and all you need is to be slightly better than average right? Is high level poker really that hard to achieve?


Post blizerian's time in poker, the human players in online games use statistics to make insane bets on odds. They play many many games at once and just look for the opening and make the insane bets when the openings come up. They've done the math that it's worth it to do those kinds of bets.


I did this with great success.

For reference, I once folded top boat to quads (he showed) to a river all in raise in PLO to a dude who had a 100% win showdown when raise river stat over several thousand hands. Other stats confirmed he was a nit, so it was an easy fold. Iirc, this was PLO 200 or PLO 400 — I never saw anyone that nitty at the PLO 1000 or PLO 2000 tables.

FWIW, I did a lot more than “look for an opening”, although I did a lot of that. I tried to play GTO as much as possible, but I would adjust to people who were exploitable when they called too much, folded too much, or were too aggressive into weakness.

I spent a lot of time away from the table analyzing stats of the regulars to find leaks to exploit. It was worth the time, and it made it much easier to play 8-12 tables of PLO.


do you still play online? I used to railbird FT back then, watching patrick antonius/sahamies/durr/jungleman etc. exciting times. How many hours did you put in for you to be good?


> do you still play online?

No longer online. I quit with UIGEA. I hate Bill Frist.

> I used to railbird FT back then, watching patrick antonius/sahamies/durr/jungleman etc. exciting times.

I never played with those folks. For reference (and maybe I was vague, and maybe I show my age), plo 200 is 1/2 blinds plo, and 2000 plo is 10/20 blinds plo. My heyday was when 10/20 blinds were the max. When they created the 50/100 and higher limits, they killed the 10/20 blind (former max) games. I never played over 25/50, because I didn’t consider myself properly rolled for it. That said, in retrospect, I should have gone modified Kelly criterion and take shots at the higher stakes — some of those dudes were total donks.

I did play against huck seed on FT (nit), and I played against Doyle and Todd on Players Only (I think their room was a skin). They also played tight, but they may have been doing required hours. People donated to them religiously with light calls.

> How many hours did you put in for you to be good?

I would argue that I am still not “good”. There’s a hierarchy in the poker world, and you don’t feel comfortable until you’re at the top — and even that is fleeting.

To answer your question, though, I was profitable at 5/10 and 10/20 after maybe 1000 table hours (usually 4-8 tables) and 500-700 study hours. Note that this was when poker was super soft, and note that I am a specialist in learning (degrees, experience, and whatnot), so I learn things like new games more quickly than most people. My job lent itself to a lot of study away from the table, so I availed myself of that time.

I remember several breakthroughs for my game:

1. I had a dream one night in which I finally understood the bidirectionality of plo8. This was the game that I built my bankroll on (after cashing in a few free rolls). That took me to plo8 100 ($0.50/$1 blinds) in short order. After that, I just grinded to 200 and 400 plo and plo8.

2. I remember getting crushed by a LAG player in plo 400 one night. I went to four plo 50 tables and played 12 hours straight playing with a 55% or so VPIP. I broke even in that session, but it helped me understand LAG players a lot better. In retrospect, that session helped me understand how to exploit LAGs really well, and that paid off a lot at higher stakes. It also helped my SLAG game a lot.

3. The next big leap was realizing that there were three lines to exploit in poker — players who are too weak (fold too much), too passive (call to much), and too aggressive (bet/raise too much). Being able to exploit these tendencies is optimal. Being able to induce these tendencies is insanely profitable. The above is easy to say, but not always so easy to do.

4. The last phase of my development was understanding “gears”. Changing gears is the ability to switch between being passive/aggressive and tight/loose depending on the context. Most people change gears predictably — for example, if they lose a big hand, they tighten up (or some players loosen up). I played my best when I was able to adjust to the texture of the game and play the way that my opponents least expected me to play and/or wanted me to play. It’s a lot of psychology, but when I mastered this, I felt like I owned the table. No one could read me, and I read them like an open book. This is the high that skilled poker players live for, imho.

To close, I twice considered becoming a pro poker player. Once before UIGEA, and once after.

Before UIGEA, I didn’t because I realized that I was only good for about 20 top notch hours per week, and I could play those hours after work. Furthermore, the tables were only juicy for maybe 30 non-consecutive hours, so I didn’t feel like i was missing much. I was also worried about the non-legalization of poker in the US, so I wanted to keep my day job.

After UIGEA, I thought about moving to Thailand or Canada, but I (rightly) thought that games would get much worse without the US market. My earn would have been a solid $100-200k based on some of my former peers, but that’s not terribly exciting money for me. Anyone who can make $100k or more in online poker can make way more than that by being a programmer or by doing some sort of tech business (SaaS, e-commerce, consulting, etc.) or financier.

Ok, that’s a wall of text. Feel free to ask follow ups.


Aw fuck, how could I forget...

I also played against Mike the Mouth (either party or FT). I think that this was when they limited the 5/10 and 10/20 games to two tables each.

I played Mike in both plo8 and plo, and he was supposed to be a specialist. He was an absolute donator in the games I played in. He took really bad lines, and he was a net loser over a statistically insignificant number of hands. That said, if he was at the table, I wanted to play, and I wasn’t leaving until he got up. He was very exploitable.

To be fair, I don’t know what his life situation was like at that time (it was up and down from what I heard). That said, I wanted him at my table 100% of the time.


what is post bilzerian time mean? I heard bilzerian's game was not gooda and rumor has it his using poker as a front of how he earned money??


Anywhere from 3 years ago to 5 years from now.


Pretty incredible that this has scaled down from 100 CPUs (and a couple terabytes of RAM) for their two player limit hold'em bot to just two CPUs for the no limit bot.


Congrats Noam for the great breakthrough work!

I have a question about the conspiracy. For the 5 Human + 1 AI setting, since the human pros know which player is AI (read from your previous response), is it possible for human players to conspire to beat the AI? And in theory, for multi-player game, even the AI plays at the best strategy, is it still possible to be beat by conspiracy of other players?

Thanks.


So, is this the end of online poker?

Will it just become increasingly sophisticated bots playing each other online?


I'm really confused about why stock for the company that makes PokerStars hasn't moved at all today: https://www.google.com/search?tbm=fin&q=TSE:+TSGI#scso=_wqsn...

The fact that there's a published recipe for a superhuman bot that can be trained for $150 and run on any desktop computer sounds like an existential threat to their business.

The main mitigating factor I can think of is that you'd need to also adversarially train it so it isn't distinguishable from a skilled human. But that doesn't seem like it would be too difficult.


Most people on sites like PokerStars are dumb money who just like to gamble. You are not going to win much money against the Pros anyway. If you create a bot like that yourself, then you are just one more of these Pros, basically. If you don't, then you use some published / commercial bot, and PokerStars will be able to detect it.


You know, now that we're talking about it I'm wondering if someone hasn't already come up with a better bot and has just been silently using it to win money online.

I'm sure the sites have been crawling with bots as long as they've been around, some better than others. As long as it doesn't drive away too many customers I doubt the sites care. They still take a rake on bot games. However better AI could change that as the "dumb money" slowly dries up.


Dumb money has been drying up for years. There have been bots taking millions of dollars out of games for more than a decade. Even bots from 10 years ago were sophisticated enough to win money at mid-stakes poker (up to $2000 buy in 6max no limit games)


proof? I dont' believe this. 2K buy-in has a lot of regs that are pretty good overall in cash games. Plus Pokerstars/FT has a pretty good anti-bot policy. if you get caught bye bye to the $.


https://forumserver.twoplustwo.com/153/high-stakes-pl-omaha/...

There are a bunch of such threads over the years where through statistical analysis, users have identified groups of dozens of bots.

While years ago many of the pros could theoretically beat these bots, it may not have been by enough of a factor to overcome the rake. Of course if the bots are practicing any game selection they can take money out of the economy even if they can't beat pros.

Anti-bot measures is an arms race and the sites aren't always ahead of the game.


What would stop the sites themselves from operating those bots? They wouldn't even need AI, just deal favorable "cards" to their own player.


There's already loads of bots online, this is just another incremental improvement in a very long line. This isn't some unexpected sudden death-knell for online poker.


How do we know that online poker has ever been a fair game? Has anyone ever done a statistical study of verified real players to determine whether their collective historical winnings match what would be expected in a fair game? It seems like it would be much too easy for the operators to skim money in any one of a thousand ways. I've never understood the trust people place in online gambling in general.


What does trust have to do with it, though? When I played online poker for money, I was just content with the fact that I could win slightly more than I lost, and I was mainly only doing it for entertainment anyway. I mean, the whole game revolves around risk management. If you suddenly go against someone much better than you, you can always exit the game.

Thinking about online poker again gives me ideas now that I actually know how to program. I actually thought up and wrote out a good way to subtly steal money from people, but I'm deleting it because I don't want someone else to do it. (And I wouldn't do it myself because I have ethics.)


In theory, you can implement a provably fair poker game on the blockchain.


So Dota 2 doesn't count as a multiplayer game?

OpenAI Five beat the world champions in back-to-back games...


Yes, Dota 2 is not a multiplayer poker game. I agree that the title is ambiguous, but it's not a stretch to imagine that "poker" is implied here.


I don't think it's implied considering the articles compares the poker bot to go and chess bots (which are the non-multiplayer games the title is referring to).


My guess is that by "multiplayer", they meant "free for all", as opposed to "N vs. N". In other words, multiple opposing factions.


From an AI and game theory standpoint, there isn't much difference between two-team zero-sum and two-player zero-sum if the teammates are trained together. That said, the Dota 2 work is extremely impressive for a variety of other reasons.


There is far more ambiguity when you are competing against five mostly-aligned strategies vs a single shared strategy.


Agreed, the "first multiplayer" claim needs some walking back or caveats.

Cool achievement, but hollow marketing doesn't make it better.


I was really hoping the article would go into more detail on how the AI engaged with the human players.

Was it online? the picture on the article seems to imply IRL.

If IRL, what inputs did it have, simply cards shown or could it read tells? Did those players know they were playing an AI?


It was online. The players were playing from home on their own schedules. The bot did not look at any tells (timing tells or otherwise). The players knew they were playing a bot and knew which player the bot was.


Tells aren't really a thing for top level poker players.


Were the games played with real money? Nobody is going to take fake money games seriously.


From the paper:

"$50,000 was divided among the human participants based on their performance to incentivize them to play their best. Each player was guaranteed a minimum of $0.40 per hand for participating, but this could increase to as much as $1.60 per hand based on performance."

So the humans weren't betting their own money, but they still made more money if they won.


This is the most important question.


I'd love to see high-stakes heads-up bot vs Tom Dwan or Negreanu.

Maybe a bot technically qualifies as an opponent in durrr's challenge [0]? :)

How would bluffing influence the outcome? Both these players who are considered very strong, are known to play all kinds of hands.

[0] - https://en.wikipedia.org/wiki/Tom_Dwan#Full_Tilt_Poker_Durrr...


I don't get this... Poker isn't pure mathematical... it has emotions involved (greed, fear, belief, reading others, manipulations (to fool the opponent)... and may be more... and all of these emotions arises differently for different people based on their time, place, their world view, their background and history...)

Are we now saying that a computer can do this all in simulation? if so, it's a great break through in human history.


At the nosebleeds, poker hasn't been around those things in a long time.

Poker is about exploitative play against people who base their play off emotions, and unperfect game theory optimal against players who don't base their play off emotions. The more perfect the GTO play is, the higher the winrate against the latter group, but higher stakes games are built around one or more bad players - pros will literally stop playing as soon as the fish busts.


isnt it just possible that the bot got lucky. It plays good. Maybe really good but does it play as good as a pro??? Would it win 9 wp bracelets. Would It make it to day 3 of the world series of poker.

Chris Moneymaker got some damn good hands. Its part of the game. Its why this feat is unremarkable and why poker is a crap game for AI. The outcomes are very loose, especially when the reason these guys are pros is partially because of their ability to read.

You are taking away a tool that made their proker players great and then expect them to be a metric to test the AI. A better test would be to have pro players play a set of 1, 2, 4, 7 basic rule bots and the AI does the same. Then you compare differences in play. With enough data points you can compare situations that are similar but the AI did better or worse. This is a fair comparison of skill.

Also if there are professional players at a multiplayer game the AI is getting help from other players. Just like Civ V I get help from the AI attacking itself. Im sure this AI got help from the players attacking eachother (especially if they were doing so and making the pot bigger for the AI to grab up, think of a player reraising another player after the bot does a check all in).


Despite the luck/noise in Poker, there are reasonable measures of performance, and while I'm not an expert in this area, the bot seems to be doing very well (see paper for details). Poker is not a "crap game for AI" it's actually quite a good game. It's a very simple example of a game with a lot of randomness (a feature not a bug) and hidden information that still admits a wide variety of skill levels (expert play is much better than intermediate play is much better than novice play). This is a great accomplishment.

More links for reference: https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-... https://science.sciencemag.org/content/early/2019/07/10/scie...


"In a 12-day session with more than 10,000 hands, it beat 15 top human players."

That's not luck. See also: https://news.ycombinator.com/item?id=20416099

Also, Chris Moneymaker is a good poker player. He's no Phil Ivey or Tom Dwan, but he's still very good and has had decent results after his WSOP win.


I would love to get a hands on the source. Hook it up to an API like https://pokr.live and then basically build a computer vision poker bot.

The trick is how to create natural mouse click movements or keyboard inputs. This is the part that I'm most shaky on but the pokr.live API works by sending screenshots which it will translate into player actions at the table

disclaimer: pokr.live API is a WIP


You do that by letting a human play, informed by the bot.


I was thinking a year ago about using Deep Reinforcement Learning in a poker bot what stopped me was the impossible amount of data and computation due to imperfect information nature of poker games. If I'll have the time I'll try to implement thing akin to the search technique described in the paper.

It might pay better than a full time job.


"At each decision point, it compares the state of the game with its blueprint and searches a few moves ahead to see how the action played out. It then decides whether it can improve on it."

That's exactly how the brain operates.


Curious if we'll see human poker pros get much better in the coming years as they incorporate training regimens that involve bots (analogous to chess today vs. 50 years ago). Seems like this will be the trend in almost every game.


As someone who plays both games....I doubt it.

Poker is an incomplete information game with crushingly high variance. The bots strategy is likely not quantifiable.


Can you expand on this? I'm a novice at both games, but the Facebook blog post mentioned that the bot exhibited some unconventional strategies:

> Pluribus disagrees with the folk wisdom that donk betting (starting a round by betting when one ended the previous betting round with a call) is a mistake; Pluribus does this far more often than professional humans do.

Is it overly simplistic to think that humans could improve their game by incorporating some strategies like this more/less often than they were previously?


Bots have already influenced the poker meta. Libratus showed us how it was optimal to sometimes overbet bluff when you have nut blocker(s). I'm sure when these poker pros do hand reviews, they're not looking at how they can exploit Pluribus, but moreso how they can incorporate some of the lines/strategies it used to beat everyone else.


I wonder what would be the impact of using Counterfactual Regret Minimization instead of training a neural network based on hands played by real players?

Whys is using CFR better than training based on real data?


It's not necesserily better but with CFR you can learn beyond what humans have learned, but on the other hand you dont learn their usual mistakes to more easily exploit them. Also in this approach you need CRM since at every point you are checking what would've happened if you picked something else, which is just impossible with a fixed dataset.


Would you say it would be hard to expand this to tables with 9 players?


No, it wouldn't be hard. We chose six players because it's the most common/popular form of poker.

Also, as you add more players it becomes harder and harder to evaluate because the bot's involved in fewer hands, we need to have more pros at the table, and we need to coordinate more schedules. Six was logistically pretty tough already.


The most interesting thing about this to me is the lesson it teaches human players about bluffing.


Was this cash or tourney format? How many blinds deep was the bot and the rest of the players at the start?


From the sample hands, it looks as if it's a cash game with stacks equal to 200BB. Plenty of room to play real poker.


Curious, why was 100BB used for six max? If I recall right, the head ups experiment was 200BB?


We considered both options but decided to go with 100BB because that is the standard in the poker world. It doesn't make a big difference for these techniques though.


Could you try to run a training with ante included in the pot? I wonder if open-limping would be a viable strategy with some hands. No one knows that and it would be really interesting to find out. Ante should be equal to BB, like it was in WSOP Main Event.


Does Pluribus care about opponent's stack depth? From the examples, it appears stacks were reset after each hand.


Guessing it's because it's most similar to a regular 6max game. Also it should limit lower the number of possible ways to play a hand, less chips means the correct choices are easier, so maybe it's computationally easier


Yeah, my question for the author is if stack depth was relevant for this experiment. In headups, they exhausted the entire tree, in six max, they went to a fixed depth.


Is the source code and data available for allowing others to play against this not?


Would love to wire this into some kind of device I could play with at a casino.


I have seen fixed limit AI, and here is now no limit AI. Is there a pot limit AI?


Basically, all the online Poker rooms are now rigged and leading to frauds


Congrats Noam for the great breakthrough work!

I have a question about the conspiracy. For the 5 Human + 1 AI setting, since the human pros know which player is AI (read from your previous response), is it possible for human players to conspire to beat the AI?


Time to cross poker off the list?

[0] https://xkcd.com/1002/


The title is misleading - bots have been beating no limit pros in 1v1 matches for quite some time.

This is for 6-man games. The article mentions 10,000 hands - this is a very small sample size to draw any real conclusions, as anyone who has dabbled in online poker for more than a few thousand dollars can attest to. Regardless - it's trivial to write a bot that'll beat 90% of the players, as site runners can all attest to (bots are a serious problem that is not new). What does it matter that a bot can beat 'the best' or 'professionals'? It's enough that it can do better than the vast majority, outside of dystopian woes about robots taking over or being 'superior' to human beings.

Glossing over all that - I am curious if this can be used for something other than ruining online poker, which has largely already been ruined by allowing multi-tabling professionals with custom software that gathers statistics on players (data mining), existing bots, US government and irresponsible (criminal) site runners (looking at you ultimate bet)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: