No limit: AI poker bot is first to beat professionals at multiplayer game

thomasfl · on July 12, 2019

One of the researchers, Tuomas Sundholm, has a real badass CV. Former pilot in the Finnish airforce. Finnish windsurfer champion. Snowboarder. Professor at Carnegie Mellon. Speaks four european languages, including swedish. And now at the age of 51, he has created the best AI powered poker bot.

https://www.cs.cmu.edu/~sandholm/cv.pdf

jacquesm · on July 12, 2019

Not to belittle the man's other achievements but speaking four languages is pretty normal in Europe, except when you're from the UK.

loup-vaillant · on July 12, 2019

> speaking four languages is pretty normal in Europe

Northern Europe, maybe. French people for instance tend to suck at foreign languages. We rarely go beyond 3 languages (French, English, then German or Spanish. The last two are often forgotten after school.)

I suspect Spain and Italy are similar.

ryanlol · on July 12, 2019

For example Spanish, Catalan, English and French would hardly be an unusual combination.

amval · on July 12, 2019

While Italy, France and Spain are pretty much tied in their English proficiency (Spain might be ahead, but not significantly like Portugal), there are 4 official languages in Spain, and several regions where pretty much everyone is bilingual.

I recall something like a 2.2 average.

bryanrasmussen · on July 12, 2019

Italy 1.8 seems accurate, in my experience most Italians know only Italian, although younger generations are likely to know a bit of English.

The surprising thing for me is Germany having 2. Seems unlikely.

loup-vaillant · on July 12, 2019

> The surprising thing for me is Germany having 2. Seems unlikely.

Germany is big. I've heard that the proficiency in foreign languages tends to decrease as your country gets bigger. Because the bigger the country, the less likely you are to interact with foreign languages. Bigger countries also tend to have foreign works translated (or dubbed) into their own language more often.

So, no, I'm not surprised.

bryanrasmussen · on July 13, 2019

I was surprised it was not less.

y4mi · on July 13, 2019

At least a quarter of my city barely speaks german.

But they need to be able to get citizenship afaik... So basically everyone can speak two languages on paper, though their knowledge of the native one is extremely rudimentary

You're also required to learn 2 foreign languages in school if you want to go to university

jfengel · on July 12, 2019

As an American, I am now going to bang my head into a wall.

xxs · on July 12, 2019

>into a wall.

I thought it was somewhat delayed, not paid, yet.

stronglikedan · on July 12, 2019

Nothing to do with being American, since you're afforded the luxury to learn other languages for free through public schooling. If anything, bang your head because you chose not to.

jfengel · on July 12, 2019

The offer is made, but the reason for doing so isn't made clear. I didn't understand it at the time; I availed myself of it in a minimal way. Most don't do that.

Some of that is the accident of geography: it simply wasn't necessary. Today, we are more connected to our Spanish-speaking neighbors, and the value of learning that language is becoming increasingly obvious. I don't know whether the schools are doing a better job of stressing that than they did when I was in school.

I have indeed chosen to learn other languages, several of them. I wish I'd done it in school, at a time when my brain was more open to it. Unfortunately, that was also a time when I didn't know very much and put my priority on other things that ended up making less of a difference in my life.

Kaiyou · on July 14, 2019

It's a myth that you learn languages easier earlier in life. Mastering a language takes about 10 years, it's just that when you start at age 6, you could be done by age 16.

jchallis · on July 12, 2019

Speaking as an American who speaks a handful of languages, very few Americans achieve any proficiency with foreign languages based on school from public school classes. Indeed I'm willing to take to zero those that don't have an active speaking component (most).

JaimeThompson · on July 12, 2019

The quality of said language is highly variable which also has an impact. It simply isn't a priority to a lot of schools.

objectivetruth · on July 12, 2019

Eh, not for many, many Americans. My school, and most of the schools in my county, offered only Spanish and my understanding is that four years of it still wouldn't qualify a person for AP credit.

It's hard to find more data beyond my anecdata -- an EdWeek article I found reported that less than 50% of schools report world language enrollment data.

Also, the Europeans who learn three or four languages in school also have the luxury to learn those languages for free* through public schooling, so I'm not sure I understand your point.

I am sure that your implication that every American kid can get a quality free foreign language skill in school is false: just like almost every single other educational outcome in the US, it's generally great in the good (wealthy, suburban) schools and terrible in the bad (poor, rural or urban) schools.

Kaiyou · on July 14, 2019

Public schooling is a waste of time and not where people learn foreign languages. I learned my second and third language purely through the Internet. One of them I also had in school, but like I said it was a waste of time. The method is just completely wrong, since in school they do the two things that are the most detrimental to learning a foreign language. Those two things are correcting mistakes (since the emphasis will be on the mistake, which will be remembered) and learning grammar. Grammar is useless overhead when learning. Once you know the language you can bother with grammar, if you care. I never did.

diggan · on July 12, 2019

> speaking four languages is pretty normal in Europe

Clearly we have different experiences (swedish person living in spain currently) but I haven't met that many people who speak four languages and are from a european country (but have yet to been in eastern europe).

That finns speak swedish is a special case though, as AFAIK, they learn swedish in school and being finn-swedish is a thing too.

bjoli · on July 12, 2019

I am a classical musician, and in my profession it is quite common. I speak a lowly 3 languages, but many colleagues speak 4+. It is a very international market, and if you leave your home country to study, it is not uncommon to work in yet another country before returning home.

Our solo flute speaks a whopping 6 languages well, and I suspect our harp player knows even more.

bjoli · on July 13, 2019

I should have mentioned: this is in Sweden.

jvanderbot · on July 12, 2019

Id love to know how you earned those downvotes.

Beltiras · on July 12, 2019

In Iceland it's pretty normal. We know Icelandic (ofc.) and learn English, one Scandinavian language (Danish, Swedish, Norwegian or Finnish), then at 15 one of German, French, Italian or Spanish. We are on an island in the middle of the Atlantic. I'd expect more linguistic pluralities on the mainland.

Geimfari · on July 12, 2019

I feel that's a bit of an overstatement, having studied them a bit is one thing, but most people here cannot comfortably communicate at all in Danish or a 4th language, and cannot read a book in these languages.

nso · on July 12, 2019

Being Swedish I bet you at minimum can understand and communicate proficiently with speakers and writers of Swedish, Norwegian, Danish and English. Probably you learned either Spanish, French, or German in school as well?

Nordic countries are a special case.

Norden er et spesielt tilfelle.

Norden är ett speciellt fall.

Norden er et specielt tilfælde.

duchenne · on July 12, 2019

A quick google search seems to contradict your statement: https://jakubmarian.com/wp-content/uploads/2014/10/number-of...

Average number of languages spoken: France: 1.8 , Germany: 2.0 , Spain: 1.7 , Portugal: 1.6 , Italy: 1.8 , Greece: 1.8 , Poland: 1.8 , Sweden: 2.5 , Finland: 2.6 , UK: 1.6

jacquesm · on July 12, 2019

That's averages. And 'pretty normal' is not a mathematical thingy it just means: that this isn't rare or noteworthy at all.

thomasfl · on July 12, 2019

Most norwegians only speak 2 languages. Swedish and danish is very similar to norwegian, more like dialects, so it doesn't count.

kyleblarson · on July 12, 2019

I am on holiday in Norway right now and have been super impressed by the english fluency of most people I have spoken with. It goes far beyond basic conversational fluency.

fasicle · on July 12, 2019

I've met a lot of people while working and travelling around Europe the past couple years, I would say 2-3 is more common.

I rarely met someone who could speak four languages fluently.

jstummbillig · on July 12, 2019

Getting in touch with two foreign languages in school is not uncommon, but speaking up to four (including your mother tongue) with any sort of sophistication definitely is not normal, at least in western Europe.

Madzen1 · on July 12, 2019

Not uncommon in Scandinavia, if you know one of the languages you can learn the other easily. Some people from Finland have swedish and finnish as their mother tounge, the german most likely came from upper secondary school, together with english.

Shaanie · on July 12, 2019

As a Swede I have little issue understanding Norwegian, but I would absolutely not claim I speak it. Yes, the languages are similar enough that we can understand each other, but no Scandinavian will be able to speak another Scandinavian language without practice as there are many differences.

kami8845 · on July 12, 2019

No it's not, what are you talking about? I've met thousands of young Europeans and ones that speak 4 languages are extremely rare. Unless they're from countries where they get 2 languages "for free" like Holland/Belgium/Switzerland. Definitely not "pretty normal".

thecatspaw · on July 12, 2019

I am swiss and the only Languages I speak are German and english. I should have learned french as well (and had it for a few years in school), but things tend to not stick if you're beeing forced to learn it against your will.

GuB-42 · on July 13, 2019

French guy here: no.

French people can usually speak basic English, and a third language is common if that person has ties with another country but that's it. At school, we are normally taught two foreign languages. The first one is usually English, few people actually practice their second one.

The situation is completely different in Scandinavian countries. And it is indeed quite normal to speak 4 languages in Finland (usually Finnish, Swedish, English and a 4th one, often German). Because their native language is only spoken by a few, foreign languages are a necessity for international relationships. And as a Finnish friend told me, learning new languages is a popular way to pass time during long winter nights.

dorgo · on July 12, 2019

If you want to keep your conversation private it is not enough to choose a rare language in Berlin. There is always somebody who understands what you are saying.

world32 · on July 12, 2019

No its not normal to speak four languages in Europe.

reitoei · on July 12, 2019

Irish person here.

It's not normal.

blancheneige · on July 12, 2019

it's pretty normal in backwater countries that can't thrive on their own. otherwise not so much.

sakarisson · on July 12, 2019

> Speaks four european languages, including swedish.

Judging by his name, I'd assume Swedish is his first language, so that particular aspect isn't that surprising to me

krageon · on July 12, 2019

Just from the fact that he exists in Finland and is older than 20 it's basically a given that he speaks Swedish because they learn it in school.

Dragory · on July 12, 2019

Learning it in school and actually speaking it are very different things though. Source: learned Swedish in school in Finland, rarely used it since, have practically forgotten all about it now.

estomagordo · on July 12, 2019

That's stretching it a bit. Many Fins only speak very basic Swedish.

xxs · on July 12, 2019

Finland has two official languages - Swedish and Finnish, his name suggests he is Swedish to begin with.

It's not uncommon to speak four languages (often C2 in couple of them) in the North Europe, esp. the Baltic region.

Like mentioned by sibling (sakarisson), that particular part is not impressive, the rest - sure

sails · on July 12, 2019

CV is also 100+ pages, not bad!

sytelus · on July 16, 2019

You gotta love resumes that says “founding of companies listed later” and have a dedicated chapter on “EVIDENCE OF EXTERNAL REPUTATION”.

stevespang · on July 12, 2019

392 published papers

pesenti · on July 11, 2019

Blog post: https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-...

Science article: https://science.sciencemag.org/content/early/2019/07/10/scie...

YeGoblynQueenne · on July 12, 2019

>> Pluribus is also unusual because it costs far less to train and run than other recent AI systems for benchmark games. Some experts in the field have worried that future AI research will be dominated by large teams with access to millions of dollars in computing resources. We believe Pluribus is powerful evidence that novel approaches that require only modest resources can drive cutting-edge AI research.

That's the best part in all of this. I'm not convinced by the claim the authors repeatedly make, that this technique will translate well to real-world problems. But I'm hoping that there is going to be more of this kind of result, singalling a shift away from Big Data and huge compute and towards well-designed and efficient algorithms.

In fact, I kind of expect it. The harder it gets to do the kind of machine learning that only large groups like DeepMind and OpenAI can do, the more smaller teams will push the other way and find ways to keep making progress cheaply and efficiently.

kqr · on July 12, 2019

Yes! I work for a company that does just this: pull big gears on limited data and try to generalise across groups of things to get intelligent results even on small data. In many ways, it absolutely feels like the future.

mooneater · on July 12, 2019

Interesting, are you using bayesian methods?

kqr · on July 16, 2019

Does "Bayesian methods" mean anything specific? Parts of the core algorithms were written before I joined, and they are very improvised in the dog-in-a-lab-coat way. I haven't analysed them to see how closely they follow Bayes theorem and how strictly they define conjugate probabilities etc. (we are also heavily using simple empirical distributions), but the general idea of updating priors with new evidence is what it builds on, yes. I have a hard time imagining doing things any other way and still getting quality results, but that is probably a reflection on my shortcomings rather than a technical fact.

samfriedman · on July 11, 2019

The FB post is much more detailed and I think the link on this post should be updated to point there.

gexla · on July 12, 2019

It's easy to "take away" too much information from this. The focus is that an AI poker bot "did this" and not get too much into other adjacent subjects.

But what's the fun in that?

10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.

In 2019, it's impractical to adapt as a competitive player in live poker. A grinder can see 10,000 hands within a day. The live poker room took 12 days. Another characteristic of online poker is that players can also use data to their advantage.

So, I wouldn't consider 10K hands as long term, even if this was a period of 12 days. Once players get a chance to adapt, then they'll increase their rate of wins against a bot. Once you have a history of hand histories being shared, then it's all over. And again, give these players their own software tools.

Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.

This doesn't take away from the development of the bot. If we learn something from it, then all good.

Tenoke · on July 12, 2019

>10,000 hands in an interesting number. If you search the poker forums, you'll see this is the number you'll see people throw out there for how many hands you need to see before you can analyze your play. You then make adjustments and see another 10,000 hands before you can assess those changes.

If you read the paper/facebook post[0] (no idea why this worse article is the link here) - you'll see they address this.

>Although poker is a game of skill, there is an extremely large luck component as well. It is common for top professionals to lose money even over the course of 10,000 hands of poker simply because of bad luck. To reduce the role of luck, we used a version of the AIVAT variance reduction algorithm, which applies a baseline estimate of the value of each situation to reduce variance while still keeping the samples unbiased. For example, if the bot is dealt a really strong hand, AIVAT will subtract a baseline value from its winnings to counter the good luck. This adjustment allowed us to achieve statistically significant results with roughly 10x fewer hands than would normally be needed.

0. https://ai.facebook.com/blog/pluribus-first-ai-to-beat-pros-...

crazypyro · on July 12, 2019

>Remember that one of the most exciting events in online poker was the run of isildur1. That run was put to rest when he went bust against players who had studied thousands of his hand histories.

Perhaps more famously, Jungleman compiled hand histories from many different people while he was playing Tom Dwan in the 'durrrr' challenge (which I guess technically isn't over....)

csa · on July 12, 2019

You clearly didn’t read the additional links they posted. They mentioned why they chose 10k (AIVAT), and it goes far beyond any of the variables you mentioned.

For any number of hands, my money is on the bot.

Traster · on July 12, 2019

That really doesn't address the point that was raised. It's not that the bot wins through luck and that 10k is too small a sample, it's that a good professional poker player isn't good over 10k hands, they're good over 5 years.

Any good player will have their play analyzed and responded to, so there's a feedback loop there - any good player will have their play analyzed, exploited and will have to re-adjust their strategy to respond to exploitative play. The question is: How does the AI strategy adapt over time to players who know the hand history of the AI strategy. That's an extremely important part of being a top level player. To give you an example - if you watch Daniel Negreanu's vlog about his time at the WSOP he actively talks about changing his strategy in response to his analysis of different players' profiles. This is especially important in Sit & Go where at high stakes you'll have regular grinders who build up reputations - less so in tournaments where you're less likely to meet any given player.

hdkrgr · on July 12, 2019

This will be interesting to see.

Brown and Sandholm's algorithm aims to play a Nash Equilibrium which by deifnition _cannot_ be exploited by a single opponent player as long as all players are playing the equilibrium strategy. As they note in the paper this gives you a strong optimality guarantee in the 2-player setting. It was unclear whether this would transfer to real-world winnings in the multi-player case, and while it looks like it does for now (for current strategy-profiles of human players) humans might be able to adapt to the strategy played by the bot. Given the fact that the bot wins against current human strategy-profiles in the n-player setting, it's likely (but not a sure thing) that human players will have to team-up against the bot to exploit it. That seems rather unlikely to me.

noambrown · on July 11, 2019

I'm one of the authors of the bot, AMA

n3k5 · on July 11, 2019

What took you so long? I mean not the Pluribus team specifically, but Poker AI researchers in general.

The desire to master this sort of game has inspired the development of entire branches of mathematics. Computers are better at maths than humans. They're less prone to hazardous cognitive biases (gambler's fallacy etc.) and can put on an excellent poker face.

As a layperson who's rather ignorant about both no-limit Texas hold 'em and applicable AI techniques, my intuition would tell me that super-human hold 'em should have been achieved before super-human Go. Apparently your software requires way less CPU power than AlphaGo/AlphaZero, which seems to support my hypothesis. What am I missing?

Bonus questions in case you have the time and inclination to oblige:

What does this mean for people who like to play on-line Poker for real money?

Could you recommend some literature (white papers/books/lecture series/whatever) to someone interested in writing an AI (running on potato-grade hardware) for a niche "draft and pass" card game (e.g. Sushi Go!) as a recreational programming exercise?

noambrown · on July 11, 2019

I think it took the community a while to come up with the right algorithms. So much of early AI research was focused on beating humans at chess and later Go. But those techniques don't directly carry over to an imperfect-information game like poker. The challenge of hidden information was kind of neglected by the AI community. This line of research really has its origins in the game theory community actually (which is why the notation is completely different from reinforcement learning).

Fortunately, these techniques now work really really well for poker. It's now quite inexpensive to make a superhuman poker bot.

amelius · on July 11, 2019

So will this be the end of online poker?

tempestn · on July 12, 2019

It's pretty easy for good players to recognize other good players. And since the house takes such a large cut, the only way for pro players to have positive expected value online is to seek out games with poor players. So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.

That said, I suppose it would be possible for the bots to become so prevalent that all this sort of opportunity is effectively used up, so the return vs time and risk for a human player is no longer worthwhile. (That already happened long ago for most players, as the initial online poker boom faded and most casual players left.)

On the other hand, all the major platforms have terms prohibiting using bots, so their numbers might be sufficiently limited to prevent that scenario.

oppiz · on July 12, 2019

It's my understanding the big sites have some pretty sophisticated bot detection systems, so in theory a bot that would be successful at beating online poker couldn't be a huge winner, it'd presumably raise too many red flags. However, if it were a near break-even player, with dozens, if not hundreds, of instances running at any given time, it's going to slowly grind out a substantial figure. You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc. I'm not a coder, but it seems like it'd be a tremendous undertaking to code a bot that would be a substantial threat to players. Then again, maybe I'm naive about the level of scrutiny the poker sites are employing.

waste_monk · on July 12, 2019

One of the professors I used to work with some years ago was involved in stylometry research on human-computer interactions such as keystrokes and mouse input (for example, to determine if a user who had authenticated successfully earlier is not the same person currently typing based on keystroke cadence and pattern analysis - e.g. if someone sat down at an unlocked workstation and started typing, you could detect it and force them to reauthenticate).

It would probably be possible to figure out the types of detection being performed by the poker sites and use adversarial training methods to train a machine learning solution to mimic human input patterns. Or, more pragmatically, have the bot analyse the state of the game and give orders for a human to perform at their own natural pace.

awb · on July 12, 2019

Poker sites mainly detect bots based on their login times, number of tables, time per action, etc.

A successful bot shouldn't get caught for "playing like a bot" because the moment it's actions are that predictable it would presumably no longer be effective.

But it will get caught for operating like a bot. So, don't run it 24hrs a day. Sites also randomize things to keep bots at bay, even card imagery.

If your performance and success drops whenever they randomize something that gives the bot false inputs, then you might get caught.

Inputting all of the poker events manually would be really tedious I'd imagine.

Of course, if you're winning millions, they can interview you about your poker history and how you got so good.

It sounds like easy money, but probably not.

Angostura · on July 12, 2019

Just play as you normally would, with the bot advising moves from the laptop next to you.

awb · on July 12, 2019

Right, but the bot needs to know who is in what position what the bets are, who folded, etc. Try inputting all of that information manually to the laptop next to you and you'll quickly get frustrated. Online poker is a fast game with lots of data points.

NetBeck · on July 12, 2019

TensorFlow, PyTorch, Caffe, Keras, MXNet, and OpenCV could copy the game if you split the video input for the player and the bot.

awb · on July 13, 2019

Yes, but see my previous comment.

People have tried it and online poker sites know they've tried it, so they'll randomize images and other data. If you take a dive when the randomizations are triggered and outperform otherwise good luck trying to collect your winnings.

alienallys · on July 12, 2019

An external camera with Image processor does that

tempestn · on July 12, 2019

Not to mention, if you get caught, there could be worse consequences than just having your account locked. The site could (and likely would if the scale was significant) sue you for not only all your winnings, but damage to their business. They would likely win (since you're flagrantly breaking their terms of use contract), and bankrupt you.

Edit: In fact, if we're talking worst case, circumventing their anti-bot restrictions would presumably be illegal under the CFAA. So if you're in the US you could even be charged criminally, although I expect in reality that would be less likely.

tuesdayrain · on July 12, 2019

>You'd also have to take into account that the sites are monitoring things like reaction times to bets and raises, hand range consistency, etc.

You might be surprised by the lengths people go to in order to bypass bot-detection just for ordinary games. All of the things you mentioned are pretty standard. Considering there is serious money on the line here, I am positive that plenty of poker bots will be virtually indistinguishable from professional players, if they aren't already.

Phillipharryt · on July 12, 2019

The same argument of money being on the line applies to the detection. Poker software is already pretty damn impressive with its tracking. The online casinos actually stand to lose more money than the bot creators could make, so the detection has a greater incentive, and is likely to triumph.

pbhjpbhj · on July 12, 2019

They only lose if there are less plays, surely? I assume they take a cut of all winnings, they're not putting up stakes.

Phillipharryt · on July 12, 2019

Yes, I'm assuming that if bots work their way into everyday online poker that people will stop using it, so there would be less players.

oppiz · on July 12, 2019

I guess the real threat isn't a "bot" but something in the way of a program that interprets the data on the screen real-time and whose output instructs the player of the "optimal" play, given the circumstances. How the hell would you deter that as a site operator?

chongli · on July 12, 2019

No, I think your earlier example of a swarm of just-above-break-even bots would be much more difficult to combat. Even if they can be detected, the anti-detection countermeasures can evolve, turning it into an arms race. Anything you can model in your bot detection algorithm, the bot-maker can model too.

Reaction times ought to be one of the easiest things to fake. All it would take is a bunch of monitoring of large numbers of games to create a nice model of real player reaction times, which in all likelihood are normally distributed anyway.

disgruntledphd2 · on July 12, 2019

Not normally distributed, as negative reaction times are unlikely. You could use log-normal, but I believe that a mixture of exponential and gamma tends to be used by reaction time researchers (search ExpGamma).

chongli · on July 12, 2019

negative reaction times are unlikely

Oh, right. I was thinking along the lines of 100m dash, where people often do have negative reaction times (which we penalize as false starts).

In poker we don't have much of an incentive to react instantly to any play.

BigJono · on July 12, 2019

Pretty sure I've read a long time back on 2p2 that large consistant winners on certain sites have been asked to submit camera footage of their play with a clear view of screens and inputs. So this is probably something that companies like Pokerstars have been dealing with for years already.

colordrops · on July 12, 2019

It would be pretty easy to hide something signalling you on what to play from cameras.

tempestn · on July 12, 2019

True, but ultimately if they're unsure they'll just ban you from the platform anyway. Consistent, winning players aren't really where they make their money, and they're free to ban anyone they like. (I realize technically they take a cut from all players, but more money gets sloshed around for them to skim off of if winning players aren't removing it from the system.)

wastedhours · on July 12, 2019

That was what I was thinking, the bot augmenting a human's playing ability rather than playing itself.

RomanBob · on July 12, 2019

> So even if they couldn't recognize the bots as such, they would see them as tough players and avoid them.

The problem would be if i was a pro i would rather run 1000 bots than play myself. Which means the only players left are AI and fish. Once the fishies learn of this fact, they will abandon in drove.

It's all gonna go back to live poker soon.

ggggtez · on July 11, 2019

No, having losing odds never stopped anyone from gambling.

olalonde · on July 12, 2019

That's simply not true... I don't play casino games because of the losing odds. I play poker because of the winning odds. I guess you meant "having losing odds doesn't stop everyone from gambling".

traderjane · on July 11, 2019

Even with a magical human test, you couldn't know whether it was human + robot performance.

salty_biscuits · on July 11, 2019

Just bet on bots playing each other.

drjesusphd · on July 12, 2019

So... like Wall Street!

MRD85 · on July 12, 2019

I'm sitting here considering the possibility of making my own bot to play low stakes online poker ($1.50 sit n go). Run it on 6 tables at once and I imagine it would be facing really poor opponents and would have a steady flow of cash.

moate · on July 12, 2019

until your bot gets caught (possibly quickly) and then you're banned from the sites.

kyleblarson · on July 12, 2019

Even if it is, it means a new live poker boom which is a very good thing

joaomacp · on July 11, 2019

It must be. It is way too hard to prevent humans from using an AI. Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.

icelancer · on July 12, 2019

>> Some chess services try to check if you're playing "too perfect", but in poker that's harder to do, and there's way more money on the line.

Not really. With perfect information you know the correct strict equity plays assuming normal opponents. This doesn't give you the ultimate answer, because a player's reads and inference about another player is definitely an input - especially at the highest level - but it is more than enough to give you a winning/losing player at the small/midstakes.

source: worked for an online poker company that had these tools... and far more available to us

badfrog · on July 12, 2019

> a player's reads and inference about another player is definitely an input - especially at the highest level

I think player-dependent strategy is more important at lower levels because the players are much further away from what you call "normal opponents", so there's far more opportunity to exploit their mistakes.

chance_state · on July 11, 2019

>Some chess services try to check if you're playing "too perfect" [...]

That's interesting, could you share an example? Most of my search results are anecdotal Reddit threads about how many people cheat in online chess.

MikeHolman · on July 12, 2019

All the major online chess websites have anti-cheat mechanisms. They don't publish details of how they detect cheaters though, and I don't know how good they are.

From what I've read, they work by comparing the player's moves against chess engines, and if the player is picking engine's choice too often in positions where there are multiple roughly equal moves, they get flagged.

meruru · on July 12, 2019

I always found weird that someone would want to cheat in a game like online chess. I mean, what's the point? Does anyone have insight on what's going on in the head of cheaters?

Jach · on July 12, 2019

A few reasons come to mind. One is simply that if you have any metrics (ranking, win/loss ratio, greater site access..) it's going to feel nice to see them improve. Another is that losing at anything can be ego-hurting (similar reason good players sometimes sandbag with new accounts / lower ranks they can't possibly lose to, they need to 'win' more). Or reverse sandbagging/trolling with a bot might be amusing. Another is the cheater may justify it as a self-teaching game, and might not always play the strongest move but see if their move is even in consideration or try to improve ability to see the better moves by having them always pointed out -- but why not just play the bot, or save that for post-game analysis? I like to run my go games through gnugo's annotated analysis at the end (as I'm very weak I assume even the weak gnugo can teach me things), it'd be too troublesome to use it in a live game.

sakarisson · on July 12, 2019

Other players justify cheating by convincing themselves that everyone else is cheating.

baq · on July 12, 2019

it's where the enjoyment comes from. cheaters don't enjoy the game as much as they seeing their ELO/MMR go up or in the worst case they're psychopaths who just want to mess with other people's heads.

root_axis · on July 12, 2019

People enjoy the the feeling of having power over others.

soup10 · on July 12, 2019

Even so most people don't bother with standard games online since its way too easy to cheat by mirroring the game and basically undetectable if they are good enough to not play lines that look like "computerish" moves.

icelancer · on July 12, 2019

>> Computers are better at maths than humans.

OP discussed it but while this is true, it is not necessarily true or straightforward when it comes to games with hidden information like poker. This is more of a game theoretical problem (Economics) than it is a purely mathematical one, which had less support in the AI/ML community, hence the delay.

The lower CPU/GPU/resource use supports that fact as does your intuition. Breaking poker required a lot of manual work and model design over brute force algorithms and reinforcement learning.

b_tterc_p · on July 11, 2019

The bot does not seem to consider previous hands in its decisions. That is to say, it does not consider who it is playing against. Should this affect how we perceive the bot as “strategic” or not? Bots that play purely mathematically optimally on expected value aren’t effective or interesting. But it feels like this is playing on just a much higher order expected value.

It feels like a more down to earth version of the sci fi super human running impossible differential equations to predict exactly what you will do given knowledge that he knows what you know what he knows... etc. ad Infinitum. But since it doesn’t actually consider the person it’s predicting, it may simply be a really really good approximation of the game theoretic dominant strategy.

At what complexity of game and hidden information should we feel like the bot can’t win by running a lookup table?

noambrown · on July 11, 2019

The bot bluffs, and understands that when its opponent bets it might be a bluff. I would consider that to be strategic behavior. The fact that its strategy is determined by a mathematical process doesn't change that in my opinion.

b_tterc_p · on July 12, 2019

It does bluff, but that’s not my point. My issue is that it bluffs without consideration of its opponent. High level strategic play of most games is about adapting to your opponents play. This bot does not do that. It is secretly a giant lookup table of game state to response.

In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

I’m surprised that you managed to beat pros without adaptability. It’s pretty impressive and says a lot about how we define strategy. If human adaptability is just not as good as machine optimality across all games, we could imagine discovering that an adaptable poker AI can’t outperform this one. It raises a whole lot of interesting questions because lots of criticism towards something like Starcraft AI is that it is strategically stupid and doesn’t adapt. Now the Starcraft Ai is admittedly kind of stupid now, but we may hit a wall on its creativity simply because creativity is, despite human intuition, a dumb idea.

Cybiote · on July 12, 2019

If you think about it, any AI that's stopped learning and is now efficiently doing pattern matching or pattern completion (assuming memory and attractor states), instead of running a complex search, is arguably a fancy lookup table hashed by similarity. This includes humans. In other words, lookup table isn't the slight most think it is. But the bot does do real time search so it's not "merely doing" a look-up.

Because of how Poker is not sub-game solveable (it is not possible to self-locate within the tree), this bot's play has to get into its opponent's mindspace in a sense. To not be exploitable, it essentially has to infer the other player(s) hidden state and paths from observed actions. This isn't something I've seen in Dota, Starcraft, Chess, Go bots.

It's true that it doesn't learn online to find exploitable patterns of other players, but doing this without also making yourself exploitable in turn is a very difficult other problem. Low exploitable near optimal play according to game theoretic notions is considered strategy.

While you're correct that online learning is powerful and something machines are not currently good at (in complex spaces), you can avoid being exploited without learning if your experience is rich enough and you know how infer what your opponent is trying to do and anticipate them. I'd argue this lineage of poker bots are the closest to playing that way of the major game playing bots.

b_tterc_p · on July 12, 2019

I don’t mean look up table as a bad thing. I mean it’s a lookup table on game state, without incorporating any information about the players. But good points

tialaramex · on July 12, 2019

> High level strategic play of most games is about adapting to your opponents play.

Is this true in any meaningful sense?

For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

I'm very impressed by this achievement because I had expected good multi-player poker AI (as opposed to simple colluding bots found online making money today) to be some years away. But I would not expect "adaptability" to ever be a sensible way forward for winning a single strategy game.

Cybiote · on July 12, 2019

Adaptability is certainly not necessary (almost by definition) if you're playing a near to equilibrium strategy but adaptability is a useful skill to have in a general non-stationary world.

That said, for this bot, I wouldn't say it's playing completely independent of the other players's interior state. Pluribus must infer its opponents strategy profile and according to the paper, maintains a distribution over possible hole cards and updates its belief according to observed actions. This is part of playing in a minimally exploitable way in such a large space for an imperfect information game.

b_tterc_p · on July 12, 2019

> Pluribus must infer its opponents strategy profile

This is what interests me. It doesn’t do this. In fact because it played against itself only, it is should be assumed that the only strategy profile it considers is its own.

Cybiote · on July 12, 2019

You're right that it uses itself as a prototype for decisions but the fact that it also maintains a probability distribution over possible hole cards and that it updates according to observed actions is already richer than the local decision only approach taking most all other bots. This is sort of forced by the simplicity of poker's action space combined with the large search space and imperfect information. Here, the simplicity ends up making things more difficult! They also use multiple play styles as "continuation strategies" so it's a bit more robust. And to be fair, I suspect much of human play does use themselves and experience as a substitute too.

lmm · on July 12, 2019

> For heavily studied games there's usually a theoretically optimal play independent of the opponent's interior state, this is obviously true for all the "Solved" games, which includes the simpler Heads Up Limit Hold 'Em poker (solved by Alberta's Cepheus project) but it seem pretty clearly true for as-yet unsolved games like Go and Chess too.

In an n-player game, a table can be in a (perhaps unstable) equilibrium which the "optimal" strategy will lose at. This has been demonstrated for something as simple as iterated prisoners' dilemma (tit-for-tat is "best" for most populations, but there are populations that a tit-for-tat player will lose to). I don't play poker but I've definitely experienced that in (riichi) mahjong - if you play for high-value hands the way you would in a pro league, on a table where the other three players are going for the fastest hands possible, you will likely lose.

Phillipharryt · on July 12, 2019

Well in online poker high level players make great use of player tagging, taking notes about players they have played before and what they've done in important hands or their patterns. Software exists to track how opponents behave in any given situation, and if it pops up again you use that.

I would think if professional players are utilising this information, a bot could benefit from it. I don't see how they would ever lose out from this information, even if it only uses situations where the opponent has a history of 100% of the time responding a certain way.

I am impressed by the bot but I have to laugh a bit because years ago I joked with a friend about making an "amnesiac bot" that had no recollection of previous hands, it seemed so useless we obviously didn't make it, we've evidently been proven wrong. (pointless tangent there)

tialaramex · on July 12, 2019

Player tagging just makes you exploitable. I play one way now, you tag me "Haha, fool bet-folds way too much" and then I change it up to exploit you, "Huh, I keep trying to fold him out with worse and he doesn't bite even though my notes say he will".

The theoretically optimal play just skips that meta and meta-meta play and performs optimally anyway. Because poker involves chance the optimal play will be stochastic and so you can stare at the noise and think you see a pattern, that just means you'll play worse against it, because you're trying to beat a ghost.

For example, suppose in a certain situation optimally I should raise $50 10% of the time. It so happens, by chance, that I do so twice in a row, and you, the note-taker, record that I "always" raise $50 here. Bzzt, 90% of the time your note will be wrong next time.

madog · on July 12, 2019

You would be a fool to act based off only 2 instances of seeing a particular behaviour. That's why you have to weigh up how many instances you've seen. Sometimes if it's less than X instances it's not worth considering that particular statistic as valid.

Now say I have thousands of hands viewed against you, and you raise pre-flop 50% of the time. That is pretty significant information about the types of hands you play. If I have only 10 hands I've observed, that same stat means nothing.

The theoretical optimal play depends on who you're playing, as more value could be extracted in certain situations vs certain players.

For example, if I've seen you face a pre-flop 3-bet 1000 times and you've folded 99% of the time. That would be a good opportunity to recognise that 3-bet bluffing this player more often would have value, and be a more optimal play than some default. Contrast playing someone who called pre-flop 3-bets 75% of the time it wouldn't be optimal to 3 bet bluff here. Different opponents, different optimal plays.

agent008t · on July 12, 2019

I think we need to make a distinction between two kinds/styles of play:

1. Coming up with an unexploitable strategy, then scaling it up by playing as many hands as you can, earning the slim expected value each time.

2. Picking a good table / card room / 'scene', and then trying to extract as much value from it as possible.

You most often see 1 online, and 2 live, for obvious reasons.

A skilled human would be a lot more successful, I believe, than a bot in case 2. For 2, important skills are:

1. Be entertaining. You have to play in a way that is entertaining to those playing with you, such that they want to continue playing with you (and losing money to you). Good opponents (i.e. that are bad at poker but want to play high stakes) are hard to find, it is vital that you retain them.

2. Cultivate a table image, then exploit it. Especially important for tournament play, where you have the concept of "key hands" that you really need to win to potentially win the tournament. With the right table image, you may be able to win hands you otherwise wouldn't have won.

3. Exploit the specifics of the players you are playing against. Yes, that also makes you exploitable, but the idea is to stay one step ahead of your opponents.

Cybiote · on July 12, 2019

Note that 1) is only true if your opponent is also not making many mistakes. Which fails to be true for most humans, where the combination of randomization and calculating state appropriate ranges is very difficult. This means that weak players can still lose heavily from mistakes/poor play within a reasonable number of hands, it need not be slim.

Furthermore, you can kind of account for such players by including more random or aggressive profiles in the inference/search stage.

Phillipharryt · on July 13, 2019

Player tagging is more complicated than a single game, and goes far deeper than playing a few hands one way and then switching it up. You can have player stats based on thousands of hands, you can know things about your opponent even they don't know.

I don't think you play very much, which is fine, but makes this discussion a bit pointless.

barry-cotter · on July 12, 2019

> In the case of poker, it appears that adaptability is not as good as pure mathematical optimization. Humans can adapt their strategy, but it’s basically just worse regardless because this thing has cracked the code.

Adaptability is beaten by perfect strategic play in games with clear victory conditions.

My familiarity with optimal control theory is nil but Kydland (1977) applied it to monetary policy to show that the right rules dominate discretion. What the right rules are for monetary policy is still an open question though, because while the victory conditions in economic policy are clearly defined the surrounding environment is very far from static so you deal with out of training set data regularly. Once AI can deal with these kind of out of context problems it seems plausible GAI is a matter of time.

http://www.finnkydland.com/papers/Rules%20Rather%20than%20Di...

> Rules Rather than Discretion: The Inconsistency of Optimal Plans

> Even if there is an agreed-upon, fixed social objective function and policymakers know the timing and magnitude of the effects of their actions, discretionary policy, namely, the selection of that decision which is best, given the current situation and a correct evaluation of the end- of-period position, does not result in the social objective function being maximized. The reason for this apparent paradox is that economic planning is not a game against nature but, rather, a game against rational economic agents. We conclude that there is no way control theory can be made applicable to economic planning when expectations are rational.

slg · on July 12, 2019

"Strategic" is probably the wrong word, but I think there is a valid question here regarding the approach the AI is taking. One of the key things for a good poker player is having the ability to adapt and adjust their strategy depending on how others at the table are playing. Sometimes you can have the exact same cards in the exact same position and in one game it is smart to fold and in another game it is smart to raise. From the description in the article, it doesn't appear that this AI takes those ebbs and flows into consideration. Instead it seems to play "purely mathematically optimally on expected value" that was honed through trillions of simulations.

There is a cliche about how poker is about playing your opponents and not the cards. Is this AI is only focusing on its cards and ignoring its opponents?

noambrown · on July 12, 2019

The AI doesn't adapt to the opponents, and that's still an interesting challenge for AI research. That said, at the end of the day, it was making quite a bit of money playing against elite human pros. I think that suggests the cliche is, at least in part, wrong.

slg · on July 12, 2019

Making "quite a bit of money" still leaves open the possibility that the AI is leaving a lot of money on the table by not taking opponents into consideration.

Also I would be curious to see how it performs against people that aren't "elite human pros". Would this AI win at a higher rate in a game against average recreational players compared to the rate a pro would win?

Lastly it is also possible that the pros simply didn't have enough time to adapt to the AI which would be extra important considering the AI plays unlike humans and therefore is harder to predict.

noambrown · on July 12, 2019

I think the bot would make a lot of money playing against average recreational players, but it's absolutely true that if you can exploit bad players' weaknesses, then you can make more money than what the bot would earn.

We played 10,000 hands over 12 days in the 5 humans + 1 AI experiment. That's quite a long time, and there's no indication that they even began to uncover any weaknesses in that time period. So I'm fairly confident the AI is robust to exploitation, and I think that's a very important quality to have in any AI system.

slg · on July 12, 2019

That 10,000 total hands number isn't particularly meaningful on the point of adaptability because the humans aren't sharing information with each other. The important number is how many hands each individual human played against the AI. Another question would be whether the pros knew which player was the AI? Because if they didn't, you are basically throwing a modified Turing Test against the pros before they can even begin to try to find tendencies in the AI. Predicting opponents is a huge part of how people play poker. If the AI plays unlike any human, pros are at huge disadvantage against an AI compared to how they would fair against a similarly skilled but more traditional human player.

None of this is meant to diminish what you all accomplished, I'm just highlighting areas of poker in which this AI would be less successful than humans even if it is more successful overall.

noambrown · on July 12, 2019

The humans knew the whole time which player was the bot.

hajile · on July 12, 2019

There was an interesting IRL poker game a few years ago. The player who was running behind started going all in on every hand without even looking at their hand (with a huge amount of success).

Out of curiosity, how does a bot deal with oddities things like this?

bostik · on July 12, 2019

This is a solved problem. Open-shoving is a feature of sit-n-gos, so of course people have simulated these and compiled so called "pushbot tables". The parameters are basically pot size and winning probabilities against a random hand.

While this particular bot may not have those programmed in, a more powerful variant eventually will.

dodobirdlord · on July 12, 2019

The mathematical theory explored in the paper is that if multiplayer poker isn't one of the multiplayer finite state games that pathologically fails to converge to a Nash equilibrium, then it has one, and this strategy should approximate it. Intuitions about adaptability and the advantages thereof aren't applicable in the scenario where the opponent is playing to a Nash equilibrium. You can perform equally well by participating in the other side of the Nash equilibrium, but anything else is a losing strategy. The fact that this approximation converges to a strategy that's actually really good suggests that there is a Nash equilibrium, and that the converged-upon strategy is converging on it.

You can't out-think or adapt to a rock-paper-scissors opponent who selects at random. All you can do is also select at random and accept that the two of you have even odds.

icelancer · on July 12, 2019

>> Bots that play purely mathematically optimally on expected value aren’t effective or interesting.

Interesting is up to you, but effective is definitely wrong.

ICM-perfect bots crush small tournaments, which do not take into account opponent behavior - merely modeling the gamestate. The faster the blinds and the smaller the stacks, the better, but even normal structures get killed by these so-called "expected value" only bots.

Game Theory Optimal (GTO) attacks are incredibly effective at all levels of the game. The AI need not incorporate opponent feedback to be a winner. It can make it better, but it is not at all required.

bluetwo · on July 11, 2019

First of all, I laughed at the 20-second average per game in self-play, since I ran into the same thing and have been trying to speed up the algorithm but haven't been able to get it faster (without throwing more hardware at it).

Second, I haven't read everything, but I believe you are playing a cash-game and not tournament-style. Is that correct? If that is the case, any chance you will be doing a tourney-style version?

[For those who don't play, in cash, a dollar is a dollar. In Tourney play, the top 2 or 3 players get paid out, so all dollars are not equal, as your strategy changes when you have only a few chips left (avoid risky bets that would knock you out) or when you are chip leader (take risky bets when they are cheap to push around your opponents).]

Also, curious how much poker you folks play in the lab for "research".

noambrown · on July 11, 2019

We're doing cash games in this experiment. At the end of the day, this is about advancing AI, not about making a poker bot. Going from two-player to multi-player has important implications for AI beyond just poker. I don't think the same is true for cash game vs tournament.

There's a cash game almost every night at the FBNY office! I don't usually play though -- I'm not nearly as good as the bot.

wallawe · on July 11, 2019

> In Tourney play, the top 2 or 3 players get paid out

Or top 2 or 3 thousand... depends on the tournament but it's usually the top 15% ish.

bluetwo · on July 12, 2019

True, I am thinking "sit and go" tournament where you would have 6 players like in this research.

icelancer · on July 12, 2019

Is there much to do here? ICM bots have this space covered pretty effectively.

Phillipharryt · on July 12, 2019

But ICM is only a model that helps you evaluate information in the tournament, players will use it often to cap their bets or as a tipping point on a call, but I've never seen it used as a complete basis of play.

snarf21 · on July 11, 2019

How do you think these same pros would do in a follow-up match? As described in the article, the bot put players off their game with much more varied betting and with donks. Do you think the margin would decrease as players are exposed to these strategies?

Players face mental fatigue and have so over-learned their existing strategies that it takes time to adapt new strategies and even more time for those new strategies to become second-nature.

It reminds me of sports in a way. Teams start running a new wrinkle of offense in the NFL like the wildcat and it takes a few seasons for teams to instinctively know how to play defense correctly against that option.

noambrown · on July 11, 2019

In the paper we include a graph of performance over the course of the 10,000-hand 5 humans + 1 AI experiment that was played over 12 days. There's no indication that the bot's performance decreased over time (there is a temporary downward blip in the middle, but that's likely just variance). Based on discussions with pros, it sounds like they didn't find any weaknesses and they didn't seem to think they'd find any given more time.

TheChosenZygote · on July 11, 2019

I think it would be hard for the pros to find exploits against the bot, but they could definitely lose less. When using solvers, pros generally only input a couple of sizings for bets, and avoid 2x+ pot sizings, which from the video it seemed like the bot used at much higher frequencies than other pros.

asdfman123 · on July 11, 2019

I'm not great at poker, but I did play a decent amount and I know a lot of my strategy involves probing for other people's weaknesses and shifting my strategy mid game to throw people off.

I feel like a lot of trained ML models have a lot laughable weaknesses, but perhaps they've been trained on every game they're well prepared for any tomfoolery.

TheChosenZygote · on July 11, 2019

The bot is trained to play Game Theory Optimal, aka it's playing to be breakeven at worst, which is why I believe it would be hard for a human to beat it. It's not playing perfectly, but the edges it's giving up is so marginal to perfect play that a human is going to lose simply by making a mistake at some point, even if a human were to use a solver to completely optimize their strategy.

MFLoon · on July 11, 2019

I also suspect it would not be able to maintain a ~40bb/100 hand win rate. The thing about human players is, while the best are capable of learning and employing truly balanced GTO strategies, in practice they rarely adhere to these because other humans (even good pros) will still have exploitable flaws in their strategies, and attempting to exploit these will be more profitable than sticking to the unexploitable strategy; of course it also opens the exploiter to counter-exploitation, creating a fluctuating cycle of players trying to exploit, getting exploited, then moving back towards playing unexploitably. That's the normal state of a pro's strategy in a given game - so to switch to a steady state of always playing unexploitably would be a fairly big adjustment even to top tier pros who are capable of it.

snarf21 · on July 11, 2019

Yeah, that is kinda what I was trying to tease out. These 10K hands are nothing compared to the XM of hands these pros have already played. It would be interesting to see how well they did after 1M hands. I'm sure the bot would likely still have an edge but I'd assume the players would adjust their strategy and but less confused by the random sized bets.

I was also confused by the sample videos where everyone had $10K at the start of each of the demo hands. It was unclear to me if that just the simulation of the hands or actual game play. If everyone starts every hand with $10K, then the feat seems less strong as going all-in has less risk.

splonk · on July 11, 2019

Stacks are reset to 10k at the beginning of each hand, so they can use every hand to train a single model with the same starting state.

MFLoon · on July 11, 2019

The fixed stack size doesn't really discount anything to me - it makes sense as an experimental control; and it's a cash game so there's no additional risk to going all in regardless of stack size.

But yea the sample size is definitely too small imo; when tested the heads up version of the bot some years ago they had it play a bigger sample (50 or 100k iirc?).

bostik · on July 12, 2019

In online poker (at least with 100BB stacks) it's customary to top up between hands if you're below full stack.

The reason is simple: with table stakes, your maximum win for a hand is constrained by your own stack size.

asdfman123 · on July 11, 2019

I remember reading in the mid-to-late aughts that a lot of old-school poker players that used more swagger and intuition were starting to be run out of the game by kids who applied statistical methods.

tc · on July 11, 2019

Could you perhaps speak to some of the engineering details that the paper glosses over. E.g.:

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

- Is there anything interesting going on with how the strategy is compressed in memory?

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

noambrown · on July 11, 2019

We tried to make the paper as accessible as possible. A lot of these questions are covered in the supplementary material (along with pseudocode).

- Are the action and information abstraction procedures hand-engineered or learned in some manner?

- How does it decide how many bets to consider in a particular situation?

The information abstraction is determined by k-means clustering on certain features. There wasn't much thought put into the action abstraction because it turns out the exact sizes you use don't matter that much as long as the bot has enough options to choose from. We basically just did 0.25x pot, 0.5x pot, 1x pot, etc. The number of sizes varied depending on the situation.

- Is there anything interesting going on with how the strategy is compressed in memory?

Nope.

- How do you decide in the first betting round if a bet is far enough off-tree that online search is needed?

We set a threshold at $100.

- When searching beyond leaf nodes, how did you choose how far to bias the strategies toward calling, raising, and folding?

In each case, we multiplied by the biased action's probability by a factor of 5 and renormalized. In theory it doesn't really matter what the factor is.

- After it calculates how it would act with every possible hand, how does it use that to balance its strategy while taking into account the hand it is actually holding?

This comes out naturally from our use of Linear Counterfactual Regret Minimization in the search space. It's covered in more detail in the supplementary material

- In general, how much do these kind of engineering details and hyperparameters matter to your results and to the efficiency of training? How much time did you spend on this? Roughly how many lines of code are important for making this work?

I think it's all pretty robust to the choice of parameters, but we didn't do extensive testing to see. While these bots are quite easy to train, the variance is so high in poker that getting meaningful experimental results is relatively quite computationally expensive.

- Why does this training method work so well on CPUs vs GPUs? Do you think there are any lessons here that might improve training efficiency for 2-player perfect-information systems such as AlphaZero?

I think the key is that the search algorithm is picking up so much of the slack that we don't really need to train an amazing precomputed strategy. If we weren't using search, it would probably be infeasible to generate a strong 6-player poker AI. Search was also critical for previous AI benchmark victories like chess and Go.

andr3w321 · on July 11, 2019

Any chance of the code being released or a cepheus style answer key being provided? http://poker.srv.ualberta.ca/strategy

noambrown · on July 11, 2019

I don't think the poker world would be happy with us if we did that. Heads-up limit hold'em isn't really played professionally anymore, but six-player no-limit hold'em is very popular.

andr3w321 · on July 11, 2019

It depends who you ask. I think it's inevitable that it's released one day. By not releasing you're just delaying it.

All the top high stakes players already have solvers that they've spent a lot of money developing and studying privately. They would definitely be upset with you, but by releasing the code you are democratizing the information to all the midstakes pros who want to study but don't have the resources to pay developers and solve the game privately.

floodyberry- · on July 12, 2019

If you're already using programs to help you, I don't see how you can be upset if someone else is cheating better than you are.

asdfman123 · on July 11, 2019

Someone watch this guy and see if he buys any fancy watches or nice cars in the next few years. ;)

CamperBob2 · on July 12, 2019

Doesn't that make it a rather poor candidate for a scientific paper? Chest-thumping without data and code is, well, chest-thumping without data and code.

ewhauser421 · on July 11, 2019

Have you thought about open sourcing the non-AI pieces? It would be great for other researchers so they wouldn’t have to build the poker pieces from scratch

noambrown · on July 11, 2019

There is some open-source code in this area, and hopefully there will be more going forward. Here's one example: https://github.com/EricSteinberger/Deep-CFR

home_project123 · on July 11, 2019

a. Is CFR applicable in single player hidden-information games? (e.g. state is initially hidden, gradually revealed to the agent, but there is not adversary)

b. How much more efficient is the improved search algorithm? the $150 number sounds like a couple of order of magnitudes..

noambrown · on July 12, 2019

a. There was this paper a couple years ago applying CFR to single-agent settings: https://arxiv.org/abs/1710.11424

b. It really depends on the game and the situation. It can be several orders of magnitude in six-player poker. In other games, it can be even more.

nradov · on July 11, 2019

Why are you concerned about the happiness of the poker world?

anbop · on July 11, 2019

Well if they upset the poker world do you think they would have top pros willing to go on record endorsing them?

nradov · on July 11, 2019

Top pros will endorse whatever they're paid to endorse.

icelancer · on July 12, 2019

This is falsifiable by any number of cases, but Isaac Haxton spurning PokerStars is probably one of the best examples so others see your comment is not universally applicable.

https://upswingpoker.com/isaac-haxton-pokerstars-partypoker/

pbhjpbhj · on July 12, 2019

>However, Haxton isn’t accepting PokerStars’ olive branch as he was among the victims defrauded by the online giant for millions of dollars.

I'm not sure the really provides strong opposition to the GP's claim.

icelancer · on July 12, 2019

PokerStars offered to make him - and him alone - whole through sponsorship dollars. Haxton used to be their lead pro and is widely considered one of the very best players in the world.

nickpsecurity · on July 11, 2019

It could just be for ethical reasons. I think anbop has a good reason even for unethical folks: hitting the best players hard in their wallets will definitely make it harder to recruit them for comparisons that validate these experiments. My prediction is that releasing this software will lead to profitable cheating like what people do with Blackjack at casinos.

unityByFreedom · on July 12, 2019

Why not run the bot, post its proceeds transparently online, and donate everything to charity?

By not releasing it, you're ensuring a higher concentration of money in the hands of a few, IMO.

Anyone with access to this source code could run a bot themselves, or employ someone to do so.

Plus, if you've accomplished this, no doubt someone can replicate it.

wolco · on July 11, 2019

By not releasing it, it doesn't validate the experiment. How can we be sure there wasn't human support?

Avamander · on July 11, 2019

As other commenters have said, I do too think you're delaying the inevitable but releasing now would mean you get credited with the first free solution.

isaacg · on July 11, 2019

In your Science paper, you mention playing 1H-5AI against 2 human players: Chris Ferguson and Darren Elias. In your blog post you also mention playing 1H-5AI against Linus Loelinger, who was within standard error of even money. Why did Linus not make it into the Science paper?

noambrown · on July 11, 2019

That took place after the final version of the Science paper was submitted. It would have been nice to include but it takes a while to do those experiments and we didn't feel it was worth delaying the publication process for it.

spenczar5 · on July 11, 2019

The article makes it sound like the AI is trained by evaluating results of decisions it makes on a per-hand basis. Is there any sense in which the AI learns about strategies that depend upon multiple hands? I’m thinking of bluffing/detecting bluffs and identifying recent patterns, which is something human poker players talk about.

noambrown · on July 11, 2019

The bot handles each hand independently. How the players play in one hand does not affect how the bot plays in future hands at all.

That said, it did train by playing against itself (before the experiment against the humans began).

kyberias · on July 12, 2019

Interesting. Does this mean that it cannot adjust to human players "switching gears"? Isn't this a huge leak?

rjldev · on July 12, 2019

It’s not a leak, it just means it cant beat the opponent for the maximum it could by playing the exploitative counter strategy vs their tendencies. Instead it just plays gto which will win against any given non gto strategy, though not for as much as the exploitative counter strategy. Playing an exploitative strategy however leaves you open for exploitation and this goes back and forth until the players converge onto gto, assuming the players are (very) good.

t. former poker pro

Jach · on July 12, 2019

Was Judea Pearl's work relevant for the counterfactual regret minimization, or is there some other basis? I've added CR to the list of things to look into later but skimming the paper it was exciting to think advances are being made using causal theory...

noambrown · on July 12, 2019

The CFR algorithm is actually somewhat similar to Q-learning, but the connection is difficult to see because the algorithms came out of different communities, so the notation is all different.

throwamay1241 · on July 11, 2019

Who were the pros? Are they credible endbosses? Seth Davies works at RIO which deserves respect but I've never heard of the others except Chris Ferguson who I doubt is a very good player by todays standards (or human being, for that matter), but I've never heard of the others when I do know the likes of LLinusLove (iirc, the king of 6max), Polk and Phil Ganford.

Is 10,000 hands really considered a good enough sample? Most people consider 100k hands w/ a 4bb winrate to be an acceptable other math aside. However, as your opponent and yourself play with equal skill, variance increases to the point where regs refuse to sit each other.

noambrown · on July 11, 2019

LLinusLove was one of the players. Chris Ferguson was in one of the 5 AI's + 1 Human experiment but not the 5 Humans + 1 AI experiment.

We used AIVAT to reduce variance, which reduces the number of samples we need by roughly a factor of 10: https://poker.cs.ualberta.ca/publications/aaai18-burch-aivat...

icelancer · on July 12, 2019

What? The pros chosen were definitely highly skilled players. They're fairly well known in the online poker community.

Furthermore, Chris Ferguson, scumbag aside, is absolutely still a very good player by today's standards, and one way higher than the mean participant in a research experiment.

10,000 hands is an effective enough sample at a certain win rate and analysis of variance of play; the n-value alone is not enough to tell you if it was enough hands.

splonk · on July 11, 2019

They're credible enough. I'd like the sample sizes to be bigger as well but they're enough to verify that even if the bot got lucky over the sample size, it's close enough that it doesn't really matter. Add a bit more compute, optimize some algorithms a little, and you'd make up the difference. The real point is that they have a technique that scales to 6-max, and whether it's 97% or 99% is kind of immaterial in the grand scheme of things.

FWIW, they did some variance reduction techniques that dramatically reduce the number of hands needed to be confident in your results, so the number of hands may be bigger than you think. e.g. the results of 10k HU hands have much higher variance than the results of 10k HU hands where everyone just collects their EV once they're all in.

ayemeng · on July 11, 2019

Jimmy Chou, Jason Les, Dong Kim are affiliated with Doug Polk.

It is an interesting point that these are pros but their specialities are either tournament or heads up. The current 6 max pros are LLinusLove, Otb_RedBaron, TrueTeller.

kapurs151 · on July 12, 2019

I'm very late to this post, so not sure if you're still around.

What are your thoughts on a poker tournament for bots? Do you think it could turn into a successful product? I've always wanted to build an online poker/chess game that was designed from the ground up for bots (everything being accessible through an API), but have always worried that someone with more computational resources or the best bot would win consistently. Is it an idea you've thought about?

tasubotadas · on July 12, 2019

Congrants on the bot!

I have a few basic questions. I would like to implement my own generic game bot (discrete states). Are there any universal approaches? Is MCMC sampling good enough to start? My initial idea was to do importance sampling on some utility/score function.

Also, I am looking into poker game solvers - what would be a good place to start? What's the simplest algorithm?

Thanks

haburka · on July 11, 2019

Why did you optimize for using less cpus? Was it a happy accident or a goal?

noambrown · on July 11, 2019

A little bit of both. We didn't think we needed the extra computing power. And we really wanted to convey how cheap it is to make a superstrong poker AI with these latest algorithms.

waynecochran · on July 11, 2019

Knowing when to bluff often depends on the psychology of the opponent, but since it trained playing itself it doesn't seem that knowing when to bluff would be learned. Did it bluff very often?

noambrown · on July 11, 2019

The bot does bluff, and in fact it learns from self-play that bluffing is (sometimes) the optimal thing to do. At the end of the day, bluffing is simply betting when you have a weak hand. The bot learns from experience that when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet. The bot doesn't view it as deceptive or dishonest. It just views it as the action that makes it the most money.

Of course, a key part of bluffing is getting the probabilities right. You can't always bluff and you can't never bluff, because that would make you too predictable. But our self-play and search algorithms are designed to get those probabilities right.

albedoa · on July 11, 2019

> when it bets with a weak hand, the opponent (another copy of itself) sometimes folds and it makes more money than if it hadn't bet.

This makes no sense. If I am betting for thin value with a weak hand, then I make less money when my opponent folds. Does the bot not know whether it is bluffing or value betting?

Denzel · on July 12, 2019

It makes complete sense. There’s a component of value and a component of bluff for a given hand in front of you. They’re related.

Value betting and bluffing aren’t defined by the outcome of a hand — action yet to be completed. Poker is a game of hidden information so betting with “thin value” implies that your component of bluffing is larger. You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.

QQ can get KK to fold based upon board texture, street, and prior action. But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.

albedoa · on July 12, 2019

> You want your opponent to fold more often than not when you have thin value because more often than not you’re actually beat.

No, that is simply not true. If I am betting for value, then I want my opponent to call no matter how weak I am or how thin it is.

> But you don’t know the other person is holding KK when you’re betting for “thin value” on the river.

Then it's a value bet. As you said, it's not defined by the outcome.

maehwasu · on July 12, 2019

“Value betting” and “bluffing” are human heuristics to simplify complicated situations.

The bot doesn’t “know” whether it’s value betting or bluffing—it’s not a relevant question. The relevant question is whether to bet, and what amount, in order to maximize value of the particular hand it has, with reference to the board and opponent actions taken.

albedoa · on July 12, 2019

Right, we agree on that, but the above comment lumps all of what you describe (“betting with a weak hand”) under “bluffing” and says the bot learns that it makes more money when its opponent folds.

kevinwang · on July 12, 2019

Where does your quote say that the bet is a value-bet? I read it as saying that the bot learned to bluff (not value bet) by betting when it has a weak hand (I.e. The bot has a weak hand, so it's getting better hands to fold by betting). The phrase "value bet"was not used.

(This, in addition to what the other comments have said about there being spots where a bet can get better hands to fold with some probability AND get worse hands to call with some probability - see the chapter "The grey area between value betting and bluffing" in Applications of No Limit Hold Em)

albedoa · on July 14, 2019

"At the end of the day, bluffing is simply betting when you have a weak hand."

I was the one who introduced the term "value betting" to the conversation, applied specifically to weak hands.

albedoa · on July 13, 2019

I mean, unless only those who interpret it wrong would respond, then I must be the one reading it wrong. Because these responses aren't lining up with how I read it or what I meant.