The Prisoner's Dilemma: a summary, and new advances

mistercow · on July 24, 2012

I feel bad for game theorists. Any time an article talks about their field, the first paragraph has to be a frantic explanation of what the field is and why it's interesting, in terms a five-year-old could understand, and then the article always ends up just being about the prisoner's dilemma.

It would be as if every article about physics started out "How stuff moves is a really neat thing to study..." and then invariably ended up talking about the parabolic trajectories of projectiles.

nopassrecover · on July 24, 2012

This is actually a really good article, including easily missed gems like the demos that the author put together (e.g. http://s3.boskent.com/prisoners-dilemma/fixed.html and http://s3.boskent.com/prisoners-dilemma/titfer.html)

stavros · on July 24, 2012

Did anyone else find that article meandering? The description of the game, in particular, struck me as especially pompous wiring.

The metaphor with the elephant and the goldfish had absolutely nothing to offer either, it was just a rehash of the previous sentence.

drostie · on July 24, 2012

The metaphor might have been a little over-the-top but I support this sort of creativity in blog posts, if only because it can ripen into something much more aesthetically pleasing. There are places where it goes way too far -- Gödel, Escher, Bach comes to mind -- but this is only a couple paragraphs.

In this case it does have something to "offer": it offers the insight that the only value that looking at past states can offer is a sort of "strategy tomography" to tell you what the other person's conditional probabilities actually are, but that once you know this, your strategy is first-order Markov, just because their strategy is first-order Markov. (This term, nth-order Markov, just means "depends only on the last n game states.")

It's true, however, that this leads to a very interesting consequence which is not discussed in this context, something quite convoluted: "I will commit to a first-order Markov strategy in order to prevent my opponent from using a higher-order Markov strategy, so that my own analysis of their strategy simplifies." It is a curious statement that your own ignorance forces someone else to be ignorant, which you can then exploit.

This is proven in appendix A of their paper. To state the conclusion in their words, "In iterated play of a fixed game, one might have thought that a player Y with longer memory of past outcomes has the advantage over a more forgetful player X. For example, one might have thought that player Y could devise an intricate strategy that uses X’s last 1,000 plays as input data in a decision algorithm, and that can then beat X’s strategy, conditioned on only the last one iteration. However, that is not the case when the same game (same allowed moves and same payoff matrices) is indefinitely repeated. In fact, for any strategy of the longer-memory player Y, X’s score is exactly the same as if Y had played a certain shorter-memory strategy (roughly, the marginalization of Y’s long-memory strategy), disregarding any history in excess of that shared with X."

sisu · on July 24, 2012

> It's true, however, that this leads to a very interesting consequence which is not discussed in this context, something quite convoluted: "I will commit to a first-order Markov strategy in order to prevent my opponent from using a higher-order Markov strategy, so that my own analysis of their strategy simplifies." It is a curious statement that your own ignorance forces someone else to be ignorant, which you can then exploit.

This is only true if you know the strategy of the enemy beforehand, though. For instance if you play rock-paper-scissors and decide your move only based on the previous move your enemy can easily exploit that after playing for a while. It is true that the enemy doesn't have to remember more than one move after he learns your strategy but he needs to remember many moves to learn it.

drostie · on July 24, 2012

1. No, that statement is still true even if you don't know the higher-order strategy of your opponent: no matter what it is, it has the same payoffs as some lower-order strategy.

2. You would have to define "exploit that," especially with the understanding that this is game theory and probabilistic strategies are certainly encouraged. So for example, you might imagine a genius who can consistently outthink you, knows your entire history and how you like to play Rock-Paper-Scissors and immediately as you throw down Rock, simply is able to guess that this is what you're likely to do, and throws Paper.

You can beat this guy. Or, more precisely, you can equal him. It's very simple: before the day has begun, roll a six-sided die and memorize the sequence. As long as they are not exploiting certain "tells" (as a Japanese robot did in the news a week or two ago) -- as long as they are just making a deduction based upon the sort of person you are, they cannot produce a net win against you and you're safe. Indeed, the Nash equilibrium for RPS is not terribly interesting, it's to choose each of the options with probability 1/3rd -- I don't really have much reason to believe that this changes dramatically in iterated RPS.

sisu · on July 24, 2012

1. I agree with what you say. There exists a low-order strategy that has the same payoff against the low-order strategy your opponent uses, and you can use that if you know your opponent's strategy. However, in many games there aren't low-order strategies that would work well against any low-order strategy so you need to know the opponent's strategy to choose a proper low-order strategy. Alternatively you could use a higher order strategy that learns the opponent's strategy and adapts to it.

2. Well, I was thinking that a strategy would be a mapping from game history to a probability distribution over possible moves. Should this be considered in some other way?

Sure, you can just play according to the Nash equilibrium and you can't be exploited, but probably a more interesting case is when both players try to outsmart the other player and not just aim for a draw.

stavros · on July 24, 2012

I will answer your comment in two sections, because it's really two comments in one:

The metaphor didn't really have anything to offer, before the preceding paragraph made it clear that the strategy is first-order Markov (it did state very plainly that you only remember the last position).

I don't understand the consequence. It seems to me that you aren't forcing the opponent to commit to a first-order Markov strategy themselves, you just make it easier on them to do that, because they don't need any more moves in the past.

Of course, you're doing this to simplify your own life, and because (I guess) first-order strategies work just as well as higher-order ones. Their conclusion seems to verify this, because they state that you really don't need (since you can't do any better than first-order) higher-order strategies, since you can model the opponent effectively with just the last move (which is the exact model the opponent is using).

Taking more moves into account, you would either just be trying to predict randomness (since that's the whole difference, if the opponent only remembers one move), or you would be "decompressing" their strategy into thousands of moves (since you can express a thousand moves of tit-for-tat with just "my move is the opponent's last move").

Literally, knowing any states further than the last gives you exactly zero extra information. Thus, it's useless.

drostie · on July 24, 2012

I think in the second half of what you said you began to understand what you started out saying that you "don't understand."

Think of "the consequence" here in a purely security-theory way. In some sense that's what we're doing, securing ourselves from exploitation.

Press and Dyson are expressing the radical idea that "if I protect myself from some stupid attackers, I protect myself from all smarter ones." How does this work? It works precisely because there is a threshold above which "smarter doesn't matter" -- a smarter attacker has the exact same payoffs as a stupid attacker playing a different strategy. In that sense there are no higher-order strategies for exploiting a sufficiently stupid strategy: sure you can program a smarter attack, but it cannot do better than a different, stupid attack.

So if I intentionally choose my strategy to be stupid and show that, for all of your stupid strategies, the expected payoff is the same -- then for all of your smart strategies, the expected payoff is also the same!

It's not quite "trying to predict randomness", it's "if I dumb myself down I can also assume, without loss of generality, that my adversaries are dumb."

stavros · on July 24, 2012

I think we're saying the same thing, yeah. I was remarking that any clever strategy doesn't confer any advantage to the attacker because there is no advantage to confer. They have perfect knowledge of your strategy just by knowing one previous move, and perfect knowledge is the most you can get.

Unless I'm confused again, it seems to me that there's just no advantage to them remembering more moves, since they know everything you're going to do from that last one. It's not that your strategy being stupid actively forces them to be stupid, they're not losing anything by being cleverer, they just don't gain anything because they already have it all.

scotty79 · on July 24, 2012

I played against extotionist strategy and I guess I got lucky because I'm leading (my score - 1) : (his score - 1) = 2:1 and won't change into 1:3 as advertised because from this step I will always defect.

    25. You defected (1 points) and I defected (1 points)
    24. You defected (1 points) and I defected (1 points)
    23. You defected (1 points) and I defected (1 points)
    22. You defected (1 points) and I defected (1 points)
    21. You defected (1 points) and I defected (1 points)
    20. You defected (1 points) and I defected (1 points)
    19. You defected (1 points) and I defected (1 points)
    18. You defected (1 points) and I defected (1 points)
    17. You defected (1 points) and I defected (1 points)
    16. You defected (1 points) and I defected (1 points)
    15. You defected (1 points) and I defected (1 points)
    14. You defected (1 points) and I defected (1 points)
    13. You defected (1 points) and I defected (1 points)
    12. You defected (1 points) and I defected (1 points)
    11. You cooperated (0 points) and I defected (5 points)
    10. You defected (1 points) and I defected (1 points)
    9. You defected (5 points) and I cooperated (0 points)
    8. You cooperated (3 points) and I cooperated (3 points)
    7. You defected (5 points) and I cooperated (0 points)
    6. You cooperated (0 points) and I defected (5 points)
    5. You defected (1 points) and I defected (1 points)
    4. You defected (5 points) and I cooperated (0 points)
    3. You cooperated (3 points) and I cooperated (3 points)
    2. You defected (5 points) and I cooperated (0 points)
    1. You cooperated (3 points) and I cooperated (3 points)

gjm11 · on July 24, 2012

> won't change into 1:3 as advertised because from this step I will always defect.

From the description of the extortionist strategy in the demo: "The only way you can avoid being taken advantage of is to resign yourself to the meagre rewards of mutual defection, and to defect on every turn."

So: yes, indeed you can do that, but then your long-run score will be rotten because it will look like a vast succession of (D,D) plays, 1 point each. If you let the Extortionist extort you, you will get substantially more points.

scotty79 · on July 24, 2012

"The only way you can avoid being taken advantage of is to resign yourself to the meagre rewards of mutual defection, and to defect on every turn. If you do anything else then I will take advantage of your cooperation and I will do three times better than you, in the sense that – on average over the long run – (my score minus 1) will be thrice (your score minus 1)."

I was refering to that. I did something else than dfecting all the time and got better result than 1:3 but that's ok since 1:3 is average over all strategies.

nopassrecover · on July 24, 2012

I found I could get better than 2:1 against extortionist fairly easily by mixing up my strategy as well.

andyjohnson0 · on July 24, 2012

It's good to see Freeman Dyson still producing new insights in his late eighties. A great mind.

Xcelerate · on July 24, 2012

For the split or steal game (with the golden balls), here is what I what do.

I would not open either ball, and I would tell the other contestant to pick one at random. Then, he MUST choose split for his own ball if he wants any chance of winning money. If my ball happened to say "steal" after opening, then I would split what I won with him.

I think this would guarantee both of us winning unless he was willing to take the chance of winning nothing.

pak · on July 24, 2012

He also has to be willing to trust that you will actually split what you win after stealing the whole pot. Otherwise, he will worry that you could dupe him and walk away with everything, and then choose to steal.

cgag · on July 24, 2012

You could just tell him you're going to choose steal: http://www.youtube.com/watch?v=S0qjK3TWZE8

Xcelerate · on July 24, 2012

I saw that video. That's what made me think of my idea. But mine removes the possibility that I might be holding out on "split" and I think has a better chance of guaranteeing we both split the money.

biot · on July 24, 2012

If I were playing Split or Steal, my strategy would be exactly this: tell the other player that I'm going to choose "steal" and if they choose "split", I'll give them half of the winnings. We would shake on it on national television which forms a nice contract: an offer, consideration for both parties, and acceptance. The beauty of this is that if they're not an idiot they'll choose "split" and we each walk away with half. However, if they are an idiot and they also chose "steal" then nobody gets anything and I can sue them for violation of the contract in the amount of the 50% I would have won.

Sadly, I suspect everyone would then copy that strategy and much like everyone choosing the letters R, S, T, L, N, and E in Wheel of Fortune, it would quickly become a formula and would ruin the suspense the show hopes to create.

redslazer · on July 24, 2012

Here is the video of this happening on the tv show. http://www.youtube.com/watch?v=S0qjK3TWZE8 . It worth a watch, I recommend it.

mmcnickle · on July 24, 2012

There's a twist in this example though. He says he's going to steal (but split after the show) and convinces the other guy to split. At the last minute he changes to split so that the split is governed by the rules of the game.

alexfoo · on July 24, 2012

I deeply wanted "the other guy" to steal leaving that guy with nothing.

mmcnickle · on July 24, 2012

I think that's why the guy proposing the deal ended up splitting. If the other guy did steal to spite him, the fact he split would show that he was acting in good faith and would increase the chance of a split after the show.

mmahemoff · on July 24, 2012

This is a clever variation on what happened:

"A more powerful variation would be to offer the same deal but without looking at the balls at the beginning, stating that you will pick one at random. The other player is then forced to split or risk walking away with nothing."

http://www.schneier.com/blog/archives/2012/04/amazing_round_...

disappointment · on July 24, 2012

The thing that always bugged me about Golden Balls (aside from the terrible name) was the way stealers would act like it was just a game when they had in fact taken thousands of pounds from another person through deception. Certainly it was legal and even encouraged by the format of the show but it was never the right thing to do.

I've seen people immediately regret doing it when they see that their opponent shared. They seem more upset about it than the loser and tend to cite an expectation that the other person would steal as their reason for stealing, even though this makes no logical sense. It was a fascinating but horrible ending to sixty minutes of otherwise tedious viewing.

redslazer · on July 24, 2012

Here is a video of golden balls and instant regret. The regret starts at 3:03 http://www.youtube.com/watch?v=p3Uos2fzIJ0

jsvaughan · on July 24, 2012

awesome

nicholassmith · on July 24, 2012

Logically you'd split every time, it's safer and guaranteed. However the expectation that someone else is going to steal so you steal makes twisted sense in a 'if I can't have they can't have it' way, which is human nature in children through to adults.

gwern · on July 24, 2012

The Hofstadter essays: http://www.gwern.net/docs/1985-hofstadter

CaptainDecisive · on July 24, 2012

The parable about the elephant and the goldfish was wonderful.

I imagined, when the time is right, telling it to my little boys as I tucked them in to bed at night. And then imagined one of them coming in to my bedroom later that night and beginning with "Papa, why did the elephant ...".

zmj · on July 24, 2012

Results and strategy summary of an iterated Prisoner's Dilemma tournament from last year: http://lesswrong.com/lw/7f2/prisoners_dilemma_tournament_res...

jere · on July 24, 2012

The most interesting story on IPD (especially from a hacker perspective) wasn't mentioned:

>Although tit for tat is considered to be the most robust basic strategy, a team from Southampton University in England (led by Professor Nicholas Jennings [1] and consisting of Rajdeep Dash, Sarvapali Ramchurn, Alex Rogers, Perukrishnen Vytelingum) introduced a new strategy at the 20th-anniversary iterated prisoners' dilemma competition, which proved to be more successful than tit for tat. This strategy relied on cooperation between programs to achieve the highest number of points for a single program. The University submitted 60 programs to the competition, which were designed to recognize each other through a series of five to ten moves at the start. Once this recognition was made, one program would always cooperate and the other would always defect, assuring the maximum number of points for the defector. If the program realized that it was playing a non-Southampton player, it would continuously defect in an attempt to minimize the score of the competing program. As a result,[8] this strategy ended up taking the top three positions in the competition, as well as a number of positions towards the bottom.

>This strategy takes advantage of the fact that multiple entries were allowed in this particular competition and that the performance of a team was measured by that of the highest-scoring player (meaning that the use of self-sacrificing players was a form of minmaxing). In a competition where one has control of only a single player, tit for tat is certainly a better strategy. Because of this new rule, this competition also has little theoretical significance when analysing single agent strategies as compared to Axelrod's seminal tournament. However, it provided the framework for analysing how to achieve cooperative strategies in multi-agent frameworks, especially in the presence of noise. In fact, long before this new-rules tournament was played, Richard Dawkins in his book The Selfish Gene pointed out the possibility of such strategies winning if multiple entries were allowed, but he remarked that most probably Axelrod would not have allowed them if they had been submitted. It also relies on circumventing rules about the prisoners' dilemma in that there is no communication allowed between the two players. When the Southampton programs engage in an opening "ten move dance" to recognize one another, this only reinforces just how valuable communication can be in shifting the balance of the game.

http://en.wikipedia.org/wiki/Prisoners_dilemma#Strategy_for_...

By the way, if you haven't read The Selfish Gene, do yourself a favor and read it now. It discusses IPD and coins the word "meme"... what more could you possibly want?

Kronopath · on July 24, 2012

It's fantastic how these people were able to play the metagame like that.

I suspect that if you caught on to this, the best strategy would in fact be to impersonate the Southampton players, mimicking the ten-move dance, and then perpetually defect, assuring the best strategy against either the "always defect" or "always cooperate" players.

jere · on July 24, 2012

Absolutely.

http://en.wikipedia.org/wiki/Cuckoo#Brood_parasitism

robinhouston · on July 24, 2012

I didn’t mean to imply that GTFT was discovered this year. I certainly don’t believe that. I’ll have a look at the wording, thanks.

Another interesting strategy mentioned by a commenter on the post is win-stay lose-shift, which reportedly outperforms GTFT: http://abel.math.harvard.edu/archive/153_fall_04/Additional_...

jere · on July 24, 2012

Yea, I didn't think you were but the wording was slightly unclear. Great article though.

macavity23 · on July 24, 2012

Watch the embedded video, it's a great illustration of the article. :-)

Toenex · on July 24, 2012

Thanks, very enjoyable read.