Hacker News new | past | comments | ask | show | jobs | submit login
When Grandmasters Blunder: Even the best make mistakes (medium.com/pachyderm-data)
92 points by jaz46 on Feb 16, 2015 | hide | past | favorite | 43 comments



It seems to me that the article overlooks one glaringly obvious issue: that the two blunders may not be independent events.

In this case, it seems quite likely that the second player's blunder was made much more likely by the fact that the first player had just blundered. To be more specific, white moved the king which appeared (at first glance) to prevent black from using a check threat to attack white's rook. The blunder was in not realizing that the check threat could still be used to attack white's rook, albeit in a more complicated fashion.

Black responded to this with another "blunder" -- failing to attack the rook and moving elsewhere instead. But this blunder was NOT independent of the first -- it is quite likely (I believe) that black saw the move and assumed white had successfully prevented the attack on the rook. He assumed that such a top-level player would never make such a mistake, and that caused him to not look closely enough at it. The first blunder helped cause the second.

(Thanks to stolio for linking to the game analysis I used here.)

One could test this hypothesis of mine using the same data set. Instead of looking just at single errors, look at error pairs (one error occurring in the move following another error). If the probability of a blunder is significantly higher on the move immediately after a blunder than it is at any other time, then my hypothesis (that the events are not independent, but correlated) is supported.


I was expecting the article to focus on this and dis/prove hypothesis of double blunder.

Most top players would comment (as Anand and Carlsen did after the game) that double blunders are relatively common in top level play.

Here is one famous case: http://www.chess.com/article/view/the-amazing-chess-illusion

In my personal experience this has been common too, when I mix playing against 2500 players and 1900 players in blitz (I am 2350fide), it is relatively easy to skip over simple hanging pieces for a move or two.

In a regular tournament game it has happened a few times as well (one player commiting a gross blunder and other not noticing).

The big question whether it is out of ordinary statistically speaking.


This is a valid point. We're going to be publishing some data in a few days that will have everything you need to test this hypothesis.


"He assumed that such a top-level player would never make such a mistake"

But if top-level players make such mistakes in about 1% of their moves, that assumption is utterly wrong (1% per move translates to (ballpark) once in every five games that an similarly ranked opponent essentially gives you victory, if you yourself manage not to blunder), so one could call making the assumption a blunder.


I agree that they are probably not independent, but I have a different idea as to why. From my personal experience of playing many competitive games, when your opponent makes an obvious mistake you might get too excited or carried away about capitalizing on the mistake and make a subpar move. Consider a situation like this:

"Wow he just left his queen right open! I can't believe he did that! I'll take it with my rook."

Rook takes queen, rook is wide open, could have taken queen with some other piece and had it protected.

It's possible this doesn't apply to chess as much but in more fast paced games if you see an opening, you take it, because even if you don't capitalize 100% on it you're better off than not doing anything about the blunder.


Using "number of pawns of evaluation lost" as a proxy for the severity of the blunder has some fundamental problems. The main one is that the relationship between evaluation in "pawns" and expected result (expected value of the game result, from 0 to 1) is not linear. (It couldn't be, since one of them maxes out at one.) It's actually more of a sigmoid curve.

This means that a player may easily make a horrific "3-pawn blunder" reducing his evaluation from +8 to +5, but in fact all he's done is reduce his chance of winning from 99% to 98%. Actually, the +5 move may even be better in practice, in that it might lead to a sure safe win rather than a tricky blowout.

Even if you changed the definition of blunder from "reduces the evaluation by n pawns" to "reduces the expected result by x", I would have an issue in that it ignores any of the human aspects of blunders. If someone drops a pawn outright for no reason (eval change -1), that is a blunder because it was so trivial to avoid. But if someone, even a grandmaster, makes a move that causes a large drop in eval due to allowing a sacrifice that no human could calculate all the ramifications of, because as far as he (and probably his opponent) could humanly calculate it didn't lose, it is hard to call that a blunder. (Conversely, failing to see some immensely complicated non-forcing winning move may be unfortunate but it's not a blunder.) But that's more a cavil with terminology than a methodological error; the study is still measuring something interesting, just not quite what I think it is claiming to measure.


Somewhat agreed. I'm a decent-ish amateur player and if I'm playing bullet or blitz chess with very low time left on the clock (<10 seconds), and its king pawn vs king ending and I'm about to convert my pawn, I will almost always choose to under promote to a rook instead of promoting to a queen because I am 100% sure I can mechanically checkmate my opponent without accidentally stalemating (due to a blunder under extreme time pressure) while spending virtually no clock time. I could almost certainly do it with a queen, but it's for my own peace of mind that I use a rook instead. Safe and easy victory, but it'd count as a blunder.


Nope, "number of pawns" is only a notional number, it's a score calculated by a chess engine. Being 1 pawn ahead may just mean a particular position where one side has an equivalent advantage though not necessarily being a physical pawn ahead. Another aspect is sometimes you're a physical pawn short, but the position evaluation may only show -0.3 pawns against you, meaning you've got positional or counter-play advantages to compensate. Often players will sacrifice pieces for counter-play and activity.

Chess engines also implement a heuristic called 'contempt' where they may make a sacrifice in order to avoid a drawn position, when faced with an inferior opponent.


Your response has absolutely nothing to do with the point the parent poster makes and completely and utterly misses the point.

He is arguing that "percentage of winning" is not linearly related to "pawn or equivalent advantage". That has got nothing to do with whether those pawns are physical ones or positional advantages that have equivalent value.


Yes, I know how computer evaluations work. The fact that they take positional as well as material considerations into play doesn't change the point, which is that playing a +5 sure-win move instead of a +8 sure-win move is not a horrific magnitude-3 blunder the way that playing a -2 move instead of a +1 move is, whatever your units of magnitude are, because what really matters is the change in the expected result of the game.


Yup, this is another very valid criticism. I think the answer is probably to have a cutoff on the lower bound. Basically saying that for a move to be a blunder it has to leave below a certain absolute value, maybe +2 pawns, in addition to a certain amount below the best possible move.


> Due to cost limitations we had to limit crafty to 2 seconds of analysis time per move

A grandmaster with standard time controls could defeat a 2-second limited Crafty. So how do you know you're finding true blunders, and not simply positions that the engine evaluates incorrectly?


This is definitely the biggest limitation of our approach right now and there are certainly some things that we counted as blunders that aren't true blunders. We're working on rectifying this by doing another pass with a better engine and more time to analyze.

That said we tested this on a smaller set of games by comparing it to results from better engines and found that only a very small number of moves tricked crafty. It's still generally quite reliable for the majority of moves.


you could just rewrite your article to call these "obvious blunders" - i.e. which you define as ones that crafty identifies in 2 seconds or less. redefine what you're doing so your methodology is correct :) Plus it's still interesting. Probably more interesting than blunders that take longer to identify!

Once you have found the blunders, you can verify them by analyzing the found positions more deeply. (Of course you should also report the number of false positives - ones that appear blunders after 2 seconds but turn out not to be on slightly longer analysis.)


Thanks the response -- I really enjoyed your article.

The results of the cross-validation you mentioned would be interesting as well.


Hi guys, author here. I'll be monitoring this thread for the rest of the evening. Happy to answer any questions.


Hi! Did you consider modeling the fact that a blunder in the first ten move-pairs seems far less likely in expert play, given that they're usually playing from book and there are fewer pieces in motion to consider?

(I'm wondering if blunders after move 15 are in fact far more common than your model suggests, and they're just being extremely diluted in your stats by correct opening play almost every game.)


You're totally right about this. In general the efficacy of engines is different for different parts of the game. In the beginning players (and engines) are mostly playing from books so there's a lot fewer mistakes and what the engine thinks of as a mistake is just someone going outside its pre defined book. Mid game the analysis is very effective. In end game the engine can actually completely solve the game, so it really just judges moves as winning or losing.

Spoiler: We're planning to address a lot of this in an upcoming followup I'm working on right now. We're going to take this in to account in our analysis as well as giving people the raw dataset with info about when the blunders occurred so that can learn from it themselves.


« Assuming Anand and Carlsen’s blunders were independent events, what we saw was a 1 in 10,000 occurrence. »

Are they independent events though? In a game between mediocre players, if there is only one move to take advantage of a blunder, the computer analysis will repeatedly cry "blunder!" each turn until either that move is played or the initial blunderer defuses the opportunity. As for grandmaster play, I have no clue.


My hypothesis would be that GMs are much less likely to blunder when there's a winning move on the board. They're generally very good at finding such moves.


What time controls were you looking at? I'm half-jokingly wondering if the big dip of correct moves in the upper 2800 range is Nakamura's crazy opening style :) (He's rated upper 2800's in blitz and rapid)


My guess: the only one that has been rated in that bucket in the last year was Caruana, post-Saint Louis. At that point, he suffered (and is still going through) a big decline with several bad blunders.


This was the leading hypothesis on r/chess. I'm pretty sure it's right Naka didn't hit this rating in 2014 as far as I know.


The only other candidate would be Carlsen when he dipped to the high 2850s.


One thing I know from playing serious Bridge: An expert player makes far fewer mistakes than the average player. He does not make zero mistakes.

I read one tournament report, where an expert player revoked.

When an expert plays good or average players, he does not need to be brilliant to win. He just has to play competently, and wait for his opponents to make mistakes.


A revoke at a bridge tournament is the closest thing to a crime scene that I've ever seen. I half-expected the tournament directors to cordon off the table in yellow plastic tape while they recreated what happened.


Some of the rules are awkward and confusing to someone who doesn't understand them.

For example, you can convey information to your partner, based on how long you take to bid. Technically, you aren't allowed to have that information.

Me - Left Opponent - Partner - Right Opponent

1NT - x - xx - pass

pass - 2C (after long pause) - x - pass (after long pause)

pass - 2D

Explaining:

I played a weak 12-14 HCP NoTrump opening

opponent on my left doubled, showing a good hand

my partner redoubled, saying he also has a good hand (i.e., we got them now)

Rather than letting us make 1NT redoubled, the opponent on the left ran out to 2 Clubs.

My partner doubled, because he had good clubs (i.e., we got them).

The opponent on the right passed, but he waited a long time before passing, illegally conveying to his partner that he was not sure if they should stay in 2 Clubs or run.

Taking advantage of that (illegally obtained) information, the opponent on the left decided to run to 2 Diamonds.

So, someone who does not understand the concept of unauthorized information would not understand why I would be disadvantaged.

At a local club game, I would let it slide. At a regional tournament, I'd expect the director to get it right.


See also the papers by IM & PhD Kenneth Regan on "Intrinsic Chess Ratings" such as http://www.cse.buffalo.edu/~regan/papers/pdf/ReHa11c.pdf .


Here's commentary on the Carlsen/Anand double-blunder: http://youtu.be/6K86f27uuP0?t=14m36s

It's not easy to see.


These results surprised me. I expected a much wider gap in correct move % between a 1500-player and grandmaster. It'd be interesting to see if the slope of the graph is steeper for minor blunders that reduce the evaluation by less than a pawn. These are the more subtle positional errors - weakening a square, not maximizing piece activity, wrecking your pawn structure, etc. Amateur games are filled with these mistakes, but they are much rarer in GM games, and I'd expect the difference to be more than just a few percentage points. But Crafty's not the right engine for this job. You'd want something with a more sophisticated evaluation function, like Stockfish (several-hundred ELO stronger than Crafty).


Give us a few days :p. We'll have exactly the dataset you need to answer these questions. (And we'll be releasing it publicly.)


Great! After thinking about it some more, I think I understand why the graphs are flatter than I expected. There are differing degrees of difficulty in tactical mistakes. When a 1500-player blunders a pawn or piece, it's often resolved by a trivial one-move sequence. GM blunders are more subtle, often requiring a lengthy (say 5-10 ply) sequence to resolve. You could prove this by recording the minimum search depth the engine needs to recognize the blunder. (This is tricky, because search extensions result in many sub-variations being analyzed much deeper than the nominal search depth, but I seem to recall that Crafty has an option for disabling extensions.)


Grandmasters blunder more often than this. I would venture to say that what correlates with blunders more so than rating is time. Error rate goes way up in Blitz and Rapid.

IMO the more interesting thing about chess skill at the top is how much way way better GMs are than everyone else.

To me, ratings at the top feel more like an exponential scale than a linear one. For example, I have beaten International Masters at chess lots of times but have never once beaten a GM.

If I studied or cared (which I don't), I think maybe it would be possible to squeeze out a lucky win once in awhile. Aspiring to be a punching bag isn't a very appealing notion though, so you can understand my lack of motivation. GMs are crazy good.


ELO ratings are constructed such that skill differences are invariant across the entire scale. A person who plays someone 200 ELO points lower is expected to win 76% of the time.


> Grandmasters blunder more often than this.

Could this analysis be a lower bound? I'm not familiar with Crafty, but given that all the games were annotated in 6 hours of wall-clock time, this analysis can't be going extremely deep into the game-tree. There may be many more moves which would qualify as blunders if analyzed as deeply as Regan's work in the other comment.


>To me, ratings at the top feel more like an exponential scale than a linear one. For example, I have beaten International Masters at chess lots of times but have never once beaten a GM.

If true this is purely psychological. You are unable to beat a GM because he's a GM and you think you're unable to beat GMs.

The strength difference between IMs and GMs simply isn't that great. Because the GM title is based on results and not ratings there are frequently IMs who are higher rated than GMs.


I think you are citing the exception(s) to the rule. Most GMs are stronger than IMs imo. I don't think it's psychological. I have played players (GM and otherwise, including other untitled players like myself) that I know are so much better than me because they win and I can't even comprehend how they arrived at making the moves that they did.

As an aside, this is kind of an issue I have with chess analysis. A computer can 'verify' that a certain move is good or bad. That's fair enough. But in the past I have seen players (of lower skill level to me) discuss analysis in for example, a battle between two bigname players.

I have sometimes wondered if these discussions are truly honest because I have seen moves made by top players that I don't even understand how they arrived at the process of deciding that was the correct move vs others. Excluding GMs, a human simply cannot prune the game tree at depth like a computer can. So discussing a few tiny branches of the game tree like one is correct and the others aren't just seems really silly for the rest of us.


Time definitely plays a huge roll in blunders. Actually if you look at the first graph there's a spike in blunders around 2800 which is entirely due to a single GM (Caruana) playing a string of blitz games this year in which he made several blunders.


Blitz games? Is your data from Blitz games played on an online server?

I like the idea of your research, but blitz games are garbage and online ratings are frequently meaningless due to abuse.

You should also look at replacing Crafty with Stockfish. Stockfish is still open source and it's around 350 points higher than Crafty which is a huge amount at this level.


Elo ratings are on an exponential scale, at all levels.


OK, so they've analyzed 4.9 million moves. How many double blunders did they find in the set? If the hypothesis of independent events is true there should be about (4.9e6 / 10,000) = 490 doubles in the data set. An obvious way to test accuracy of the model is to compare that to the actual number.

Why hasn't that comparison been done/mentioned?


What exactly is blunder? Watch this game: http://www.chessgames.com/perl/chessgame?gid=1032537

Tal sacrificed horse and queen.


Not relevant to chess, but in Japanese there is a saying: saru mo kikara ochiru. Not sure if I spaced that correctly but it amounts to "Even monkeys fall from trees"




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: