Hacker News new | past | comments | ask | show | jobs | submit login
Evolution of chess: Popularity of openings over time (randalolson.com)
95 points by rhiever on May 26, 2014 | hide | past | favorite | 42 comments



The Pirc spike in the 1850s looks interesting visually, but keep in mind that your data set back then is incredibly small. Looking at my own database, the spike seems to be entirely due to some guy name Mahescandra playing it 57 times against Cochrane. It certainly doesn't have anything to do with the increasing popularity of 1.d4 40 years later.

I've never heard of Mahescandra, but Cochrane is the guy the famous Cochrane gambit in the Petroff is named after, where White sacrifices a piece on move 4 (1.e4 e5 2.Nf3 Nf6 3.Nxe5 d6 4.Nxf7).


I double-checked my data set to look into this, and you're right. Nearly all of the Pircs in that time period were done by a guy named Moheschunder Bannerjee: https://en.wikipedia.org/wiki/Moheschunder_Bannerjee

He contributed to the development of the Indian Defence.


Why do you all have your own datasets of historical chess moves? Is this a popular playground for data analysis, or is it something chess learners study?


Chess compresses very well. You can fit pretty much every meaningful game ever played into 10gb or so.


It's something chess players study. Check out database.chessbase.com


Chess opening trends are like fashion--some high profile player(s), always on the search for new ideas, finds a resource in an unpopular line and suddenly it's all the rage. Everyone is playing it, working out the complications, finding ways to defend or neutralize the lines, then interest wanes until someone uncovers a fresh new plan somewhere else and the cycle repeats. Other times though, new resources aren't found and a line mostly dies out, like the Kings Gambit.

So what I would be interested to see from your data set is a relation between opening performance and its popularity. Did people stop playing the Pirc due to sub-par results compared to other openings at the time (like I imagine happened with the Vienna) or did it simply fall out of fashion? It would be interesting to know which lines always had good results, but just stopped being popular for whatever reason. They could be due for a revival.


With the advent of computer analysis, the serious chess player needs to have a basic understanding of almost every opening, with that understanding going deeper and deeper as you scale the ranking list.

For example, the King's Gambit hasn't really fallen out of favor despite the lack of long grandmaster games. The Falkbeer Countergambit (1.e4 e5 2.f4 d5) in particular has seen a revival as of late with the Nimzowitsch variation (3.exd5 c6!?). You might not see Carlsen and Anand playing it in tournament (mostly due to the risk-reward factor of having to solve complex positions over the board under time pressure), but they've almost certainly studied it with their team of coaches and researchers.

To comment on the Pirc: statistically, Black fares much worse with the Pirc in relationship to the most successful counter to 1.e4, the Sicilian defense (1...c5). From a gameplay perspective, Black has to commit much earlier with the Pirc than the Sicilian, since the Sicilian has about a billion variations starting as early as the second move.


Unfortunately, it isn't always obvious from the data which of your interpretations is correct. Take the following two fictional cases:

Case 1: The Limburger Gambit was popular for a few decades, but then grandmasters started playing the Limburger Gambit Deferred instead, so it fell out of fashion, although it is still a perfectly good opening.

Case 2: The Limburger Gambit was popular for a few decades, but then a response was found that refuted the whole line. It was played in one famous game, after which everybody abandoned it.

As far as the data set is concerned, the difference between these two cases is a single game, so it's not going to show up in the stats.


This is true, but I think there are many more case 2s where rather than an entire line being refuted in a single blow, gradually one side was able to find antidotes and the line stopped being a good practical choice. The Vienna is an example. So while it isn't always obvious, I'm hoping enough fall between your two Limburgers to be interesting.


The two biggest examples I can think of of "opening fell out of fashion but then was brought back at the highest levels" are Kasparov reviving the Scotch in the 1990s and Kramnik reviving the Berlin variation of the Ruy Lopez in his world title match with Kasparov in 2000.


As a novice player, it's frustrating to read stories about players beating other players "by the book," i.e. they know more openings than the other player does, so they play some unexpected opening and just destroy the opponent out of the gate. I guess this is possible in just about any game, not just chess, but it seems to reduce the optimal learning process for a novice to "start memorizing things". Bah. Edit: Not that memorization is bad or boring by necessity, but finding a way to make the memorization fun and interesting is in itself frustrating, knowing that you could learn openings really fast just by brute-forcing the various lines into your brain with things like memory palace techniques.


This is an argument that is raised a lot in chess circles. While there's some truth behind it, it tends to be applicable much more in an environment where both players are trying really hard to play "in book".

Before you get too deep into openings, you should have a solid understanding of the strengths of individual pieces. This leads to a better understanding of why you'd want to put your pieces in certain places. It kind of explains the broad strokes of openings, without the little traps here and there. This is known as the Russian school of teaching.

Basically, learning from this direction, you can spend much less time studying openings. You might want to know about few common traps in common situations that you tend to use, but that's not the focus. And you'd be surprised at how well this works - barring a tiny number of games where you're caught out, your better understanding of the overall game will win out.

Essentially, overspecialising in opening repertoire at the novice level is a kind of arms race which only really works against other similar players. It's not a good long-run strategy.


The "purpose" of the opening is to get yourself into the sort of middlegame position you like to play. (Some players love having white against a French Defense, for example, while other are quite comfortable with black's side of it.) Understanding which types of positions you prefer takes experience, and while you're gaining that experience at the lower levels tactical skill can dominate. So it can seem like knowing the "book" moves for a bunch of openings is important when in fact the problem is that your opponent has better tactical skills than you or perhaps you have trouble judging a position and developing a plan. That's why specializing in a few openings can be helpful -- you get lots of practice with similar middlegame positions at the same time you limit the number of lines you have to deal with.

Now if you like open games such as the King's or Evans Gambits (i.e., fashions of 100+ years ago) things will get tactically bloody pretty quickly and so gaining a knowledge of where the booby traps are will be prudent. But if your taste runs to semi-open games, say, you can limit your opening study to a relatively small number. I happen to like the Dutch Stonewall formation as black for example, and the precise move order for getting into a position I'm comfortable with doesn't require a lot of memorization.

Happily we live in a time when computers can not only entertain us in chess but also help us with learning chess. You can improve your tactical skills by studying a tactical positions card deck in Anki for example. Or you can work on a particular opening by setting the computer to always start games from a particular position. You can use that same starting position trick to train yourself in endgame play. The important thing is to find ways to keep the game fun while you improve your play.


If you are a novice player, you really don't have to worry about this. Memorizing opening lines doesn't become an important part of the game until you are a very good tournament chess player. Until you are much better, it is far, far more important to have a general understanding of opening principles and a good eye for tactics.

This is not to say that opening study is not very important at higher levels, and if that upsets you in principle about the game, I totally understand. But it distresses me to see novices give up on chess because they are under the impression that at their level it's all about memorizing some killer opening shots.

Note that high-level Go play requires a lot of memorization too (joseki, life and death status of common shapes, etc.).


Knowing an opening isn't the same as memorising moves. Knowing the opening is more about understanding the position, it's strength and weaknesses, the immediate tactics available, and the natural flow of pieces towards their strategical aims.

Yes, tactical lines do require more knowledge of concrete lines, because they are complicated to work out over the board. But at that point the memorisation should be a "summary shortcut" of your understanding or assessment of the position.

If it were about memorisation, what happens when both players reach the end of their memorised lines? If it's not mate, or one player being substantially better off (in which case, why did the other player memorise that line too), how do you proceed from there?

And that's where knowing an opening comes to the fore, the important part of the opening isn't the moves themselves, but how the game opens up to the middle-game. Understanding the types and structure of the middle-game that's derived from the opening is a far more valuable investment of time. For instance, knowing how to play with and against an isolated d-pawn - a significant number of opening variations lead to these kind of structures. Hedgehog structures, static double-pawn structures, backward pawns, outposts on an open file.

Knowing how to play a position is long term more valuable than memorising a routine of moves. Understanding over rote-repetition.

Granted, some positions are predominantly tactical, and those you need to calculate - either at the board, or before hand. Memorisation is a poor substitute for actual experience.

One of my proudest over-the-board achievements was understanding how to exploit the positional drawbacks of the Black side of a Stonewall Dutch. And beating a player 600-700 Elo stronger than me because I understood the White knight manoeuvre from f3-e5-d3, before exposing Black's backward e6-pawn by opening the centre with the f3 + e4 pawn advances. I spent a week trying to understand one model game of that opening, white's plan was logical, and the strategic aim of exposing the backward e6 pawn easy to understand and attempt.


That happens mostly at the high levels, in novice to intermediate levels if your adversary plays an unorthodox move in the opening and you play by principles you have the advantage.


Yes, that is the reason many good chess players switch to the game of go. For example Jan Bogaerts in belgium was about 2350 ELO before switching to go.


The analysis is interesting. However I'm not sure it has much practical value due to transpositions. For example, as white I play 1.Nf3 and if black plays d5 I play d4 and we have a d4 opening. If black plays c5 I play c4 and depending on what black does it will transpose into either an English opening (1. c4) or a maroczy sicilian (1. e4) or an indian defence (1. d4).

So basically, my opening move would be classed as 'other' but really it is one of 1.d4, 1.e4, 1.c4 in terms of the classifications of this post.


If any of the alternative paths were common enough, they would show up in the charts as well. I didn't limit the analysis to a particular set of moves; I simply counted all of the paths present in the data set and showed the most common ones. This is why two variations of the Indian Defence show up in the "White's second move" chart.

I think it'd be interesting to try to combine all possible paths for an opening into a single count, but that would probably be complicated if multiple openings can be reached through the same path. (e.g., which opening would the shared path be assigned to?)


The traditional way to handle this is by classifying a game according to the last cataloged position that occurs in it. This is how ECO classification works; you can see its catalog of positions at http://www.chessgames.com/chessecohelp.html.

For example, just after White plays 1.Nf3, the game is classified as A04, but after 1...d5 2.d4, it's now officially a D02, over in the Queen's Pawn category, just as it would have been if the game had started 1.d4 d5 2.Nf3.

Databases usually keep track of chess openings played by ECO code rather than by specific moves, exactly so that these transpositions are handled smoothly.


I wish I knew about this earlier! Thanks for explaining it to me though. :-)


It is quite complicated. The examples I gave are quite simple, but in a lot of cases the transpositions can happen many moves into the game. The maroczy sicilian example I referred to is one of these cases, where I might not actually play e4 until say move 8-10, but in a general sense it probably should be classified as an e4 opening.

As an overall guide I think what you have done is fine, but I suspect something in the order of 10-20% of games might be subject to transpositions where the opening move isn't an accurate categorization.


And by the way, two of the most important kinds of transposition are:

1. Players who want to play Queen's Pawn opening but avoid certain lines in them start in an English instead.

2. Players who want to play Queen's Pawn or English openings but avoid certain lines in them start with 1. Nf3 instead.


for some openings, the 'paths' do not really matter. You might have 25 transpositions leading up to the same position after 10 moves. A well known example is the classic isolated d4 pawn position that might come from: - Nimzo Indian - Caro-Kann Panov - QGD - Slav - c3 Sicilian - Alekhine's defence and others.

Popularity has not a lot to do with the openings themselves but with other things like a good book being published, or a popular match (like WC).

Another thing you have is that some variations are really popular, and have good results, but are at one point refuted by a single game. As a result of this, the variation dies. but remains to have very good statistics.


I noticed that the queen's pawn opening has grown in popularity considerably.

I don't really play chess anymore, but two decades ago, I played quite a bit. Much of the time, it was against a computer. I couldn't win. I finally found a book that really worked for me, and it taught me the "stonewall" approach, where you (playing white) advance the queen's pawn, lock up the game, and start maneuvering very tactically to restrict the movement of black's king and get a mate.

I found that this was the only way I could beat the computer - largely because it allowed me to think along a very few narrow lines (often involving a sacrifice), while the computer would get bogged down in useless searches. This was all in the late 90s, my guess is that it wouldn't work anymore.

So, I'm wondering if the surge in queen's pawn opening is the result of people being conditioned to play against computers? And it brings up another point - I've heard (man, I wish I could find a link to this theory on the web) that chess masters who lament the rise of computers actually aren't particularly bummed that computers can beat the strongest human chess players. They're bummed that the best human chess players are now playing a style that is heavily influenced by years of practice against a very specific style of opponent - a powerful computer that can't really make a "mistake". Young people have opportunities to train that never existed before, but because they don't train against stronger human players as often, some of the style of the game has been lost. I'm not (and never was) good enough at chess to know if this is true, but if anyone has an angle on it, please chime in!

Oh - by the way, a very satisfying ending here. I knew a very brilliant guy who I couldn't beat at chess in high school. After I read this "stonewall" book, I was all ready for him. I advanced queen's pawn, all read to lock it down and take away his advantage. He simply advanced his king's pawn, basically telling me, go ahead and take it. The wide open game against an inferior opponent was worth far more to him than a pawn.

Good for him, though not so good for me ;) That's the kind of thing you do when you're experienced playing against people rather than computers.


Was that book with the Stonewall Attack "How to Think Ahead in Chess"? http://www.amazon.com/How-Think-Ahead-Chess-Techniques/dp/06...

I don't think that the current popularity of 1.d4 is due to playing against computers. The main reason it is currently more popular than 1.e4 at the highest levels is that Black has found a very effective way to counter 1.e4 (the Berlin variation of the Ruy Lopez) that makes it difficult for White to achieve more than a draw. This high-level popularity then trickles down to lower levels.

It is definitely true that the style of play has changed somewhat due to computer engines, but it's not so much the result of playing against computers (top players generally don't find that very rewarding), but of analyzing with computers. You can play through a game or an opening and explore variations, with the computer constantly telling you what it thinks the best moves are and who is winning by how much. As a result, players become trained to evaluate positions closer to the way a computer would. One example is that computers don't mind grabbing material and defending an unpleasant position for a long time if they don't think that the opponent can break through. This has given players confidence to be more materialistic than they were in the past.


Yes, that was the book!

This also reminds me of a Martin Amos novel "the information" (based on the publication date, I'd guess I read it around the same time I was learning the stonewall attack). There's a section where one writer is determined to defeat his friend at everything (tennis, chess, a few others), and he hires people to help him.

The chess teacher, I believe, teaches him something that sounds like the stonewall defense, and reflects that it almost feels like he's teaching someone to cheat at chess rather than play it. The writer isn't cheating of course (at all), but I think it feels this way to the chess master/instructor because he's learning the stonewall because he doesn't really want to engage in the actual competition. He wants to sidestep it in order to win.

I'm not trying to knock the queen's pawn defense, more the way I (and this writer) were trying to use it. I'm sure queen's pawn openings can be very creative. But in my case, this is why I considered my buddy's play of the kind's pawn a "satisfying ending." I was (to use the tennis analogy) using a pusher's approach. When you watch two people playing a game, and it's clear that one person is trying to win by shutting down the game rather than really playing it, you tend to root for the person going for creative play.


The visualisation I would most like to see with respect to openings is a 'heat map' like diagram, showing how likely a given piece is to occupy each square after n moves.

Are there any squares never used after the first move?

Just what weight is given to the centre of the board after the first few moves?

This sort of information could be communicated really well with a view of the board, and the pieces expected to be in each square.


Out of curiosity, is chess solvable yet by computers? Meaning, is it possible to simply brute force every possible legal game up to n moves and determine all the winning and losing move sets? What's this number look like theoretically?

(I'm sure in the general sense games with a very large n aren't as I suppose a game could be played in perpetuity)


No, it is not yet possible to brute force all chess games. The largest extant endgame database has full solutions for seven pieces on the board. Once you get to that point, chess is solved. ;-)


Just to clarify what you alluded to, it's theoretically possible for chess to be solved, and progress is being made slowly, but surely (it took around 7 years for chess to be solved up to 7 pieces). Of course, with each additional piece comes an exponentially larger set of positions, so progress from 7 to 8 pieces should take much longer assuming there aren't any massive breakthroughs in computing speed.


AFAIK, the most complex board game solved to date is checkers http://en.wikipedia.org/wiki/Chinook_(draughts_player). Chess is also solvable, but researchers are far from solving it yet


An important distinction here is that checkers is also only weakly solved (Connect Four, by contrast, is strongly solved). https://en.wikipedia.org/wiki/Solved_game

This means that Chinook can play perfectly from the start position against any set of opposing moves, but if you play a move on Chinook's side that it wouldn't play for itself, the resulting position is probably not solved by Chinook.

An arbitrary position is in fact very unlikely to have been solved.


>every possible legal game up to n moves

That's easily above 10^30 positions. 10^30 is a very low lower bound, too.

So, uh, no.


Well, that's why I specified n moves. Entire games can take 2 moves. So for n=2, it's probably pretty easily solvable. Probably for n=3 and maybe 4 as well.

It's also not strictly exponential as n increases, pieces are removed from the board for example restricting the number of legal moves as the game progresses.

There's probably interesting solvable subsets up to large numbers of n, like maximum captures as the game progresses, or all games where there are zero captures or some such.

It's not a simple matter of calculating every possible theoretical legal or non-legal state of the board with all the pieces.


>as n increases, pieces are removed from the board for example restricting the number of legal moves //

As the pawns are moved to make way for the back row the number of legal moves increases very quickly. For example if white plays d2-d3 then it frees the King, Queen and Bishop to move; 2 possible moves are removed and 8 are added.


Right, but eventually the number of possible next moves starts to drop. A game with a King, Queen, and Two Bishops has more possible valid next moves than a game with just a King and a Bishop left.


Great blog post. It's interesting to compare my knowledge of opening evolution with the historic data. During the Romantic Era, King Pawn games were clearly the norm. When Reti and Nimzowitch introduced hypermodernism in the 1920s Indian openings became much more popular.


i don't know much about opening but i play chess frequent in my younger age and sometimes computer. It's just a pattern.But to win with computer,must follow non pattern/opening to win.. Very long time not playing chessmaster software.


I'm under the impression that introducing the Queen too early (say < 6 moves) usually leads to it being taken and that player losing.

I don't know how much this is actually true though. That would also be a good thing to look over


Indeed, one possible consequence of developing the queen early is that it can be harried with tempo by the opponent; that is, he gets to make you respond while making developing moves he wanted to make anyway.

However, except in cases of gross incompetence it doesn't lead to your queen being taken, it's just suboptimal. There are also plenty of exceptions to this rule, of course. The most obvious one is the main line of the Scandinavian Defense, 1.e4 d5 2.exd5 Qxd5. This isn't a grade-A opening but it's a solid B+ and occasionally gets used at the highest levels.


Where does one download the data? The first post in the series links to the website but there doesn't seem to be any download links anywhere




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: