This was an interesting project at Google, it started when I was there, and it was breaking things when I left. It is too bad that Ken Thompson didn't get at least acknowledged for his role in making it happen.
I don't think it will be as influential as the original GFS was but its an important piece of work that folks should study.
No, I think it's critical. I worked on one of the first services to ever use Spanner when I was an intern. Lock-free read transactions is a game changer. Short answer -- if your database system can't do lock-free reads, your database is broken. That one feature allows one to do some incredible performance optimizations.
Ken sat near Jeff, Sanjay, and co while they were designing Spanner, and he regularly consults in an informal capacity on people's projects. I wasn't there, but it wouldn't surprise me at all if Ken's unique insight contributed to Spanner's design.
Ken's mind just works in a different way to most people. You explain your problem to him and he'll respond with some question or statement that turns your entire perspective inside out.
"The aggressive use of a small number of abstractions is, I think, the direct result of a very small number of people who interact closely during the implementation."
When I was in the platforms group looking at storage issues the Spanner requirements had a lot of commentary from Ken in them, so much so that I thought it was his idea/project until someone corrected me a bit later. That was why I was surprised he wasn't acknowledged, from where I sat it seemed like he was one of the architects of the effort. Apparently that wasn't the case.
That's a nice way to put it. That's exactly why these inventions are so interesting: they seem give insight in to problems of another order of magnitude.
Interestingly, the data storage seems similar to Rich Hickey's Datomic: "data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to configurable garbage-collection policies; and applications can read data at old timestamps."
I think the major contribution in this paper is how to do consistent snapshot reads in a distributed system without a common reference clock, i.e. the use of True Time.
Many databases use some sort of MVCC, but they operate on a single node or in a closely connected cluster. This paper shows how to achieve the same properties in a system spanning continents.
Another observation that struck me when I read this (and after reading the percolator and megastore papers) is how there is a convergence of the "traditional" relational DB world and the "new NoSQL" world.
Relational Databases are becoming more scalable, partially with new technology, partially by shedding features in some scenarios.
And the NoSQL stores, are becoming less so (it was really about "NoSQL" anyway, but that's a different story). All of these stores have layers or features that bring closer to the traditional SQL/relational model.
This looks like the High-Replicaiton datastore which is now the default in App Engine - Paxos replication, a choice between strong and eventual consistency and tablet sharding. Interesting that they've already built it and it's available for everyone to use.
Even more interesting, "Spanner" is English for "something that spans", as in a database spanning the world.
Maybe it's a bit snarky, but I really don't see how you can read into something like that. It reminds me of the following Jack Handy quote:
Maybe in order to understand mankind, we have to look at the word itself: "Mankind". Basically, it's made up of two separate words - "mank" and "ind". What do these words mean? It's a mystery, and that's why so is mankind. - Jack Handy
Did you read the first page? BigTable has no trnsanctions, and scales, but is a pain for apps that need consistency. Spanner adds transactions for apps that need it, at scale, charging a tax in the form of latency.
Using two different clock technogies per node (GPS and atomic!) and light speed networking helps make this manageable.
We read about this at work at google a few months ago in a reading group. (perks of the job) And we spent almost the entire time talking about the timestamps. It is perhaps the most important piece of this paper. Fault tolerant time is right.
Yeah, I've been doing research in distributed systems and the timestamp part of this paper is incredibly interesting to me. It's awesome that I might actually get to cite something more recent than Lamport.
I don't think it will be as influential as the original GFS was but its an important piece of work that folks should study.