Hacker News new | past | comments | ask | show | jobs | submit login
Spanner: Google's Globally-Distributed Database (research.google.com)
258 points by SriniK on Sept 15, 2012 | hide | past | favorite | 55 comments



This was an interesting project at Google, it started when I was there, and it was breaking things when I left. It is too bad that Ken Thompson didn't get at least acknowledged for his role in making it happen.

I don't think it will be as influential as the original GFS was but its an important piece of work that folks should study.


No, I think it's critical. I worked on one of the first services to ever use Spanner when I was an intern. Lock-free read transactions is a game changer. Short answer -- if your database system can't do lock-free reads, your database is broken. That one feature allows one to do some incredible performance optimizations.


What _exactly_ is a lock-free read transaction? Is it different to reading in a MVCC system?


>if your database system can't do lock-free reads, your database is broken

Yep.


I know CouchDB doesn't do read locking. What are other ones out there?


Not doing read locking is not a game-changer.

Firebird doesn't do read locking. Neither does Lotus Notes. Both have been around about 20 years.


Not doing read locking alone. Combine it with a planet-scale data storage system...


Postgres and Oracle?


PostgreSQL uses MVCC to ensure ACID compliance without read locks. Uncommitted concurrent transactions are isolated from each other.


Datomic.


"non-blocking reads in the past"

Sounds like google's finally invented time travel.


Actually, this is exactly how transactions in Oracle work. The difference is - one db server (Oracle) vs. distributed system (Google)


Oracle doesn't have to be one DB server. Check out Oracle RAC for instance.


I would also be interested in a longer, elaborated answer.


The lessons are pretty much the same as the ones functional programming have been trying to teach us for years: immutability and caching.

Beyond that I would rather not elaborate for reasons of confidentiality.


Value addressing. MVCC.


What is "value addressing"? That's the first time I've seen that term and google doesn't bring anything relevant up.


This is a wild guess, but I guess it might be a synonym for content addressing

http://en.wikipedia.org/wiki/Content-addressable_storage#Con...


+1. Neologism of an autodidact. IMHO it is more correct, regardless.


Elaborate?


> It is too bad that Ken Thompson didn't get at least acknowledged for his role in making it happen.

Interesting, can you say more about this?

Is he not mentioned because officially he is part of the Go team?


Ken sat near Jeff, Sanjay, and co while they were designing Spanner, and he regularly consults in an informal capacity on people's projects. I wasn't there, but it wouldn't surprise me at all if Ken's unique insight contributed to Spanner's design.


I'm familiar with Ken Thompson, so I'm more puzzled than someone who isn't familiar with his work might be.

What exactly is his unique insight? Do you know any specifics or are you just speaking on behalf of the fact that he's a famous programmer?

I say this as someone whose Planetside2 character is named: "KenThompsonHackerExtraordinaire"


Ken's mind just works in a different way to most people. You explain your problem to him and he'll respond with some question or statement that turns your entire perspective inside out.


Enneff works with ken in the Go team at Google.

As for his particular insight, if you are familiar with his work, that should be enough.

For those not familiar with his work, this interview might be a good starting point:

http://genius.cat-v.org/ken-thompson/interviews/unix-and-bey...


This is golden:

"The aggressive use of a small number of abstractions is, I think, the direct result of a very small number of people who interact closely during the implementation."


It's from 1997 I guess ? Seems like he was wrong about Linux and maybe Microsoft :)


When I was in the platforms group looking at storage issues the Spanner requirements had a lot of commentary from Ken in them, so much so that I thought it was his idea/project until someone corrected me a bit later. That was why I was surprised he wasn't acknowledged, from where I sat it seemed like he was one of the architects of the effort. Apparently that wasn't the case.


I work on HBase (the Apache version of BigTable). It makes me sad to see how far ahead Google is compared to the rest of the world. :)

The notion of uncertain time is ingenious.


I think that's more a factor of Google's scaling needs vs the rest of the world. We needed to invent it first so we did.


That's a nice way to put it. That's exactly why these inventions are so interesting: they seem give insight in to problems of another order of magnitude.


At least they understand by sharing this information it moves the technology forward. You don't see a lot of other big companies doing that.


Interestingly, the data storage seems similar to Rich Hickey's Datomic: "data is versioned, and each version is automatically timestamped with its commit time; old versions of data are subject to configurable garbage-collection policies; and applications can read data at old timestamps."


That's exactly like BigTable[1]. It makes sense that they built on top of that.

[1] http://static.googleusercontent.com/external_content/untrust...


But you can mutate bigtable cells. Datomic seems dramatically different in that respect.


Can you? Or do some apps just always ask for the latest timestamped version when they read?


You could but it's not enforced. In practice, teams at google seem to use the time axis in myriad ways, and seldom like datomic.

Also, always reading the most recent timestamp doesn't use time like datomic does. You aren't querying by time and so on.


Timestamp versioning is one of the oldest (1978) ideas in distributed systems: http://patricklogan.blogspot.ca/2007/09/naming-and-synchroni...


MVCC has been around for at least thirty years, but it's interesting that we have seen more databases with this feature recently.


Almost all databases use it in form or the other.

PostgreSQL uses it, Oracle uses it, MySQL (innodb) uses it, Apache HBase uses it, the list goes on and on...


I think the major contribution in this paper is how to do consistent snapshot reads in a distributed system without a common reference clock, i.e. the use of True Time.

Many databases use some sort of MVCC, but they operate on a single node or in a closely connected cluster. This paper shows how to achieve the same properties in a system spanning continents.


Another observation that struck me when I read this (and after reading the percolator and megastore papers) is how there is a convergence of the "traditional" relational DB world and the "new NoSQL" world. Relational Databases are becoming more scalable, partially with new technology, partially by shedding features in some scenarios. And the NoSQL stores, are becoming less so (it was really about "NoSQL" anyway, but that's a different story). All of these stores have layers or features that bring closer to the traditional SQL/relational model.

Spanner appears to strike a nice middle ground.


Is spanner written in cc or java?


cc


Another research publcation from Google that's more-than-worth reading.

These just pile up, I must find time and get my hands on them...


This looks like the High-Replicaiton datastore which is now the default in App Engine - Paxos replication, a choice between strong and eventual consistency and tablet sharding. Interesting that they've already built it and it's available for everyone to use.


Fun fact: Spanner means voyeur in German slang.

Anyway, looks like a very exciting project. One could come up with so many applications.


Interestingly, "Spanner" is German for "voyeur". Coming from Google it's almost kind of ironic.


Even more interesting, "Spanner" is English for "something that spans", as in a database spanning the world.

Maybe it's a bit snarky, but I really don't see how you can read into something like that. It reminds me of the following Jack Handy quote:

Maybe in order to understand mankind, we have to look at the word itself: "Mankind". Basically, it's made up of two separate words - "mank" and "ind". What do these words mean? It's a mystery, and that's why so is mankind. - Jack Handy


A spanner is also British English for what North Americans call a wrench.


Also, colloquially, for an idiot.


Transactions don't scale. They really need to use NoSQL.


Did you read the first page? BigTable has no trnsanctions, and scales, but is a pain for apps that need consistency. Spanner adds transactions for apps that need it, at scale, charging a tax in the form of latency.

Using two different clock technogies per node (GPS and atomic!) and light speed networking helps make this manageable.

Fault-tolerant time!


We read about this at work at google a few months ago in a reading group. (perks of the job) And we spent almost the entire time talking about the timestamps. It is perhaps the most important piece of this paper. Fault tolerant time is right.


Yeah, I've been doing research in distributed systems and the timestamp part of this paper is incredibly interesting to me. It's awesome that I might actually get to cite something more recent than Lamport.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: