Cloud Spanner is now production-ready

theDoug on May 16, 2017 | [–]

One line summary: “99.999% availability and strong consistency — without compromising latency”

(I work at G)

jfoutz on May 16, 2017 | | [–]

5 minutes a year is really impressive.

bradleyjg on May 16, 2017 | | | [–]

It is, but the details of the SLA aren't that great.

https://cloud.google.com/spanner/sla

"'Downtime' means, with respect to any Production-Grade Cloud Spanner Instance, more than a five percent Error Rate for the instance. Downtime is measured based on server side Error Rate. 'Downtime Period' means a period of five consecutive minutes of Downtime with a minimum of 60 requests per minute. Intermittent Downtime for a period of less than five minutes will not be counted towards any Downtime Periods."

For monthly downtown between 99.0% and 99.999% the customer is credited with 10% of the monthly bill. The customer is required to track the uptime and request an SLA credit.

dxhdr on May 16, 2017 | | | [–]

> Intermittent Downtime for a period of less than five minutes will not be counted towards any Downtime Periods.

That's unfortunate. The most frustrating part about cloud services are exactly these kinds intermittent performance and responsiveness issues.

puzzle on May 17, 2017 | | | [–]

Part of the problem is that computing the downtime at a higher resolution, in a way that scales from low-traffic users (the quoted minimum of 60 requests per minute is also not a coincidence) to very large ones, is not trivial.

You have to consider the rate at which the raw data is collected from the replicas ---and the jitter across all of them!-- then the rate at which that data is aggregated, and so on. (The above assumes that Cloud Spanner uses Borgmon for monitoring.) Any hiccups in the collection itself will result in less noise if you look at 5m vs e.g. 1m. You could increase the sampling rate, but then you're making your DB replicas spend more precious cycles on work that is not serving queries or replicating data.

snoman on May 16, 2017 | | | | [–]

That's like saying 'the most frustrating part about cloud services is that they use the internet.'

You're right, but that's basically 'by-design' as far as the internet is concerned.

falcolas on May 16, 2017 | | | | [–]

It probably won't count network downtime either, since it's measured at the server, not the client. Even if your client is in GAE...

theDoug on May 16, 2017 | | | | [–]

I’m in marketing and not here to pump our own tires, but Spanner is /the/ database beneath so much of Google. We trust and rely on it, so it’s very exciting to me to see other big companies who need mission-critical /and/ global scale performance do the same.

Filligree on May 16, 2017 | | | [–]

I'd love to use it, but I can't reasonably champion Google's services at work when I'm not allowed to use it for side projects.

When is it going to be available for non-companies in Ireland?

boulos on May 16, 2017 | | | [–]

It's about VAT collection. If you sign up and warrant that you do it yourself, you could click the button. However, your note reminded me to kick off a specific email about Ireland as a possibility.

Disclosure: I work on Google Cloud, but I'm definitely not a lawyer, tax lawyer, accountant or any such financial type.

MichaelRenor on May 16, 2017 | | | | [–]

Especially considering that all of AWS database solutions require a 20 minute maintenance window per week!

Androider on May 16, 2017 | | | [–]

The RDS service is literally scripted EC2 instances running MySQL/Postgres/SQL Server etc. Anything you do in the RDS console/API runs (or schedules) scripts on EC2 instances. If you would have downtime on any maintenance of your self-managed DB instance (say, upgrading the DB engine), so will RDS.

Some people get disillusioned at this fact, thinking RDS was something more magical. But AWS does offer the Multi-AZ option, which does the maintenance on the standby, fails over, and then performs the same on the main instance. It's effectively transparent. RDS also makes backups, restores (including point in time), encryption using KMS etc. really easy. If you don't have a full-time DBA, RDS provides for a very easy to use DB with virtually no maintenance required, but Multi-AZ is absolutely required for any kind of production deployment.

AWS's answer to Cloud Spanner is much more likely to be a future cross-region replicating version of DynamoDB or Aurora, not RDS.

snoman on May 16, 2017 | | | [–]

As far as Multi-AZ is concerned, it's a bit of a wash. A month ago when the east coast had its issues, Multi-AZ didn't do anything. Every time I need to reboot an instance, with failover it is down/inaccessible for the same amount of time as if I had just rebooted it plainly.

I'm sure there's scenarios where it's worked well for people, but I've never seen it happen.

brianwawok on May 18, 2017 | | | [–]

I see multi region saves many disasters. Multi AZ seems a waste. What percent of failures only hit 1 AZ? Your loss is for sure (higher latency, bandwidth cost).... unclear what the real world gain is of multi AZ.

acchow on May 16, 2017 | | | | [–]

What are they doing during those 20 minutes?

Define "maintenance window" - the DB becomes read-only? Or unreachable? This sounds insane to me.

rwiggins on May 16, 2017 | | | [–]

Actions that require database downtime/restarts can happen during the maintenance window. For example, if you change a DB parameter that requires a database restart, you can tell AWS to restart the DB during the next maintenance window. Same goes for DB engine upgrades and the like.

If you're running anything production-worthy, you'll be using one of AWS's multi-AZ/failover solutions for your production RDS instances. In that case, the database will perform a failover in those instances so there isn't downtime.

falcolas on May 16, 2017 | | | [–]

That depends on the parameter being changed; not all parameter changes are compatible with block level replication. Such as the MySQL InnoDB log file size, something that needs to be changed on every instance.

snoman on May 16, 2017 | | | | [–]

> In that case, the database will perform a failover in those instances so there isn't downtime.

Have you done that? I have and there's been downtime every time (~5m).

MichaelRenor on May 16, 2017 | | | | [–]

Not usually, but it sometimes will go offline for 5 or 10 minutes or more. It fails over essentially.

rwiggins on May 16, 2017 | | | | [–]

That doesn't mean they're unavailable for 20 minutes weekly, probably because of the nature of AWS's multi-AZ RDS setup.

At our shop, we haven't had any problems with RDS maintenance at all -- and we would indeed get paged for even 1 minute of database downtime, much less 20 (!). That said, we don't have that sort of monitoring in place for any of the test/staging DBs (which are not multi-AZ).

BillinghamJ on May 16, 2017 | | | | [–]

That's a window when they _can_ do maintenance, it's pretty rare for them to actually do it.

gtirloni on May 16, 2017 | | | | [–]

The SLA document [0] says 99.95% so perhaps you meant per month?

0 - https://aws.amazon.com/rds/sla/

MichaelRenor on May 16, 2017 | | | [–]

You see the trick here, that's 99.95% outside of their regularly scheduled maintenance!

gtirloni on May 16, 2017 | | | [–]

Oh! Now I see. I haven't used RDS for anything critical yet but thought it'd be a no-brainer.

insert mind expanding gif

MichaelRenor on May 16, 2017 | | | [–]

In practice it works just fine. They just make you give them permission to blow your app up for twenty minutes a week (and wink and say maybe they won't actually do it this week).

d4rti on May 17, 2017 | | | | [–]

With respect to SLA's the two important things in my opinion are:

* How precisely is it measured? * What happens if it is not met?

As observed by the other posters, scheduled maintenance doesn't count.

nailer on May 17, 2017 | | | [–]

That really should be in the title of the article.

makmanalp on May 16, 2017 | | [–]

I think people calling this expensive are missing the point - the main use case for this is large organizations that truly need to scale globally, at which point you'd probably sink many times the money to have that problem magically solved for you.

nickpsecurity on May 16, 2017 | | [–]

Not to mention Spanner was first to pull off this combo of performance, consistency, and availability. The others claiming similar things are one pulled off market and one just beginning (unproven). It's not just cost: they're either going to have to clone Spanner with clean-slate project, exhaustively analyze/test CochroachDB to see if it's mission-ready, or invent their own magic that does what most DB vendors couldnt pull off. They'd be spending a lot of money on something that would likely crash and burn when all the odd errors hit it.

Most enterprises that have to maintain global consistency are better off with Spanner. I doubt there's many, though, that really need that vs splitting stuff into locales that are consistent in a smaller, geographical area that allows regular, DB clusters.

xj9 on May 16, 2017 | | | [–]

better off using non-free software for mission-critical services

it may be true that spanner is currently better than any free software option, but you are much better off contributing to a free software project that solves this problem than paying google to keep developing their non-free alternative.

nickpsecurity on May 16, 2017 | | | [–]

FOSS and commercial sector tried and failed to do stuff like that for over a decade. I agree developing it is better in the long term but most don't have the skill to solve that problem. Better to buy a proven solution, make sure you can exit it (not sure what Spanner is like here), and switch if a FOSS solution emerges.

jbergens on May 17, 2017 | | | | [–]

If they don't need sql they could try CosmosDb from MS. It's not proven outside MS yet either but I do have more faith in their testing than in most open source solutions.

nickpsecurity on May 18, 2017 | | | [–]

Didn't know about it. It seems...

"Azure Cosmos DB accounts that are configured to use strong consistency cannot associate more than one Azure region with their Azure Cosmos DB account. "

...it lacks Spanner's ability to work across geographical regions. It's otherwise a full-featured, distributed, DB service. Lots of tradeoffs allowed.

oxplot on May 16, 2017 | | [–]

For those interested, Cockroach DB whose version one stable just got released lately, is taking the same approach as Spanner and is open source. In fact, it was started by a few folks from Google who worked on Spanner and other related tech.

richdougherty on May 17, 2017 | | [–]

I know Spanner uses atomic clocks to get tight time bounds on transactions. Since CockroachDB doesn't use atomic clocks they have to use a slightly different approach:

"While Spanner provides linearizability, CockroachDB’s external consistency guarantee is by default only serializability, though with some features that can help bridge the gap in practice."

"A simple statement of the contrast between Spanner and CockroachDB would be: Spanner always waits on writes for a short interval, whereas CockroachDB sometimes waits on reads for a longer interval."

https://www.cockroachlabs.com/blog/living-without-atomic-clo...

puzzle on May 17, 2017 | | | [–]

No, they worked on Colossus, aka GFS2, which Spanner uses to store data.

all_blue_chucks on May 17, 2017 | | | [–]

> it was started by a few folks from Google who worked on Spanner and other related tech

Patent lawsuit waiting to happen? SCO Linux all over again?

gfodor on May 16, 2017 | | [–]

So, can someone explain why this isn't the RDBMS holy grail? (serious question)

brandur on May 16, 2017 | | [–]

I don't think there's any question at this point that Spanner is really cool. The vast majority of cloud databases out there (even brand-new ones like Azure's Cosmos) are still implicitly trying to convince you that not having ACID transactions is something that won't matter very much, and that scale is by far more important, which isn't true. Google hasn't taken that approach with Spanner, and it's a breath of fresh air.

I'd throw one downside out there though: lock in. While queries use standard SQL syntax [1], all write operations are performed via a very non-standard GRPC API [2]. I think the team certainly considered this and weighed the trade offs, but it does mean that after you're on Spanner, you're on Spanner forever.

[1] https://cloud.google.com/spanner/docs/query-syntax

[2] https://cloud.google.com/spanner/docs/reference/rpc/

boulos on May 16, 2017 | | | [–]

We're hoping to add more compatibility layers, but the semantics of reading updates within a transaction are what gave the team pause. If you don't need that semantic, it's relatively straightforward (I assume we'll see this happen either directly by us, or from the community).

Disclosure: I work on Google Cloud, but not on Spanner.

fdsfsafasfdas on May 16, 2017 | | | [–]

Any chance that Google Cloud will ever expose access to F1?

boulos on May 16, 2017 | | | [–]

The point of the SQL layer now merged into Spanner is so that people don't have to run F1 or something like it on top. I'd expect continual improvements there, especially if enough folks say "I'll move if X"

fdsfsafasfdas on May 16, 2017 | | | [–]

Well, the hope would be that it would allow read-write-read sql transactions, which to my understanding will never be implemented in Spanner.

bogomipz on May 16, 2017 | | | [–]

Any insight into why read-write-read sql transactions will never make it into Spanner?

elvinyung on May 17, 2017 | | | [–]

Not sure if "never", but from the new Spanner paper[1]:

> While we have seen comparatively little demand for this feature internally, supporting such semantics improves compatibility with other SQL systems and their ecosystems and is on our long-term radar.

[1] http://dl.acm.org/authorize?N37621

bogomipz on May 17, 2017 | | | [–]

Thanks for the clarification and the link.

curiousDog on May 16, 2017 | | | | [–]

Also important to note that it's a NewSQL system, not a full blown RDBMS. You can't lift and shift your MySQL application and expect full blown horizontal scalability and such.

makmanalp on May 16, 2017 | | | [–]

I'm surprised no one has challenged this yet - if there is anything that cloud spanner does better than most things out there, it's "full-blown horizontal scalability"! There are very few systems out there that have tackled the issue of scaling beyond cross-AZ in a non-bullshit way.

dsp1234 on May 16, 2017 | | | [–]

You can't lift and shift your MySQL

I'm surprised no one has challenged this yet

Are you saying that a company could lift and shift MySQL then?

_rk8k on May 16, 2017 | | | [–]

For me, this immediately discounts it:

> $0.90 per node per hr

That's $648/mo in pure overhead.

I'm not saying there should be no overhead but it shouldn't be that high.

They also optimistically encourage you to try it with a "free" trial. The "free" trial, assuming you haven't used GCP before, gives you $300 credit, which is enough for less than 2 weeks of a single Spanner node.

Plus according to https://cloud.google.com/spanner/docs/instance-configuration, they recommend a minimum of 3 nodes, or $1800/mo.

I know this is targeted at businesses but it'd be nice if we could get smaller "nodes" or something to lower the barrier to entry. You shouldn't have to go straight from nothing to 10k QPS.

boulos on May 16, 2017 | | | [–]

In case people don't do the math, you (seem to be) are suggesting that Spanner should be $9/month. I certainly argued for a "shared Spanner" mode, that wouldn't give you a dedicated setup. The benefit is that the minimum "install" would be super low, allowing you to start on a slice and move upwards (like many teams at Google!).

The reality is that $8k/year isn't that much for a company. You can all share it, get quite a bit out even the smallest deployment and so on. However, compared to the work to slim down the minimum deployment for Spanner, this was honestly a reasonable outcome (that is, I certainly wouldn't have wanted to see the team delay a year to build a shared multitenancy version).

Additionally, Spanner does things for you that MySQL et al. don't. Having an automagic Regional (and eventually Global if you'd like) database without dealing with sharding is worth $8k/year even to me. So even if it could fit on $10/month of hardware, I don't begrudge them for charging a service fee, rather than saying "This is how much cores, RAM, disk and flash this eats".

Disclosure: I work on Google Cloud and obviously have a vested interest in you paying us :).

_rk8k on May 16, 2017 | | | [–]

I don't know where the $9/month number comes from but as someone else said, I edited my post (added a bunch of things) so maybe I edited out something that could be read that way.

Basically yes, I'd like a shared Spanner. If $648/mo gets you 10k read/2k write, then for ~$9/mo you should be able to get roughly 140 qps read and 9 qps write, so long as cost scales roughly linearly. Assuming you can spread this over the month (1000qps here, 0qps there), this would be super appealing to a lot of smaller projects I think.

I think the way it is, it's exclusively a business tool but it has potential to be so much more. Imagine if it was possible for anyone to set up a distributed, fault tolerant, scalable Wordpress instance using Cloud Spanner, Cloud Functions and Cloud CDN any pay per uncached pageload? You could even set it up with a single click in the Cloud Console. Running your own scalable cloud service would be no more difficult than setting up a Tumblr blog.

boulos on May 16, 2017 | | | [–]

Ha! I calculated the monthly cost as $.9/hr x 730 hrs => $657. You said $648, so that left $9. Sorry for the confusion!

But yes, we already have the per-op model for Cloud Datastore (and it's part of what makes it so attractive!) so you can understand that we're naturally inclined to do just what you're saying. Making Spanner multitenant in a secure, performant manner is real work. But it's what Datastore, Big Query and our other shared services do. I hope you can appreciate that getting "Dedicated Spanner" out the door was the right first step though.

_rk8k on May 16, 2017 | | | [–]

Ah, I just used 720h :)

And that's fair enough, I understand dedicated/shared are very different things, it's just that for me, right now, Spanner isn't viable. Change that and you'll be a whole lot more appealing!

narrowingorbits on May 16, 2017 | | | | [–]

This makes sense. But, for many (at least bootstrapping) startups, $1800/mo for the database is not in the realm of the possible. Is not having to deal with (future) sharding headaches "worth" that much? Sure. But that doesn't mean it's possible.

So by pricing us out of the entry level, you will forgo having startups build their platforms on it and get hooked (while it's still at 1-node mysql scale).

When the sharding headache hits and they're scrambling to scale, will they switch to Spanner? Maybe. Probably. They'll want to, anyway. But there is a big market that will be missed.

matt2000 on May 16, 2017 | | | [–]

I think I agree with you. Most databases these days get adopted by being open source, or pretty inexpensive for hobby projects. That lets engineers experiment with them, get comfortable and then recommend them for a major project. I'm not even going to get started with spanner, so I'm not sure how I would build up the confidence to use it in a major deployment.

time4tea on May 16, 2017 | | | | [–]

Really?? How much does a person cost per month? You could avoid lots of fiddling for many people just by using appropriate tools. Applies for everything, not just this.

tdfx on May 17, 2017 | | | [–]

But that person needs to know the appropriate tool to use, given their experience. They are going to gravitate towards services they're familiar with, and by pricing things on the high end, you'll reduce the number of people who get to play with it.

narrowingorbits on May 16, 2017 | | | | [–]

As I said, that makes sense, and I agree. That does not mean it's financially possible.

derefr on May 16, 2017 | | | | [–]

I think the underlying problem here is that there's no "mock Spanner" (in the Minio-is-mock-S3 sense) for people to write code against on-and-off for a while, to decide whether they like what Spanner does for them in development workflow terms.

Two node-weeks of Spanner credit, used wisely, might be enough to tell you whether Spanner fits your operations requirements, but it won't be enough to tell you whether you want to commit to creating an application architected around the paradigm it represents.

degenerate on May 16, 2017 | | | | [–]

I think he edited his post so your $9/mo math no longer references anything he said.

But I think the issue is: making the commitment to use Spanner will take more than 2 weeks ($300 in credit) to determine if it's worth using.

Why not setup a "shared Spanner" with millions of pre-populated rows, and let free accounts have unlimited read-only access to it? That would let people fiddle with the technology before sweating bullets during the 2 week trial. (5 day trial if you run the recommended 3 node minimum)

jbergens on May 17, 2017 | | | [–]

They could also increase the credit for testing. For example if you are a verified company and want to try it they could give you 2-3 months free or really cheap.

manigandham on May 16, 2017 | | | | [–]

It's understandable and not expensive considering the quality of the offering - however startups and even teams within large companies prefer and have gotten used to seamless easy scaling from small accounts so that they can test and like the product first before committing to it completely - especially one that is built into GCP and not portable.

crypto5 on May 16, 2017 | | | | [–]

And what are the specs for this node? CPU, RAM?

oaktowner on May 16, 2017 | | | [–]

According to this article[0] (that someone referenced below):

A Cloud Spanner “node” isn't a physical machine, per se. It is an allocation of compute capacity and disk access throughput on a set of machines with access to the storage API.

(disclosure: I work at Google Cloud but not on Spanner)

[0] https://quizlet.com/blog/quizlet-cloud-spanner

puzzle on May 17, 2017 | | | [–]

Cloud Bigtable is similar.

_rk8k on May 16, 2017 | | | | [–]

They don't appear to document this, save that they handle 10k QPS read or 2k QPS write and can deal with 2TiB of data. They also recommend that you have 3 of them:

https://cloud.google.com/spanner/docs/instance-configuration

kyrra on May 16, 2017 | | | [–]

https://quizlet.com/blog/quizlet-cloud-spanner

For lower QPS, normal RDBMS solutions (like mysql) have lower latency than Spanner by a decent amount. But Spanner scales way beyond what MySQL is capable, which is where it starts to shine.

tylerpachal on May 16, 2017 | | | [–]

That was a good article, thanks!

jlg23 on May 16, 2017 | | | [–]

Because your locally hosted DB will run circles around it. And for 99.99999% of the shops out there a single, local DB instance is more than enough.

mentat on May 16, 2017 | | | [–]

Until the backup doesn't work or it does but it takes a day to recovery it or whatever. DBs aren't free, ever.

VikingCoder on May 16, 2017 | | | | [2 more]

[flagged]

Jgrubb on May 17, 2017 | | | [–]

Also possible that it wasn't meant to be taken literally.

rwiggins on May 16, 2017 | | | [–]

In addition to the other comments, vendor lock-in. My shop is strictly AWS, so...

chaostheory on May 16, 2017 | | | [–]

I kind of agree...

> including ANSI 2011 SQL support

which kind of eases this concern a bit. If Azure Cosmos' marketing emphasized this more, this wouldnt be as big of a concern for it either.

thesandlord on May 16, 2017 | | | [–]

The SQL support is currently for queries only. DML is not supported (yet), so keep that in mind.

(I work for Google Cloud)

Side note: Cosmos is really confusing to me, it has 5 consistency models, supports 4 different database models (I don't think relational is an option?), multiple different APIs, etc.

For example, Azure advertises the p99 latency, but don't specify the consistent model. I'm thinking it's for the eventual consistency, so in that case what's the latency for the strong consistency model? Definitely need to learn more.

jdhawk on May 16, 2017 | | | [–]

No DML Statements prevent quick migrations.

zzzcpan on May 17, 2017 | | | [–]

The simple reason it isn't the RDBMS holy grail is that most project that rely on RDBMSs don't really care about CAP or strong consistency and once at scale the whole RDBMS thing just doesn't matter anymore, but things like latency usually do matter and it does sacrifice latency, despite bold claims of googlers.

I think Cockroach suits the RDBMS world much better. It understands that there is little reason to make it scale beyond a small amount of nodes in couple of nearby datacenters, but it does offer a lot compared to pre-CAP databases, meaning that it will be cheaper to operate and with no cloud lock-in risk.

areefer on May 17, 2017 | | | [–]

There are a set of RDBMS use cases that do care about CAP and consistency - specifically applications that need Active-Active or Multi-Master capabilities. These use cases need scale across multiple data centers and low latency transactions.

But CockroachDB suffers from worse latency limitations than Spanner since it relies on NTP/hybrid clock to serialize transactions.

Disclosure - I work for NuoDB.

tyingq on May 16, 2017 | | | [–]

No SQL insert/update plus hard lock-in via egress charges and now incompatible app changes. Work hard to port your app to get there, work/pay hard to get it out.

Technically stunning for sure though.

pbhowmic on May 17, 2017 | | | [–]

It is very good as far disk-based databases go but relational databases have moved to the point of main memory only and NVRAM databases. And that's where the current action really is.

dnackoul on May 16, 2017 | | | [–]

Also, how does Spanner get around the CAP theorem and similar distributed systems results? It seems like the only tradeoff is monetary at the moment.

brown9-2 on May 16, 2017 | | | [–]

https://research.google.com/pubs/pub45855.html

manigandham on May 17, 2017 | | | [–]

The only important line, which means it's a CP system:

> Even then outages will occur, in which case Spanner chooses consistency over availability.

dnackoul on May 17, 2017 | | | | [–]

Thanks!

manigandham on May 17, 2017 | | | | [–]

It doesnt - they're saying they have such good networking that P is very rare so it's one of the few times you treat it like it's CA.

In case P does happen (a network break), it's a CP system.

kyledrake on May 16, 2017 | | | [–]

It's too expensive.

foobarbazetc on May 16, 2017 | | | [–]

Massive lock in.

Thaxll on May 16, 2017 | | | [–]

It is but it's quiet expensive.

forgot-my-pw on May 16, 2017 | | | [–]

It's the first in the market and pretty groundbreaking tech. Alternatively, you can use CockroachDB, TiDB, or other open source solutions, but you will need to manage your own hardware/cloud.

kyledrake on May 16, 2017 | | | [–]

Right. That's the choice cash-strapped startups will make instead of using this. And once they're experts at that particular problem domain, they will figure out how to make their first choice redundant/HA enough that they probably won't need what spanner provides.

I used Linux instead of Windows NT as a kid because it was cheaper and ran well on cheaper hardware. Then I became an expert at it. Then it became the OS I used for servers.

I've wanted to use spanner for years, but at this point, the longest my production SQL database will ever be down is for less than 2-3 minutes, and it's not worth more than the cost of a rack full of servers and a massive migration effort to theoretically shave off a couple minutes of potential downtime for me. If I'm going to go that route, I would probably just opt for something like DRBD. The way my current infrastructure is set up, it wouldn't add any costs for me to do this.

Thaxll on May 16, 2017 | | | | [–]

Those are not even close to Cloud Spanner, what make Spanner great is all the Google backend reliability that you don't have to manage.

aoeusnth1 on May 16, 2017 | | | [–]

You can run CockroachDB on GCE instances.

nikanj on May 16, 2017 | | | [–]

Because it comes equipped with Google's support offering, i.e. "godspeed, and maybe we'll get back to you if your blog post makes it to the front page of a news aggregator". Companies pay real money to have real support from companies such as Microsoft. Look at https://blogs.msdn.microsoft.com/ntdebugging/ for example, they sink ungodly amounts of resources helping customers debug obscure corner cases.

manigandham on May 16, 2017 | | | [–]

Are you an actual paying GCP customer with a support contract?

There will always be some cases of bad customer service but our support interactions have always been quick, relevant and helpful.

nikanj on May 17, 2017 | | | [–]

As there's a story like [1] on the front page of HN weekly if not daily, I've done my best to avoid building anything on their platform. To quote [2], "They have no phone number to contact, no way to dispute this other than email — which they have ignored us for over a month now without replying to our continued requests."

https://news.ycombinator.com/item?id=14356409 https://medium.com/@contact_16315/firebase-costs-increased-b...

_jcwu on May 16, 2017 | | [–]

It appears to be very expensive. I used their calculator and it says using 1 spanning node and 1GB of storage is about $657 a month.

forgot-my-pw on May 16, 2017 | | [–]

Yup, but cost is relative to the problem you're trying to solve.

When your app needs horizontal scaling DB with multiple regions, and you don't have to come up with your own sharding solution, suddenly $8000 a year sounds like a great price.

erikpukinskis on May 16, 2017 | | | [–]

General computer science-y kind of software philosophy question:

Should we really be thinking of sharding as something you can outsource to the infrastructure layer?

It just seems like having a stance about how the data in your specific domain naturally shards is probably going to pay off not even the long run, but immediately in terms of sanity checking your information architecture.

And I wonder if in 2017 we haven't gotten to the point where a cloud application should just shard, because we're trying to think about software as something that runs on a transient instance with access to a subset of data, appearing and reappearing, not a giant box with everything on it.

I get that Google engineers are darn close to abstracting away that "giant box with everything on it" behind an API, but I guess I'm asking if that's really how we should be thinking about our code.

scott00 on May 16, 2017 | | | [–]

As with basically all software design questions, the answer is "it depends".

Writing a LOB app with 4 expected users all located in the same location, and expected storage requirements of a gigabyte a year? Go with a traditional DB, add replication if you need high availability, make and test your backups, done.

Writing the new Facebook weknoweverything app that captures smartphone audio and video continuously for all users at all times, automatically transcribes all conversations and produces AI-driven summaries of all video, all of which are saved forever and highly searchable? You're going to need a highly customized data storage architecture to have a prayer of keeping up with that data.

Somewhere in between? Some of those are good candidates for Spanner. Some aren't.

degenerate on May 16, 2017 | | | | [–]

I agree with your line of thinking. But at really big companies with hundreds of small teams, all working on separate projects, when the boss says to hook all that stuff together I'm sure a solution like Spanner looks really delicious and cost-effective.

For you and me working with data we already understand and added to our project piece by piece, we can safely shard and compartmentalize.

nickpsecurity on May 16, 2017 | | | [–]

They're effectively competing with the likes of NonStop doing linear scaling on fault-tolerant systems but geographically distributed. The mainframe and NonStop solutions start at $1+ million if volume is significant. The VMS clusters can similarly cost 6 digits. Spanner might be cheaper than them if you need availability, consistency, and speed more than space.

_jcwu on May 16, 2017 | | | [–]

I guess I was looking at it from the wrong perspective then. It is very expensive for me then :)

I guess it is not even meant for me (a single developer with a small project) then!

nickpsecurity on May 16, 2017 | | | [–]

I hear you on that. It's why I'm holding high hopes for CochroachDB. Otherwise, ain't no chance I can have this kind of DB. However, there is opportunity for strong consistency in a cluster in small area just distant enough to reduce cascading failures. A project might build a VMS-style, clustering system for that integrated with FOSS database. Could exist but DB subfield is too flooded for me to track.

manigandham on May 16, 2017 | | | [–]

Also take a look at https://www.pingcap.com/ for sql scale-out and http://www.scylladb.com/ for cassandra-but-fast.

nickpsecurity on May 18, 2017 | | | [–]

Thanks for reminding me about TiDB. Turns out I was on the first thread here about it. The next had some interesting comments:

https://news.ycombinator.com/item?id=13298664

Particularly, the comments about them mixing Go and Rust wondering if they should move to gRPC. The integration complexity concerns me a bit but at least they're smart about what pieces they use. There was also a lot of detail in the architecture although consistency and availability vs Spanner was not clear in my links. I'll have to look into it more. Thanks.

oxplot on May 16, 2017 | | [–]

My problem with this is how it's priced. Why am I paying for instance hours when I don't have/need instance access. I would have thought it'd be priced like Datastore per GB used and API calls made. As it sits right now, it's fairly expensive.

bbass on May 16, 2017 | | [–]

As a heavy user of datastore, i'm glad the pricing model is around instances and data stored. Paying for reads and writes makes doing any sort of migrations or map reduce jobs extremely expensive.

manigandham on May 17, 2017 | | | [–]

They aren't actual VM instances, just blocks of compute and storage. BigTable is priced the same way.

At the scale they're targeting, it makes sense, especially for large companies who will gladly pay for the reduced complexity and easy scaling while removing all the operational overhead and cost of other solutions that don't come close to this.

all_blue_chucks on May 17, 2017 | | | [–]

The bottleneck likely isn't I/O. My guess is it's RAM. RAM ain't cheap.

0xbadcafebee on May 16, 2017 | | [–]

tl;dr building an acid database that does everything at scale is hard, but we did it

Linked paper: http://delivery.acm.org/10.1145/3060000/3056103/p331-bacon.p...

Useful parts: "Lessons learned and challenges" p. 11. "Conclusions", p.12.

vmarsy on May 17, 2017 | | [–]

Your link doesn't work, it shows "An error occurred while processing your request.". Could you provide a "BibTeX reference" or equivalent?

manigandham on May 17, 2017 | | | [–]

The paper is called: Spanner: Becoming a SQL System

ACM link: http://dl.acm.org/citation.cfm?id=3056103&CFID=915581248&CFT...

Discussion from 2 days ago: https://news.ycombinator.com/item?id=14337817

daxorid on May 16, 2017 | | [–]

Currently salivating for a self-hosted version of this. I suppose it's too much to expect that this would ever become a reality.

philipcristiano on May 16, 2017 | | [–]

CockroachDB is inspired by Spanner and available now. Github page says "CockroachDB is production-ready"

https://github.com/cockroachdb/cockroach

manigandham on May 16, 2017 | | | [–]

Also take a look at https://www.pingcap.com/

fixermark on May 16, 2017 | | | [–]

It would probably require your own self-hosted version of Google's cloud infrastructure, so it's unlikely to happen.

hoschicz on May 16, 2017 | | | [–]

This is basically impossible: for strong consistency, you need very accurate clocks. Google installs GPS clocks in its data centers.

Also Bigtable, Spanners predecessor, is not open source too.

cormacrelf on May 16, 2017 | | | [–]

Well, that's how Spanner's consistency works. CockroachDB uses a hybrid vclock/wall clock solution that's possibly going to be tuneable to just clocks. https://jepsen.io/analyses/cockroachdb-beta-20160829

atemerev on May 16, 2017 | | | | [–]

Every GPS chip in every GPS-enabled device includes the clock — by design, it can't work without it. So perhaps it is possible to quickly hack something from a cheap Android phone placed somewhere near the window?

timelol on May 16, 2017 | | | [–]

Using GPS as your time source is a fabulously good way to trash your entire database during a leap second. See this paper for a discussion on how essentially everybody but Google gets this wrong: http://crin.eng.uts.edu.au/~darryl/Publications/LeapSecond_c...

mastax on May 17, 2017 | | | [–]

GPS time does not have leap seconds: http://www.oc.nps.edu/oc2902w/gps/timsys.html

> GPS Time is a uniformly counting time scale ... The word "uniformly" is used above to indicate that there are no "leap seconds" in this time system.

> The GPS message contains information that allows a receiver to convert GPS Time into Universal Time

In fact the article you linked corroborates this, because they used GPS time to show how inaccurate NTP servers are.

nickpsecurity on May 16, 2017 | | | | [–]

That was sams idea I had. I found that SparkFun even had a guide on it:

https://www.sparkfun.com/pages/GPS_Guide

detaro on May 16, 2017 | | | [–]

Better resources: http://www.satsignal.eu/ntp/Raspberry-Pi-NTP.html

https://www.febo.com/cgi-bin/mailman/listinfo/time-nuts (I don't understand a lot discussed there, but many interesting links and info about current deals to be found)

There also are differences between the quality of the timing output of different GPS modules, with some being optimized for timing applications (as opposed to location finding)

nickpsecurity on May 16, 2017 | | | [–]

Appreciate it as I am wanting timing-focused ones.

eggie5 on May 16, 2017 | | | [–]

Try CockroachDB, the OSS clone of Spanner

0xbadcafebee on May 16, 2017 | | [–]

I'm curious to see what the real-world implications of this iteration of Spanner are against Calvin-based designs, like FaunaDB. Obviously Calvin is better for low-latency workloads, but I'm curious about the comparison with a mix of workloads, where latency is important but not mission-critical.

ethanpil on May 16, 2017 | | [–]

Mass market support questions:

PHP Support? Does it support views and triggers like MySQL/PostgreSQL?

itbh on May 16, 2017 | | [–]

PHP is coming: https://github.com/GoogleCloudPlatform/google-cloud-php/pull...

gtirloni on May 16, 2017 | | | [–]

It doesn't seem like those are supported right now.

https://cloud.google.com/spanner/docs/data-definition-langua...

https://cloud.google.com/spanner/docs/reference/libraries

ethanpil on May 16, 2017 | | | [–]

Same problem with CockroachDB 1.0. Limited views. No triggers.

grouseway on May 16, 2017 | | | [–]

Pricing starts at $8000/year so you probably don't want to migrate your PHP software over.

TomNomNom on May 16, 2017 | | | [–]

Why? I'm confused as to why using PHP would be relevant to how much it costs.

alexdumitru on May 16, 2017 | | | [–]

It's not relevant at all. People just hate PHP because it's the trend.

Something1234 on May 16, 2017 | | | | [–]

Probably because a lot of the PHP things are down by tightwads who want to spend as little as possible.

Ygg2 on May 17, 2017 | | | | [–]

Don't worry, this is Google, so expect a prices hike of times 700 (all those wonderful millions of dollars per year per project :D ). See: https://medium.com/@contact_16315/firebase-costs-increased-b...

antisocial on May 16, 2017 | | [–]

How does this compare to Apache Cassandra?

thesandlord on May 16, 2017 | | [–]

Cassandra is a NoSQL database, so you get horizontal scalability but need to deal with eventual consistency and there is no SQL support. It is similar to Google Bigtable [0].

Spanner is a database that merges the horizontal scalability of a NoSQL database with the strong consistency and SQL semantics of a relational database, basically the best of both worlds. People like to call this type of database "NewSQL."

Cassandra is also open source, Spanner is not. CockroachDB is the OSS clone of Spanner, but there are some differences and tradeoffs to be made as with all things.

(I work at Google Cloud)

[0] https://cloud.google.com/bigtable/

antisocial on May 16, 2017 | | | [–]

Thanks, I forgot about the NoSQL part due to tunable consistency. I remembered Cassandra as almost similar to SQL databases with horizontal scalability.

stygiansonic on May 17, 2017 | | [–]

More information about Spanner itself, the globally-synchronized clock it uses and its relation to the CAP theorem: (By Brewer himself) https://research.google.com/pubs/pub45855.html

perrohunter on May 17, 2017 | | [–]

Cloud spanner gets me super exited, I wish they would show more details about its design and architecture.

Narkov on May 16, 2017 | | [–]

I can't find any information on how you manage backups? Can I restore to a point in time?

manigandham on May 16, 2017 | | [–]

Any rough timeline on when multiple regions will be available?

Looking at Cosmos DB for (at least readonly) regional distribution for this now as well as the other options like scylla/cockroachdb/tidb.

kidsil on May 16, 2017 | | [–]

If the pricing is predictable enough, it could be the missing piece of a fully serverless application.

iamneal on May 16, 2017 | | [–]

My company has been migrating to spanner, and we are starting to get pretty familiar with it. We started building a golang sql driver for spanner, and protoc plugin for generating a persistence layer for spanner.

https://github.com/tcncloud/protoc-gen-persist https://github.com/tcncloud/sqlspanner

Both are pretty new projects, and we ran into a snag with the sql driver when it came time to implement delete statements, but both are pretty cool, and we would love contributors!

justinsaccount on May 16, 2017 | [–]

Have they announced the End Of Life date yet?

johan_larson on May 16, 2017 | [–]

Since Google uses Spanner in-house extensively, it is likely to be available for a good long time.

justinsaccount on May 16, 2017 | | [–]

> Google uses Spanner in-house extensively

True

> it is likely to be available for a good long time.

Where does this conclusion come from?

The google mapreduce paper was released in 2004. [0]

"By 2014, Google was no longer using MapReduce as their primary Big Data processing model" [1]

The google spanner paper was released in 2012.

If spanner lives as long as mapreduce it will be largely replaced in 5 years.

[0] https://research.google.com/archive/mapreduce.html

[1] http://www.datacenterknowledge.com/archives/2014/06/25/googl...

rrdharan on May 16, 2017 | | | [–]

It's true that Map Reduce is no longer the "primary Big Data processing model" for some definition of primary.

However, it is still very much available. There are still many important production MapReduce jobs running AIUI.

[work at Google]

wsetchell on May 16, 2017 | | | | [–]

I think you're talking about two different things: "will Google drop support" and "will Google invent something better"

Map Reduce is still supported today. Most teams have chosen to move to newer infrastructure because it is better. Nobody forced those teams to move.

Here Google has not dropped support, but has invented something better.

[Disclaimer - work at Google, but not on anything related to Cloud]

gldalmaso on May 16, 2017 | | | [–]

Google disproves Google.

https://cloud.google.com/prediction/docs/end-of-life-faq

"As we've expanded our Cloud Machine Learning services, many of the use-cases supported by Cloud Prediction API can be better served by Cloud Machine Learning Engine."

In this case "invent something better" == "drop support".

euyyn on May 16, 2017 | | | [–]

The MapReduce paper was released in 2004, and BigTable is still available.