Google's backbone is a little bit different. Google Cloud shares this backbone with Youtube, Maps, and the rest. Not only are there DC-to-DC cables, our backbone extends to a vast number of Edge POPS (more than AWS and Azure combined), detailed at [0]. Our DC-to-DC network is pretty great too, allowing things like Spanner to exist (minimizing P in CAP etc) at [1].
Here's what it means for you in practice:
- Google's Load Balancer supports a single global anycast endpoint.
- When your packet tries to hit Google network, it hits the nearest POP, which acts as an on-ramp to the network and traverses only Google network, not touching public.
- Similarly, when data is en route to a customer, Google network will take it all the way to the nearest DC or POP on its private backbone.
- Google by default gives you a global software-defined VPC. No need to create VPN tunnels between zones/regions/etc.
I work at AWS and I think there's definitely some similarities and differences. We do share our backbone with CloudFront, and hence our video traffic, of which there's quite a lot these days. We also advertise our network ranges broadly, it's our mission to carry the traffic as much as possible ourselves. So those aspects are very similar.
But a genuine difference is that we don't try to operate a global "seamless" network. The reason is that we optimize for the "A" in CAP. Our experience is that at the low-ish level of a network, it can be too easy for outages and availability issues to spread quickly. For example, with global networking then a misconfiguration or error can more easily propagate globally and bring everything down.
Instead, we have autonomous uncoupled regions and it's one of our core principles that faults and errors stay within these regions (or better yet, availability zones). That does mean that partitions can happen, but find that most customers use active-standby configurations (where it makes no real difference) for key data, and we also build the tools that work with partitionable networks at a higher level. For example Route 53 supports multi-region routing and failover, and does it measurably better than simple anycast routing can achieve.
Over time, we're offering more and more multi-region services, such as cross-region replication for data, but the coordination is done at higher levels where we can achieve higher levels of availability in simpler ways, built on top of a more solid foundation.
Yeah, at least one outage in the last year for us included the paraphrased "anycast is hard". There's definitely an advantage to limited-blast-radius services, but I wouldn't trade GCS's multiregional buckets for S3 CRR (the same applies to Datastore, etc.). We have strict reliability requirements for global services; our perspective is that some customers want regional control for regulatory reasons, and we're happy to meet those requirements. But composing a global or even multi-regional service on an untrusted, lossy network is "crazy".
Again, Disclosure: I work on Google Cloud (and this network existed when I got here!)
>For example, with global networking then a misconfiguration or error can more easily propagate globally and bring everything down.
This sounds like Nassim Taleb's antifragile meme [1].
If I was running IT for some large enterprise (which I'm not!) then I might replicate services on both AWS and Google Cloud. A bit like Apple try to have more than one supplier for their hardware components.
Just in case you can forward this on... AWS EC2 eu-west-2 (London) to GCP Network Loader Balancer is about 12ms round trip for some reason. Most other anycast services located in London are 0.5-2ms round trip.
Here's what it means for you in practice:
- Google's Load Balancer supports a single global anycast endpoint.
- When your packet tries to hit Google network, it hits the nearest POP, which acts as an on-ramp to the network and traverses only Google network, not touching public.
- Similarly, when data is en route to a customer, Google network will take it all the way to the nearest DC or POP on its private backbone.
- Google by default gives you a global software-defined VPC. No need to create VPN tunnels between zones/regions/etc.
(work at G)
[0] https://peering.google.com/#/infrastructure
[1] https://cloudplatform.googleblog.com/2017/02/inside-Cloud-Sp...