Google Infrastructure Security Design Overview

contingencies · on Jan 13, 2017

Many of these solutions are unavailable below a certain scale, and there is currently little commercial utility or pressure in offering these features in a wholly-owned-and-operated fashion to small businesses or individuals. The new deal (eg. DDOS resistance) is to rent an implementation, or go without. Basically, the gap between everyone else and the Googles of the world is large and growing.

On the other hand, I wonder how useful some of them are. Boot-level security sounds fantastic but the cost of engineering and at the rate they probably cycle hardware, with decent service-level signatures this probably largely wasted money (eg. unexpected behavior like comms from service X to service Y is default-denied at multiple levels, logged, triggers hard shutdown/reset of system). While performance is cited as a concern, you'd save a lot of money removing the design/deployment/maintenance of all that complexity and could afford a little extra (more standard) hardware.

mikecb · on Jan 13, 2017

I find that thoroughness to be what's most impressive about this stack, considering every layer and securing it both independently and in relation to the others: it's as close to a textbook example as I can imagine. After all, what's the point of securing up the stack if you can't trust the bottom? Here's hoping AWS and Microsoft get there too.

Edit: Just trawling through, seems like quite a few of the tools are on github.com/google

obulpathi · on Jan 13, 2017

> Many of these solutions are unavailable below a certain scale

This was true before Google Cloud. With Google Cloud, you can enjoy these benefits whether you are an individual developer with a sub 100$ monthly budget / a mom n pop shop with 1000$ budget or a SMB / Startup with a 10K$ - 100K $ to spend on your infrastructure.

chrishacken · on Jan 13, 2017

That's exactly what he's saying. Small startups, who don't want to use Google, would have a lot of difficulty implementing some of these designs on their own.

puzzle · on Jan 13, 2017

Without boot level security, it's easy for the NSAs of the world to slip in a hard drive or two with extra "surprise" software on it, later engaging in active/passive surveillance or credential theft. Always assume that any one single employee could be compromised.

notriddle · on Jan 13, 2017

The NSA's of the world don't need to hack Google's infrastructure. They can just ask.

This is protection from rogue employees acting independently, assuming it's not just marketing and ego-stroking for the engineers.

toomuchtodo · on Jan 13, 2017

Why don't you google "NSA google smiley face".

skybrian · on Jan 14, 2017

Yes, and that happened before many of the security measures described in this doc were in place. It's one of the reasons behind Google's current and ongoing investments in security. Knowing that yes, the NSA is going after you is a wake-up call.

In particular, the doc says all data on the WAN (between data centers) is now encrypted.

leesalminen · on Jan 13, 2017

I don't get it....search results returned only your comment.

toomuchtodo · on Jan 13, 2017

First result for that search: http://www.slate.com/blogs/future_tense/2013/10/30/nsa_smile...

OPs comment:

> The NSA's of the world don't need to hack Google's infrastructure. They can just ask.

NSA doesn't just ask; they found ways to MITM Google.

contingencies · on Jan 13, 2017

First of all, NSA hardware attacks of this ilk are supposed to occur through mail. Operations the scale of Google can acquire hardware in a secure/monitored fashion that bypasses public shipping facilities which would largely frustrate this type of attack. Also, I would hazard a guess than Google building their own hardware makes attacks on their boards far more difficult than for the rest of us. As for disks, they would be acquired in serial numbered batches from known suppliers and could be quickly tested to match known performance and sensor (eg. heat) metrics at the time of ingress. This is not very difficult, and assists in protection against tampering. In addition, the use of commercial grade disk hardware acquired in large batches means that the ultimate internal destination of a given disk in the organization is very difficult to ascertain, therefore the workload would be unidentifiable. Careful internal distribution processes would add stronger protections. Regardless of a compromised disk, proper architecture in a large-scale system mitigates the impact and data exfiltration capacity of individual compromised machines. Removed hardware would always be destroyed.

puzzle · on Jan 13, 2017

With the NSA's budget, I don't see why they would limit themselves to mail-only attacks. They could compromise any level in the supply chain, especially for targets which are worth the effort. They, or, more likely, the Brits tapped Google's DC-to-DC fiber and reverse engineered all sorts of internal protocols, as seen in Snowden's leaks.

chinathrow · on Jan 13, 2017

Yes - or they rigged _all_ commercially available HSMs in use for encrypting a DC-to-DC fiber.

puzzle · on Jan 13, 2017

I think it was Neils Provos who said on stage that Google does not trust link encryption, but rather prefers end-to-end, even though that's a much greater problem in terms of key management.

lima · on Jan 13, 2017

> Boot-level security

You can get pretty far with commodity hardware. Even Secure Boot with custom keys prevents most threats.

contingencies · on Jan 13, 2017

IMHO the biggest problem with commodity hardware is IPMI BMCs, a problem so insidious and widespread as to limit the utility of implementing trusted boot. (I designed datacenters for a major bitcoin exchange.) I would hazard a guess that Google's custom hardware has a more intelligent/limited/secure (and crypto-validated firmware based) IPMI implementation, and this contributes far more to security versus commodity hardware than cryptographically secured main processor / system boot.

lima · on Jan 13, 2017

I agree. Is there any serious effort at making an open source BMC firmware?

At least Intel AMT improves the situation a bit.

wmf · on Jan 13, 2017

OpenBMC https://code.facebook.com/posts/1601610310055392/introducing...

zbjornson · on Jan 13, 2017

> We have started rolling out automatic encryption for the WAN traversal hop of customer VM to VM traffic. ... all control plane WAN traffic within the infrastructure is already encrypted. In the future we plan to ... also encrypt inter-VM LAN traffic within the data center.

It would be nice if this was more explicit. For example, is traffic that is TLS-terminated at their LB reencrypted all the way to the back end VM? At what point is it decrypted again? Are those keys unique to us or are they used for whatever traffic happens to traverse the same network paths? (I assume shared but with software-defined networking maybe it's practical for them to be unique.) What does the "control plane" encompass?

In any case, I'm curious what people think about trusting the service provider for inter-service and inter-VM encryption. Do you use the LB's TLS termination? Do you still enable encryption for your DB connections even if it is (or will soon be) redundant with their network encryption?

mentat · on Jan 13, 2017

Anyone with access to the hypervisor at the service provider will have access to plaintext. TLS protects you from service provider network compromise within whatever scopes that covers. If you're in the cloud, you do have to have some basic trust in your service provider as compute is always in plaintext (barring homomorphic encryption).

xyzzyz · on Jan 13, 2017

Anyone with access to the hypervisor at the service provider will have access to plaintext.

This is mostly true with today's state of the industry, but with upcoming technologies like Intel SGX[1], the hypervisor will not be able to access the plaintext anymore.

[1] - https://software.intel.com/en-us/blogs/2013/09/26/protecting...

blazespin · on Jan 13, 2017

It's not really an issue of trust but rather defense in depth. You want to protect against rogue employees who can tap into the network, for example.

_mlxl · on Jan 13, 2017

In the CIO summary they mention every service uses KeyCZAR.

First line on KeyCZAR repo:

"Important note: KeyCzar has some known security issues which may influence your decision to use it."

https://github.com/google/keyczar#known-security-issues

nealmueller · on Jan 13, 2017

I work at Google. The final bullet in the CIO Summary on Keyczar was a typographical error, taken from our paper on encryption at rest (https://goo.gl/hSordh). It's since been removed from this Security Design Overview. The encryption at rest paper goes into additional detail and includes the important clarification that while a very old version of Keyczar was open-sourced, the open-sourced version has not been updated to reflect internal developments.

_mlxl · on Jan 13, 2017

Thanks for the reply and follow on information. Wondering why those internal changes didnt get rolled into the public release, especially if they were security focused updates? Lack of adoption of the library maybe?

jcims · on Jan 13, 2017

They could be design decisions that are tailored for Google's use or issues for which Google has other compensating controls.

apeace · on Jan 13, 2017

The first listed issue is "Use of SHA 1 and 1024 bit DSA", which they admit are "considered weak by current security standards".

Not sure why the OP has been downvoted. Definitely something interesting to note.

noja · on Jan 13, 2017

Like "Use of SHA 1 and 1024 bit DSA". Ouch.

sweis · on Jan 13, 2017

Keyczar is no longer being maintained and should probably be deprecated.

Either that or someone can take the reins and update it to use modern algorithms.

dwheeler · on Jan 15, 2017

This is great to see. For those who don't know, this is an "assurance case" (definition: "a body of evidence organized into an argument demonstrating that some claim about a system holds, i.e., is assured") - https://www.us-cert.gov/bsi/articles/knowledge/assurance-cas...

I'm glad to see more assurance cases. You can't just do one thing and have a secure system. And if you want people to trust you, you need to give them a reason to trust.

The CII best practices badge ( https://bestpractices.coreinfrastructure.org ) also has an assurance case; details at https://github.com/linuxfoundation/cii-best-practices-badge/... . If you want to help us make that better, let us know!

petters · on Jan 13, 2017

> ... and laser-based intrusion detection systems

Huh? I thought that was exclusive to movies like Entrapment and Mission Impossible.

scrollaway · on Jan 13, 2017

It's a fancy term for motion detectors. If my neighbour can afford one for his yard, it's not that crazy to put some in a datacenter :)

Edit: I obviously wasn't implying they're using the same ones. Come on, now >.>

foxylad · on Jan 13, 2017

Motion detectors are usually passive infra-red (PIR) sensors - no lasers involved. Unless you can cite a consumer-grade laser-based motion detector, I think this means Google's data centers are protected by slightly higher level gear.

dalore · on Jan 13, 2017

https://www.amazon.com/Homesafe-Safety-Motion-Detector-Senso...

Homesafe Safety Beam Laser Motion Detector Sensor & Alert

Only $39.99!

runeks · on Jan 14, 2017

> This high tech device creates an invisible infrared beam up to 60 feet long and sounds a loud alarm, pleasant chime, or mutiple chimes when the beam is crossed.

SEJeff · on Jan 13, 2017

From my understanding (after doing a few weeks of research on this for my own home security) the current "top of the line" tech is the Tomographic motion detectors which build up a mesh.

It has been commercialized by a security company named Xandem, some info on it:

https://en.wikipedia.org/wiki/Tomography

https://www.youtube.com/watch?v=Y8updJWoSxE

I'll be purchasing a Xandem system soon

timdierks · on Jan 13, 2017

Nope, real-world solutions to monitor air ducts and other spaces that need to be open but which shouldn't have people in them.

mnm1 · on Jan 13, 2017

I wonder what their data deletion policies really are for something like Photos. I deleted all my old photos weeks ago but when I pull down the archive of my Google data, they're still there. With such a policy, I could see that data sitting around for years while Google claims that it's in the process of deletion, something that is not actual deletion. Then again, I doubt they actually ever delete anything.

londons_explore · on Jan 13, 2017

In which case, you likley didn't delete them correctly.

If you delete them from your device, it doesn't delete the cloud copy.

If you delete from an album, it removes the image from the album, but not from your account.

Google's privacy policy says has limits to delete user data, and I can assure you they are very strict about that. (Lots of data is deleted within hours, but the multiple days is to ensure all backups of it are gone too)

See http://blog.tech-and-law.com/2010/11/google-data-retention-p...

petters · on Jan 13, 2017

This is correct. A deletion should be effective/visible immediately, but it can take some time before all backups are guaranteed to be gone.

drieddust · on Jan 13, 2017

No it isn't google takes your data to tapes as well as offline long term storage.

[1]https://www.youtube.com/watch?v=eNliOm9NtCM

mkj · on Jan 13, 2017

They could be deleting encryption keys to the tapes? All speculation.

puzzle · on Jan 13, 2017

There was a talk about the backup infrastructure. The speaker talked about the issue of keys, but didn't provide specific details:

http://highscalability.com/blog/2014/2/3/how-google-backs-up...

puzzle · on Jan 13, 2017

There are whole teams and pipelines dedicated to making sure data is deleted on all media, tapes included. The long tail can be affected by things such as a machine holding a bunch of GFS chunks from your files that went to the hardware repair queue in the meantime. Those chunks might not even be that useful without the others stored on other machines, but in the general case you can't make guarantees that e.g. they don't hold information that a skilled person could use to identify you.

Klathmon · on Jan 13, 2017

Also, I'm not sure if photos has this, but you also need to "empty the trash" in your drive account after you delete something before it will be actually deleted.

IIRC it gets auto-deleted after 30 days or something.

pawadu · on Jan 13, 2017

do you see the actual photos or just the folder thumbnails?

fowl2 · on Jan 13, 2017

so they've reinvented kerberos, presumably in a way that works. interesting.

(and there are many other things)

puzzle · on Jan 13, 2017

Yeah, it's called LOAS:

https://mobile.twitter.com/jbeda/status/715373975182807040

Godel_unicode · on Jan 13, 2017

Why do you think that Kerberos doesn't work?

bogomipz · on Jan 13, 2017

I have a question about Step 5 in the post, it states:

Is "Step 5: Add '1' to the end"

Is this a delimiter for beginning of the padding or does it server some other purpose?

timdierks · on Jan 13, 2017

Did you mean for this to be somewhere else?

bogomipz · on Jan 13, 2017

Yeah I did, thanks. That's what I get for multitasking. Unfortunately its too late to delete :(