I've read the article and the entire comment thread and nobody is talking about the cloud provider itself ... ?
When someone borgs[1] up their data to store at rsync.net we just assume that we are the threat. Of course I don't believe that but it's perfectly rational and we encourage people to think of rsync.net as a threat as they design their backups.
Comments in this thread are actually discounting the threat of Amazon personnel, GCS personnel, etc., as if that threat was zero.
Not only is it non-zero, I would go further: if you're storing the data on AWS and generating your keys on AWS and managing your keys with AWS ... you're doing it wrong.
This is an interesting and important point that you raised.
> if you're storing the data on AWS and generating your keys on AWS and managing your keys with AWS ... you're doing it wrong.
This is a reasonable thing to do if you've decided that you trust AWS and expect any violations of this trust to be dealt with by the legal department.
It's less reasonable if you're concerned about AWS employees going rogue and somehow breaking the security of KMS without anyone else knowing.
It's even less reasonable to do this if you're concerned about AWS credentials being leaked or compromised, which in turn grants an attacker access to KMS (i.e., a government would be more successful by compelling IAM to grant access than they would trying to subpoena KMS for raw keys).
(Sure, you can audit access via CloudTrail, but that's a forensics tool, not a prevention tool.)
But that's kind of the point I wrote in the article, no? You need to know your threat model. You've stated yours succinctly, and I think it's a commendable one, but many enterprises are a bit more relaxed.
> It's less reasonable if you're concerned about AWS employees going rogue and somehow breaking the security of KMS without anyone else knowing.
That's the least of the concerns. Remember AWS is subject to court orders of all types (legitimate ones and NSLs). Even if nobody goes rogue, any data that AWS (or any cloud/SaaS provide) could access, must be assumed to be compromised.
> Comments in this thread are actually discounting the threat of Amazon personnel, GCS personnel, etc., as if that threat was zero.
Thank you for this comment, this continuously blows my mind.
If one hands over cleartext data to some third party, one must assume they will abuse it. They might not, but "might" is not a convincing control in a threat model. If they can, one must assume they will. And plan accordingly.
Sure, sometimes it is ok. But any threat model that assumes a third party that could abuse the data but will never do so, is flawed.
It is vital to remember two things: Even if one 100% trusts the current management of a company to do No Evil, management can change. Second, regardless of managment, a company is subject to subpoenas and NSLs, so your data will be handed over.
If you don't directly control the generation, storage and usage of all encryption keys, your data is effectively in the clear.
Hey I do exactly this! Borg into rsync.i know it’s not exactly your domain, but are you guys working on making this flow easier? I found borg difficult to set up, and tbh I don’t really trust my setup long term. All the borg tutorials I found are on random people’s hobby blogs, it would be nice if you had one specific for rsync.net
Either way, Thank you for providing such an affordable service !
The volume/pricing entry point for borg is really inviting. I would love see the same things for the zfs part. Sadly 5Tb is a steep entry point for me.
That's workable for many situations which only require data "storage", but the moment when you require cloud-side data "processing" too, the situation changes.
I would agree that if you have no use-case for cloud-side data processing, then the cloud doesn't need the encryption keys; however there are a lot of use-cases for data for which processing is highly advantageous.
For example, if you just want to store data in the cloud, sure, use client-side encryption. But what if you want to query it with, using an S3 example, S3 Select or Athena or load it into a data lake? There are a lot of things one might do. It's useful to be able to query data from within the cloud – without having to transfer it out and separately decrypt it.
This use-case is likely one of the primary reasons why customers are willing to trust cloud providers to manage the keys too – for many data sets, though not all. And that's what technologies like cloud KMS are good for. Access to the actual keys is tightly limited within the cloud provider, both among employees and technologically, and not even other cloud services are capable of accessing them (either accessing the keys or encrypting/decrypting using them) unless you grant them explicit permission, with internal technology that enforces this.
Maybe you didn't intend on using "new fancy foo service" to process your stored data yesterday, but today it sounds good, so you can alter an access control grant and allow it to interact with your existing stored data and the necessary encryption keys. Maybe every time some new data arrives in S3, you want to process that data with Lambda and synchronize it with another system. Then it's helpful if you can grant Lambda permissions to decrypt the data so that it can process it :-)
Generic server-side encryption does do something: it protects against threats that exist at the data center layer, such as certain attacks through lower layers of infrastructure that don't necessarily involve a full compromise of the machines involved, or inappropriate disposal of old storage devices, mistakes by employees, etc. It protects against some insider attacks, just not those by capable administrators.
> however there are a lot of use-cases for data for which processing is highly advantageous.
Most use-cases require 'processing', otherwise there would be no point of most cloud services and we would only ever use, say, S3.
The moment you spin up even a virtual machine, it will need to be able to decrypt the EBS volumes you have attached to it. Even if you encrypt the filesystem with your own mechanism and feed it the key by hand every time, it is still available in the instance. Instance memory is not easily available to AWS employees, but neither are the physical media that backs EBS volumes(whatever it is, details are scarce and all pieces may not even be physically in the same building). If we are concerned to that extent, then we can't use any other service. Which is ok, there are plenty of business that have requirements that disallow the use of any cloud providers.
KMS is a money printing racket for AWS. It only really helps you if someone walks out of a data center with your hard drive. For everything else the attackers are just going to get your IAM creds which have open access to all your keys.
So many times people forget that encryption does not solve the confidentiality problems. It just translates it into key management problems. And this is such an amazing example I'd use in the future.
I've work with physical servers in many compagnies. Every time, storage devices have a lifecycle : they are bought new, then plugged (and some data are written), then some times then move in another physical location, then they "die" (= put to the trash, either because they broke or for other reasons).
Encryption at rest is an efficient way to secure data for the latter cases.
The workers stole a hard disk ? No data is stolen.
A hard disk is lost during transit, somehow ? No data is stolen.
The device which is broken is retrieved by someone, opened-up, and being read-at directly ? No data is stolen.
Your old device, put to the bin for some reasons, that could still be read if plugged on the proper hardware ? No data is stolen.
All of that with little performance impact, and no software modification, few engineering overhead, very little work to do. It eases the lifecycle of storage devices, because storage devices are now worthless per-se (except for their physical cost, indeed). They carry virtually no data, no worth.
Can you propose any other way to protect against those thread models ? Rewrite every software, so that every programs handle their own private keys ? Yeah, that's a nightmare, not gonna happen. And even if it did .. how would you encrypt your rootfs ? Ha yes : encryption at rest :)
...which is a concept nearly anyone working in IT understands.
I don't think the vast majority of people with IT/ops experience seriously thinks that encryption at rest provides data protection from people getting unauthorized access to the system, in person or remotely...aside from maybe management that ended up in charge of IT and engineering departments without almost any practical-skills background in either.
The author is confusing "it costs us nothing (now that encryption can be done in hardware and is integrated into most desktop operating systems) and protects in some scenarios, so yeah, we just decided to mandate it always be done" with "PEOPLE THINK ENCRYPTION AT REST IS A MAGIC BULLET LOOK AT ME I'M INSIGHTFUL, POST LINKS TO MY BLOG ON LINKEDIN!"
The whole post is insulting to the intelligence of even a fairly junior desktop support technician.
> The author is confusing "it costs us nothing (now that encryption can be done in hardware and is integrated into most desktop operating systems) and protects in some scenarios, so yeah, we just decided to mandate it always be done" with "PEOPLE THINK ENCRYPTION AT REST IS A MAGIC BULLET LOOK AT ME I'M INSIGHTFUL, POST LINKS TO MY BLOG ON LINKEDIN!"
What in the article gave you that impression?
I do not hold this confusion in my mind, nor did I deliberately encode such a statement in my blog. I'm curious why you think this is what I was saying.
> The whole post is insulting to the intelligence of even a fairly junior desktop support technician.
If that was true, every time someone posts "Show HN: My Hot New Database Encryption Library in Haskell", they would be mitigating the confused deputy attack by design, rather than what we see today: Namely, failing to even protect against padding oracle attacks.
That's what the article was actually talking about.
Comments like these are doubly great, useful tech perspectives, and a reminder that the comments often do not reflect understanding of the original submissions.
Seems to totally miss the point, but I still upvoted because it illuminates the purpose of disk-level encryption so well, adding color to the conversation.
>Important: I’m chiefly interested in discussing one use-case, and not focusing on other use cases. Namely, I’m focusing on encryption-at-rest in the narrow context of web applications and/or cloud services.
In web applications and cloud services drives could still be misplaced, stolen, or improperly disposed of.
Further, if data is encrypted at rest then there are multiple levels of auth that must fail for a breach to occur, namely access to the data and access to the key.
Definitely true and a layered defense against data loss / theft has obvious advantages. But take for instance a small SaaS running on a cloud PaaS (e.g., AWS, GCP etc). What it the likelihood of improper disposal of a hard disk? And then what is the likelihood this improperly disposed of hard disk survives the process of removal / improper disposal to then be found by someone nefarious? And then, what is the likelihood that that particular drive was in a volume that contained anything sensitive.
Then what is the cost / overhead / complexity / other cons of adding encryption at rest? Cyber budgets often go crazy I see so many clients that are buying tech based on marketing hype or the security teams lusts for cool tech rather than what reduces the risk the most for the dollars available.
Last time I implemented encryption at rest (when I worked for a small cloud provider), it was as easy as adding an option when creating the disk.
The option triggered the implementation of a dm-crypt layer between the physical device and the upper storage layers. Crypto keys were stored in the storage system. Once revoked, the whole server was rendered useless (from a data thief point of view).
We benchmarked the stuff a bit. Indeed, there was a loss. While dm-crypt uses AES (with hardware acceleration) and since we had multi-hundreds of thousands IOPS per device, we did not care.
While threat modeling, you talk about specific scenarios and specific threats. That does not mean other scenarios and threats don't exist. It just means they aren't the focus of that particular conversion.
>In web applications and cloud services drives could still be misplaced, stolen, or improperly disposed of.
This is explicitly called out in the article by the author, despite it not being part of the threat model the author is examining. And people are still bringing it up like some sort of gotcha.
See (again):
>This is not a comprehensive blog post covering every possible use case or threat model relating to encryption at rest.
> I’d say the author is being so restrictive in the scope of threats that it isn’t very useful.
Loss of control of the hard disks may have many different ways it can manifest in the real world, but from a cryptography and software development perspective, is congruent to other flavors of the same underlying problem.
That's not being "restrictive", it's recognizing the common denominator.
The problem is that after that common denominator is recognized, the post implies that it is outside the threat model of "web applications and/or cloud services", when it is not.
It doesn't need in-depth discussion, and the way data is still highly exposed despite disk encryption is very important, but that implication is not great.
> Rewrite every software, so that every programs handle their own private keys ? Yeah, that's a nightmare, not gonna happen.
PCI 4.0 disallows using Full Disk Encryption to fulfill its encryption at rest requirements, because it demands a finer-grained encryption authorization model. For applications subject to PCI 4.0 compliance, my clients have to encrypt beyond the application level, and down to the service account level used by the application (most of these applications use multiple service accounts for representing teams that use the applications), with different files encrypted for different authorization groups.
There are very few enterprise scale solutions addressing this at the moment, and the stopgap I've seen is indeed the application teams are resorting to handling their own keys if the enterprise doesn't offer a finer-grained solution. The QSA's don't yet understand how seismic a shift this is, nor how inadequate the assumed authorization schema is for the many edge cases out there that will be unable to adopt the assumption a single key for a single group is sufficient to model all use cases. It will be a little messy until folks sort through the edge cases coming down the pike.
You’re absolutely right! Full disk encryption (FDE) does indeed fall short of PCI 4.0 requirements because it transparently decrypts data for anything running on the OS or accessed by the user, which isn't sufficient for finer-grained control.
PCI 4.0 demands more granular encryption authorization models, where data is either encrypted at the file or column level, or access to plain data is restricted to more specific roles than just the OS user account. This is to reduce the risk of unauthorized access.
There are a few dev-centric data protection solutions that address these requirements seamlessly. These solutions allow you to store and encrypt any type of data with simple APIs while transparently managing keys, rotation, and granular access controls.
I am currently evaluating Piiano Vault (https://www.piiano.com/pii-data-privacy-vault) as one such solution for a new product that I am currently working on that would require a PCI compliant zero-data solution. Seems like its delivery model is very flexible - it can be fully self-hosted or consumed as a managed SaaS, providing robust data protection tailored to my specific needs.
For this thread model, the encryption should happen at the volume level, not application level.
Which is also what the author write:
>If you’re only interested in compliance requirements, you can probably just enable Full Disk Encryption and call it a day. Then, if your server’s hard drive grows legs and walks out of the data center, your users’ most sensitive data will remain confidential.
> Unfortunately, for the server-side encryption at rest use case, that’s basically all that Disk Encryption protects against.
(Something tells me that you did not read the article that is so narrow minded)
It sounds to me like you read a different article than I wrote.
The point of my article was not "Encryption-At-Rest Is Bad" as you seem to have taken it to mean.
Rather, the point is that other techniques, when you sit down and actually think them through, do not provide any significant value over just phoning it in with Full Disk Encryption.
How you get from "software libraries that encrypt-at-rest routinely fail to provide value on top of full disk encryption" to "Scott says FDE is bad" is unclear.
Additionally: From a software perspective, the risks you all outlined are morally equivalent to the hard drives grew legs and walked because they are the same risk; namely, loss of control of the actual hard drives.
The article in question is focused on threats when those drives are plugged in and the keys are being used to encrypt/decrypt data. As several others have pointed out already, I explicitly state this, repeatedly. I don't know how to make it more clear.
All the risk vectors you list yourself are about physically being able to get the disk. The author states that "growing legs and walk out" (understood as people being able to physical access to the disk) is already mitigated by disk encryption.
So not only are you not adding anything to the article. You actively try to dismiss that the author has thought the cases you bring up through (and calling them out as narrow minded on false grounds).
I think we all would love to see a risk that is mitigated by encryption at rest and is not already being mitigated by disk encryption.
It also defends against cultural indifference to privacy violations. If you go to Reddit r/sysadmin, most SREs and DevOps folks generally do not care too much for challenging subpoenas and government data requests. If a three letter spook shows up to their data center and demands data, it's just another Tuesday for them. IT people are very different from software engineers who are more likely to protest online and offline about civil right violations and government overreach.
But the SRE/Devops team would certainly have access to the decryption key. Seems a little bit weird to encrypt to just stop another team from doing their job. Wether data should be handed over to the govt or not should probably be a company decision and not the dev teams anyway.
Most companies don't really think too hard about these decisions, they just go with the easiest route. If the upstream hardware and software enforce encryption by default, very few companies would go out of their way to specifically try to disable that functionality if doing so is very tricky.
Different levels of encryption at rest play different roles in your particular scenario.
If you use a cloud service and they have an encryption at rest feature that you enable, the default is for them to control the key. Or, for Azure, put a "customer controlled" key in Key Vault. But again, that's their service. In this scenario you're only protected against people physically in the data centers not their DevOps folks. But, that feature gives you a checkmark for SOC2 or other regulations...
The problem protecting against DevOps folks at a cloud is:
1. You are burdened with putting a key somewhere outside of their cloud. If you do something at the filesystem level like enable bitlocker on a VM you're going to experience pain during reboots and it's not possible on a root volume if you're not given a console.
2. You can't do this on a cloud service like RDS. You'd have to do row level encryption with your application doing the decryption/encryption. But, your application has to have the key and now your back to #1. The VM with the key needs the root disk encrypted or the drive where the key is store encrypted. And again, you're not able to use a cloud service like EKS or App Service you're stuck with VMs.
Generally, I tend to just stick with regulation requirements which protect against the physical hard disk.
This is a weird take. For one, handing over data because someone in power told you too is the exact opposite of power tripping. It's pretty much the entire description of disempowering.
But secondly, the idea that anyone who actually has a say in if the data is turned over (the legal team, executives) give a flying fork about how a random sysadmin in their data center feels about it is wildly off base. The data is going out or not according to legals instructions. Either sysadmin Bob does it, or he takes a principled stand, gets fired, and Bob 2 takes care of it.
I'm a sysadmin and I have no idea what you're referring to. Yeah there are guys in my field who are happy to overreach in terms of data access, but I'm sure we can come up with some anecdotes about devs over-harvesting data on...virtually any modern online platform.
Most of my colleagues (at least those I've met) tend to be conscientious about data access and privacy. I host several E2EE services for friends/family because I believe in privacy and data sovereignty. But if my company's legal department says "give access to <these guys>," as a sibling commenter said, if I refuse then I'm fired and those guys will still get access. So I do what legal says and keep my job, TYVM.
> A hard disk is lost during transit, somehow ? No data is stolen.
Arguably, customer data should not leave a secure location on a physical disk. If you want to move your server, backup the data, wipe the disk, move the server/disk/etc, and then put the data back on the disk once it is back in a secure location. The author mentions this in the article - define a secure boundary, don't let the data exit that boundary unencrypted.
Never underestimate the bandwidth of a stationwagon loaded with tapes.
Arguably, your position is bogus. Why cause a bunch of extra copying around to move a server. "Back up" indeed - to what? Another disk? It's turtles all the way down.
> Never underestimate the bandwidth of a stationwagon loaded with tapes.
I don't. I know it has amazing bandwidth and it might be a lot faster than network transfer.
> Arguably, your position is bogus.
That's not nice.
> Why cause a bunch of extra copying around to move a server. "Back up" indeed - to what? Another disk? It's turtles all the way down.
Yes, to another disk, in the same data-center or in another one. If the data is encrypted end-to-end, it might be better to transfer it to another data center than to move the data between data-centers on physical disks for security reasons. It's less likely that an adversary is able to execute an attack where it is able to pin-point the exact moment and the exact data they want is being backed up between data centers. On the other hand, organizing a heist of a few trucks filled with physical storage leaving the data center might be easier to pull.
In these two scenarios we expect the encryption to be equally hard to crack for data at rest and data in transit, which is usually not true for performance reasons.
If the device is lost or stolen during transport physically, all it takes is the attacker to gain control of the key, or find a weakness in that firmware or encryption method. While those are stupidly improbable, the reality is once you lose physical custody, the attacker has a lot more options.
That doesn't have anything to do with the customer's data and isn't helped by the suggestion of wiping the drives before transport. The data is encrypted no matter what happens to firmware; that is at best a way to compromise the server(s) after transport.
If you're using a secure disk encryption technology, and you manage to clear the keys from the TPM or overwrite the header containing the KDF salts and other metadata, that should render the device data unrecoverable.
laptop theft is actually far worse a problem than data center. well, not in terms of scope, but in terms of frequency at least. you can actually expect laptops to be stolen.
i think you misread the article. i found it to be a great, even excellent, discussion of WHY and then explaining HOW in terms of the WHY.
In my experience, a lot of the motivating factors for large enterprises mandating encryption at rest aren't about specific security controls. They will often hand wave in that direction, but as as the OP says in their post, without being able to describe a coherent threat model.
Instead a lot of motivating factors I've seen are about preventing various paths for "legitimate" data disclosure to third parties. For example, when data at rest is combined with additional requirements like "bring your own key" it means a subpoena or NSL needs to be served on the _first party who owns the data_ (as they need to provide the keys) and can't be served on just the cloud provider without the first party having at least visibility of it.
Salesforce has an addon product called "Shield" that cost 30-50% of your overall license cost. It allows for encryption at rest at the field level (not all fields are supported and introduces query limitations). Companies purchase this add on thinking it makes them more secure, but it essentially does nothing to protect your data from exfiltration. If Salesforce's raw, multi-tenant data stores are leaked it seems unlikely that your company is going to be the ones taking heat for it. The only reason to go through this trouble is to check the regulatory box. Also it seems like Salesforce should be encrypting the entire disk at rest. Instead they created this feature to charge a premium to those with regulatory requires and try to shift the liability.
Markup aside, your description of what Salesforce is offering is what the article is saying should be done. Encryption of the disk at rest doesn't do anything for data exfil situations; it protects against physical theft or improper disposal - only.
Sorry, still reading so having gotten there yet. How does Salesforce offering a "light" version of encryption at rest improve security? Or are you saying it's a better balance of performance / security by only selectively encrypting specific data points?
Anyways, the improved security comes from the fact that even when the server itself is improperly accessed (maliciously or not), the data you aren't currently accessing remains encrypted.
With (just) full disk encryption, you aren't protected when the (running) server is accessed. All of the data can be exfiltrated in plaintext.
Gotcha... so basically encryption of disk at rest prevents someone from walking out with a drive...
Encryption "at rest" in the database prevents someone with server or direct db connection from pulling the data.
I had never really thought of those as two different vectors, but of course they are. Thanks for clarifying!
With Salesforce and how a lot of these companies manage their security model, I'm still confident that investing in securing unauthorized user access is still orders of magnitude more useful than putting time and effort into this vector.
>I'm still confident that investing in securing unauthorized user access is still orders of magnitude more useful than putting time and effort into this vector.
These are addressing two different scenarios, so they should be mitigated separately. In one case, you are mitigating against unauthorized access. In the other, you are mitigating the damage that can be done when someone has already gained unauthorized access (however that occurred). After all, the only system immune to unauthorized access is the one that doesn't get powered.
"Defense in-depth" is thrown around a lot, but it really is important. I do agree though, when it comes to priority of implementation, I would start with protecting against unauthorized access first.
I don't disagree on a conceptual level, but on a regular basis I deal with companies completely lacking any real access model, users without MFA, blanket admin level access, etc... getting sold on this particular product and something spending 7 figures to adopt it.
It sounds like they are using/implementing something similar to SQL Server Always Encrypted[0]. This basically works by encrypting specific fields using a certificate that needs to be supplied by the connecting SQL client (application). Obvious limitations is that you can't use the fields for sorting in queries (ORDER BY), and depending if deterministic encryption is not enabled, you can't use it in filters (WHERE) either. Same applies for any T-SQL logic on the data fields - because the encrypted blob is opaque to SQL Server - it is decrypted client-side. There is no workaround, except for pulling the data locally and sorting client-side.
> Obvious limitations is that you can't use the fields for sorting in queries (ORDER BY), and depending if deterministic encryption is not enabled, you can't use it in filters (WHERE) either. Same applies for any T-SQL logic on the data fields - because the encrypted blob is opaque to SQL Server - it is decrypted client-side. There is no workaround, except for pulling the data locally and sorting client-side.
It sounds like it is in addition to full-disk encryption, not instead of it.
Encrypting each field with a distinct key that an attacker cannot glean by simply exfiltrating all the data on disk and/or all the data in RAM protects against online attacks in a way that full-disk encryption cannot.
The real question is: does Salesforce do this properly?
It’s certainly possible that there’s a valid oversight here, but Salesforce has a rather talented security team, and the company truly lives by “Trust is our #1 value”^1
I can’t speak for the implementation, but my guess is that it’s been very thoroughly vetted by both internal security and external pen tests. They wouldn’t market a high profile security feature without that.
SaaS vendors charging a big premium for customers locked in that have compliance requirements is nothing new; it’s basically a standard play in the rentseeking startup model:
I'm a huge fan of MSSQL's transparent data encryption in situations where we have more than 1 client per database engine, and we want to separate the data physically.
I worked on a project where separate databases (a la Postgres) tied to separate clients wasn't enough. Postgres still can read across the databases.
With TDE we tie the key to the individual clients meaning even if the engine messes up, since the connection isn't made with the right key, you still can't read the contents.
We still did encryption at rest as that comes for free these days, for reasons mentioned here in the comments.
I just wish Postgres would come with TDE. Paying for software is fine, but the cost of MSSQL is way more than $0. In fact, it's usually cheaper to set up a Postgres instance per client. That way, when the engine messes up, well, there is only data of that client.
And I know a database is unlikely to mess up. I'm more likely the culprit, and as such I prefer to have my guardrails.
Funny I always considered the impetus of Encryption at Rest to be the "left my laptop in the airport scenario". Or, maybe, "Where'd that CD with all of the medical records go?"
Originally, I never felt that the EAR issue was that germane at data centers, since you're mostly protecting against someone backing through the loading door with a truck and stealing trays of drives.
The disposal issue is valid, that was just something we were diligent with using a secure disposal service.
Key management has always been an issue. Since no one wanted to be there when the machines spooled up after a glitch at 3am to type in a password to open the volumes. Everything else is basically locking the file cabinet drawers and taping the key to the back of it.
Nowadays, it's different. Its more ubiquitous. Encrypting a laptop is a mouse click, and painless after that. Cloud providers have the infrastructure to manage it at their level.
I'm still now sure what the solution is for a "self hosted" infrastructure. Hardware key modules are still pretty expensive. I haven't (not that I've looked at all recently) seen a writeup of how best to set of encryption on those 4 Dells my friend has racked across the country in Georgia (though even modern machines have some level of volume encryption I think).
The author used the term "unknown unknowns". This is a variant of the way I talk about this state:
"you don't know what you don't know".
There are two audiences for this articles then, the ones who know what they don't know (or know it all), and those like me who are ignored-squared.
I found it extremely helpful to help me "know what I don't know". If you have no idea what "encryption at rest" is or why it is important, then this is very useful and helpful.
As the author clearly states it does not cover other things you don't know, like why and how to use full disk encryption.
That said, it would be helpful for us i² [i squared] folks to have some of the more basic terms explained. Although "encryption at rest" is somewhat understandable, it would be pleasant to have it explained. For example what are the other kinds of encryption that are not "at rest"?
There are a bunch of diligent amateurs out here, ones who know we don't know what we don't know and are attempting to build cryptographic things. We learn not to create our own implementations for example, but articles that specifically address us are good.
> That said, it would be helpful for us i² [i squared] folks to have some of the more basic terms explained. Although "encryption at rest" is somewhat understandable, it would be pleasant to have it explained.
That's helpful feedback, actually. What terms seemed opaque or misleading to you as you read it? I'm always happy to fix mistakes in blog posts to improve clarity.
> For example what are the other kinds?
I contrast "encryption at rest" with "encryption in transit" (i.e., TLS) and "end-to-end encryption" (E2EE for short; i.e., what Signal gives you).
As writing advice, it went from very understandable and approachable to stuff like:
"You can get this property by stapling HKDF onto your protocol (once for key derivation, again for commitment). See also: PASETO v3 and v4, or Version 2 of the AWS Encryption SDK.
It may be tempting to build a committing AEAD scheme out of, e.g., AES-CTR and HMAC, but take care that you don’t introduce canonicalization risks in your MAC."
I would almost suggest breaking stuff like this into two articles, one which is very technical and correct, and one that conveys the high-level message. The high-level one can link to the technically correct one whenever the urge would come to explain something more fully.
Thanks for the reply. It is an interesting question, because the truth is that I don't know. At one point, very long ago, (VLG) I knew nothing about computers at all but would by "Byte" magazine (I told you VLG) and read it. It was complete gibberish. Then after some time (six months?) I could suddenly understand.
I actually did learn quite a bit from your article. The use of tech terms like HDKF & AEAD was helpful rather than a hindrance. For example the phrase "stapling HKDF onto your protocol" is surprisingly helpful. I looked up HKDF, and the use of "stapling" gives me a concept of how it is used. So good.
Going back over it, I believe the real problem is that your previous post, "Lucid Multi-Key Deputies Require Commitment" is required to reading (or knowledge) for this post. Once I read that much of this was easier. You hid that reference is in the "why should you believe me" and yet to my reading the this article builds on that one. You define the terms and provide a context that is missing.
So concrete suggestions (easy to come up with after the fact!):
- ask yourself if this is a continuation of a thought, topic, etc and give refs if so.
- in addition to "why you should believe" maybe add a section "for i²" readers :-)
Cool things are ones where you never see the world the same. Just to reiterate, I don't see the world of cryptography the same after reading your article, despite the quibbles, so thanks.
You're the author? Cool, I liked the article a lot. You do use a lot of terms that won't make sense to non-crypto nerds, like OAED, IND-CCA secure, etc. A lot of posters here are fixating too much on the "stolen hard disk" picture which I think the article addressed by declaring it out of scope. So the real points aren't getting through.
> A lot of posters here are fixating too much on the "stolen hard disk" picture which I think the article addressed by declaring it out of scope. So the real points aren't getting through.
This is a crypto nerd blog post though. The whole point is to talk about cryptographic library design!
As you identify in the second paragraph, it is not true to say that “you don’t know what you don’t know.” There are indeed things that you do know that you don’t know. I know I don’t know how to fly a plane.
Basically I am complaining that you choose to rephrase a perfectly good description of a problem (with a lot of history) into an incorrect statement.
So it would be more accurate to say that “there are some things that I don’t know that I don’t know.” However, “unknown unknowns” and “known unknowns” are more succinct. And, thanks to that idiot Rumsfeld, more commonly understood.
I get what you are saying that it is more succinct. However the next statement "it is not true to say ..." confuses me. In fact there are many cases where I did not know that I did not know something important.
I was cross country skiing across a frozen lake. The ice was 24 inches thick. I knew it was that thick. I did not know that where a lake narrows there can be current that prevents ice from forming. Moreover, I was unaware that there were major issues about frozen lakes that I was unaware of. I was not aware there was any risk and therefore came much to close to falling through the ice. Those little "crack crack crack" sounds stick with me. There have been other examples when I was unaware that I was ignorant of life threatening issues.
My use of that phrase is perhaps particular to the fact that most of my realization of "unknown unknowns" has been when they are life threatening.
I think reason is simple "encryption at rest" == "it is going to be encrypted in backup".
People asking about "encryption at rest" are really asking if backups of your web application data are encrypted.
Earlier I think it was quite a plague when just un-encrypted backup files were leaking out because someone exposed them on some open FTP to "quick and dirty" copy backups to new environment or to some test environment - and forgot to close it down or remove the files.
Other threat would be developers exposing database server directly to the internet because someone from marketing wants to connect "new shiny super business intelligence" and developers not knowing better than "allow all" on firewall, then someone might steal raw db files but might not really have access to web application and encryption keys.
For the reasons mentioned by author I can see how it seems like security theater. But I think my reasons are quite valid and on topic.
The thing that's security theater isn't encrypting at rest in general.
The thing that's security theater is encrypting insecurely, or failing to authenticate your access patterns, such that an attacker with privileged access to your database hardware can realistically get all the plaintext records they want.
>> The thing that's security theater isn't encrypting at rest in general
> The thing that's security theater is encrypting insecurely
Security theater should be defined as:
Doing things that outwardly appear to improve security but have de minimus or less effect on actual security.
The 93 section questionnaire from bigco's IT department is security theater. Filling it out does zero to improve security for bigco or myco or my users.
IDK, I have multiple times seen significant practical security improvements as a direct consequence of some "93 section questionnaire" because the very first section had a few questions "Are you doing this simple, well-known best practice thing?", which they were not, because it took some time, effort and/or money and they just didn't care.
But once the questionnaire mattered, they started doing it just so they could legally answer "yes" to that question. Things like finally changing the default admin passwords on that service they installed a year ago, and testing backup recovery to find out that it actually can't be done due to a bug in the backup script skipping some key data.
> Doing things that outwardly appear to improve security but have de minimus or less effect on actual security.
Right. And that's exactly the situation the article describes.
The accusation of "security theater" was only levied when IT departments reached for the "full disk encryption" potion to mitigate the ailment of "attacker has active, online access to our database via SQL injection", when that's not at all what it's designed to prevent.
They can insist that they're "encrypting their database", but does it actually matter for the threats they're worried about? No. Thus, security theater.
The same is true of insecure client-side encryption.
I wonder if OP works in healthcare, after HIPAA passed encryption at rest was the buzz word of the decade as it was one of the primary requirement of HIPAA.
The problem of course being, most health care breaches are on applications that aren't at rest so all the data was being stolen anyway.
From 2012-2021 I worked in health tech and on many calls with large customers and security questionnaires were on whether we were actually storing their data encrypted at rest. We even had to get audited for Aetna to validate encryption at rest (amongst other things). To me this seemed like such a joke of a requirement because all our data was in AWS, and breaches were far more likely from other avenues.
So to me this reads as a jaded SWE or CISSP who has dealt with how much attention this one attack vector is paid, but ultimately it is kind of a given now in modern cloud infra.
When I joined Amazon, the team I was hired on was called AWS Crypto Tools (which owned the AWS Encryption SDK, among other developer tools), while another team was called Transport Libraries (which owned S2N).
When I left in 2023, they started adopting more of the "Encryption At Rest" lingo for what was previously called Crypto Tools. I don't know if they landed on different verbage since then.
> after HIPAA passed encryption at rest was the buzz word of the decade as it was one of the primary requirement of HIPAA.
Andy Jassy decided that it was time to Return To Office.
I was hired in 2019 as a fully remote employee. This means my options in 2023 were a) move to Seattle or b) walk. I chose to walk.
The leadership of the Cryptography org fought tooth and nail to get an exception for me, but were unable to do so. I still hold everyone in AWS Cryptography in high regard.
This might sound like a stupid question, but I'll ask anyway:
What benefit does encryption at rest solve for something which is never intended to actually rest?
For example, a MySQL database powering a web application is expected to be alive and responding to requests 24/7. It's never really intended to be at rest.
So what benefit does encryption at rest bring? Won't a hacker be attempting to take data when it's online (and therefore not resting)?
Your RAID reports an issue with one of the disks. You disable it and have the DC staff swap it. What happens to the disk? It might be resold. It might be put into the next server. It might land in an office drawer with a label "to be wiped".
> What benefit does encryption at rest solve for something which is never intended to actually rest?
That's the point of the article, luckily enough. "Disk Encryption is important for disk disposal and mitigating hardware theft, not preventing data leakage to online attackers."
> So what benefit does encryption at rest bring? Won't a hacker be attempting to take data when it's online (and therefore not resting)?
"At rest" is used to contrast with "in flight", where the data is being transferred between computers. So data "in flight" is protected by HTTPS etc, once it has been transferred it is "at rest" on the destination computer (even if that computer is still online).
Intention becomes victim to reality easily, often with no input.
I haven't read the post yet, but I feel like expecting a perfectly well-defined model is kind of in bad faith. Defense in depth is established, FDE is one tool among many.
By focusing on the hacker we forget about the person who may get the equipment downstream; procedures/processes fail, and so on.
It's preparing for the unknowns. Sounds like paranoia? That's the job! Defending against human nature - malice, forgetfulness, etc.
I guess I'll close with this: you're the only one who can make your security assessment. What's important, what's at risk, and so on. It's trade-offs all the way down.
I should clarify that I see the value of encryption at rest for something like an employee laptop, which could be left at a bar (while powered off) by accident.
I just don't get the value of it for always online servers.
You can remove storage devices from online servers, without interruption. Said devices will contains data that could be "lost" that way. Hence: encryption at rest.
An intruder gains access to an API box, and could try to read sensitive data from a DB. But the interesting fields are encrypted, and the key is somewhere in RAM. Not impossible to exfiltrate, but takes much longer time and more skill, thus cannot be made an unattended malware payload. Also, a key for one customer won't give access to data of other customers, even if the common database access credentials are obtained.
I've written a whitepaper about our encryption at rest at work, which includes a bit of considerations about the threats considered.
The first important part is: Encryption at rest protects higher levels of the stack from access by lower levels of the stack. If your attack is working at an application level - e.g. you have a database connection - encryption at rest is no tool to deal with this. Encryption at rest is more about protecting the database from an attacker with physical access.
The second consideration is: There will always be a tradeoff between availability and security when dealing with encryption at rest. Manually decrypting a system is very secure, but if that system goes offline at 3am, it'll be offline until the device is decrypted. Automatically decrypting a system using Clevis/Tang, TPM, Bitlocker and such gives you more availability but could make it possible for an attacker to access your data if they have sufficient control.
And that's the third consideration: If an attacker has sufficient control, they can defeat automated decryption of encryption at rest, and they may have ways to start attacking manual decryption of encryption at rest as well. Like, you might be able to start looking at memory contents, disk writes and start doing some differential cryptoanalysis to attack encrypted data and such.
But with all of these three together, you arrive at the goal and security level we have formulated for our encryption at rest:
Our encryption at rest is supposed to defend us against employees of our hosters getting access to one or two of our virtual or physical storage devices. If they have access to one or two storage devices, they must not be able to access customer data on the devices.
Naturally, the question is: But what happens if they have more drives?
Well, the simple answer from the whitepaper is: That's a problem for the lawyers.
Handling 1-2 of our drives is entirely arguable as daily business. Swapping drives via remote hands or dismissing dedicated servers with 2 drives at a specific hoster happens a lot and then they handle 1-2 drives, and our goal ensures they cannot gain access to customer data then.
However, if a datacenter tech starts pulling 3 or more drives and starts analyzing data on them together, that's outside of normal operational procedures and we can start considering that an attack and start sueing them. Or they are being directed by law enforcement. Both are issues for the legal teams though.
At least that's our view. Encryption at rest works in a very different set of circumstances and mindset than other security topics in a software stack.
My thought here is that not all of the data in the database is being accessed at the same time, so the un-accessed data is "at rest". Is that correct, or am I barking up the wrong tree?
Assuming full-disk encryption is in use (LUKS, TrueCrypt/VeraCrypt, BitLocker, etc.), there is enough information held in RAM to decrypt the entire disk. If the attacker gains access to a privileged user, or at least to a user allowed to read the file system (such as the user running the database), they can exfiltrate the unencrypted contents of the disk, regardless of what the DB software is actively accessing.
Encryption at rest is nice for when a device has to get retired. Without the key, the drive is indistinguishable from random noise. No longer do you need to run DBAN for hours, put the drive through a degausser, or drill holes in it. No worrying about plaintext data hiding due to relocated sectors or wear-leveling. Just purge the keys and you’re done. Then the drive even has a chance at getting responsibly reused.
Yeah though SSDs make that even easier, with the aptly named SECURE ERASE command. Modern SSDs encrypt the contents of the drive at rest _anyway_ (transparently, using a key that's baked into the hardware) as encryption algorithms are very good at removing repeated patterns that might degrade the flash over time.
> The first question to answer when data is being encrypted is, “How are the keys being managed?” This is a very deep rabbit hole of complexity, but one good answer for a centralized service is, “Cloud-based key management service with audit logging”; i.e. AWS KMS, Google CloudKMS, etc.
This is of course the beef. What's the best practice in managing user data keys so that data is available only when there's an authenticated user around? There are ways to derive keys from the secret exchange involved in user authentication.
> What's the best practice in managing user data keys so that data is available only when there's an authenticated user around?
What does it mean for an authenticated user to be "around"?
If you want a human to manually approve the decryption operations of the machine, but it can still store/encrypt new records, you can use HPKE so that only the person possessing the corresponding secret key can decipher the data.
At least, you can until a quantum computer is built.
A working definition for some apps could be: The user's data should not be available to the system if there isn't an active user session, such that the user's privacy interests are cryptographically protected in event of a breach or data leak occurring when the user is not actively using the system.
I wasn't thinking of manual approval of any cryptographic steps. Just that when you log in to work on your data stored in the system, the system can only then decrypt the data, and when you log out, the system forgets the keys until next time.
Okay, this sounds vaguely like a problem that may be solved by "HPKE where the secret key is reconstructed from a threshold secret sharing scheme" (>=2 of N shares needed, 1 held by the service and 1 held by the employee's hardware device, where 1 additional share is held in cold storage for break-glass reasons).
I would need to actually sit down and walk through the architecture, threat model, etc. to recommend anything specific. I'm not going to do that on a message board comment, because I probably am missing something.
Full disk encryption or similarly transparent data encryption at the database allows you to continue to use the database as a full database.
Decrypting on the "client" (app server) means you can't really use its native query language (SQL) effectively on the encrypted columns.
Not sure what the state the art is in searchable encryption for db indexes, but just trying to do stuff that requires a scan becomes untenable due to having to read and decrypt on the client to find it or aggregate it.
The security sandbox becomes app server memory instead of db server memory preventing effective use of the locality on the db server. I didn't see the article address this.
Can make sense to encrypt specific sensitive columns that are not used for searches or aggregations later, but many systems the reason you have discrete data columns is to query them later not just retrieve a single record to decrypt and view on screen vs just storing a single document and encrypting / decrypting.
Disk encryption is easy to use without reducing the functionality of the DB, client encryption specifically and purposely handicaps the functionality of the DB so its use case is very narrow IMO.
I tend to treat both the app server and db ram as unencrypted so they require good access controls to use them (don't let Bob run sql queries against all the data unless he is authorized to do so).
> Not sure what the state the art is in searchable encryption for db indexes, but just trying to do stuff that requires a scan becomes untenable due to having to read and decrypt on the client to find it or aggregate it.
There are a lot of different approaches, but the one CipherSweet uses is actually simple.
First, take the HMAC() of the plaintext (or of some pre-determined transformation of the plaintext), with a static key.
Now, throw away most of it, except a few bits. Store those.
Later, when you want to query your database, perform the same operation on your query.
One of two things will happen:
1. Despite most of the bits being discarded, you will find your plaintext.
2. With overwhelming probability, you will also find some false positives. This will be significantly less than a full table scan (O(log N) vs O(N)). Your library needs to filter those out.
This simple abstraction gives you k-anonymity. The only difficulty is, you need to know how many bits to keep. This is not trivial and requires knowing the shape of your data.
I was reading your reply and started thinking, this sounds a lot like what I did to do encrypted search with Bloom Filters and indexes. I click on the first link and find the exact website I used when researching and building our encrypted search implementation for a health care startup. It worked fabulously well, but it definitely requires a huge amount of insight into your data (and fine-tuning if your data scales larger than your initial assumptions).
That's awesome that AWS has now rolled it into their SDK. I had to custom build it for our Node.JS implementation running w/ AWS's KMS infrastructure.
Are you the author of the paragonie website? The coincidence was startling. If so, I greatly thank you for the resource.
Edit
After going back and re-reading the blog post, looks like you are the author. Again thank you, you were super helpful .
One way I’ve seen (eg, searching by zip code) is to encrypt all possible buckets you would search by (prefixes/suffixes) using a different (search) key, then encrypting the relationship foreign keys. Then the application searches for the encrypted values and decrypts the foreign keys.
This strategy provides only obfuscation, not encryption. If the same plaintext always "encrypts" to the same ciphertext, it becomes possible (sometimes even trivial) for an attacker with access to large amounts of related information (such as the entire database) to use correlations and inference to effectively decipher it.
One-time-pads effectively save you here. The application knows zipcode 1234 == "AWER", but the database doesn't and there isn't any way to derive that without outside information. The technique is a pseudo-anonymization technique, not encryption.
Assuming you want "find all users in zipcode 12345" to be a supported query, it does not matter what encryption scheme you use, you will have one of these two problems:
On the one hand, you can require that 12345 always maps to AWERQ, in which case an attacker can use frequency analysis, metadata chaining, etc. to determine with some confidence that AWERQ = 12345. Calling this "pseudo-anonymization" is definitely more accurate than calling it "encryption", but you might as well just use a one-way hash function instead. It doesn't do anything against determined attackers with prolonged or broad exposure to the data; I don't see the value except perhaps for compliance with poorly thought out or outdated regulations.
On the other hand, you can require that 12345 always maps to a different string every time, but that means you need a different key/salt/IV/nonce for every row or cell, defeating indexing and aggregation, and so all queries become full table scans. This significantly frustrates an attacker, but also significantly frustrates legitimate operations.
With something like zip codes, so long as all the data is encrypted, there's very little chance someone can work out what that zipcode is (and if there isn't any information in the column name, even less). The only way someone could determine that it is a zipcode is by looking at commonality with known decrypted data. Even if they were to determine that it was a zipcode, they would only know which zipcode it was for the users they had decrypted. In other words, the blast radius is very small and compartmentalized, while still allowing searches.
If an attacker has a full data dump, they probably have a recent/frequent queries dump too. Column name obfuscation won't go very far.
Zipcodes are short. If not given extra padding, their ciphertexts will still be short.
Zipcodes are also low cardinality. Unless you use multiple salts/nonces/IVs/keys, the frequency of ciphertexts will match the frequency of plaintexts.
In many situations, a prepared attacker will be able to insert their own data beforehand, allowing them to perform a chosen-plaintext attack and potentially decipher much of the data. The best protection against this is to not reuse salts/nonces/IVs/keys and thus again defeat performant searches.
None of this is to necessarily say it's not worth it but rather to keep in mind the article's point: know your threat model.
I once reviewed a php library (don't remember which one but it was extremely popular) using `mb_strlen($string)` to get bytes to encrypt/decrypt. It was just waiting for someone to come along and en/decrypt with a non-english language.
An issue that wasn’t mentioned: protecting an encryption-at-rest key is not so easy, and the solutions I’ve heard of that are easy to deploy are barely effective against an attacker with physical access to the datacenter.
> Cryptography is a tool for turning a whole swathe of problems into key management problems. Key management problems are way harder than (virtually all) cryptographers think.
>Cryptography is a tool for turning a whole swathe of problems into key management problems. Key management problems are way harder than (virtually all) cryptographers think.
As someone who has issues remembering where they left their house keys at times, I want this on a bloody coffee cup/t-shirt.
Further, every practicing cryptographer everywhere should be forced to keep one on their person at all times, and be required to present it to do normal tasks as a reminder to them of the suffering their work creates for others implementing their cryptosystems. They probably won't care. But it'd make me feel better if they were as annoyed in the everyday as much as I am wrangling this kind of nonsense.
I think it's very valuable for people who might not already now this to see an example of why it's true, as the post does:
> "What’s happening here is simple: The web application has the ability to decrypt different records encrypted with different keys. If you pass records that were encrypted for Alice to the application to decrypt it for Bob, and you’re not authenticating your access patterns, Bob can read Alice’s data by performing this attack."
An interesting question here is whether or not there's a god key that allows the administrator to decrypt all the data even if they can't authenticate as the user (or if they just have copies of all keys). Searching HN for 'lavabits' turns up some results related to this, e.g.
In a way when people talk about encryption what they really mean is authorization to access data.
The actual facts of whether or not data are encrypted using some whiz bang algorithm are irrelevant as long as some intermediary is ensuring that the data are only accessible by the intended clients.
Sometimes I wonder if all the focus on encryption is actually wasting cycles that could be spent instead making sure the authorization model is bulletproof.
For instance if a DBMS could ensure that only client A can access client A's data then does it matter if the data are stored encrypted in the DB?
You might say, well if they aren't encrypted then anyone with root can just read the data directly but it may be the case that anyone with root will be able to access the data regardless of whether it's encrypted because they can just pull it from the memory space of the DB engine.
There are a lot of considerations but it does seem like people get caught up in the "how" of encryption because of all the fancy maths and cool sounding algorithms rather than focusing on the "what" they're actually trying to accomplish which is usually "prevent clients from accessing data they shouldn't be able to access".
Encryption at Rest makes it easy to reason about data hygiene, since access to the data is gated through access to the keys.
You want to delete data? Toss the keys. You want to confidentially process data? Make the keys available to a TEE or such. You want to prevent yourself from having constant access to the data? Let the client provide the keys. And of course, you want to protect the keys? Use an HSM.
In the context of cloud, it's cargo cult security. Something somewhere says "you must have encryption at rest." I find it very hard to believe Amazon's or Google's servers do not already have full disk encryption. So what are you protecting again? If you're storing the decryption keys in KMS in the same cloud, you're not hiding anything from the cloud provider. The only rationale I can think of for doing this is defense-in-depth, but seeing how many companies struggle to even get IAM right I doubt this would help much.
Security compliance and real world security are a universe apart. You have encryption at rest with mandated AES-GCM/SHA512? Cool story bro, some teenagers just broke into your network with a bit of social engineering and a 6 year old CVE.
> I find it very hard to believe Amazon's or Google's servers do not already have full disk encryption.
I am confident that they do. Even better, they can be configured to use your KMS key rather than the service key, and you can configure KMS to use external key stores (i.e., an HSM in your datacenter outside of AWS's control, that you could theoretically pull the plug on at any time).
> So I'm not so sure what's the point of encryption at rest in AWS except just to tick off a compliance and regulatory checklist.
> The private key is with them anyway, just don't encrypt and save few milliwatts of power.
"Them" is Amazon, a company with over 1 million employees, last I checked.
It's perfectly reasonable to trust the KMS team to keep your keys secure, even if you don't trust the RDS team to never try to look at your data.
I know it's tempting to think of all of AWS as a sort of "Dave" who wears multiple hats, but we're talking about a large company. Protecting against other parts of the same company is still a worthwhile and meaningful security control.
> It's perfectly reasonable to trust the KMS team to keep your keys secure, even if you don't trust the RDS team to never try to look at your data.
If the database is live, then the data is able to be decrypted and who knows where it ends up. Encryption at rest solves only the threat scenario where the RDS team has access to the database storage layer. It doesn't do anything to mitigate any threats after it has been read from storage.
As a customer, I don't know neither I do care how they have teamed up internally. Not my problem.
From my perspective, the secret keys I don't have. Just AWS has and they can decrypt whatever and whenever they want maybe because they have a warrant or some three letter agency has them do it.
Real life case study:
We armed our servers (VPSes) to the teeth. Then an attacker gained access to our hosting provider's administration panel. They used that to download our hard disk content.
> Then, if your server’s hard drive grows legs and walks out of the data center, your users’ most sensitive data will remain confidential.
> Unfortunately, for the server-side encryption at rest use case, that’s basically all that Disk Encryption protects against.
If you aren't able to self host, then encryption at rest is a real use case and the next best thing to actually controlling your data. That being said, obviously self hosting with FDE@Rest is the best.
Or you can end up like the people who lost their data [1][2].
> Or you can end up like the people who lost their data [1]
I don't see how encryption at rest could've changed the outcome.
In the article, the cloud provider, which has full control over the VMs, was compromised. The VMs were hosting various Bitcoin services, which needed continuous wallet access for operation. So, I'd say there was no data at rest to be secured. The attackers could theoretically patch the application to make malicious transactions or just extract the wallet from RAM.
Also, the article suggests that the attackers were getting inside the running VMs rather than accessing VM storage directly.
Encryption at rest is something your cloud provider does to pass SOC audits. End of sentence. If you have stronger security concerns, then you need to turn to other tools in your toolbox.
Encryption at rest has several valid use-cases beyond SOC audits. End of sentence.
Edit: Since this has gotten some negative votes, I'll happily expand.
The two primary examples of FDE that are real-world useful (i.e. not just checking boxes) is loss of physical control of a device and cryptographic erasure (at device end-of-life).
Neither of these use-cases in relevant to the threat model the article is discussing, but it's ridiculous to say that FDE is only for SOC.
This is why I said "your cloud provider". If you're handling your own physical devices, yes, YMMV. (For example, FDE on company laptops should obviously be non-negotiable). But expecting it to do anything else is just magical thinking.
Cloud providers don't store stuff in a literal cloud, so it follows that they too must worry about their own physical devices.
If you agree that FDE is good for physical access to lost/stolen devices and cryptographic erasure, I'm not sure why you don't think that applies to hardware in a data center which is just as capable as being lost/stolen, and also needs to be securely disposed of.
>But expecting it to do anything else is just magical thinking.
It certainly does more than just check boxes for SOC, which was my entire point.
The main threat to encrypting health information is actually dishonest cryptographers.
The quality of healthcare privacy reasoning is like, instead of searching for user Michael Knight in a database of cancer patients, you hash the name, use a bloom filter to determine if the hash exists in the protected data set or not, and then tell your regulators and the public that the foreign jurisdiction research assistants hired from fiverr don't have access to your health information because everything is encrypted and we only compare hashes. it's like that "sudo make me a sandwich," cartoon but, "cryptographically disclose your health information."
> The main threat to encrypting health information is actually dishonest cryptographers.
Wow, okay, you have me hooked.
> instead of searching for user Michael Knight in a database of cancer patients, you hash the name, use a bloom filter to determine if the hash exists in the protected data set or not
The protocol you loosely described here could be either totally fine or horrendously broken depending on the implementation details and architecture of the application. Not to mention the universal cryptography concerns; i.e., key management.
> and then tell your regulators and the public that the foreign jurisdiction research assistants hired from fiverr don't have access to your health information because everything is encrypted and we only compare hashes.
You've said "hashes" twice now, so I have to ask:
1. Are they only hashing, or are they using a keyed hash (e.g., HMAC)?
2. Are they doing something ridiculous like using MD5?
If you're familiar with cryptography at all, passing around MD5(data) is vastly different from truncating HMAC-SHA384(data, staticKey) to a few bits and protecting staticKey with an HSM.
Without more detail, I can't tell if your assertion that cryptographers are being dishonest is warranted. It's sort of like simplifying RSA encryption to "multiplying integers". Yes, that's what's happening on some level, but some essential information was omitted.
in your hmac example you have a separate key to manage, whereas in my example, it's implied it's sha256 or a variant that provides a layer of obfuscation in the lookup, and may even implement a "standard" to fool regulators. most regulations and standards say what tools to use (key bits, algos, etc) , and not that the implementations need to be approved by a professional.
my example is that the scheme uses tools in a way that is meaningless because I can just take a phone book, hash the names, and check to see if they are in the cancer database as though it were an cleartext database. to say the data subject's data is private in this case because it is hashed (or often, inaccurately, "encrypted") is to mislead people who make decisions about it. I'm saying the designers of such systems represent themselves as cryptography and security experts who build weakened systems like this to mislead regulators on behalf of their bosses and users who just want the cleartext data.
most protocols are a shell game where if you don't know where the root of trust secret is managed you're the sucker at the table. my experience has been that working cryptographers (protocol designers) in government, health, and financial institutions in general are a very refined class of bullshitters who pound the table whenever you ask them about details, and that one should not be intimidated by their theatrics. Hence in PHI management, dishonest cryptographers working on behalf of anti-privacy interests are the main threat.
Thanks for clarifying, and confirming at least one of my suspicions.
All I can say here is, "Yikes."
If you (or, well, anyone) ever need an honest third party to audit a cryptography design--whether it's because you suspect bullshit or just want assurance that it's actually a good design--talk to the Crypto team at Trail of Bits.
There is a book about that, "Malicious Cryptography" by Adam Young and Moti Yung. Not about dishonest cryptographers per se, but about various sneaky tricks that could be used in crypto systems.
Anyway I'm sure you remember the notorious Dual EC DRBG.
Encryption at rest is misleading. It is often used to describe an insecure situation consisting of disk encryption only instead of encrypted data store values.
> But who is stealing data off of servers by taking the server?
Away from pure bare metal, in the presence of bugs in security separation someone could steal data simply by having a VM on the same server as yours. Encryption at-rest with each VM or service/account having their own keys removes the risk of data accidentally becoming visible in another environment that is sharing the hardware.
Though the chance of some useful data being revealed this way is pretty small on large scale cloud installations, and if you are using a cheap VPS host for sensitive data you might not be valuing your data enough.
There are other paths to the attack he mentioned. Eg you find an API that accepts ciphertext or part of. Or a cloud backup/restore flow. Likely you need another vulnerability but it does happen.
It has happened before that disks have been taken from servers in third-party data centers. Most commonly by the feds, but also by other actors. Do you control the physical security of your server?
It’s not security theatre in the rare case that the stated goals is to protect against physical attacks, or for hardware disposal.
If the goal is to put “military grade encryption” sticker in the sales deck, or to pass some certification, or to create a vague cloud of plausibility that you’re taking security seriously in some other way, then absolutely it’s security theatre. It deserves to be called out.
When someone borgs[1] up their data to store at rsync.net we just assume that we are the threat. Of course I don't believe that but it's perfectly rational and we encourage people to think of rsync.net as a threat as they design their backups.
Comments in this thread are actually discounting the threat of Amazon personnel, GCS personnel, etc., as if that threat was zero.
Not only is it non-zero, I would go further: if you're storing the data on AWS and generating your keys on AWS and managing your keys with AWS ... you're doing it wrong.
[1] https://www.borgbackup.org/