In most cases, encrypting sensitive information like e-mail addresses with a memory-resident key (e.g. injected using tools like Vault) in the application layer is a better strategy, at least if you need asynchronous access to that information (e.g. to send out weekly update e-mails). Most of the data leaks in the past were caused by compromised or misconfigured databases, not by compromised application server code.
Also, within the EU I need to be able to proactively reach my users (e.g. to notify them about a data loss), so only storing hashes of e-mail addresses and hoping users will log in so that I can send them an e-mail won't work.
This kind of encryption-at-rest scheme becomes an absolute necessity when cryptographic secrets have to be stored, such as 2FA TOTP secret keys or recovery codes.
Encrypting the email addresses and any Personally Identifiable Information on your users may also be a good practice, to limit which eyes can actually see the plaintext data (database provider, former developers without rotated credentials, an old backup left over..).
One issue with this though could be the inability to use the encrypted field for queries (eg: select * from users where email = 'foo@bar.com'), but OP's solution of hashing can help here: store the email encrypted, its hash in clear text, and do a query on the hash.
Sure if you encrypt with an application-level key it makes it harder for any adversary to use your data, as he/she will need to not only get access to your data but also obtain the encryption key to do anything with it.
Encrypting data like this is easy and can drastically reduce your attack surface.
This would not work for any serious/useful service: e-mails are not only for marketing, there are many good reasons to send one like (user requested) notifications, invoicing, ... and also screw-ups! If your service had a problem (security, broken data, invoicing again, long downtime, ...), you better contact your users before they find out on hackernews.
Even just a very thin encryption layer would probably do a decent job. Attackers are typically going at it from an infrastructure perspective: they make a hole, poke around for basic configuration info, locate the database, and siphon it out. They may or may not have enough time and knowledge to reverse-engineer a basic column-specific symmetric scheme.
The only drawback is that such scheme must then be made available to any system that consumes the database, possibly from multiple languages.
Not any system that consumes the database, just any system that needs to send email. Unless you are using the encryption for other fields as well, I suppose.
Plaintext email could be stored client side in a cookie and may be submitted to the server when use of the email is required, and if it validates.
If the user logs in and the site is down, a backup system could email them about the issue. This is the backup system, primary systems are down. Please contact support if you need more information. No need to email users who aren't using the system currently about downtime, or in fact no need to email users if they aren't using the system.
Further, if a "password recovery" flow is modified slightly, it can be repurposed for password-less logins by using strong tokens sent to user email, as they request them. A simplified 2FA flow can be established as well, where a token is texted the user after verifying email address. A second layer of security to texting tokens can be achieved using Google Authenticator.
To use such a system, the user will need to be OK with sending their email address each time they need email from the system AND be OK with having their phone handy to login. Of course not every use case requires security, or can be used with this proposed security system.
But how do you contact users if they aren't on the site? What if you have a data breach and need to notify them or need to remove their account because they are inactive and want to give them a heads up.
If your account recovery works by sending an email... which then sets a plaintext email cookie, there's no actual auth, right?
To make this make sense, I think you are assuming but without explicitly stating the use of signed cookies? EDIT: "if it validates", I guess so.
The other bit which is not clear to me is, what is the key in the database to identify ownership of user information?
You need a linking record which looks like hash(email) -> uid (or user record or whatever) which does not seem any better than what is proposed in TFA.
OTOH if no information is stored against the user's email / uid / username then you probably don't need login or auth.
This scheme struggles in the face of email address case folding.
At the protocol level, email addresses are case-folded on the RHS but case-sensitive on the LHS. So it’s crucial that LHS case is preserved by delivery systems. Unfortunately most users then treat them as folded on both. So you can successfully verify one variant, store the downcased hash, and it’ll subsequently match but delivery bounces. Or, hash the exact original input but have many baffled users unable to access their accounts. Neither is a good outcome.
This is not an edge behaviour either, I have tons of users that mix up their email capitalisation from day to day.
No it doesn't, convert the case when generating the hash – think of it as part of the hash function. But leave the case unchanged from whatever the user entered for any steps that involve sending an e-mail to the address.
Isn't this problem orthoganal to storing hashed email addresses? You'll always have access to the email address the user typed in when you want to send a transactional email so you can perform whatever sanitization needs to be done at that point. How does storing the email in plaintext get around issues involving sending emails to case-sensitive mailboxes?
> How does storing the email in plaintext get around issues involving sending emails to case-sensitive mailboxes?
By the validation one performs at initial sign-up.
I’m implicitly saying it’s okay to require that an email address was entered with perfectly matched case at signup, which is validated by a code or link etc, and then be more forgiving about what you receive in all subsequent uses because it’s supportive of ordinary humans trying to use your product.
You could manage that with hashed emails by separately storing the case of letters in the user part, without storing which letters it's referring to. You could then apply that case to the user-provided email during checkout.
(You could probably actually do a pretty good job in most cases just by bruteforcing the possible combinations and seeing which one matches the hash, but the worse-case CPU cost would be bad for long emails.)
This was proposed in another comment as well, and it’s a neat idea that unfortunately becomes fragile in the face of UTF-8 local parts, since the Unicode folding standards change over time.
Fair point, though you could probably store all non-ASCII characters in plaintext and still get most of the benefit. I suspect UTF-8 in emails is rare overall, at least for sites that don't have certain region-limited audiences.
That isn't how email confirmation works, because it's a) enormously prone to false negatives, b) subject to ruinous delays, c) occurring after the transaction i.e. too late, and d) building you a reputation as a bad sender.
This is why all email address confirmation today is asking you to click a link or enter a code.
Also note that the latter strategy is rising in popularity; it is because clickable links, whilst seeming so convenient, are themselves prone to both false positives and negatives, and also increase the likelihood of ending up in junk mail.
But wouldnt the user face this issue all the time if they have a case sensitive mailbox but type their own address in the wrong case? So the assumption may be that a user with a case sensitive mail box is used to typing in the correct address?
No — many accounts aren’t created by the people that use them, and in any case we shouldn’t be relying on correct repeated string input and then blaming the user for a fulfilment process failure due to a typo from hours ago that we silently accepted at the time, and (worse) we can’t even distinguish between a capitalisation error and a discontinued recipient, even if it was previously verified.
As designers/developers, it’s our problem to solve.
Wait a second. Back up. Let's be clear here: the case you are talking about is the user entering their own email address incorrectly, and you're saying we as designers/developers should make a system that knows this and sends it to the correct email? What? Huh? If Person@place.com and person@place.com are two different recipients, how the hell is my application supposed to know which one you actually mean!?
The same way we deal with all email validation. Verify it once at application signup, then rely on the precise verified form.
It is unrealistic to expect end users to get the capitalisation of their email address consistently right. It is realistic to expect it to be done right at signup, since in the best practice case this’ll include a verification loop.
> No — many accounts aren’t created by the people that use them
Can you elaborate on this? I can only think of 2 examples, but neither seem like good ones:
1- someone holds power of attorney over someone else, and register an account (email account?) in their name. But if there's PoA involved, the 2nd person isn't (probably ?) able to manage an email account on their own, so this doesn't seem a meaningful distinction to worry about. (though if it's not an untreatable condition, it's possible that they might resume using their email account themselves, I guess)
2- the account is created by your ISP when you register, and they "helpfully" choose a username for you. So from this point of view, you didn't truly "create the account"
5- parental/grandparental accounts. What’s more, speaking from our own support mailbox, these are the folks most likely to miscapitalize their email address.
Okay... so how is this issue currently handled? Say I create a new email account: lAsZlO@inopinatus.com and the mailbox is case sensitive.
Next I create a youtube account but as E-Mail address I enter laszlo@inopinatus.com and I am told to click the link in the confirmation E-Mail... that I never receive. Well I do not like youtube anyway so I head over to hackernews and create an account to write this comment. Oh no I can not because I typed my E-Mail address in lowercase again. So now I am wondering if there is some error so I open your homepage, search for the support page and file a bug report, typing my E-Mail address in lowercase into the E-Mail field. I never receive your response telling me to write my E-Mail address in the correct casing.
I come to the conclusion that my E-Mail account just does not work and create another one at a provider that is case insensitive. OR I figure out/am told that the case is important and will never forget it again.
So where exactly are services like youtube or hn that ask me for my E-Mail responsible for handling upper/lower case correctly?
The solution could still be: store both the hash for the canonicalized address and the hash of the exact address of which you know that it worked at some point in time. If the user enters the address later the matches the one but not the other issue a warning that should they not receive the E-Mail to double check their spelling.
Actually I do perform case routing & delivery in one of my Fastmail accounts. It’s not the default, but it is possible, if you’re willing to write or generate Sieve, which I am.
A base58-encoded extension part, for segmenting actions due to email (and the replies/responses to/consequences of email) originated by a SaaS platform. Helps with routing of support requests in particular so we don't have to go back to the end-user to say "which <platform organisation> did you mean?" and other similar CRM-ish behaviours. Also allows us to manage bounces, spam complaints, and RTBF assertions by (organisation,end-user) tuple and similar. The discriminator string itself comes from an application subsystem where it was already generated for our state machines. The (minor) downside is outbound delivery sometimes being delayed by greylisting more frequently than otherwise.
Since I've contributed to MTAs & MDAs and built multinational ISP email services in a previous life, I'm confident that every MTA of consequence is case-preserving of envelopes, I'm happy to rely on it. (this is also why I feel on solid ground pointing out the hidden gotchas in various proposed schemes that don't perfectly accommodate the same rule)
If the email provider says that email addresses are case sensitive, then that's the truth you live with, it's not your system, not your design and you can't dictate other systems how they should work.
It depends on configuration. I doubt very many SMTP servers are case sensitive in this day and age. This is not the case on my Postfix servers. Sendmail was also case insensitive in its default configuration (though it has been many years.)
I thought email addresses were case folded on the right hand side and site dependent on the left hand side.
The right hand side is more or less forced by the rules of DNS.
As for the left hand side, if I run the email for a site, can't I decide whether to deliver ABC@ and abc@ both to the same mailbox or to different mailboxes? And can't someone else make a different decision for their site?
If a site administrator does not have the prerogative to decide this, what rule prevents them? (And if there is such a rule, can you rely on it being enforced?)
Of course, but when processing an arbitrary email address, which will almost always be not on your site, you MUST treat the left hand side as case-sensitive (unless you have knowledge about that email domain).
site dependent means that when given an address you must treat the lhs as case sensitive. To do otherwise will mean that you've potentially broken the address and can no longer properly use it.
You could store the hash of the downcased address plus a capitalization mask which tells you which letters to capitalize.
This works from a technical perspective, as letters with ambiguous capitalization (Turkish i, etc) aren't allowed in emails. It's a very minor privacy compromise: if a user has a very rare pattern of capitalization then an attacker with access to the database could identify their account. Negligible compared to the current standard.
It’s a neat idea but unfortunately RFC 6531 opened up the local part to most of UTF-8, so internationalised capitalisation is in the mix now.
Ultimately I’ll never advise delivering to email addresses other than the precise octets of the one already verified, and this means the gold standard is always folding for match and uniqueness, but delivery precisely as verified.
How about this: store the verified email address, but encrypted using the hash of the case-folded input as part of the key. The intention being, you had to have the matching folded form in hand to obtain the verified canonical form. For extra jollies, only decrypt it on the client. (cryptography warning: I write this as the idea comes to me and without any analysis of emergent properties, vulnerabilities etc)
> For extra jollies, only decrypt it on the client.
Uh... How does that work if you need email address to send email to the user? If only the client is able to decrypt the addess, you will basically have to wait until the user connects and gives you the email (which your presumably never store long-term, handling it like you'd handle credit card numbers). That severely limits what you can do with an address.
If you're okay with being technically able to access email address, you'd probably better off with just straight encryption. That solves data leak protections, key rotation, backups, etc.
If you want some magic ID which is known only to the user and your servers will only use it to verify identity, then why not use just passwords with client-provided KDF parameters. Your machines will never know the plain data.
True. But since so many websites (e.g. all aviation companies I've encountered) case-smash the LHS of the email address and can get away with it since all other email software has had to adapt, this is a rather minor concern by now.
You're trying to sound clever but it's not working, I don't think you understand what the logic actually is, if you are comparing those two.
A lot of emails get casefolded no matter what. This is enough of an inconvenience that nobody in their right mind would operate a case sensitive mailbox you would be using for account signups.
A lot of people are still using that part of a regex no matter what - I still run into this regularly (and have been, for two decades now).
"There is some danger that common usage and widespread sloppy coding will establish a de facto standard for e-mail addresses that is more restrictive than the recorded formal standard."
This is EXACTLY it. Both in the local part, and in TLDs - nothing clever about people being too clever by half, and generating false negatives on their input side.
"Funky"? These things are two decades old by now. There have been multiple generations of communication protocols since this became standard - and yet still people consider this some weird aberration, even the 4-letter TLDs.
Indeed, fallback is still necessary - but it doesn't follow "meh, just go back to the 3-letter maximum, because a lot of people still live in 1999."
I have a 3.2 domain on a ccTLD and even that gets shot down regularly enough that I wouldn't consider using it as a primary address. There should be no excuse for that, ccTLDs are older than a good chunk of the people writing the code excluding them, and yet here we are.
Says who? Email addresses are case insensitive. If email software treats emails as case sensitive then it is broken. People have to write email addresses on paper forms, in all caps.
Says RFC 5321 [1]: "The local-part of a mailbox MUST BE treated as case sensitive."
It _does_ recommend receivers treat it as case insensitive for maximum interoperability, so it is de facto insensitive, but something implementing it as case sensitive isn't broken.
It does make it broken. Broken means not working. If your software refuses an email because it's in the wrong case then that software is broken. And quoting out of an RFC is not going to make users stop complaining.
Email addresses are written i a variety of situations where preserving case is not possible. For instance on forms, or over the phone. If the IETF wants to ignore that then that's the IETF's problem, don't make it yours too.
I think it is a fair point -- when a technical standard and/or convention so vastly disagrees with common user perception, perhaps the requirement should be broadened to account for both.
On the flipside, you’re trying to tell me I should be willing to accept a lower standard than I wish to, or that I’m used to, or that has been established for decades, because of some anachronistic bureaucrat, and my response to that is a short expletive.
You, and the paper forms, are incorrect. In fact, on such forms, you should use the proper case for your email address, otherwise you are entering an incorrect address, which may be fraud.
Some things that become difficult if you don't have a verified email address for your users:
- Most common: a user has a support request because they can't get into their account (e.g. you have sign-in-with-Facebook and they lost their account there, or got banned).
- Your authentication partner (again e.g. Facebook) disables your integration for some reason - someone reports your account as abusive (maybe maliciously) and it gets locked, and your attempts to work through Facebook customer support hit a brick wall. If you have email addresses you can at least get your users back into their accounts via a reset-password style flow.
- You have a data breach, and you need to tell your users what happened and what private data of theirs was leaked to an attacker.
- You get a legal threat - a DMCA takedown message for example - and need to pass it on to your users.
- You sell your service to another company and the lawyers involved in the transaction insist on emailing out a terms of service update.
Between 'yes' and 'no' we could still have airgapped or at least segregated systems, where an email address is known, but only to the part of the system responsible for communication.
In larger systems that could be a reasonable way to build things.
Keeping email addresses in the "auth" microservice which has tighter security - blocking security team code reviews, a smaller team who are allowed to modify it for example.
Right? And especially in these event stream architectures we like to use now. We still use outbound mail queues don't we, if for no other reason than to control the blast radius of any bugs in the system.
I don't need to send you a mail, I need to tell the system that handles mail to send you a mail. As long as I spend a small amount of care on avoiding the Confused Deputy scenario (eg, open relay), that would work better and contain much of the PII to a low-traffic (network and, as you say, code delta) system.
This is a clever idea but limited in applicability. It is probably fine for a low security web app or game, but could still leak personal information if the db got hacked.
The problem is that the salt has to be the same for each record and that emails present a limited search space.
Imagine I stole the database for blackmailable-fetish.com. All the emails are hashed with the same salt so I can brute-force the following restricted space:
[top 200 first names][top 1000 surnames][digits from 0 - 999]@[top 5 email providers]
That would probably get me 75% of the emails - let the extortion games begin!
Because it gives the false appearance of security. With this scheme, you always need to act as if the e-mail addresses are plaintext anyway. It should not be used.
True, but I maintain that if you are worried that hackers may steal your email database then a much better approach is to encrypt the emails with an external key.
You could just do that anyway.
I've seen spam email trying to extort people threatening to release indecent images of them (that they don't have. supposedly, they've been captured from the victims selfie cam).
In this case, you don't have to be accurate, you're just trying to call someones bluff, in the small case they've actually done said thing (and in addition believe you can prove it AND that payment will silence them)
Depending on the size of the user database it might be cheap enough to try all random-salts+hashed emails (if it fits in RAM it's probably cheap enough).
Sidenote, but I find this post maddening to understand, because the author seems to be using the word "e-mail" to mean both "e-mail address" and "e-mail message", and then uses ambiguous pronouns to boot:
> In conclusion, if you only use emails for transactional emails, you might be able to only store hashed versions of them.
HUH?
The most obvious way to interpret this sentence is as storing hashed versions of transaction e-mail messages. Which makes no sense and isn't what the author means, but wow this is some confusing writing.
> Earlier this year, when I went from having only Facebook-login [...] to allow registrations with email and password, one of my concerns was how to implement this is a way that protects the data and privacy of my users.
Any privacy effort is laudable. Then again, if you're serious about protecting your users' data and privacy, Facebook login is the elephant in the room.
Fully agree and you can be certain that Facebook does save your E-Mail address.
I use authentication services like auth0 and AWS cognito. The first one I think is completely safe for privacy, the second one is used for convenience (I think the service is good for stuff you host on AWS anyway, although it is generic, so it isn't restricted to that).
But using an auth-service is mostly about deferring risk of breaches to people more proficient in security. That comes with the cost that said auth service can know which services registered users are using.
The author is correct though. While a user that employs such an auth service, it can be good practice to hash the mail-address or even other identifiers for you own DB (you still need that to associate state with a user).
I fully agree, so when I released the update with email/password registration, I also stopped allowing new account creations via Facebook. Now it's only supported for login for legacy reasons, and those users can disconnect their Facebook account after connecting an email.
Why would you go to the trouble of not enabling Facebook login (as long as you provide other logins methods of course)? If someone is using Facebook & Facebook login they clearly don't mind being tracked by Facebook.
Wouldn't the lack of means to contact all of your users, immediately and directly, create other compliance challenges? You would be unable to notify users of a data breech until their next login; former users might be left permanently in the dark. Similarly, being unable to push legally mandated notice of policy updates could be an impossible challenge. I can see how this proposed scheme could work day to day, but you would likely be well served to retain un-hashed emails in cold storage.
> For every transactional email I need to send out - registration, account recovery, and email change verification - the user always initiates this by submitting their email address, and it will at that time be available to the backend to perform the needed action.
This sounds like terrible UX, not to mention email use cases not initiated by the user. I really think you'd be shooting yourself in the foot by setting up a small site with this philosophy because you don't need emails right now
Good points. Though given how many emails have been leaked already, not sure sha256 with fixed salt achieves much. One can build a rainbow table with that salt fairly quickly. You might as well use bcrypt, scrypt and co.
Using something like bcrypt would definitively be better, but considering that the email is the identifier, I would have no way of retrieving the correct hash to check it against, so the salt must be fixed to allow for lookups.
I'm currently using SHA512 with a fixed salt. If someone gains access to only the database and not the salt, the emails are well protected. If someone gains access to both, then it's true that they could build a rainbow table to check if a given email exists in the database. What they _can't_ do is easily use all the emails in the database for spam/phishing/etc.
You cant, really... bcrypt hashes are not consistent... you run bcrypt on the same email, you are going to get two different hashes. You can't search your DB for the matching hash, you would have to iterate through every entry to compare.
When signing up for a service, I always sign up with <name-of-service>@<my-domain.com>, which makes it easy to see who sold my email address and to filter/block by service.
It's a good idea to protect user privacy. One drawback I can think of storing a hashed email is - What if the user forgets the username / email id and wants to know it? (This is a common use case). In such a case you have to collect additional unique data to help the user gain access to their account, but that defeats the original purpose - to protect user privacy.
People often forget usernames but not so often e-mails. The e-mail is usually the primary/only means of identifying users anyway so you're not going to provide it back on request anyway.
Well, first of all, I have 5 email accounts that I regularly use.
One of those uses multiple domains (it's an iCloud account that I registered way back in the iTools days, so it supports @mac.com, @me.com, and @icloud.com....possibly even @itools.com, not sure about that one).
I regularly use a + extension to the addresses that support it when I'm signing up somewhere that I don't 100% trust, and that allows arbitrary additions to the address. And I don't always remember what version of a site's name I will have used (eg, user+hn@example.com, user+hackernews@example.com, user+ycombinator@example.com, etc).
I recognize that I'm an outlier, but trial and error is completely infeasible in this scenario.
You could use a combination of TOTP/google/facebook or some other side channel verification and use that to allow unlimited tries for a certain period of time to allow for more guesses? I'm thinking that for the most part people generally have/keep <5 email addresses (that they use often enough to log in with) so they should be able to just iterate through the list of emails that they've used over their lifetime and figure it out?
I do wonder though -- if the hash secret gets out then I think we're right back to where we started... it would be easy to cross-reference leaded email DBs with the dumped DB and work backwards. I'm now a bit less sure this does much for use privacy against an even slightly motivated opponent (one who would almost certainly have access to at least one dump of previously-exposed emails)...
The more important takeaway from this article for me is that sites should be hashing the Facebook user ID, since it's often far more personally-identifiable information than an email address.
The article points out that this doesn't maintain privacy:
> I discovered that, even though the ID was unique to my FB-app, it was still possible to go to facebook.com/{id} and be redirected to the user’s FB-profile.
So far I haven’t seen a comment point this out or suggest similar, so let’s say that instead of trying to maintain an application level list of email addresses that is used in a breach (or for other reasons), rely on the exercising service (email) which by formerly sending a verification email, has a record of the destination at least in a log, and maybe during registration placed in a “verified member” list, all more or less managed within the mail service.
Article 34.3.a seems to disavow the data processor of such requirement.
The communication to the data subject referred to in paragraph 1 shall not be required if any of the following conditions are met:
(a) the controller has implemented appropriate technical and organisational protection measures, and those measures were applied to the personal data affected by the personal data breach, in particular those that render the personal data unintelligible to any person who is not authorised to access it, such as encryption;
>(c) it would involve disproportionate effort. In such a case, there shall instead be a public communication or similar measure whereby the data subjects are informed in an equally effective manner.
This is the sort of content I come on HN for. It introduced me to a possibility I haven't considered, and it's followed by an interesting debate in the comments. Thank you for sharing.
The most important caveat I can think of is the ability to inform irregular users about something important, either for legal or ethical reasons. For example, my note taking app is shutting down, and users might have important things stored there. I could also message them about a deprecated feature, a change to the ToS, or ironically a data breach.
Nonetheless, it's still a good idea and I'll keep it in mind.
Storing email feels like a no-brainer for a system that needs to send messages to its customers. Some prefer phone numbers, which maybe provide stronger guarantees while being maybe not as long lasting.
As an individual, the issue is that "anon" or "throw-away" emails are not that commoditized.
I heard that "login with Apple" meant to provide an email proxy, hiding your real email, but I have not seen it deployed, except on Reddit. As good as it can be, it’s Apple only.
I can always wildcard on a domain I rent and use klingo@domain as a mean to compartment identifiers but it is not low maintenance.
I use wildcards and they're extremely low maintenance. Literally no maintenance required since I set them up.
Steps to reproduce:
1. register a domain name
2. register that domain with your email service provider of choice (I use Protonmail)
3. create an email address on that domain
4. set that address up as a catch-all address for the domain
5. profit
Lack for a better word I guess. This is what I do for myself and my So, but this isn’t a commodity accessible to anyone directly by their email provider.
Good catch, I missed in the conclusion that they suggested hashing the email if necessary (earlier in the article they mentioned only hashing the identifier).
You're right that could be an issue, but hopefully anyone who registered via Facebook will take care to add an email/password to their account and disconnect Facebook before they delete their Facebook account.
Yeah with data breaches becoming more and more common, I really think it's irresponsible to not have a way of contacting your users. Sure, you could throw a banner up on your website - but the comms should be immediate.
This might be reasonable for a service that doesn't sell anything, or there's absolutely nothing owed to users and users have no reasonable expectation of privacy. But any commercial or professional organisation that doesn't have a method for contacting end-users is either A. Shady as fuck (numbered accounts, darknet-hosted, ignorance by design), or B. irresponsible.
This is a website who's pure purpose is to extract PPC/ad and referral revenue from its users. There's no personal information requested from users, other than "Display Name". This is actually one of few exceptions I think the owner of the website is being more responsible with their user's data by not keeping anything.
However, if they are breached and are serving malware to customers for a week before realising, they will have no way to tell their users they may have been affected. Or what if someone decides to install a backdoor and log the user's email and password when logging in? This is nitpicking and honestly probably 1% of websites hacked in this way actually notify their customers, but it's nevertheless still a hole in the design.
They're also likely capping their earning potential if they do plan to sell the website, as they don't have any delicious user data to sell to marketers.. For which I commend Daniel and Bjorn! Well done.
I don't know, I'm thinking this is great, but also pretty bad. Maybe adding an opt-in for breach notifications would be useful, or having a third-party service to subscribe to breach notifications for the website would be the best of both worlds.
Sounds unnecessary complicated for no real benefit with issues along the road.
And it does feel weird to use Facebook in this example.
If you don't care for an email address, and you are using the login only for maintaining that list, use an permalink. Thats probably easier and better.
One permalink for edit, one permalink for viewing.
We did this slightly differently. For login we stored a hash of the normalized email address (all lowercase, and handling gmail's dots and plusses). For sending emails we had them encrypted in a separate database, which only the mail-sending servers had access to - not the web-facing servers. That way we didn't need to ask for the email address every time, and it was still fairly well protected.
I wonder what a database that supported a moral equivalent of cgroups would look like.
I can't create a record, I can't delete a record, I can't see the email field, but I can change the subscription plan for this user, or change their avatar.
We tend to do table or row level permissions, matrixed with verb. Column level occurs at the application layer, leaving plenty of room underneath for exfiltration.
I admire your dedication to keeping your users data secure, anonymous and private.
> For Wishy.gift I use SHA512 with a fixed salt
Just a FIY in case you don’t know: if you want to allow different accounts with the same email, in case of a data breach it would be obvious by the duplicate hash this has occurred. Salting with a different nonce for every row is not much harder and would protect in that case.
Lets imagine that a@exam.com and b@exam.com have same hash, so you use different salt so that they are different.
How do you know which one is which? Which salt belongs to what email?
On this website, this only happens for actions that require entering your email address anyway.
> For every transactional email I need to send out - registration, account recovery, and email change verification - the user always initiates this by submitting their email, and it will at that time be available to the backend to perform the needed action.
So this only works if you only use the email as a username. No way to notify the user of things like ToS changes, security issues, notifications from the service, etc.
Some services don't have the desire (or need!) to send out generic service notifications.
For ToS changes, it's often enough to display them to the user when they first log in after the change. That's how many mobiel apps handle it already.
The security notification thing could be an issue though. Doesn't seem to be necessary per the GDPA but it's probably a good idea to communicate security breaches.
In technical writing within this industry, “email” is interchangeably used to mean either the protocol, an address or a message, depending on context. “If user confirmed their account by entering a confirmation code received via email or phone, that email or phone number becomes verified” is a routine sentence that will confuse no one.
> “email” is interchangeably used to mean either the protocol, an address or a message
"email" isn't a protocol. SMTP is though, and referring to RFC 5321 ("a specification of the basic protocol for Internet electronic mail transport"), section 2.3.11 [1], we see that: "As used in this specification, an "address" is a character string that identifies a user to whom mail will be sent or a location into which mail will be deposited."
I stand corrected on the former, should’ve written “system” instead of “protocol”.
As a frequent reader of technical blogs and reference documentation, I stand by my point that “email” alone is used to mean email address quite frequently.
Ambiguity is ever-present in human language, I’d say I rarely know precisely what I will read about when clicking a link here. Confusion between “email address” and “email message” is relatively mild, in fact (post about Kafka from not long ago comes to mind).
I am not the OP, and frankly I don’t believe full justification has a place on the web just yet. Hyphenation alone is rarely enough to make it bearable, and I think browsers’ rendering engines don’t do more than that.
I think you can store emails but process them only in an anonymizing publicly auditable proxy, ensuring that downstream business services do not have plaintext access whilst still being able to send outbound emails whenever you want. I wrote about it recently: https://futurice.com/blog/trustworthy-services-from-cloud-pr...
The key is to grant cloudfunctions.functions.sourceCodeGet (Or AWS/Azure equivalent) on the edge so anybody can verify that your proxy is above board. End users just have to trust the Cloud Providers access controls, not the service providers word on implementation.
I think the others are missing the fact that if you use the same salt for every row, it's less secure. So you'll be storing the email more securely, but still not as securely as you should be storing the password.
To do it any more securely would require pulling up every single record for its salt, and hashing the login with that salt and checking it. It's virtually impossible at any real scale.
It's pretty scalable. 10 billion email addresses times 16+32+4 bytes of salt, SHA512/256, and ID is 520GB of RAM; available in a single (big) machine and searchable in under a second with a few cores.
... or if you do, make sure they are deleted pretty quickly afterwards (with logrotate or something comparable).
Having some logs in invaluable for debugging, but for example keeping them only for 14 days could be an acceptable compromise between debugability and privacy.
One thing worth noting is that often, you don't even need to store passwords.
If a user wishes to log in, you send them a link/code by email. That increases security dramatically, as most email services already have some more advanced protections built-in. You also don't have to worry about leaks that much, as there are just no passwords to be leaked.
Please don't do that. It makes me very angry when I'm trying to quickly get some information from a public/shared device and instead of getting logged in, I get a link on my device. (which I need to retype / send / ...) Most likely I don't care about the security of the account that much in that case. But my email login never goes anywhere close a device I don't own.
That might be in breach of the GDPR. In the event of a personal data breach, you need to tell the data subjects about the breach [0]. You can’t just put a notice on your page, since someone might not be using your service any more, but you still have their data. And GDPR aside, it is very short-sighted to assume you will never ever need to e-mail users on your own.
You comment was dead, I vouched for it as it's a valid assumption that you have to notice users on breach. But in this case, it seems like the following applies (34.3.A from your link):
> The communication to the data subject referred to in paragraph 1 shall not be required if any of the following conditions are met:
> the controller has implemented appropriate technical and organisational protection measures, and those measures were applied to the personal data affected by the personal data breach, in particular those that render the personal data unintelligible to any person who is not authorised to access it, such as encryption;
With emphasis on the "in particular those that render the personal data unintelligible". Since the email is no longer an email, it should not be counted as personal data, it's just random characters, and no notification needed.
Also, within the EU I need to be able to proactively reach my users (e.g. to notify them about a data loss), so only storing hashes of e-mail addresses and hoping users will log in so that I can send them an e-mail won't work.