I get he's ranting about imprecise terminology, but really, what does this knowledge gain the end user? Maybe I'm missing the point. Users don't care if their password was hashed or encrypted or stored in an ice freezer in Antarctica.
An end user doesn't have any control or knowledge over the password storage mechanism for any sites, so the best thing is to use a strong random password generating password manager--because that is something the end user can control.
I’ve seen a horror story in the form of “[HELP] sysadmin wont give encryption keys” post to a forum. Mid-2000s, non-English so I’m paraphrasing.
The sysadmin resigned, no tech person anymore. The boss got a lost password inquiry from a customer, panicked, called former admin for help, not a single word he understood, hit the forum.
And in the forum post he said the admin “wouldn’t cooperate”, “very adamant and confrontational” about “giving out encryption keys” to “decrypt valuable customer database we can’t afford to rebuild but might have to” and how it’s driving the company towards bankruptcy over last few days etc etc.
Eventually forum members realized he’s talking about a password reset, and convinced him it’s actually impossible to recover hashed password nor it’s necessary which is the point, you just verify identity and let them change the password etc etc, but it took hours for the community, on top of a day and half or so of tense hours for the boss guy’s company, to just understand the situation and act accordingly.
So I think it might be worth communicating how passwords work every once in a while.
"Our former system administrator is extremely uncooperative and very confrontational" or "we need to decrypt valuable customer data that we can't afford to rebuild and this is driving us towards bankruptcy" are the kind of words that make most educated people very much unwilling to help, no matter how much money you promise them ;).
It's not a matter of principle, it's a matter of headaches. It's very unlikely that a customer like that would pay enough to make it worthwhile.
There’s plenty of people willing to work at any price point. Just because a $200/hr consultant dismisses the request, a $10/hr one can probably still figure it out.
However if the alternative is bankruptcy I think there IS money to be spent on the solution.
That is also assuming you have the time, the resources, and the general knowledge to pick out a great contractor whom you can trust with the security and safety of a critical system.
^ This. When someone approaches you saying things like "our former sysadmin is confrontational and unhelpful", you don't just have to solve whatever problem the guy who ragequit can't solve. You also have to undo the damage made by the asshole who's contacting you after they tried to solve it themselves, and sometimes by the one or two unqualified people they tried to hire afterwards.
You know how employers are skeptical to hire people who start the interview by saying how awful their last/current workplace was/is? Same here. I've definitely met hundreds of developers or sysadmins that I wouldn't trust, and whose services I wouldn't recommend. I've definitely been wrong and hired some of them myself. But I would never say something like that in public. At the end of the day, who's the smartass that hired these people in the first place? I get it's frustrating but such is life, you still have to be professional about it.
Plus, 200 USD/hour for a quick fix looks like a great deal, but even if the fix itself takes one hour, it's rarely a one-hour job with customers like these. You routinely end up invoicing for less time than you spend deflecting their attempts to negotiate the bill, explaining why it took one hour and itemizing it, and getting them to pay the stupid invoice.
My favorite is when the website accepts your long random password but then the login fails. Because the set password function truncated the password before hashing it but the login function doesn't do that.
Indian NPS, equivalent of personal pension account, government operated, website forces you to change password every 90 days. Accepts any length for new password. Silently truncates it & then hash & store. Now, you are locked out because you dont know what was the actual truncated password they stored.
Happened to me two times, changed password, locked out, had to reset. then while logging in, I noticed their requirement of 12 characters, at next reset used 12 characters, & did not get locked out.
Yes, but that would be OK if the login form truncated the password in the same way as the password change form. They'd both arrive at the same hash. Which is what the comment above you said. What you describe probably means the password change form actually did run the hash on the full length, while the login form truncated it.
I never said it's OK, I just explained what's happening.
Ah, now I understand why I got those downvotes. What I meant by "OK" was that it would "work". It wouldn't exhibit the behavior my parent comment was describing. Not that it would be secure or good practice.
This still happens, in 2020, with breathtaking regularity. Several times a year, a site won't accept my password, so I then start the whole "Will 20 characters work? No, that didn't work. Will 16? Nope... 15? Nope... 12? Nope. Oops I skipped 14... AH YES FINALLY."
One of the hats I wear these days is looking through source code for security vulnerabilities. It's shocking how many SQL injection vulnerabilities I find in newly written code. I remember first reading about SQL injection back in the 90s and yet developers are still making that very basic mistake. It is also a bit scary how many of the code analysist tools miss intentional flaws I've added to the code to test the scanner. These aren't ancient lint checkers, they're ridiculously priced enterprise tools. It worries me that they're giving a false sense of security that is going to get lots of implementations burned.
>These aren't ancient lint checkers, they're ridiculously priced enterprise tools. It worries me that they're giving a false sense of security that is going to get lots of implementations burned.
But no one is going to get fired or held back for choosing those 'enterprise' tools. They've spent a lot of money, everyone has a checklist, and people move on, even if there's a breach. And some jr might have pointed out "hey, this open source toolset does this, but is kept up to date and has found 47 things our scanner didn't find" and this jr will likely be ignored.
“Well with open source tools, you pay with time and we prefer this gold-plated, Cadillac, enterprise-grade tool. We’re a serious company now, after all.”
I'd argue it's bad API design. You can't make an API that's easy to use wrong and insecurely and then be surprised that people use it wrong and insecurely.
I can only half agree with you on that. Yes, I also dislike APIs that make wrong or unsafe use easy and correct use more bothersome but seemingly no different in behaviour (until it goes BOOM), but I also find that soooo many people simply don't have the awareness that they are interfacing with another system that interprets their data in a potentially unsafe way. And these people will misuse any API like this.
Unfortunately short of forcing everyone to use an ORM I don't see how we can block the unsafe API, which I'm assuming to be the string-based query interface e.g. `conn.query("SELECT * FROM users")` since any interface that accepts a string will allow a dynamically constructed string which lets developers open themselves to injection attacks. Only ORMs AFAIK can prevent this, e.g. db().users.all() or db().users.select(name="bob").
It'd be nice if the languages offered a way for the query-compiling function to require that the query strings given to it are static, compile-time strings.
Because it's ridiculous to decide that the bottom of the decision graph should be held responsible for the poor decisions made from above.
Engineers should only be held accountable for decisions they made personally. The unfortunate reality is that, a terrifying amount of the time, terrible decisions are handed to engineering teams as required implementation details from managers, executives, product directors, etc. So should engineers be held criminally accountable for their product manager demanding MD5 hashes on passwords?
That sound you hear is millions of actual engineers, ones with qualifications/certifications/professional obligations/legal liability (think “civil”, “mechanical”, “electrical”, et al.) all rolling their eyes yet again at the code monkeys who call themselves “Engineers”, yet aren’t prepared to tell their boss that the decisions they’re trying to make, with inadequate training and experience, are dangerous...
Do you think the guy signing off on a bridge or power station design gets pushed around by “product managers” demanding shitty or cheapskate design or construction?
You’re welcome to disagree about calling them software engineers, but I’m not seeing how that is relevant or constructive to the whole point of this post and comment chain?
The point GP is making that civil engineers will not (knowingly) design an unsafe product just because they got something in a specification that led them to do so. They’ll object and refuse.
You were arguing that the specs or decisions made above their pay grade forced the software engineers into a dangerously faulty design which they dutifully created and shipped.
It's a bit circular, though, isn't it? It is in part due to the fact that a mechanical engineer _has_ that legal certification that they have the standing to meaningfully object, and actually be listened to by their PHB.
An engineer should be willing to tell his boss "No, because I would go to jail" If the manager insists anyway, the engineer should be obliged to refuse, even if that costs them their job.
Of course management should also be held accountable, but until engineers are forced to have some skin in the game they will continue to be as pliable as wet noodles.
Consider this: who better to blow the whistle on management than an engineer who knows they've been given an illegal order?
I have never seen an engineer who understood the issues be 'as pliable as wet noodles'.
What you're asking is for an engineer to be stuck in a place of legal culpability if they do the work, and to be fired if they don't. Added bonus: you mention whistle blowing, but who the hell would they whistle blow to?
If there's a proper industry or governmental group to blow the whistle to, that will also see the engineer financially compensated until such a time as they find a new job, then fine, it's fair to make engineers culpable. Otherwise, you're just making engineers suffer for the bad decisions of those made above them, by making them either legally liable, or risking their jobs.
Any employee who refuses an illegal order is risking their job. If your boss asked you to perform brain surgery and promised to fire you if you refused, would you give it your best try or would you tell him to fuck off? If software ''engineers'' want respect they need to grow a spine and learn how to say that simple two letter word.
Insecure websites aren't illegal. Only -breaches- are. And that only a civil offense, leading to a fine, the company pays. Right now, both the punishment and the decision are aligned.
You're looking to now make it so that there's a punishment levied on the developer, who has no more power to say no than they currently do. You want them to say no, but all you're doing is making it more unpleasant for them to not do so; you've done nothing to make it easier to do so.
I agree with the GP, but I also think it is poorly phrased. It's not that the engineers should be willing to say they'd go to jail, it's that they should be able to.
Right now, the conversation goes, management: "I want x". Engineer: "X is insecure, we should do y instead". M: "Y will cost us X more than x, and it's never going to matter for us, anyway."
This puts the engineer in a position where they need to argue and justify the cost. Compared to "Sorry, I can't do that; it's illegal and I'd go to jail if I did that and was found out." Now the engineer doesn't have to justify anything. The law isn't a burden on the engineer here, it's a shield.
Yes, there are still some scenarios in which management insists. In my limited experience, that's in gray scenarioa where it's arguable whether the law applies. But the point is, it's much easier for an engineer to argue whether the law applies, than whether it is worth the money.
I think you can see these effects in the lengths companies go to protect healthcare data (hipaa) vs any other random personal data.
So this is a reasonable argument, but before we get there, there should be real responsibility placed on the company. Currently, when use data is lost, the companies suffer absolutely no consequences. They have literally no incentive to make sure their system is secure.
The equasion should be, for management: "Y will cost us X more than x, but if X is hacked we will get taken to the cleaners"
For the actual engineer to be held responsible you would have to add a formalised approval process, so that its clear who signed what off.
Inagine you signed off on something, and then changes were made without your knowledge - that much easier to do with software than a bridge.
I guess my point is not that engineers should be exempt from being held accountable for their work, but that engineers are frequently asked to do things incorrectly/poorly/negligently and then assessed by their employers for their willingness to comply. Sure, you can say "Just stand up to your employer", but that's an incredibly dismissive stance on a complicated issue. Yes, you should say no to requirements that are flat out illegal, but is it ever that cut and dry? I'd be surprised to find it was ever that simple.
According to my engineering professional association, it is that simple. They'd have no qualms stripping me of my license to practice engineering for knowingly approving an unsafe design, regardless of what effect that decision would have had on my financial well-being.
The big difference is that companies need professional engineers. Professional approval on certain things is required by law. I'm not sure that would be a good idea for software, but that is what makes the system work for professional engineers.
> They'd have no qualms stripping me of my license to practice engineering for knowingly approving an unsafe design
Presumably there would be repercussions at the government level if a company repeatedly demanded engineers do things worthy of stripping their licenses though, no?
This varies by jurisdiction, but perhaps I oversimplified. There is another reason that doesn't happen.
The individual engineer doing the work needs a license, but the company itself also needs a permit to practice. The permit must be held by an engineer, who is personally responsible for the engineering that occurs under their permit.
So, the permit holder needs to worry not just about their own ethical behaviour, but that of all engineers in the company. They are incentivized to ensure the company will hold the public safety paramount, or to walk away if they cannot (thereby leaving the company without a permit).
If the company has a pattern of misbehaviour, it may be difficult to obtain a permit.
_Exactly_. Tech companies regularly exist purely based off of illegal business models (they call them _Disruptions_).
So, yes, we agree, if there are repercussions for a company regularly breaking the law, then engineers can and should refuse work that has negative legal or moral repercussions. But in the world of tech, that's not the case.
Wait a minute, I thought software engineers were in high demand, with companies fighting over them and opportunities everywhere! At least that's what one out of every 10 HN articles tells me. Surely that demand gives them a little power and agency over their work. I think we are pretending here that these developers have only one option: "Sure, boss! Whatever you say, boss!"
I personally believe that you, the software developer typing in the code, should hold yourself personally accountable for what you are typing in. You might also be designing what you type in, or even setting the requirements, but it might be other people. Regardless, you are making the software come into being--you're the one coding it and pushing it to the repo, so you should set the standard of what is acceptable. This "well, boss told me to do it!" rationalization and blame-shifting is how we get dangerous and unethical software.
And, yes, I have quit software jobs where I was asked to write software I considered ethically questionable, and failed to change the boss's mind.
> I think we are pretending here that these developers have only one option: "Sure, boss! Whatever you say, boss!"
No, I am suggesting that there is significantly more grey area between your moral highground and reality.
> I personally believe that you, the software developer typing in the code, should hold yourself personally accountable for what you are typing in
Yep.
> This "well, boss told me to do it!" rationalization and blame-shifting is how we get dangerous and unethical software.
It really is a strange world that, when corporations are attempting to turn profit on illegal behavior, it's the meaningless bodies-in-seats that we're trying to hold accountable.
I am, and have been, repeatedly, suggesting that the lowest level cannot be held accountable without holding the rest of the levels accountable. Jailing engineers for doing things their companies demanded of them is ridiculous if you're not also jailing those doing the demanding. I'm kind of shocked this isn't painfully obvious.
> And, yes, I have quit software jobs where I was asked to write software I considered ethically questionable, and failed to change the boss's mind.
Congratulations, that's a level of privilege lots can't afford.
Better if both would face repercussions, since if only the boss has skin in the game passive introverted engineers will silently follow orders, knowing that they're not personally risking much if they comply but risk losing their jobs if they don't.
My understanding is bcrypt truncates at 72 bytes by default, which could be 18 emojis I guess?
It should still be consistent with registration/login with things getting truncated, but I think this is also the default in PHP if you are using password_hash() today. Is that a security issue?
> Although implementing a maximum password length does reduce the possible keyspace for passwords, a limit of 64 characters still leaves a key space of at least 2^420, which is completely infeasible for an attacker to break. As such, it does not represent a meaningful reduction in security.
Oh this... I had something weirdly different recently. On a digital ocean account, all of a sudden I couldn't log in. Apparently the login page was updated and said my email adres was invalid. Which is weird cause I definitely logged in before. I used the + trick with Gmail, so I thought this was the issue, icw. new login page. Wasn't the case. Apparently my DNS service which also does filtering, blocked part of their CDN, also part of the email validation javascript. Turned that off, could log in again.
PayPal has this problem but only in certain change password forms. I think it was a password reset form. this happened to me recently. It was incredibly frustrating.
I can't believe sites think it's acceptable to do that. I get it, algorithms like Bcrypt have a size limit. But there are reasonably secure ways to get around that, for instance using HMAC to Sha256 the password before Bcrypting it.
Or better yet, pick one of the other algos with better limits and protection like Argon2 or Scrypt.
I had this happen recently with a finance website! Although technically the reverse, it silently stopped logging me in because the password field was changed to use HTML validation to enforce a max length of N, but they had previously accepted my password that was length N+1. Maddening.
I think FF recently deployed a change so that it no longer silently truncates inputs that are too long. It won’t solve the problem in all cases but hopefully in some.
The reason this is a problem even for new applications [1] is that it's the result of a leaky abstraction in common password hashing libraries which forces the site operator into a tradeoff between login service reliability/consistency and security.
Iterated password hashing algorithms like bcrypt [2] use a parameter tuneable by the site operator that drastically varies the computational cost of calculating the hash, so a brute force attack on the hash could be required to use orders of magnitude more computational power to crack. The tradeoff is that it will make users wait longer to log in, and (might) incur additional operational cost to the site. This is why it's a tuneable parameter: Up increases security at the cost of user delay and operations.
In theory. In practice, the user delay and operational cost are also proportional to the user's password length, which you don't control. Observe, the leak. This can cause wildly varying response times for authentication attempts, and opens your authentication system up to a DoS attack. The only reasonable solution is to put a small cap on password length so that the longest passwords don't take more than 2-5 times as long to compute, as per each site's tolerance of variance.
So why does the runtime of the iterated hash function depend on the length of the password? In each iteration the user's password is joined to the previous iteration's hash, and hashed together to get this iteration's hash. The runtime of any hash function is proportional to the length of the data fed in, the data fed in includes the user's password, thus the runtime of an iteration is proportional to the length of the user's password. So the runtime of the whole iterated password hash is `O(Iterations * PasswordLength)`.
There are various schemes one could come up with to change the time to `O(Iterations + PasswordLength)` ~ `O(Iterations)`, such as concatenating the first hash of the user's password instead of the password itself on all successive iterations, so all iterations but the first are independent of the user-chosen password length. There could be some security/entropy-based arguments for avoiding this solution, though I don't know what that could be.
It's fine to truncate the password (though ideally you don't do it at a super short length).
The issue you are replying to is referring to when a site doesn't do it for both account creation and login. Which is a bug. A stupid, hard to detect, hard to explain, likely never to be fixed, bug. That only affects people trying to be secure.
I get that different parts of a service truncating the password to different lengths is a problem, and that it's a different (perhaps worse) problem than the one that causes site operators to limit password lengths.
Worst think about 1password, and lastpass when I uses it, it doesn't let you pick what special characters it uses, despite it being such a common thing on websites. So you have to manually add them, or swap out thing.
Enpass’ password generator has a field were you can enter characters not allowed.
Regretfully Enpass doesn’t store this field nor the rest of the complexity rule as part of the password entry. The next time when password has to be changed you have to figure out the underlying complexity rule again.
The old “abchkkunenukzimejienejsidmdjiwknevgjk bgiknhhhnnisplwkslandhgabsndmskalpaapowhsoslxiaiapjsbsnsnaja” is not secure, but “P@55w0rd” is super duper secure.
in our app we have a requirement that is similar.. I kicked and screamed and sent them spec documents from the NIST.. no one cared.. we have a max length of 10 chars... that SERIOUSLY hurts... 8 and 10 chars are our current requirements... plus some combination of numbers and special chars... WTFBBQ !!!!111...
I mean, if you insist on a 10-char maximum, then mandating symbols to increase the search space is a good idea, right? (Granted, that doesn't make a 10-char max sane)
Allowing symbols increases the search space, but requiring them reduces it.
And in practice this effect can be exaggerated when people don't use random passwords but must actually choose a password they'll remember - because what they'll actually do is choose something easy and then shove a symbol in there to meet your requirement. You may well allow 30+ different symbols, or even more, but the users will invariably pick one of a dozen or so that were easiest to reach on their keyboard and they may learn to be shy of characters that sometimes "don't work" such as quote marks and any local currency symbol even if those are easy to type.
Also, the chance that a site will refuse a strong random password is directly correlated to the importance of the account. Case and point, the appointment site for my barber happily takes a 30+ character randomly generated password. My old bank would not allow you to have a password longer than 12 characters and only recognized 4 special characters. That is why they are my old bank.
This is less annoying than sites that accepts your strong password, except their backend actually couldn't handle it. I love being locked out immediately on account creation.
If a person ever even considers writing code which could generate either of these error messages:
"Your password is too long!"
"Your password uses special characters!"
... they are not only incompetent to write code which handles passwords, they have been so misinformed that they are an outright liability. They need to relearn everything they currently know on the subject and start over. Few things in the technical world instantly convey such utter anti-knowledge as the presence of either or both of those error messages in a codebase.
Some passwords are too long - I wouldn’t expect any website to accept a gigabyte long password - and I wouldn’t judge a site that doesn’t accepts \0 either.
I don't remember the math on hashing/bcrypt but isn't this the case that all passwords sort of hash to a fixed length string? Like why even have something like "your bank password must be 8-12 characters" long.
Obviously for a gigabyte long it's a bandwidth and hash-computing issue :p
> Obviously for a gigabyte long it's a bandwidth and hash-computing issue :p
Yes, that’s why you put in limits which are way beyond reasonable passwords but way below that. Say a few hundred or thousand bytes.
Also worth consideration: most of these work on bytes, probably utf8. A user wants to be cute and put emoji in there, that’s 4 bytes a pop. So depending how the system counts them, “hospital plane” might be considered 2, 4 or 8 characters.
But wait! Group emoji are concatenation combinations thereof, you can have a single multi-character emoji which is composed of half a dozen codepoints, and two dozen bytes once encoded.
If that’s true, can you please share a link to your website so that I can stop using Dropbox and migrate my encoded data to be stored in your password field?
Hrm, yes, good point. So my snarky comment loses its charm, if it ever had any. Still, though, I think it’s reasonable to alert the user if your password exceeds its allocated storage rather than silently truncate.
bcrypt has a maximum length of 72 characters and its what Mozilla recommends for encryption. Do you think Mozilla is too "incompetent to write code which handles passwords" and "an outright liability"?
I think the idea behind the article was to explain why people are encouraged to stop using passwords from Site A if Site A gets 'hacked,' even on Sites B, C, D, etc. It's not just that encryption was broken on one site. It's that now that password has been matched up to a hash, so its strength is kaput.
I do think the message got a little lost in the post though, with the follow-on discussions of salting, bcrypt vs. md5, etc.
But hasn't the password + salt been matched up, not just the password? Or is his point that you don't know, so you need to treat it like the password is compromised (which I agree with). But the password is compromised either way (hash or encryption). I'm confused, maybe I need to go back and read it again.
Compromising the hash and salt, since they must be stored close together, makes it possible to identify if the salted hash is a password in a corpus of previously compromised passwords. An attacker can do Hash(PW, Salt) for all PW in a list of leaked/cracked plaintext passwords. If they've guessed your password and it's shared across multiple services, lateral compromise. Salting only prevents the rainbow table attacks, where an attacker precomputes all possible hash values for a known keyspace (like, say, 8 alphanumeric length passwords) and just look up for a match. Encryption is concerning because it necessitates the ability to decrypt since they're often inverse operations of each other, and presumably there's a shared key stored somewhere to do the comparison, which means it's likely trivial to recover the password compared to hash cracking and undermines any strength or complexity benefits. This also likely points to other bad behaviors utilizing this "feature", such as helpfully emailing you your plaintext password when you forget it.
... we can build a function that takes the output from hash(password) to deterministically create a new candidate password, let's call this function pass(hash), and then chain the hash and our new function together as many times as we want. This lets us store much less data, while doing more work during our look-up phase.
Now if I find a hash 92fe87 in a password hash file, I do not learn that the password was jimmy, instead I need to compute pass(hash(jimmy)) and that's the password I was looking for. And if I find 39a4e6 which isn't in my list, I calculate hash(pass(39a4e6)) and discover that's 213eea, then I look this up in the table and I discover the password I need was 12345678. Obviously real Rainbow Tables don't just run the hash twice like this, but instead some fixed number of times chosen by the creator to trade off less space versus more work to find a password.
I should actually fix this. What I've described above is basic "chaining", but Rainbow Tables are a further improvement still by Philippe Oechslin. The additional insight in Rainbow Tables is that we can reduce collisions in our hash-pass-hash-pass back and forth if we modify that pass function so that its behaviour varies by depth, this way if a collision occurs but at different depths in different chains (e.g. maybe the chain starting with password "password" hashes immediately to 5f4dcc but in another chain the value 5f4dcc is found for the password "j58X_m04" after six steps) the next call to pass() will diverge again, so the collision only wastes a small fraction of our precomputation effort. If the collision does happen at the same place in the chain, the final hash output will be identical to another chain, so it's easy to discover this problem and apply whichever mitigation seems appropriate.
Interesting, I haven't worked with rainbow tables very much since by the time I got into the world of hash cracking it had either been deprecated by salting or wasn't relevant (i.e. NTLM). That is a clever trick of trading back some of the space for extra time; I remember some of the rainbow table file sizes being ridiculous to the point of almost unusable haha.
If one uses bcrypt for hashing passwords, as currently the best practice recommends, building basically a salted rainbow table becomes rather expensive, too. Not impossible, since the amortized cost for many common passwords is relatively low, but still sort of expensive.
Ideally a machine that generates and checks the hashes should be a box without a NIC, connected to the rest of the servers via a bunch of RS-232 ports. This would make extracting the salt much harder, down to effectively impossible. Few orgs can afford such a setup, though, due to the hassle of administering it.
> Not impossible, since the amortized cost for many common passwords is relatively low, but still sort of expensive.
This statement seems like it gravely underplays the numbers.
Traditional Unix crypt uses a 12-bit salt. So this means your precomputation (whether a Rainbow Table or not) is 4096 times more expensive. That's just about plausible though already uncomfortable ("Sorry boss, I know you said the budget was $10 but I actually spent forty thousand dollars").
But bcrypt uses a 128-bit salt. So now your precomputation is so much more expensive that if the equivalent ordinary brute force attack on a single password cost 1¢ and took one second on one machine, you'd spend a billion dollars per second, over a billion seconds, on each of a billion machines, and still not even have scratched the surface of the extra work you've incurred to do your precomputation.
Rainbow Tables are a precomputation "time-space tradeoff" attack. You do a bunch of preparatory work which is amortized over multiple attacks and results in needing space to store all your pre-computed data. This is nice for two reasons:
1. You get to do all the hard work before your attack, leaving less time between the attack and your successful acquisition of the passwords compared to work that's necessarily done after stealing the credential database.
2. You can re-use this work in other attacks
But if you're waiting until you know the salt you don't get either of these advantages, so Rainbow Tables are irrelevant.
It's like if somebody mentions the F-14 fighter jet in a discussion about the fastest way to get from Times Square to Trump Tower. Yes the F-14 fighter jet is a fast aeroplane, but it can't go to either of those places so it isn't relevant whereas Usain Bolt is a very fast human so he really could run from one to the other.
It does make logins on other machines difficult. You may open up your password manager's web interface on your friends laptop, but then you opened up all your passwords to potential malware,including banks etc. When you just wanted to log in to Reddit, where you have an account you somewhat care about: you spent years under that username, but its not a huge catastrophe if it's taken away, unlike, say, your domain registrations, your email password, your e-government logins, banks etc. The "all eggs in one basket" still bothers me.
We haven’t lived through a big password manager leak yet. But a password manager gives the list of usernames, passwords and websites. This kind of compromission is probably worth a million dollars per user. That’s a lot of data in the same vault. Just saying.
Mist people arent worth a million dollars, that downright silly.
But i do agree that centralised storage of everyone's passwords does not make sence.
I've decided to use KeePass and store the encrypted password file in cloud storage - that way it syncs between my phone and conouters, and i can always get at it if i really need to. But it does make logging into reddit on someone else's oc very difficult.
The knowledge it gains the end user is the confidence that the service I am using actually knows what they are doing. I have often found that if someone uses imprecise terminology, it is a symptom of a bigger issue.
> I have often found that if someone uses imprecise terminology, it is a symptom of a bigger issue.
What a spot-on observation. Maybe this is why I am always so pedantic, even when I tell myself that it's not necessary in said context. Subconsciously, I'm implementing this connection, and I can't turn it off.
I've found it enormously valuable at the beginning of projects to hash out (erm...) a glossary so we know that everybody has the same understanding of the terms we use.
As an end user who uses Chrome and who's mum does too it means you can tell her to use Chrome's suggested secure password rather than 'P@ssword' or similar.
And my mum who's not really techie does care because she gets emails from Yahoo and the like saying sorry we've been hacked yet again please reset your passwords and gets in a panic over it.
I care as non technical person to know the difference between encryption and hashing. And to know why easy passwords are useless. This was an extremely informative article.
> I get he's ranting about imprecise terminology, but really, what does this knowledge gain the end user?
Reminds me of an experience I had in college. I was working phones over the summer at a call center and had this software developer call in. This was during the runup to Y2K, and I mentioned the "Y2K bug" and he starts ranting about how it's not a bug.
And I'm like, yes I understand it's not technically a bug, but most non-technical people don't know the difference.
I've never understood why people are like this with non-technical folks. If you're having a technical discussion with another technical person, then yes, the distinction absolutely matters. But to non-technical people? Who exactly benefits from that?
I agree, to an extent. Yes, the average person isn’t going to care that, for example, (original) Doom and Marathon are 2.5D engines rather than 3D engines, nor the technical differences between CDs, DVDs, or Blu-ray.
On the other hand, the general water-level of knowledge is frustratingly low at times: I’ve received customer support which insisted that they couldn’t register proof of identity to activate a SIM card for an iPhone, only for Android; a salesperson saying iPhones couldn’t be used in those cheap strap-a-phone-to-your-face 3D headsets; and a previous boss had to tell off one of their own customer support team for saying their product didn’t support Firefox when the only browser installed on the customer support team’s computers was Firefox.
In fact hashing the password is better than encrypting them!
If you encrypt a password, it means that somewhere you have a key that you use to decrypt it to check if it's valid on the user login. It means that there is a way that you (or more importantly an attacker) can use to decrypt the passwords.
Instead if you use a good hashing algorithm is practically impossible to find the password given the hash. Yes if the password is really simple you can get it, but come on, if the password is really simple what's the point of protecting it?
By the way I think that we should phase out password anyway, I mean that I prefer to implement in the applications that I develop a password-less authentication: when you want to sign in a mail (or an SMS) is sent to you, you click on a link with a temporary token and you are authenticated.
No password to remember, not having to implement forgot password, change password, recover your password, not having to store the password, not having the user have to choose a password, and I hate choosing password (in fact I ended up using a password manager that generates random passwords for me, but it's not the ideal solution, because then password have to be synced on all my devices, not all websites/apps have forms made correctly to support password manager, and the password manager extension (Bitwarden) goes in conflict with the integrated Firefox password manager so I end up having password saved in the password manager and other in Firefox and it's a mess).
By the way I think that we should phase out password anyway, I mean that I prefer to implement in the applications that I develop a password-less authentication: when you want to sign in a mail (or an SMS) is sent to you, you click on a link with a temporary token and you are authenticated.
Please don’t do this. For one thing, SMS is fundamentally broken as a secure delivery method. But more than that… it’s just so, so deeply annoying.
I worked in an office that had no mobile signal. So for me a 2fa SMS involved walking out of the office and down the road for a few minutes until the SMS came through and then running back to my desk in the hope that I get there in time to enter the code before it times out.
You may not be able to get your phone on the corporate network without installing their MDM, and you may not want to give your employer the ability to wipe your phone.
Support for that is very uneven. Most UK networks only support it if you have an iPhone or a Samsung phone, and sometimes only if you bought the phone from the network directly.
But how will you log in to your SMS inbox? With an email? But what if that requires an SMS inbox as well.
Once we're in the realm of 'you only have to remember this one password' you might as well use a password manager that unlocks with that password and does the rest for you (be it with autofill or webauthn and the likes).
Yubikeys are fine too, even as a single factor in some cases.
Regarding simple passwords, we added a check against the top 100K seclist passwords when first registering, to keep users from using easily guessable passwords (we also had an experiment where we checked if that password was one of the frequently compromised ones).
Literally this converted into:
1- Users abandoning on sign-ups "oh how am I supposed to find a password I will remember"
2- Users bashing us on the app store reviews: "make it super hard to sign-up" even though we only ask for username and password, not even an e-mail
3- Users logging in, liking the app, then a few months later when they got logged out for whatever reason, completely forgetting what their password was and not having a fallback e-mail.
We ended up pulling it back. We just have a small note now that says "easily guessable password" but allow them to proceed with registration.
This is a good summary of a novel we've been writing based on our experience of tackling similar issues with clients. Working title: Misaligned Incentives. The best real-world solutions we've seen address this issue head-on by providing tangible incentives to the user in such a way that motivates them to act and doesn't harm the overall business objective. Example: product/service discount in a form of a coupon if you register a 2nd auth factor. Finding that balance is challenging, it is very context-sensitive. Selling it to the service owners is even more fun.
You could make the minimum password length longer than the longest SecList password. Then users can’t reuse any of those insecure passwords! Plus it’s also a fast O(1) check. :)
Does your app really need people to register an account? I’ve seen plenty of apps that make people sign up when there’s absolutely no reason to require it.
The caveat with a third party oauth solution is that you are now dependent and reliant on the third party to _let_ you use them to log in. Here are some fun experiences I’ve had with Facebook over the last couple of years:
- Our app was _deleted_ without any notice and any means of appealing (didn’t appear in the appeals page, and of course there’s no human support). We even filed a ticket and were told that they couldn’t help us because the app was “gone” in their system. Luckily we require an email address or we would have completely lost the ability to authenticate a subset of our users.
- A different internal app was banned from using “Facebook Login” because we were “providing a broken user experience” — the app was not even exposed for login in our system. We couldn’t appeal because the warning notice didn’t allow responding from our mailing list. Changing the primary contact didn’t work either, and we even disabled the login on the app just in case. Still revoked with no means of getting it back.
Google has been less awful to work with, but they make you jump through lots of hoops to get public login permissions. In summary, think very carefully about a third party Oauth solution.
Every time I want to use the service I have to go through this? I don’t think I would like that. Much easier to just paste in my password. Plus these emails are like sending passwords in plain text. If they are intercepted someone can impersonate me.
There are three components worth looking at. Each of them is popularly secured with TLS.
Firstly, submission, sending an email you just wrote from your client to a server. This is usually done over a specifically TLS-secured "SMTP submission port" 587 although it can also be done with STARTTLS.
Second, relay, getting email from your server to somebody else's server. A large proportion of today's servers default to STARTTLS over SMTP for MX. So this means when they connect to a peer server to exchange mail they'll enquire about using TLS and do so if possible. A passive adversary can't stop this happening.
Finally, delivery. Almost all modern IMAP clients default to using TLS with IMAP, so this step will be encrypted. Even in clients that don't require TLS a passive adversary can't stop them upgrading by default if possible.
This is misleading. Remember our context here is that we're getting a sign-in token for some web site, let's say it's the EXA Metal Pole Limited (Europe) site, example.com
The plain text is stored briefly on EXA's outbound mail server mail-blast.example.com, and then it's transmitted to my inbound MX mx1.tlrmx.org, stored very briefly there, and passed to the IMAP server imap.tlrmx.org.
So that's three servers, but, one of them is controlled by the same people as the site we're logging into. If they want a backdoor they can just make one, they don't need to steal their own sign-in tokens, that would be really stupid.
OK, so two servers left. But those are both operated by me, the recipient of the tokens. Why am I stealing my own tokens? To what end? "Oh no I broke into my own account and have impersonated myself" ?
Now, many people use say GMail instead of their own mail servers. But can we reasonably say these people's mail was "intercepted" by GMail, the outfit they've explicitly chosen to receive and store email on their behalf?
And even if we insist upon using the word "intercepted" this way ("The Buccaneers pass was intercepted by Mike Evans" [Evans is a Buccaneers Wide Receiver, the pass was presumably meant for Mike and so we would not ordinarily call this an interception, but if you insist...]) it's unclear what unexpected gain is achieved. GMail could just build their own backdoor and sign in as you to get the tokens instead of "intercepting" them if for some crazy reason that was what they wanted.
Email is federated, not point to point. It quite often hops between a couple of servers. Cloud hosted stuff typically gets routed through the cloud provider first (and whatever intelligence agencies are tapping that feed), which then pushes it to the top-tier smtp server nearest the destination for obscure hosts.
Still we’re in a perverse situation here. Running your own server is getting harder to do since everything operates on white lists, and I wouldn’t trust the big name providers for something like this.
The email approach is what StreamYard does. If someone gets a forwarded email within a short timeframe, they have access. Then they cookie you with an access token.
This is both good and bad. When I needed a whole team to have access to my account, I just built a mailing list, and used that address for signing up. Yes, it was annoying that we'd all get email every time someone logged in on a new device, but it was also pretty straight forward to use.
it's called two factor for a reason! You're suggesting a return to one factor, but ditching the PW and using the backup means of auth. What's supposed to happen is that you combine something you know (pw) with something you have (phone) as it's generally difficult for an attacker to get both.
Cryptographically, encrypting doesn't actually add any more security so... no point imo
edit: but infosec isn't completely equal to cryptography, so some deterrence like that will prevent some attacks. But it's like adding a real beefy padlock on your door (the hashing), and then putting a piece of tape to keep your door shut. Or putting a piece of tape over the keyhole of your padlock.
I always wondered something: does using a secret key as salt and keeping the last (few) block(s) of a block cipher as output produce a reasonable hashing algorithm? maybe with three salts, one for the key, one as a prefix to the password and one as a suffix?
What the GP describes is absolutely correct. It may not be all that common but it is a known pattern. That you haven't heard of it doesn't mean it doesn't exist.
> An alternative approach is to hash the passwords as usual and then encrypt the hashes with a symmetrical encryption key before storing them in the database, with the key acting as the pepper.
> A password hash is a representation of your password that can't be reversed, but the original password may still be determined if someone hashes it again and gets the same result.
I love workshopping copy!
How about:
To mitigate events like this, we only store a scrambled version of your password. Though your actual password can’t be simply unscrambled from the leaked data, it is possible it could be deduced by a guess and check process - especially if you are using a weak or common password.
I think it's important to include the technical term "hash" at least once. Then users can research the topic.
"Scrambled" is imprecise enough that it could mean either hashed or encrypted. In radio usage, it specifically refers to encryption, so someone researching "scrambling" would get confused very quickly by that explanation.
I'm not sure why implementing pepper (alongside, or even instead of, salt) is so rare. It's arguably much easier to implement than salt, and protects against both attacks described here.
The only caveat is that your database isn't coupled tightly with your application code, so your pepper remains secret even if your DB is breached (which is usually the case).
A pepper is essentially a secret encryption key (it's a long secret string that's added to the password and the salt to ensure more entropy). With cloud key management services (e.g. both AWS and GCP have a KMS), I think it's more beneficial to just encrypt the hash before putting it in your database. Process looks like this:
Upon password creation:
1. Generate hash as hash of password + salt.
2. Encrypt the hash with a public key from KMS (you can store the public key in your server code).
3. In your database store the encrypted hash, the salt, plus some "key ID" that identifies which KMS public key you used (this is so you can rotate keys later).
Upon user login to verify the password:
1. Retrieve the user's encrypted password hash, salt and KMS key ID from the database.
2. Make a call to KMS to decrypt the hash (KMS internally stores the corresponding private key but never lets you access it).
3. Then hash the password the user entered + salt and compare it to the decrypted hash to see if there is a match.
Benefits of this are:
1. If an attacker steals your database, they can't decrypt any of the passwords or the password hashes.
2. KMS never exposes the private key of the async key pair, so you know this won't get exposed either. The only way to decrypt something is to make an API call to KMS.
3. Thus, the only valid attack really is if the attacker is able to gain the same access privileges as your server. But even then they still need to call KMS one-at-a-time to decrypt hashes, and all of those KMS calls are logged in an audit trail, so it should be much easier to see if you have anomalous calls to KMS. There is a huge benefit here in that it is impossible to do bulk decryption without a giant audit trail.
We do something similar for storing all DB entries (since our data is sensitive, as we're a financial services company). Even if someone gets access to our DB, all they'll get is garbage :)
Yep. Not sure the details of AWS, but in GCP access to KMS APIs and specific keys is controlled by IAM, and you can set "conditions" on IAM policies to restrict access by things like IP of the request: https://cloud.google.com/iam/docs/conditions-overview
pepper instead of salt is a bad idea because if your pepper is leaked the attacker can brute force all of your passwords at once.
The main argument I've heard against pepper is because people are afraid of losing the pepper. It either needs to be directly in your code which is easier to leak, or out of band which is easier to lose.
edit: Does anyone else want to goto waffle house whenever they talk about salting and peppering their hash? Too bad the closest one is an hour away.
If the pepper is stored as an environment variable, adding it in addition to the salt can be a minor increase in the password security. The thing is it often isn't that hard to get if someone has access to your server.
If it's just the db that is exposed though, it could be a small added layer of protection.
And yes, I would not ever do it instead of a salt.
Would in this scenario be running each service (database/backend/frontend) in separated clouds/envirorments be benefitial to mitigating risks?
Let me explain. for example, I might have my NextJS frontend in Vercel using it's secret management /env tools.
The backend as a vanilla node-apollo-server-express could probably be on a cheap VPS, being monitored/restarted/load-balanced by PM2.
The database would be cloud, either PostgreSQL as a Service, or Fauna or something.
Would this scenario be better than just cramming everything into a VPS and trying to get that as secure/closed down as possible and be done with it (do monthly updates and whatnot)...
I've faced recently this conundrum at job. Small new app will not have more than a few hundred concurrent users optimistically...
As @wongarsu said, I think running the DB separately makes a lot of sense. Also, that (in theory) protects any ENV variables on the main host server from exposure if your DB is compromised. And the DB is likely better protected and up-to-date if you use a DB-as-a-service. But I wouldn't go extreme either. There are a lot of hidden costs associated with utilizing additional machines, whether they be via lamda functions, VPSs, etc.
One other good thing about keeping things separate is updates in theory should be easier, as besides OS stuff, you only have a handful of applications on each machine. But again, for small apps, it's generally not worth the extra complexity. 3 platforms is likely as far as I go (client-side server, API, DB) in the majority of cases with small apps, and most of the time I'd probably just go with 2 assuming Node can serve both the client and API.
Theoretically, more separation means more security. On the other hand, securing many things is harder than securing few things.
PostgreSQL as a service (or whatever DB you prefer) is worth it, that means somebody else can get backups, security relevant settings, patches, version upgrades etc right. Everything else is probably fine on one server, but it doesn't sound like that would be much besides the backend in your case anyways.
> pepper instead of salt is a bad idea because if your pepper is leaked the attacker can brute force all of your passwords at once.
This isn't true, you could simply do encrypted_pw = md5(pepper + md5(password)) or whatever.
Edit: Getting a bunch of comments on this, just want to clarify that I used md5 purely to illustrate a one-way hashing function. In actual practice, you'd use something else (HMAC with SHA256 most likely).
No, I strongly recommend not to do so. The reason you put an salt in is to prevent multiple hashs to be the same (because people use the same password) with a pepper you still protect against cross reverencing the hash with other databases, but any two users in your database will still have the same hash and people tend to reuse passwords (or similar passwords) so this is a very real attack vector to get passwords without really braking any hash fully (E.G.: user with known password on different platform => try out similar passwords => broke hash for anyone with similar password).
So a unique per hash specific salt is the most important thing to do.
Pepper/shared secret can make it additionally harder to crack any hashes as while you know all salts (they are stored alongside the hash) you don't know the pepper.
Lastly there is additional data (AD) (named sometimes differently). Which can prevent some form of hash reuse attacks where you e.g. find some form of attack which allows you to override hashes+salt in the db (but not more). Then you could rewrite all hashes to known ones and get access. Tbh for many systems if an attacker can do something like this they don't need to do that anymore. But for other (often large and complex) systems it's helpful.
The idea behind AD is that you (somehow depending on algorithm) include some additional data which needs to match. The most common example is to use the user id (if immutable) as AD so this hash+salt(+pepper) is only usable for given user and never for any other user.
If you ever write a auth sub-system for a big enterprise system I would recommend you to use salt+pepper+AD(uid), for everything else I would think salt+pepper is enough. But never should you use a hash without unique salt under any circumstance. It's always the wrong path to take.(For password hashing.)
Short answer: if they are unique because they're a small sample from a large space (e.g. UUID v4) no salt is needed. If they're unique but maybe predictable, salt.
Thanks! Yes, they are random GUID-like values and are short lived (2-4hrs). Had a need to store them for a reason and decided to only add pepper to hash considering they are unique and short lived anyway.
SHA256 is also no good for storing passwords, you need to use a PBKDF like scrypt, bcrypt, or pbkdf2.
The SHA-family cryptographic hash functions are purposefully designed for throughput, if you combine them thousands of times like in PBKDF2 they can be fine. One round of SHA256 is trivial to brute-force especially with the plethora of ASICs available.
HMAC is also completely unnecessary here, and see the article title for your variable naming: it's not encrypted_pw it's hashed_pw.
I know this is just a comment on HN, but I want to point out that MD5 is not a password hashing function, and is broken even for a number of other hash purposes. For passwords something like argon2id or another modern slow hashing function is appropriate, and for general purpose, SHA2 and SHA3, as well as BLAKE2 (and maybe BLAKE3) would be good choices.
Further, there’s so little reason not to salt, so you ought to do it. It’s built in to the hash strings in most modern pw hashes. Peppers are also a great idea. Defence in depth!
The discussion of "MD5 collision resistance is bad, but md5 preimage resistance is still ok, so using md5 for applications where preimage resistance is required is ok" is I think completely valid (however pointless it is)
However for password hash functions to be useful they also need to have a high work factor to make brute forcing even more difficult.
The only unknown in that equation, once you get hold of pepper, is password. So, what added security does it provide, other than requiring another md5 computation?
How does that help? They can iterate through possible passwords and generate md5(pepper + md5(password)) just as easily as md5(pepper+password). The point is that they can iterate ONCE and match against all passwords in the DB. With salt they have to iterate for each row in the DB which is much more time consuming.
If that gets leaked the attacker could brute force that, and match the results to your entire list of hashes. If they're salted, they would still have to brute force each password individually.
I think you don't quite understand the purpose of a salt. Pepper and salt protect against very different things. And frankly, salt is much more important.
am I missing something here? the attacker still only needs to know the pepper to brute force all the passwords with this scheme. Since the pepper is deterministic, and especially with md5 which is already extraordinarily quick to reverse, the attacker can just take md5(pepper) then do the extremely quick operation of the hash extension md5(password)
> The only caveat is that your database isn't coupled tightly with your application code, so your pepper remains secret even if your DB is breached (which is usually the case).
There are many attack scenarios in which storing a secret separate from the DB doesn't get you much at all. Suppose an attacker finds an RCE vulnerability in the application – then they can slurp up the contents of the DB, but they can read the pepper from the configuration too.
Suppose they just have a SQL injection – they can't directly get the pepper from a SQL injection, assuming it is being stored somewhere outside the database. But, it may help them do it indirectly – for example, they could UPDATE their account to have admin privilege, and then access admin-only features of the application which may end up revealing the pepper. I've seen many apps which have admin-only screens to run arbitrary scripts, view configuration files or environment variables, install plugins, etc – those kind of features are helpful in supporting the application, but can be used to turn a SQL injection into more complete control over it
But what about having separate auth sub-service with separate database.
Who says a SQL injection allows yourself to update your account to be admin. (SQL has their own permission system even through its not used that much).
Who says making yourself application admin allows yourself to do more then banning users.
Sure for many especially smaller systems pepper won't help much and AD (additional data) is even less likely to help.
But for many (especially bigger/enterprise) systems this might very well not be the case.
Furthermore pepper usage tend to not really hurt.
I think it's a fallacy to assume just because many less well build systems allow SQL injection to escalate that all do so and pepper is useless. Especially given that most systems need to have some config secret management system anyway and as such adding a pepper tend to be somewhat cheap to to (dev effort) and doesn't really cost much at runtime either (normally).
> But what about having separate auth sub-service with separate database.
Sure, my point was mainly about classic monolithic apps. If you split your app up into lots of separate micro-services, my point may no longer hold.
> Who says a SQL injection allows yourself to update your account to be admin.
I wasn't talking about a database superuser account, I was talking about an application-level admin. Usually there is some database table which stores user permissions, and an attacker can do an UPDATE/INSERT on that table to grant an ordinary user account full admin permissions, or even create a brand new user with those permissions. All this can happen within the single ordinary DB account used by the application (given most applications only use one DB account)
> There are many attack scenarios in which storing a secret separate from the DB doesn't get you much at all. Suppose an attacker finds an RCE vulnerability in the application – then they can slurp up the contents of the DB, but they can read the pepper from the configuration too.
If the auth is a service accessing only via some remote access protocol that only allows a minimum number of operations that do not map directly into database commands they cannot exploit RCE in the app to get auth database.
Such service can also enforce reasonable rate limits and quotas, something that a regular direct database access can't enforce.
It's really hard to store pepper in a system meaningfully more secure than the password hashes themselves. For any suggestion of a safer way to store pepper my response is "store the password hashes there instead". Until you get to the point of decrypting password hashes in a TPM I don't see any benefit and at that point you've switched to a hardware solution.
That's not the argument though - the argument is that storing high security rarely accessed information ( hashes of passwords / encrypted passwords ) that are needed for a very limited number of well defined operations on the same system within the same security domain with the low security often accessed information with a high number of not well defined operations is what responsible for majority of hashed/encrypted credential leaks anyway.
You are going to have a much harder time getting data from a service that only speaks HTTPs with 6 endpoints that can only be fed 1 type of JSON with only specific fields present per end point rate limited to rates per second that make sense for those individual end points with a database backing it being internal to that service which gets 15 deploys per year than from a Gigantic Database Supporting Application That Does Everything For Users and Business and Marketing and Analytics that keeps changing based on weekly sprints.
Yes, furthermore (db) admin login should preferable be done over a different SQL user so that a normal usages SQL injection can never be used to update anyone to have any admin rights.
But then just giving the default user admin rights can make many (not so good but widely used) deployment tools and methodologies and similar much easier to use so you see it quite often.
I have seen it a few times. Default db user being a db admin because the ad-hoc written deployment script during initial development was never updated and security didn't matter back during per-release bootstrapping. ;=(
> Yes, furthermore (db) admin login should preferable be done over a different SQL user so that a normal usages SQL injection can never be used to update anyone to have any admin rights.
You have to distinguish application admin rights from DB admin rights. Most apps run with an ordinary DB user account, and even with a SQL injection bug in the app, you'd need to find an additional database vulnerability to upgrade the ordinary DB user account to a DB admin account.
However, for most apps, what is really valuable is the data, not the database, and for the data you don't need the DB admin account. The app's own users are often stored in a database table, with some flag (or something more complex like a separate group membership table) used to give an app user admin rights in the application. The point is, those admin rights can potentially unlock maintenance-focused features which could be used to extract the password pepper from the configuration. (Of course, as other commenters have pointed out, this is much harder in a microservices architecture with a user authentication service than in a classic monolithic app.) Giving a user admin access via SQL injection can also have other benefits – while data theft can be done via a SQL injection vulnerability, it can be cumbersome; using REST APIs, file export screens, etc, can potentially make data theft quicker and easier. The cost is increased risk of discovery, since auditing of admin rights may discover some unexpected new user granted admin rights, or some existing user having them unexpectedly – by contrast, data theft via a pure SQL injection with no data changes made would not be detected by an access audit.
I have wondered the same thing. Make it a piece of data that comes in from the environment or a secrets manager. It’s not bulletproof, but if done correctly seems like it makes the whole hashing scheme a little better.
> It's arguably much easier to implement than salt
Salting is easier since it can be completely wrapped up in the key derivation function. The programmer doesn't have to know anything about it, they just use generate_key(passwd) and check_password(passwd, key).
> The original password is never stored thus keeping it a secret even from the website you provided it to
Never stored, hopefully, but the website has a chance of seeing it everytime one signs in. A malevolent site or developer or admin could store the password somewhere or try to reuse it on a number of other well known sites. Hence, one different password per site to protect also against the sites themselves.
We're finally seeing challenge-response in the form of WebAuthn, though I'm not sure why it's taken so long to get to where we are now. It's not like challenge-response is a new concept. And you don't need a hardware security key to handle challenge-response. If you don't have that as an option, you could use the password to directly generate the response. Of course the browser has to support this, otherwise you'd be relying on the page's JS which doesn't help with the attacker controlled server scenario.
Agreed. My point here is that there was no reason that we couldn't have had something better than passwords in the time between first using passwords on the web and now.
There was a class of password algorithm that used your password and some other information as a seed for generating an asymmetric cypher, but the original Stanford proposal was apparently a bit light and nobody ever implemented it. I bumped into some new algorithms in the same category, but I'm blanking on the clever name they used to describe them all.
The server would only ever have your 'public key'. I'm not sure where you get the saltiness to prevent lookup tables but still let me log in from three devices with the same credentials.
I think you're referring to PAKE (password-authenticated key exchange). SRP (secure remote protocol) might be the Stanford proposal you're referring to, although it's actually fairly commonly implemented as far as PAKEs go (Apple for instance uses it for their iCloud Keychain).
The Stanford proposal you're talking about was named pwdhash.
This approach is sometimes called a "Deterministic password manager" and that might be the phrase you wanted.
There are several problems with this approach, especially with regard to the need to sometimes change passwords.
But the point your parent was making is that in the modern era (certainly this century) none of this was necessary, there are asymmetric PAKEs which render it unnecessary for a site to know your password at all in order to confirm that you still know what it is.
The difference between hashing and encrypting that I did not find quite spelled out enough in the article is that with leaked hashes there is a risk of cracking proportional to the password strength. High entropy passwords will not get cracked even if they are hashed with the MD5 which pretty much represents worst case scenario[1]. With encryption, it is very much all or nothing scenario; if the database gets cracked then all of the passwords get revealed.
[1] Napkin math: Single Titan RTX does some 65GH/s of MD5, so if you have 80bit password it would take in the order of 2^80/65e9 seconds = 500e3 gpu-years to crack
If you have a random 80 bit password, yes. But passwords are the farthest thing from random. Even high entropy passwords from a password generator will only use the printable ASCII character set, which reduces the number of possible values of a byte from 256 to 95. An 80 bit password from a conventional random password generator would have an effective entropy of (80/8)*log2(95) ~ 66 bits, which would take about 30 GPU years.
80-bit password is not the same thing as 10-ASCII-characters password. The reduced alphabet space of only using printable characters is taken into account when figuring the bit-edness of a password.
Mince meat all looks the same to me, so it seems like a poor hashing medium, I would be passing auth 100% of the time, regardless of the cow. (Or dog! Or human!)
You take your letters and substitute them for numbers, A = 1, Z = 26.
Like a checksum, you add all the letters together. The same word produces the same sum each time, but if I had the sum, I wouldn't know for certain what the original word was.
I wonder why it isn't best practice to hash with a salt and then encrypt the passwords using something like AES.
The encryption key can be stored in the secrets manager and be injected via environment variables.
Unlike a pepper it is possible to change the encryption key if it is leaked. I don't think changing a pepper for existing hashes is possible, but if they are encrypted you can just reencrypt them with the new key.
So is there an obvious downside I'm not seeing to hash with a salt and then encrypt?
It's possible to change the pepper during login (same as adjusting parameters).
The downside for the scheme is complexity and limited upside; complexity gets a lot more attention when it comes to security considerations.
Best practice especially needs to be simple; it's easy to mess this stuff up and hard to understand. A lot of the comments on this post betray a very poor understanding of password storage; they simply haven't come across the correct information.
Overall pepper is good as long as you include salt. There are times when the db gets leaked and the env variables don't.
There's nothing wrong with your scheme if it's implemented properly, but being able to change the site-wide key is a limited upside compared to using a pepper. There is an upside though.
And all of this doesn't matter much as long as you do the bare minimum of using a tuned pbkdf+salt and keep your stuff patched.
I don't see any downsides. As you point out, upgrading hashed peppers isn't as beautiful. Yes, you can just introduce a new pepper and wrap the old hash into another hash using the new pepper, but you'll still have to keep the older pepper around to obtain the old hash during checks, creating risks if the attacker gets access to the older pepper and an old db backup. With AES usage, you can throw the old key away once all of your db has migrated and you've updated your key backups.
Just make sure you use a suiting AES mode. If it turns out that you xor'd all hashes with the same stream that depends on your pepper only, it's not really helpful :).
I can only think of two things, and they're manageable.
One, slightly longer password checks. And two, the temptation to lean on the AES key and set the cost of the hash too low. In the case the key does get out - we had an entire era where people stole environment variables from servers - then you decrypt the entire password file once and have cheaper guesses per second.
I guess that is true, but if you store the encryption key in a secret store on Azure or AWS, I wouldn't really worry about that.
Maybe the real question is why use a pepper over encryption?
They have the same downside (as you mentioned) but at least you can change the encryption key without having all the users change passwords. I don't see any advantage in using a pepper over encryption, except maybe implementation complexity.
Pepper, encryption, they both guard against drive-bys, which can happen for sure (missing backups anyone?) but they're not the only thing that can happen.
Like any disaster preparedness situation, make sure your strategy doesn't count on an asset you've already listed as unavailable.
So I tried to show this article to my mom and she had no idea what it was saying. When you introduce terms (and you are targeting non-"computer" people) you have to clearly define them right away. For instance she was really confused that hashing was used as both a verb and a noun.
It may also be unclear what "reversing" means here. You have to remember that most non-technical people don't think in the programmer style "input-output" way.
Say "reversing a hash" and the may think of reversing the string like abc->cba.
Often you cannot do better in explaining it without doing a mini course. And people would have to sit down, concentrate and work through the new concepts and that's tiring, looks like school, "I was never good at math", and you lost everyone. You can't make the average person understand hashing, mainly because the don't care and will glaze over. You can dress it up with engaging stories and metaphors, and they will remember the engaging story but not actually build the right mental model if you actually poke at their understanding.
If your email is compromised, the attacker can request a password reset, then log in that way.
Why isn’t it common to use “login emails” with a one time token login buttons in the email?
The only downside I see is that it is harder to log in on a machine where you do not want to access your emails; for example like this:
On the machine with no email access the site would have to show a qr code, and a phone where you are already logged in would have to scan it, to approve the login.
Is this any less secure, or is it just not feasible for average users to understand?
I asked this exact question some years ago. I believe the answer has many parts:
Effective security is tailored to the situation: the app, the users, and the adversaries.
Login emails are slow and require multiple steps. The user must enter their email into the website, switch to their email client, wait for the email, open it, and click the link. If the user's email goes down, they cannot log in to the website. Some folks get distracted while waiting for the email. Sometimes a login email will get marked as spam. Sometimes a user permanently loses access to their email account. These things reduce user engagement and increase support costs, and the company makes less money.
There are many ways for attackers to obtain email: DNS, BGP, email server, email server backups, user device backups, malware, re-used passwords, and phishing.
There are only a handful of ways for an attacker to obtain passwords: malware, password re-use, and phishing. Notice that this is a subset of the attacks on email. And passwords have special mitigations. Some devices have password managers that resist malware. Security teams have various options for reducing password re-use.
Good authentication depends on something you know (password) and something you have. Password managers turn something you know (password) into something you have (device). Criminals rarely steal devices to gain access to online accounts. So in practice, something you have is just as good as something you know.
I get the part where you list the possible inconveniences of email.
What I dont get is how having smaller attack surface for losing a password, and using a secure password manager mitigates the attack vector of getting access to my account with a password reset through my compromised email.
If I understand correctly you say that email is more vulnerable, so this only strenghtens my point I guess?
Password reset is a rare event, so it has extra mitigations:
1. Extra security checks. For example, if you buy a new laptop and use a coffee shop's wifi try to reset your bank password, they will lock your account. You will have to call and talk to a person and give extra personal information to get it unlocked.
2. Notify the user about the password reset. Use email, text, phone call, and postal mail.
3. Automatically lock the account if suspicious activity occurs in a time period after password reset. Examples:
- Orders over $100 shipped to new addresses.
- Risky transactions: buying gift cards, buying plane tickets in foreign countries, changing the delivery address of a shipment.
- Using a known device (with cookies/fingerprint) and the old password.
4. Require extra confirmation for transactions. Examples: re-enter credit card numbers, security codes, and personal id numbers (SSN in USA).
5. Preserve user data so it can be restored to the state before the password reset.
These mitigations work well enough for protecting accounts from fraudulent password resets.
> If your email is compromised, the attacker can request a password reset, then log in that way.
Doesn't this answer your own question? (ideally email password reset should also require MFA)
As another, non-security point, occasionally it can take a good few minutes for emails to arrive in your inbox. Though to be fair, that's sometimes the case for SMS too.
Many comments complain how Troy explains about technical stuff like encryption vs. hashing and the users don't care about or understand it. However, don't forget that many readers of his blog are the technologically literate and can appreciate the nuances of this content. In fact, I find this particular post quite easy to understand for any beginners who want to learn about about password security.
The distinction between encryption and hashing can never get too much education, both for the end users and the more technical developers/sysadmins.
If we're being extra pedantic, hashing is just using some function that maps inputs to a set of values and is not necessarily hard to reverse and he should've used the term cryptographic hash.
But you're talking about cryptographic hashes which are by design difficult/impossible to reverse. Their unidirectional nature is what makes them cryptographic hashes instead of just plain hashes.
> Saying that passwords are “encrypted” over and over again doesn’t make it so. They’re bcrypt hashes so good job there, but the fact they’re suggesting everyone changes their password illustrates that even good hashing has its risks.
This is correct, but I am going out on a limb and guessing that legal counsel had something to do with the wording here (which is perplexing because I tend to expect legal definitions of terms to be more specific).
I was at an organization that also had a data breach, and legal counsel advised us to write a similar email when disclosing publicly that the breach occurred. I was personally on multiple phone calls with legal counsel about this, and it was quite frustrating to try and explain the difference between encryption and hashing, or stay on point with the fact that our passwords were not encrypted, trying to get people to stop using that word on phone calls. Early on, they'd ask questions like, "But aren't your passwords encrypted?!" And you'd have to explain, no, they're not, they're hashed, which is most likely better than encrypted (although I'm open to being proven wrong on that).
They were, also, mostly useless on explaining what their perspective of encryption was. I never got an explanation from counsel, and at best, I was linked to a blog post that suggested some security best practices (not a legal definition of anything we were liable for).
The sad thing is, with a data breach like that, you probably do (and should) feel terrible for your customers, anyone who trusted you with their emails, passwords, etc. But the laws surrounding it are confusing enough to make it easy for some people to push this out of their mind and just focus on, "What is the best thing we can do to legally cover our asses?" Even if that means saying factually incorrect or misleading things like, "Your passwords were encrypted."
His explanation is STILL too technical. Here is his explanation:
A password hash is a representation of your password that can't be reversed, but the original password may still be determined if someone hashes it again and gets the same result.
Compare to:
A hash is like a fingerprint of your password. Your fingerprint tells me very little about you. You might be a man or a woman, tall or short, young or old. But if you show up, I can tell that you still have the same fingerprint. Likewise a hash doesn't tell me your password, but if my computer keeps guessing them, it can know when it is right. And my computer guesses really, really fast.
For the record I just asked my teenage son who doesn't know what a hash is whether he understood either explanation. He understood mine, but not Troy's. And he's a smart kid. If he doesn't understand an explanation, I'm pretty sure most adults won't either.
And as far as operational security is concerned, this fixates on just that one threat that the author talks about at the end, which depends on weak password.
Wonder which one your teenage son likes better? Of course the answer can be skewed one way or the other knowing which one was written by his parent :-)
I don't know which he likes better, but I dislike that one because it is misleading. The phrase "scrambled" suggests that you might be able to unscramble it fairly trivially.
I occasionally help new programmers figure out the basics of coding online and not understanding the difference between hashing and encryption is one of the biggest points of confusion I see. They're fundamentally different things and knowing the difference is something that you should learn sometime after learning for loops and some time before learning trees and graphs. This is something the general public is very much confused about and if, as a professional, you get his wrong it very much makes you look bad. I don't want to hear my doctor confusing the pancreas for the liver and I don't want my IT professionals calling SHA256 an encryption algorithm. Understanding the difference is important. I'm not sure why the audience at hackernews finds this so academic or controversial.
Someday, when I grow up, I want to write like Troy Hunt.
There was nothing new for me in that article and yet enjoyed reading it.
There's a special place in my heart for people who make security things readable
A bit of a tangent, but even if hashing was hypothetically equivalent to encryption, wouldn't it still be good practice for organizations to encourage users to change their passwords after a data breach?
A good hashing algorithm today could be useless down the road if computation becomes orders of magnitudes faster, or RSA becomes trivial to reverse, etc. Similar to how MD5 was initially designed as a cryptographic hash function, but isn't considered one today.
I think the idea "Well my password was encrypted, so why should I have to change it?" seems a bit silly.
I want to log in everywhere with my iPhone. My iPhone has my FaceID stored locally, so in order to log in somewhere with my account someone has to be in possession of my iPhone first. That's the 1st factor.
Secondly they have to be me, or at least somehow fake my face so that my iPhone can match my FaceID. That's the second factor.
That is already a super convenient two factor authentication scheme. Why can't websites not send a request which invokes my iPhones two factor auth model to log in?
Passwords AND password managers should be made redundant.
Your face is only authenticating you to your device because that's what you chose. If you don't want that (e.g. your identical twin sister loves pranking you) you can just use a different authenticator. The remote web site deliberately has no idea your face was involved, it just knows your identity was verified on its behalf by the hardware storing your private key.
I can have a near infinite number of passwords, but I only have 1 face, or 10 fingers. When all of my fingerprints are compromised, and the system only allows fingerprint login, now what do I do ?
Troy doesn't understand users. I know how this stuff works and the blog post had me snoozing.
Users don't have room in their brain for both "encrypted" and "hashed". All they care is if they are secure, and if some pro tells them the vendor is lying.
Why should they be expected to know?
They are receiving an email from a professional IT specialist who couldn't even figure out how this stuff works.
It's the vendor's job to get it right, and lawyers and activists's role to hold them accountable.
Question, is it good to hash passwords as = password + email (so changing email requires input of password) + site wide long random string (this string is same for every user), thoughts?
You'll be better off with a per-user random salt than the <email>+<site_const> scheme. If you later want to include/exclude a re-auth flow to your email change process then you retain that flexibility (though as an end-user I dislike such additional barriers -- I suppose they thwart cookie stuffing and other kinds of vulnerabilities if your service is broken in that way, but I still don't like it).
Maybe still with a salt? Otherwise I’d think it would still be reasonably hackable cause both things can (could) be known.
The only thing that bothers me here (not sure why) is something about the fact that if the user changes their email you get a pass change as well and that might signal an account take over if your system is analyzing behavior stuff like that. Could be wrong on my gut feeling here.
Unfortunately, email is not a unique salt in the context of the internet. If the owner of the email uses the same password elsewhere (very likely), it's probably already in a rainbow table.
That is nonsense. You might want to check again what's a rainbow table and how it's used, because it's not what you think ^^
It's possible that another service use the same hashing method and email as salt and password. It's irrelevant at best when it comes to security. If the password was cracked it would make it's way to dictionaries, the password is compromised irrelevant of the salt and the hashing method.
The purpose of (properly) salting is to prevent against rainbow table attacks when your users table with password hashes is leaked, right?
That's why OWASP advises that salts be at least 16 characters long. This requires an attacker to generate a (currently) unfeasible number of rainbow tables.
However, if every web developer out there followed your advice, then the number of rainbow tables ever needed is reduced to the number of emails; assuming everyone uses the same hashing function. Sure there are a lot of email addresses, but maybe the attacker only wants to target users at @bigco.com. So now only 100 rainbow tables are required to cover the C-levels and VPs there.
If you know systems are doing SHA(password), you could generate a database of precomputed passwords up to 10 characters. That's called a rainbow table. It can take a year to generate one and take a terabyte to store. The benefits of the table is to make lookup really fast.
You could still try to brute force up to 10 characters on the fly or try a dictionary of known passwords. Having a precomputed table is much faster though.
Systems have to use a salt per user like HASH(username + password) because user+password won't be found in the table. This effectively break usage of rainbow tables. The salt only needs to be unique per user account, an email is good enough.
It's nonsense when you say to generate a table per user. It could take a freaking year to generate a table! That's not feasible.
That being said. For something like Linux or Windows accounts, if it were using something like HASH(username + password), it could be worthwhile to make a table because the username is always "root", so that use case should use something else than the username as salt. ^^
And it takes a week for 9 characters and a few hours for 8. The point is that with a random salt the amount of work that needs to be done is proportional to the number of accounts you want to break across all services, not just one. It isn't a ton of additional security (especially if an email account is compromised...why is everything so terrible), but it is strictly better and doesn't have failure modes like the same username being used in a billion devices.
He's right about one thing, hashing is mostly for data integrity and encryption for confidentiality. So if passwords are leaked and the organization says don't worry, fortunately your password is encrypted but its better to change it, just in case when in reality they hashed it, they are just misleading their customers and not even trying to improve the password security they have in place.
Here is my take on explaining password hashing for a non-technical audience:
> Making a password hash gives you a scrambled text that nobody can turn back into your password but that anybody can use to check if a password guess is your real password.
This uses the folk understanding of entropy to be more memorable in a non-misleading way. You can't un-scramble an egg hash.
> They’re bcrypt hashes so good job there, but the fact they’re suggesting everyone changes their password illustrates that even good hashing has its risks.
It's good that they suggested changing passwords, though. I have a feeling that if you call them out they're going to go back to "the passwords are secure because they were encrypted".
What are people using to store API tokens (eg. hmac based secrets)? My understanding is that it is symmetric so you need to store it in a recoverable way -- which always means you can leak it. What are better ways?
my non-crypto intuition on hashes is that the key idea is that there should only be one hash that matches my password (no collisions i think is the word? )
So naively i understand that a non-salted hash for "password" is a terrible idea, since two peoples hashes for "password" will be the same..
but i still struggle with the practical safety this gives in db breaches where the salt is in the breach..
Troy seems to say that this makes the cracking process slower, which think i see...
but is it fair to think that the KEY ISSUE is the uniqueness of the hashing function for any given input string?
They are just trying to avoid a quick lookup in a pre-computed table. Adding a salt means they have to start from scratch.
Uniqueness is not really a factor, reversibility is. Given a hash will be fixed-length, and passwords can be an arbitrary length, you get a very large number of passwords mapping to each hash. But the key is to make them hard to find.
So for the server it's good to have a strong (which often means slow to compute) and non-broken scheme. Something like MD5 is just too fast to compute, if someone is targeting you in particular (and not the entire breach) you might have a bad time anyway. Some schemes have a work factor meaning the computer basically repeats calculations a lot to waste time, which a cracker would also have to do. This factor can be updated to keep up with computing power over time.
On the client the best you can do is not reuse passwords, make them long, and try not to overlap with any known wordlists (dictionaries, past breaches, etc).
Not having a salt means that hashes can be precomputed for common passwords and the exercise of cracking passwords is a simple text search in your precomputed table.
Having a salt means that the universe of password:hash correlations is different for each password entry.
thanks, yeah that's what mean... but i see that the the core insight is that using salted hashes has an exponential decrease in the ease of finding and storing "hashed passwords" for a lookup tabke
So am i still in the right mindset to think that the "uniqueness" of every hash/ lack of collisions is still a valid area?
ie that the hash function is really unlikely to produce the same hash twice....
Seems like a rant against hashing, but using bcrypt at level 10 (like he demonstrates) is just so much better than encrypting. Level 10 takes about a full second on hardware from a couple years ago, which was the last time I checked. Yes, you can verify that one of them is "iloveyou", when you already know that, but any kind of dictionary attack, at one per second, is not going to be a good time. And like he said, if you encrypt, and lose the key too, game over.
I took that to be a sarcastic comment on how many sites will see that password and say "wow, strong password!" when it's actually one of the most crackable passwords available.
This is a good password because it has lowercase, uppercase, numeric and non-alphanumeric values plus is 8 characters long. Yet somehow, your human brain looked at it and decided "no, it's not a good password" because what you're seeing is merely character substitution. The hackers have worked this out too which is why arbitrary composition rules on websites are useless.
Either you stopped reading 1 sentence too early, or you are posting in bad faith to try and smear Troy.
Yeah I think that whole paragraph needs to be rewritten; one could also come away thinking that substituting characters in a word with symbols is also a "good" password.
The issue here isn't so much the password strength, but the algorithm used to hash the password. A good password hashing algorithm like scrypt takes a long time to make a single guess, even if the input is simple, and since every single password has its own salt, it has to be re-calculated for every password, even if all of them are 'P@ssword'.
This is true, except for scrypt, which is very tough to make ASICs for. As for bcrypt, it's true that ASICS can go very fast, but you ultimately have a massive advantage as defender here. An attacker needs to try billions of combinations per hash, but you can simply take a whole second of CPU time (scaled to your current load, roughly) if you want -- that's not a lot for a user, but for an attacker taking several billion of even half a second makes cracking very, very hard.
You aren't technically wrong (the algorithm can add some number of bits of effective security), but that's still horribly misleading. "P@ssw0rd" is emphatically not a good password, and "because it has lowercase, uppercase, numeric and non-alphanumeric values" does not (significantly) improve a password. See https://www.explainxkcd.com/wiki/index.php/936 for a specific example. (But note that you need eight or more words if the attacker has password hashes that they can attack offline.) Salting adds 0-2 words worth of security depending on how many users can (no longer) be attacked with each hash invocation and a good (slow and optimization-resistant) algorithm adds 1-2 words depending on how slow you're willing to make login.
Edit: I think TFA might be being facitious in the claim "This is a good password because it has lowercase, uppercase, numeric and non-alphanumeric values plus is 8 characters long.", although it doesn't communicate that very clearly.
The immediate answer is that Microsoft built the systems this is discussing last century and at the time this wasn't highlighted. Both the LANMAN and NT schemes are from the era when Microsoft was just getting into stuff like remote authentication that we now take for granted as a baseline.
Even though it's probably true this is an unsatisfying answer because the Unix password hash (at least a decade older) is stronger, using 12-bit salt and a deliberately pessimized hash algorithm.
A further (and longer lived) excuse for Microsoft is that they strongly favour backwards compatibility which has seen their company be very successful over a long period. Having accepted this poor scheme in the 1990s it was hard to give it up and annoy millions of users.
One of the grave mistakes Microsoft made a little later is that they began relying upon these password hashes as secrets themselves, and that's where it starts to get interesting, because that's a sign that key people at Microsoft didn't know that they didn't know what they were doing. That is, not only did Microsoft not have people who were competent to do this properly, they didn't even have people who knew they needed those people.
User PC gets owned, attacker elevates to admin, now has access to IT Support hashes cached locally which can be sent to other devices directly to authenticate.
Troy is sarcastically explaining that this is a "good password" according to older password complexity rules, but a horrible password given predictable password substitution logic. Password complexity rules accomplish little.
By seed you mean salt. Salts are stored in plaintext, so they don't increase the entropy of the password. Instead they make it so that each password hashes uniquely so that everyone with the same password gets different hashes. They also mitigate rainbow tables by effectively requiring the attacker to create a rainbow table per target
I'm not sure what you mean by a seed. If you mean a salt, that's no more secret than the hash. It has the effect of requiring you to crack each hash separately, but doesn't make it any harder to crack an individual hash.
Terrible crypto design is a recurring feature of Microsoft products too, e.g. the original LANMAN password hash has a maximum password length of 14 characters and just treats all passwords as two separate 7 byte values to verify.
If we use MD5(password) then brute force of a 14 character random password (say using a 64 symbol character set) isn't really viable because you need to try about 2^84 values. MD5 isn't an acceptable password hash, but using a strong random password was enough to save us.
But if we use LMhash(password) that same password can be brute forced in just 2^43 operations. This design is so bad that there's no way for the user to protect themselves.
This popularized Rainbow Tables. You may use a time-space tradeoff for any hashing scheme that doesn't have salt, or doesn't have enough salt compared to how much it's used, but for most schemes a strong random password overwhelms the power of the time-space tradeoff anyway. Rainbow Tables tilt this tradeoff so that it's more favourable in exchange for a modest reduction in effectiveness† and for LANMAN they're enough to turn it from "If you had a super computer and hundreds of gigabytes of disk space you could crack any Windows password hash near instantly" to "If you have a mid-range PC with 5GB of hard disk space and download this software you can crack 99.9% of Windows password hashes in a few minutes".
†For a lot of pre-computation effort you can get all the effectiveness back. But that's not important here.
That was a fun one. There were lots of lessons from that that I used in my security awareness training. The bottom line was that all this stuff was in an essentially-unencrypted backup file.
There were other lessons too, but that was the main one.
I don't know why you are base64 encoding things before sending them to the DB. Just use a binary blob.
scrypt is a hashing algorithm, not an encryption algorithm. So that doesn't seem like what you want.
LZ-String is a compression algorithm, that's trivially reversible.
The problem you'll run into is that the "right" encryption algorithm is constantly changing. Probably the best thing to do would be to not do it yourself, instead, rely on encryption capabilities of your DB if provided. If you opt to doing your own encryption, then be prepared to constantly maintain and update it. Pushing that responsibility onto the DB makes that burden a simple "make sure your DB is up to date" Which you should be doing anyways.
If you use scrypt, you won't be able to get the file back out ever again.
If you use lzma, it's compressed, not encrypted, and anyone who gets the compressed data can trivially decompress it.
Consider using an nacl secretbox if you need to encrypt it.
Consider very carefully where you'll store the password.
Does the user input it each time they access the file (in which case use a key-derivation-function on their password)? Do you encrypt it with an application key? Per-user key stored in vault? Do you have a hardware encryption tool available (an HSM) or something like aws KMS?
Your encryption is only as secure as you keep the encryption keys.
Yes I did read the article, It's not a password, it's a PDF file, it's contents I store them as plain base64 now, but I'd rather have store them in a more secret fashion in case of a leak...
for the encryption itself, you can use e.g. LibSodium or AES (in a sensible mode!!!). of course, you'll need a key too, which is where you can use a KDF. but without knowing more about your threat model, it is hard to tell you how to do the key derivation.
Why is the auth database a part of the application to begin with? Why is it not externalized away behind a single service with a bare minimum "set-password/is-this-my-password/trigger-x" endpoints exposed to the application with per route/per source/per question rate limiting? This is 2020, not 2001.
Because while service oriented architecture is currently "hot" it's not a requirement for many systems and monolith is still king in most corporate environments.
It's not just because the decade counter changed that everyone must rewrite their systems using the latest fad.
If you have a single database, located on a single server you have a single security domain. No amount of hand waving is going to create a magic security boundary that will not be crossed by the most banal part of the application.
One thing that always annoys me when I read a Troy Hunt blog post or when he tweets something is how stubborn and unable to admit that someone else said something clever or that he did something wrong.
The comments of this blog post have some really really good suggestions how someone can much easier explain a hash that even children would understand:
> A hash is like a fingerprint of your password. Your fingerprint tells me very little about you. You might be a man or a woman, tall or short, young or old. But if you show up, I can tell that you still have the same fingerprint. Likewise a hash doesn't tell me your password, but if my computer keeps guessing them, it can know when it is right. And my computer guesses really, really fast.
Troy completely dismisses this because it's not 100% scientifically correct in the analogy. IMHO it's good enough for the layman to understand though which is the point of his blog post no?
It strikes me that he's unable to admit that someone might have done something slightly better than him, or just give credit where credit due.
Reminds me of so many other blog posts where this happened as well and his infamous claim that HTTPS is faster than HTTP and when people call him up on it he argued that HTTP2 is faster than HTTP1 and going in circles...
Arrogance annoys me so much that I actually dislike reading his posts...
The author thinks passwords are not encrypted on Wattpad, because they ask you to change passwords on other sites where you have used the same password. The reason is more likely that Wattpad uses a common encryption algorithm. The hackers chance of gaining access to your other accounts using the same password is now much higher, especially if you use a relatively easy to guess string. Because using a fast computer they can run through common encryption algorithms with each guess of your password at their leisure... Once they have a match, they can login to your other sites.
Also hashing is nothing like encryption. To clarify hashing, is used to map very large data types to far smaller ones of just a few dozen (or few thousand) character strings, for the sake of speedy lookups. Hashing algorithms are not one way, you can determine the possible input values used to generate the hash. In the case of (relatively speaking) very small input data types like passwords, there is likely a 1:1 ratio of input password to hash, making it effectively a plain text password. So why would anyone use it? What evidence is there to suggest Wattpad would use an obviously ineffective method for encrypting passwords?
So the statement, "a password hash is a representation of your password that can't be reversed" and other statements about the security of hashing, is just incorrect - for passwords. I agree that hashing is statistically 1-way for large inputs like big strings or images, since then the possible inputs that match to a particular hash is very very large. It is not so for small ones, like passwords, where the data type is smaller/similar in size to the hash range.
You're confusing run-of-the-mill hash functions used e.g. for hash maps with cryptographic hash functions which have an entirely different purpose and therefore different characteristics (and conversely, using SHA256 to hash your dictionary keys would be totally overkill).
Cryptographic hash functions are one-way in the sense that (unless they are shown to be broken) it is not believed that there is a computationally tractable way of recovering the unhashed input from the hash. You can always brute force it, but even "just" for passwords, the input space is way too big for that, especially if you use a salt - unless you use a very common password, of course (even if limited to 26 letters and 8 characters that's over 200 billion different passwords).
So, most of your comment is just wrong, except for the part about how Wattpad stores passwords, which is just very likely wrong, since I can't prove it.
> (even if limited to 26 letters and 8 characters that's over 200 billion different passwords)
Of course, if you're using a fast cryptographic hash function then someone with a good GPU can throw a surprising amount of brute force at the problem. A Tesla M60, which is five years old, can do about 1.4 billion SHA-256 hashes per second; Amazon will rent you a server with one of these for $0.75/hour, and cracking a single 8-character lowercase alphabetic password should take about 2.5 minutes at most. Half that on average.
(Functions like bcrypt help with this by being designed for slowness, and newer ones like scrypt also try to use large amounts of memory with unpredictable access patterns so they're harder to accelerate with GPU/FPGA/ASIC hardware.)
This is all true, and this why we have bcrypt etc., but it was just an example for illustration.
In real-life situations, the input space is much larger still (if you account for longer passwords, case sensitivity, special characters etc.), as long as your passwords are truly unguessable. And at some point, even a very fast hashing algorithm won't be able to keep up. I don't know the exact calculations, but I would expect MD5 to still be hard to break for truly strong passwords (say, 20 random characters). The problem is more that people don't actually choose strong passwords most of the time.
Because once you salt it (use a different salt for each hash, which solved your collision issue) and purposely slow it down (through multiple rounds,etc) it effectively ruins precomputing the hashes because of how many permutations there are.
Of course this depends on which hashing algorithm you use. Md5? You can do millions a second. Bcrypt? You can make one hash take 100ms, that’s 10a second. They also use a random salt for each hash - so precomputed hashes will only work for that hash.
The issues you highlighted are very real for md5, but there’s a reason md5 is not recommended for passwords.
They care (sometimes, sometimes they don't act like it, see https://www.ieee-security.org/TC/SPW2020/ConPro/papers/bhaga... ) that it was compromised.
An end user doesn't have any control or knowledge over the password storage mechanism for any sites, so the best thing is to use a strong random password generating password manager--because that is something the end user can control.
However, for the websites managing passwords, I'd suggest reviewing the NIST guidelines: https://nvlpubs.nist.gov/nistpubs/SpecialPublications/NIST.S...