I saw a very interesting talk last year from someone who, as part of a company's security team, had set up a system that continually attacked the hashes of every employee's Active Directory passwords. If one was cracked, the employee would receive an automated email with a note containing the last few characters of their password and a suggestion to change it.
I recall they also spoke on some security aspects of the system's design, like how the cracked passwords never touched disk and had to be destroyed as soon as possible, etc.
I wish I could find a recording or a writeup on this somewhere, as I thought it was a pretty cool (and effective) approach.
I used to work at a University in the UK. One of my responsibilities was the email system. We constantly suffered targeted phishing attacks where the sender pretended to be from the IT department and required the recipient to respond with their password, for various made up reasons. Our spam filters captured most of these on the way in, but some still got through. And people replied. People replied all the time. Students and staff. Even after being constantly informed about the problem and told to not reply to such emails. When they replied, the attacker would log in and start sending out spam.
Anyway, the format of our passwords was quite strict. I don't remember the exact rules, but it required "special" characters and lower/upper case letters and numbers and a minimum length etc. So what I did was write a system to scan all outgoing email. It would search that email for all strings which matched our password pattern. It would then attempt to authenticate against the Active Directory with each of those strings. If any succeeded, it would block the email, and the person sending the email would get a response telling them to not send their password via email. I would also be Cc'd in, so I could keep an eye on it.
I was reading about something similar for CI servers to compare the stdout/err to the values of all initialized secrets and if any strings match, filter them out. It's a simple way to block "echo $SECRET_STUFF" from publishing to a public log. It doesn't catch everything as it'd still be possible to curl out to transmit information, but it does work quite well for the more common case of "*Let me dump all environment cars to debug why this isn't working...".
Did you consider regularly sending phishing emails yourself, and automatically call out anyone who reply anything at all to them? I mean, you won't catch quite as many people, but eventually most will learn, one would think.
Yes. Second solution is much better if you can do it: faster, more efficient, more accurate, but it depends on using an authentication mechanism which gives the server access to the plain password during authentication, so it will not work in all situations.
What if I send an email with a partial dictionary, hidden in an invisible HTML block. If the user is blocked from replying, I know their password was in my original email.
While I can envision a case where you might be able to make this work for specific configurations of the blocking and specific bounce handling and from address forging, while it would allow you to use to the resources of the remote org to do most the checking, it would also require fairly intimate knowledge of quite a few configuration variables. So may, in fact, that you would probably already have an administrator password of some sort if you were able to carry it out. :)
Story time...please excuse the tangent, related to the above comment.
In the mid 1990s, I was the computer security officer for the 81st Medical Group in the USAF, which is the proper name for a rather large DOD hospital in southern Mississippi.
Though it was 22 years ago, the hospital was almost completely paperless. Every member of the staff, from doctors to orderlies, used one of the 10,000 or so VT320 terminals spread across a dozen or so building in the campus and beyond. Needless to say, on an average day, a person would enter their userid and password many times. Many of those accounts were very powerful, because we were networked with the rest of the DOD's medical records. One example report I ran with a doctor's account credentials was 'List everyone in the DOD, past or present, who is or was HIV positive.' ('Was' because the person could be dead.)
Furthermore, this entire system was reachable via the Internet, via AFIN (Air Force Information Network).
This probably strikes anyone reading this as...kind of nuts. And by today's standards it certainly in. But 22 years ago, most people weren't thinking in those terms.
The implications did freak me out a bit, once I took the job, and though I didn't have the power to do much about it structurally, I could do some things to improve password security.
So I had a dedicated (dating myself here) Pentium Pro Linux server that did nothing but run password attacks on our entire authentication database. On top of that was some automation I wrote that, once an account's password was guessed, would send automated e-mails, daily, to the account holder and their manager.
If the password wasn't fixed in a week, then their account would be automatically expired, forcing them to pick a new password.
The system didn't stop them from picking the same one as before, which people frequently did, but the automation was smart enough to expire their password again the next day without the grace period if that was done, which was annoying enough to get people to stop that practice.
This was rather...unpopular...among the staff. But I had that little 'HIV Positive Report' presentation I mentioned before. I said the account I ran that report from was behind the password '1234', and that anyone in the world could have logged in, run the report, and published the results. The thought of that spooked even the most technically and security clueless medical types.
Scare tactics? Yup. But sometimes scare tactics are justified.
Last time I submitted something I thought was really interesting here (https://news.ycombinator.com/item?id=12652035) I don't recall it getting anywhere. But maybe the 10th anniversary of Hacker News really wasn't that interesting. (:
Many of those accounts were very powerful, because we were networked with the rest of the DOD's medical records. One example report I ran with a doctor's account credentials was 'List everyone in the DOD, past or present, who is or was HIV positive.'
Uh, um, wow. That's a pretty serious abuse of privilege you're admitting to, even by 1990s standards. I'd lawyer up if I were you. Shit-storm incoming...
The sysadmin in the CS computer labs at my uni (Leeds) was doing this in 2002.
Changing cracked passwords was compulsory, you got an email that just said (paraphrasing) "Your password is crap" and then were forced to change it at next login.
It was quite amusing, and instructive for the first year students. I seem to remember it happening to maybe 15 to 20% of people when I started, even though everyone was warned repeatedly. And this was with hardware and hashes from 15 years ago. Many (most?!) websites were still storing passwords in plain-text and didn't use TLS for log-in forms in those days!
That's a cool idea, but wouldn't it be more efficient to use something like zxcvbn to estimate the strength of new passwords and reject weak ones? That way you're not wasting electricity running a GPU array at full tilt 24/7.
It might not catch the kinds of things that seem strong but end up on word lists. `correctbatteryhorsestaple`, and even more so `correctbatteryhorsestaple1` or `correctbatteryhorsestaple!` would probably pass a "strength" test with flying colors, but you bet it would get cracked in a moment by any script kiddie with a word list.
I remember seeing a list of cracked passwords and one of the ones they got was !QAZ2wsx#EDC4rfv%TGB6yhn. It passes every single password strength checker and dictionary word checker in the world, and still gets cracked.
zxcvbn actually correctly identifies that password as being the result of the user hitting multiple adjacent keys on a standard QWERTY keyboard. Still passes the strength check though because it doesn't identify the more complex pattern of moving top to bottom, then left to right across the keyboard, alternating shift with every column.
This also shows that measurements of entropy are always relative to a particular party's knowledge, which is an interesting concept. (When we say that we're measuring the uncertainty that an attacker -- or message recipient -- has, that naturally depends on what the attacker or message recipient knows.)
Usually in entropy estimates you assume the attacker has full knowledge of the password generation method used. (Kerckhoff's principle.) In reality, most attackers won't know what generation method you used, but it's better not to rely on security by obscurity when it comes to passwords.
zxcvbn accounts for the use of word lists. (And keyboard patterns, and common dates, and repeated characters, and a dozen or so other common patterns you probably haven't thought of yet.) Try it yourself: https://dl.dropboxusercontent.com/u/209/zxcvbn/test/index.ht...
And in fact, four random words is actually quite strong. The XKCD comic that password is taken from accounts for the use of word lists in its entropy calculation. In fact, it even _assumes_ the attacker knows the exact 2048-word dictionary you're selecting the words from. Even under those assumptions, four random words is _still_ a pretty strong password.
But a brute force test like the parent comment described wouldn't catch that either, unless it had 'correctbatteryhorsestaple' as a word in one of its dictionaries. And if you're going to go that route, it's just as easy to put 'correctbatteryhorsestaple' in one of zxcvbn's dictionaries.
Any common password pattern you could catch via brute force could also be detected via zxcvbn, except that zxcvbn would be much faster and more efficient at it.
Yes, the info I was missing, which you provided in your first reply, was that zxcvbn does use word lists. I should have acknowledged that in my reply, thank you.
It may have improved, but a few years ago at least zxcvbn was implmented by repeatedly trying various password "simplifications" at some kind of entropy cost. At the time it was quite easy to construct long (something like 60-char - our limit at the time) pathological passwords where a single zxcvbn check would take at least a minute of CPU time, with strongly super-linear growth for each additional char.
Might have been improved by now; not sure. If it's not you might be wasting electricity another way ;-).
Where you use this is important, too. Accounts that can provide remote access or admin rights should be scrutinized heavily, whereas an office temp with an email address (and no access from the outside world) isn't much of a threat.
Not true. When that email address gets broken into, it'll get used to send spam, making your mail server a spam source, and making it harder for all the rest of your mail to reach people's inboxes.
Not to mention producing more spam for the rest of us.
If someone has broken into your internal network, harvested or brute forced credentials, and is now sending emails externally, you have WAY bigger problems than spam.
checking against the most common password lists (with some masking, maybe?) achieves _most_ of this, but you'd definitely need to do an offline attack to do better than that, on the order of an hour or so of GPU time per account. It wouldn't be trivial!
I don't trust myself enough to manage passwords properly for some small services I run 'cos I simply don't have the spare time to invest compared to investing in functionality.
For that reason I've been trying out a password-less login for a while now (works via email) and so far non tech folks haven't complained too.
It is pretty much as though you always used the "forgot password" mechanism to login.
That's a really neat solution, and avoids the cognitive overhead of having to remember yet another password (or the security risk of re-using passwords). I particularly like the way you tie the log-in token to a particular browser session so that it can't be hijacked!
Plus by merging all of the log-in paths (registration, 'forgot password', and normal login), you have one thing to design and secure rather than three. That seems like a huge advantage from a security perspective.
I am not an expert on password hashing but I was wondering why can't the websites hash their passwords twice using two different hash algorithms. That way when the hashes are exposed, the attackers have to go through two algorithms. Is the time complexity increase only marginal that people don't do this ?
"In this paper, we study the existence of multicollisions in iterated hash functions. We show that finding multicollisions, i.e. r-tuples of messages that all hash to the same value, is not much harder than finding ordinary collisions, i.e. pairs of messages, even for extremely large values of r. More precisely, the ratio of the complexities of the attacks is approximately equal to the logarithm of r. Then, using large multicollisions as a tool, we solve a long standing open problem and prove that concatenating the results of several iterated hash functions in order to build a larger one does not yield a secure construction. We also discuss the potential impact of our attack on several published schemes. Quite surprisingly, for subtle reasons, the schemes we study happen to be immune to our attack."
In that paper they're concatenating two hash outputs which obviously (right?) only makes it easier to reverse the hash - the attacker need only pick whichever hash he can work with quicker, and find the preimage for that. Given the fact that password entropy is much lower than hash entropy, the preimage thus found is almost certainly the original password, not a collision - rendering solving the other hash preimage moot.
If you want to force the attack to go through two hashes, you'd use functions sequentially (e.g. F(G(input)), not F(input)+G(input)), which the paper at least initially doesn't talk about. I didn't read the whole thing, mind you...
Note that password-stretchers like bcrypt/PBkdf2 use only fairly small extensions to this idea, so clearly in general the construction isn't known to be flawed.
Shouldn't the reduction is input space for the subsequent hash functions actually make it easier to find collisions ?
Or is finding collisions not closely related to input space ?
The paper is a bit above my level of understanding and I tried making sense of how the cryptanalysis is done to no avail.
There's no actual need for two algorithms -- you can just hash them with the same algorithm twice. This is essentially what PBKDF2 does, hashing the password a configurable number (thousands) of times. It's a little more complicated than that, but that's why you should not just use a hash, but a KDF like PBKDF, scrypt, bcrypt -- they'll do what you're suggesting and more. They are exactly designed to make it a bunch of work for an attacker to guess-and-check your passwords.
Instead of the added complexity of implementing multiple hash algorithms, if you're using something like bcrypt or PBKDF2 you can just increase the work-factor which makes the attacker (or indeed your application) do more work to calculate the hash.
There's a risk, depending on your usecase and traffic levels that if you crank work factors too high, you can impact the users perception of your performance (e.g. a login operation might appear slow)
Which is actually better in terms of time required to brute force(as in which takes longer) ? Two different fast algorithms with moderate work factors or one algorithm with a pretty high work factor ?
well AFAIK you can keep cranking the work factor as high as you like, so realistically one algorithm with a high work factor is likely to be better as it's a simpler thing to implement and has no drawbacks in terms of security.
I think that's more security through obscurity rather than difficulty.
Once the attacker figures out what the 2 hashing algorithms are, the scenario basically becomes the same as cracking hashes of 1 algorithm of increased difficulty (through number of passes)
So, like the other answer implied, the increased complexity of maintaining 2 algorithms might not be worth the obscurity trade-off in the end.
However, I am not a security professional either, so perhaps my opinion is not comprehensive enough.
I'm not sure how modern hashes fare in this regard, but one issue is that hashing a hash reduces your input space from all possible passwords to all possible hashes of the first hasher.
Technically, it does actually. Suppose we use SHA-256 as first hashing algorithm, the input space for the second is effectively reduced to the 256 bits of output from the first which is much smaller compared to the input space of the first algorithm.
38 characters of ASCII gibberish (all 95 printable characters) gives you less than 2^256 possible passwords, and pretty much nobody uses a password with more entropy than that.
The problem is that you aren't just dealing with "any output from the first hashing algo", since passwords are going into that algo in the first place. A hashing algo can't add entropy, but collisions can reduce it.
An undetected I/O error is vastly more likely than even one accidental SHA-256 collision among billions of passwords. It's not quite literally impossible but I'd bet my car I never see it happen.
You are correct that there are 10^8 possibilities for a string of 8 digits.
10 possibilities in position 1
10 possibilities in position 2
...
10 possibilities in position 8
10 * 10 * 10 * 10 * 10 * 10 * 10 * 10
Alternatively one can even simply observe that 99999999 is the highest number possible and since 00000000 is possible also then we have 99999999 + 1 different possibilities = 100000000 = 10^8
I built my latest application using Amazon Cognito for user management. My application and database don't ever know anything about the passwords. Amazon's problem.
You have two choices: try and do it on your own, or delegate to someone who you think can do it better. There are risks either way. Considering how much data Amazon has, they probably invest significantly more than you (or OneLogin) on security.
While letting them handle the security options is probably going to result in a more secure system for you, it's certainly not "Amazon's Problem" when your database gets leaked and your user data gets out. For example, you're still going to have to explain to your users that you were compromised, and you're still going to show up in the haveibeenpwned list, not "An AWS Cognito Account".
I'm comfortable using passwords <20 characters distributed among a range of sites because I have a realistic view: if one gets compromised, not every account does, and most accounts are not critical. Some are luggage keys, some are Medeco.
But those are bad comparisons. A key and lock is an asynchronous single use authentication+authorization mechanism. Passwords are just the authentication part, so trying to replace these just requires we have a secure way to authenticate ourselves.
We have the benefit that we are using digital systems, so our authentication can be digital, too. We can also rely on multiple factors to improve how authentic this process is. Biometrics, digital files, access to other accounts and networks, offline code generators, and personal information all provide lots of authentication data and multiply the effort needed to defeat the system. By combining all these factors, we can create a new digital key that is far more difficult to defeat than old methods by themselves, and ultimately is more flexible because it can be made up of any of these things.
The problem mainly seems to be that we live in a world of different locks, and most locks don't accept this particular kind of digital key. We've hacked around this problem and made some attempts at more compatible solutions, but they really fall short of their true potential.
In the future, you should simply be able to use any system and know that it will authenticate you in a way that can't be copied or cracked. Today that just isn't the case. So for now, maybe we should move the goal posts. We can keep making our keys more unwieldy, but we can also get more guard dogs.
The guard dogs need to exist not only to protect the locks, but the keys, too. If you go to unlock a door, a thief can knock you out and steal your key. Each aspect of our digital access needs guard dogs. We can no longer accept insecure communication methods, nor insecure computing platforms, to exchange our authentication. I think the real challenge going forward is rethinking how we process data altogether.
It's a lot easier than all of that. New two-factor authentication standards like U2F achieves most of that with just a simple, inexpensive hardware token.
U2F is cumbersome when you have to travel and change phone numbers frequently. In many cases there is no real way to have a line to one of your authentication methods, since you can never get the SMS confirmation.
U2F does not require SMS, and that's what it's designed to work around: problems inherent to "traditional" 2nd factor authentication. It does this by a secure connection from your browser (which does a challenge/response with your U2F token) to the server. You can connect your U2F token to your phone and auth on any network without SMS.
But U2F is really only a stopgap technology designed to provide a better mechanism than SMS or TOTP. There are still difficulties users will find with this mechanism that are problematic to secure or make less cumbersome, slowing adoption and security in general. And U2F still has several attacks that will work against it, making it somewhat trivial for malware to take over an account.
I envision a future where not only are there many factors we can use to authenticate, but that we might never need to "reset" our accounts again. That the majority of attacks on the user could end, and that servers will be more resilient to both general attacks and specifically data exfiltration. And that the data we use to secure accounts on the server can't be reused. An almost secure technological world.
This requires implementing strong security measures in all of the computers we use today. It also requires the adoption of universal multi-factor authentication methods, and a methodology to protect them from abuse by attackers. You can't get there by tacking more complicated mechanisms onto computers that are already not secure.
Suuuper nitpicky, but in the paragraph directly below Dark Helmet, Jeff calls his Graphics Card a 1080 GTX Ti. The GTX goes in front of 1080, since GTX is the general product line.
You are correct. It is a mistake. It's also confusing naming, because not too long ago, nVidia named their cards with GT and GTX as a suffix, such as the GeForce 8800/9800 GT and GTX.
What this shows is that even with best practices passwords are a fairly weak security control. We need a standardised second factor id.
The FCC or corresponding body elsewhere should mandate that phone networks and phones support a secure messenging protocol which could guarantee that a message could be sent to a phone number and only be received by that device.
Password-only authentication is like locks on luggage, even with best practices.
I'm not sure why you think this article shows that with best practice passwords, it's a relatively weak control?
From the article a user choosing a random (i.e. not in wordlists) 8 character password with upper/lower/numeric characters could expect an attacker to take 3 years to crack the password (and that's attacking one hash!)
Now to be clear, I totally think that passwords are a bad idea (mainly because humans aren't well equipped to choose and manage large numbers of random strings) but I don't really see why this article advances that concept?
A very motivated attacker, or one with a sophisticated set of wordlists and masks, could eventually recover 39 × 16 = 624 passwords, or about five percent of the total users. That's reasonable, but higher than I would like.
Sure so the attacker has entirely compromised the site in the first place, offline brute-force only works where the attacker has already got a copy of the database.
They've then eventually got access to 5% of the user's passwords, and the one's they got access to were all based on dictionary words...
Assuming that the site has any level of reactive/detective controls, they've noticed the breach and invalidated the passwords, thus rendering them useless.
What I'm saying is the offline password cracking times demonstrated in this article don't seem to indicate any more weakness in the use of passwords than was already known. the percentage of attackers who are going to bother with hitting a PBKDF2'ed password database on a forum site with any level of dedicated cracking past a run with some dictionaries, just isn't that high.
Attackers are drowning in existing compromised password databases already many/most of which exposed clear-text or weakly hashed (MD5/SHA-1 with no salt) passwords, so realistically speaking the incentives for getting another set where you really have to work hard to get them, isn't likely to be that high unless it's a high value site.
If you want examples just look at the lists on https://haveibeenpwned.com/ 500 million accounts with cleartext passwords from a single dump is at the top of the list.
That said, I admire discourses efforts to move the bar higher by increasing complexity and blocking weak passwords, all helps people move away from less secure alterntives
strong password hashing definitely has it's place as part of overall app. security.
Where I think many/most applications would benefit from more security is in detecting/reacting to attacks.
Most apps have no controls in this line at all, and make an attackers life very easy in that they can keep trying vast numbers of attacks without being blocked by the application.
Most of those passwords that got cracked, my reaction is, OK, of course that's a weak password... but "1qaz2wsx3e" and "A3eilm2s2y"? Geez! How'd they get those?
Would it be a bad security practice to keep a database of the SHA hashes of maybe the 10 000 most common passwords then alert users who try to use them? Obviously you would do the comparison before applying your actual bcrypt/PBKDF2 function with salt.
Presumably more is always better, but there's a very long tail of passwords so the hit rate will drop off a cliff, and now you're storing 5x as much data for increasingly questionable benefit.
Encrypting the hashes in the database would make it safer. That way the password hashes can't be attacked in this way unless they can decrypt them first.
Encrypting passwords wouldn't add a lot here unless you're using some mechanism to protect the encryption/decryption key (like using a Hardware Security Module), as an attacker who compromises the database is likely to compromise the key at the same time.
If you do have a hardware security module, then why not just do away with hashing altogether, encrypt the passwords with AES-128 and you'll likely be fine (as long as the attacker can't extract the key from the HSM)
It's slightly useful if you only give the key to your application servers, and not your database servers. Now you need an application server breach and not just read access to a database.
It's not unheard of for something like a decommissioned database backup to wind up insecure and on the internet without being properly wiped, causing a whole-db leak without anyone actually breaking into a production system.
I recall they also spoke on some security aspects of the system's design, like how the cracked passwords never touched disk and had to be destroyed as soon as possible, etc.
I wish I could find a recording or a writeup on this somewhere, as I thought it was a pretty cool (and effective) approach.