Hacker News new | past | comments | ask | show | jobs | submit login
SQL injection with raw MD5 hashes (cvk.posterous.com)
125 points by philfreo on Nov 28, 2010 | hide | past | favorite | 36 comments



Summary: binary MD5 hashes are, in fact, binary, and any given MD5 hash has a 1/16 chance of containing a given SQL metacharacter. The CTF challenge here required the calculation of a hash containing a whole 4-character injection string; this is harder, but not Hard.

You should know that the exact same problem applies (moreso, in fact) to encrypted strings. Developer laziness insulates most apps from raw MD5 digests (most devs use the hex digest function, which returns human readable [and safe] output). But the exact same "iterate 1 bit at a time until you hit a jackpot block" trick works with almost every application that uses AES.

It's not just SQL, either; I've used it to get XSS out of corrupted AES decryptions as well.

And, of course, the same trick works in the opposite direction. If you encrypt a string with a quoting domain (say, where the character ';' needs to be quoted '\;' to preserve an encoded tuple), attackers can pad inputs to get the quote and the metacharacter to span blocks, then use block corruption to kill the quote but not the metacharacter.

Have I mentioned today how you shouldn't be building crypto code? DON'T DATE ROBOTS.


It just feels like it's getting hammered into programmers' minds to never use unescaped strings in DB queries all the time. Quite hard to believe that a decent programmer would concat such raw output to an SQL string unescaped. Nevertheless I'm sure this happens all the time.

The interesting thing about this report is how fast you can brute-force calculate this in order to find a magic byte sequence.


The winner managed it with a 3-char injection: http://blog.nibbles.fr/2039


> It's not just SQL, either; I've used it to get XSS out of corrupted AES decryptions as well.

How would you go about such a thing? My initial thought is to simply permute over the controllable bytes and see what comes out, looking for specific characters that get you what you need, e.g. a quote to break out of an HTML attribute, but I think I might be missing an easier path.


Nope, that's how you do it. In the case I'm thinking of, I only needed one character (but it needed to be the last byte).


I wish there was an easy answer to encryption and security, but it seems like an arms race a la "The Cassini Division". No matter how you build a system there are always more inputs to compromise... and the attackers never give up.


The easy way is to not use a stupid function like md5(?, true) which returns binary data, and include it without escaping in your SQL.

It's really not rocket science.


Exactly. It's so damn easy to not make these mistakes. I don't know how people manage it.


Is it easy to enumerate all the potential attack points in a system?


In all my years of dealing with ugly and insecure PHP code, never have I seen anyone using raw md5's in PHP

Sounds like a fabricated situation just for the game to me, versus something trying to resemble real life situations. Not saying its not possible, but the type of coders that would make a SQL injection oversight with something like that would also not care to provide a non-default raw_output value to the function for the heck of it


I am more impressed by whoever came up with the challenge, although setting the second parameter of md5 to 'true' made it obvious what the solution to the puzzle was (nobody would ever have binary output on irl).

The French team came up with the solution quicker as they worked out that any '=' would work - their keyspace search was thus an order of magnitude quicker:

http://blog.nibbles.fr/2039


I thought it required '='[^1-9] ? Because real 'password' = 'bogus crap' would turn into 0 which would then be compared to a string starting with something else than 1-9 which gets turned into a 0 as well.


This might be only tangentially related, and I'm sure nearly everyone on HN knows this, but salting MD5's is probably the easiest way to significantly increase their security. By changing md5($var) to md5($var . "foo") this type of attack could be prevented, assuming that the crackers don't have access to the source code [in this challenge, they did].

To get an idea of what can be exposed if you don't salt, and thus use predictable hashes, have a look here: http://www.google.com/#q=inurl:c4ca4238a0b923820dcc509a6f758...

md5(1) = c4ca4238...


+1 to prodigal_erik

Also concatenating "foo" to the end of /every/ password is not salting your hash. You need to have a suitably random string for it be considered salt. Also you want to have a different random string for every password in your database.

[+1 to tptacek for constantly saying don't write crypto code :-p]


I should have noted that "foo" wasn't meant to be taken literally; clearly it should be something that wouldn't show up in a dictionary. Even so, the point was that any salt is substantially better than none, and is easy to implement.


There are theoretical attacks against simply adding secret bits to a single hash. You're probably better off with HMAC, which is essentially md5($key . md5($key . $message)) with a few padding details.

http://en.wikipedia.org/wiki/HMAC


"Salting" is not the solution to every problem. More to the point, if you put binary data in an SQL query things are going to break, even if you use tricks like this to make it a bit less predictable. (Really, the probability of random binary data containing a ' character is pretty high...)


And it looks like an even faster solution, here, but in French. http://blog.nibbles.fr/2039



PHP contains filters for a reason. They are there to protect you.

While this was a clever challenge, in general you should sanitize all of your input using filters.


Filtering of input is not appropriate way of ensuring that SQL queries are safe, e.g. you'd be telling people named "O'Hara" that they're not SQL-compatible or you'll fall into magic_quotes trap.

You should escape output.

In this case there isn't anything wrong with the input, and you can't say that any byte of MD5 is wrong or dangerous.

It's just data that is used improperly.

Binary data in SQL queries is especially tricky, because strings may be expected to be in specific character encoding (e.g. UTF-8), so you either need to use prepared statements and pass argument explicitly as binary, or use string-safe escaping (e.g. hex).


In general, you should not use PHP, but, If you must, you should be using a framework that handles this kind of stuff for you.

Input sanitisation has been solved thousands of times by people smarter than you. Do not reinvent the wheel (poorly).


magic_quotes is turned off by default for a reason too.


This is all quite clever, but any wary developer makes sure to use either properly sanitized inputs, or better still, prepared SQL statements (as f.e. available through PHP's PDO method), so this trickery, no matter how clever, is really not a big issue.


"Any wary developer" ranks up there with "a sufficiently clever compiler" among the True Scotsmen of our industry. I don't think you're right; empirically, unsafe SQL is a very big issue, and the places where a QA team is unlikely to stumble across injection vectors (ie, any vector where the string "O'Malley" doesn't pop up an error to paste into a ticket) are worse still, since teams don't find them.


i think you don't look outside your own "boxeola" too much. no all development is done by a) clueless team that passes the software on to b) quality assurance + bughunters. relying on or assuming that someone else will take care of eventual loose ends is a very bad and ignorant approach to creating software, and in my soon 15 years of experience as a european programmer this is thankfully no where near the majority of cases.


I think one key difference is in when the software was developed. If I had to guess, most software developed in the past 5 years (being conservative here) is most likely not vulnerable to SQL injection, simply because doing things the right way (that is, using parameterized queries and the like) is so easy these days. But if you look at software developed prior -- and software built at a lower level, e.g. directly hitting the PHP mysql API -- you've got a fair shot at stumbling upon SQL injection, I'd say. That means that most developers writing say, Django or Rails apps, are never going to have to think about SQL injection, and won't see how prevalent it is. In theory, SQL injection is a solved problem -- we've known how to avoid it for a long, long time -- but in practice, it's still out there in full force, largely due to legacy code.


Parameterized queries are also simply not the panacea they're made out to be; there remain plenty of opportunities for injection even with a parameterized query. SQL injection via MD5 digests are indeed unlikely, but not because of query structure --- rather, because most developers don't know to take the binary result instead of the more convenient hex result.

It just rubs me the wrong way when people claim "if you do things right, you'll never run into this problem anyways". Two possible interpretations: either "do things right" is too broad to mean anything (no-true-Scotsman-style) or "do things right" involves a piece of advice like "used parameterized queries" which doesn't actually work reliably in the real world.

We have absolutely found SQL injection in Rails code before. I've also had engagements on very modern financial codebases where the developers were able to expound at length on how impossible it would be for them to have injection --- providing entirely sane design rules to back it up --- that ended up losing their entire app to pre-auth SQLI. There's always some tiny corner of the app --- a custom query builder, a hand-hacked pagination system, a sort column generator, a table selector parameter, that one stored procedure that does dynamic SQL and doesn't know what U+2032 is --- that manages to slip up.


> Parameterized queries are also simply not the panacea they're made out to be; there remain plenty of opportunities for injection even with a parameterized query.

Can you describe a case or point to example code using a parameterized query that is vulnerable to SQL injection? I've seen a stored procedure that built raw queries and pass them to sp_executesql (T-SQL) that provided a vector for SQL injection. However I am struggling to think of a case where a parameterized query could allow for SQL injection.


Not every input to a query can be bound as a parameter.

Simplest example: User.find_by_sql("SELECT * from users where name = ? LIMIT #{ limit }", name)

Other examples: sort order (ASC/DESC), table selection, join columns, GROUP BY argument.

If you think I'm arguing against parameterized queries: of course not. Use them. But know their limitations.


For those who didn't know what tptacek meant by U+2032, like me, a quick Google search picked up this: http://news.ycombinator.net/item?id=1940713 , in which he explained it three days ago: "This is a classic web security problem; most famously, WinAPI systems have a 'flattening' function that would convert things like PRIME U+2032 into ASCII 0x27 (the tick that terminates SQL statements)."


I am not so sure of that. There is lots of code that is not so much developed as it is evolved from a Q&D n-liner (n<20). Such code can easily have some left-over opportunities for SQL injection attacks.


We're basically paid to look in other people's boxeolae.

Sorry, if your own personal experience suggests that most software is rigorous with SQL and input validation, I think I have to assert a broader and more accurate perspective on the issue. It feels statistically improbable to me that we're just missing all the "good" software.


[deleted]


[deleted]


Sorry, I actually intended to reply to the parent of your comment -- reposted and deleted my response just after you commented.

Parameterized queries aren't a panacea, but they seem to mitigate most of the issues. That said, a lot of people rely solely on them, thinking they really are. While certain things make the vast majority of attacks infeasible (e.g. using SQLAlchemy rather than any straight SQL), nothing can replace actually knowing about the security issues and designing solutions that are immune to them.


this could be a case of demographical difference. programming culture and methods of approach is vastly different in europe, the americas, and asia as indeed often told by migrating workers in our field. consequently we can assert that it is equally likely that it is just you who have a narrow and selfserving perspective of your business, and that there is more diversity out there than what you specifically deal with for your living :-)


I know there was already a shorter solution but

instead of '||'[1-9]

one could have searched for

'!=' or '<>' thus opening up the search space




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: