Hacker News new | past | comments | ask | show | jobs | submit login
PHP: md5('240610708') == md5('QNKCDZO') (3v4l.org)
240 points by dbrgn on May 4, 2015 | hide | past | favorite | 175 comments



I'm not exactly clear on how PHP == works, but you can see the MD5 for yourself:

    $ echo -n 240610708 | md5sum
    0e462097431906509019562988736854  -
    $ echo -n QNKCDZO | md5sum
    0e830400451993494058024219903391  -
    $ echo -n aabg7XSs | md5sum
    0e087386482136013740957780965295  -
All of them start with 0e, which makes me think that they're being parsed as floats and getting converted to 0.0. This is why "magic" operators like == in PHP and JavaScript never should have existed in the first place. Operators like == should be, by default, extremely boring. PHP's just happens to be a bit more magical even than JavaScript's.


Once I wrote a little PHP application to manage a clan in a browser game. I used an MD5 hash as session id that I checked with if(session_id)

When users started reporting that their logins would sometimes not work at the first time, I found out that strings that start with zero are coerced to 0 and then interpreted as false.

Never used PHP for anything important since.


To be fair, this kind of thing (maybe not exactly this, but type-coercion bugs) can happen in JavaScript, which is all the rage now for "important" stuff.


This is levels worse than what Javascript does though. Most high-level languages have some sort of implicit coercion (even python lets you do truth tests on non-boolean values). The problem here is the programmer isn't confused about types at all. They're comparing two things of the same type: two strings! Nevertheless, given two strings PHP tries to coerce them into ints before carrying out the equality test. Yes, you will have coercion bugs in other languages if you're testing things of different types, but I don't know any other language where a equality test between two things of the same type are automatically coerced into another.


It can happen in a few languages, but PHP is notably more aggressive in trying to convert to int.

Actually a common way to grief new websites is to try to register '0' as a username. `if (string)` is a common way to check for null, and '0' will often fail.


Yeah but javascript has 'use strict' whereas PHP decided that the easter egg "looks like you're using the wrong language!" was more important than actually allowing a 'use strict' to force === instead of ==.


While that's true, JavaScript is still horribly error-prone because of this. The suggestion that JS would be a much better language if the == operator worked more like === in the first place is very reasonable.


I don't think strict mode affects == vs. === Best bet is to use a linter to catch that.


What did you use for the important stuff that was 100% predictable?


zeroes and ones.


This does not appear to the case in PHP 5.6, even for most strings with '==' gotchas:

  <?php
  if ('0e24') echo 'true'; else echo 'false';

  outputs:
  true
As far as I know the only strings that fail an if check are "" and "0". (Which is still a pitfall, but not one you'd hit with an MD5 hash)


Yeah, documentation is for pussies.


Here is one PHP core developer claiming that PHP documentation is wrong, even on fundamental things...

http://www.reddit.com/r/lolphp/comments/2md8c0/new_safe_cast...

Just saying....


But it is not wrong in this case.


PHP doesn't even pass its own test suite.

That's right... they consider certain tests failing OK, a certain number of failures OK, tests failing nondeterministically OK.

Just saying... Who gives a shit about programmer headaches? ;)


> I used an MD5 hash as session id

> Never used PHP for anything important since.

The problem here isn't PHP, the problem here is you.


Nah, the problem is PHP.

See: http://blog.codinghorror.com/falling-into-the-pit-of-success...

> When you write code in [PHP], you're always circling the pit of despair, just one misstep away from plunging to your doom.


I'd be willing to say this is true for any language in varying ways.


But to widely varying degrees. This kind of problem is a direct consequence of having a relatively weak and dynamic type system (or other semantics that mean you might as well have).

Plenty of people have warned about this kind of danger for a very long time. However, there seems to be a significant subset of the web development community that only has experience with languages like JS and PHP and to a lesser extent other dynamic languages like Ruby and Python, who simply fail to realise how many of these bugs should have been entirely prevented by using better tools by now. The usual counter seems to be something about unit tests, at which point anyone following the discussion who actually knows anything about type systems and the wider world of programming languages dies a little inside.

It is entirely fair to criticise bad tools for being bad, particularly in specific ways and with clearly identified problems that can result as in this case. It's bad enough that we are stuck with JS for front-end web development these days, but there aren't many good arguments for using something as bad as PHP on the back-end in 2015.


And what is gained from doing so? We must remain critical.


No, the problem is using a shitty function like MD5 for any practical purpose.


The hash function is completely irrelevant to this bug - whether you use a hash that returns 0 for every input, or invent a hash function that returns a perfectly unique and unpredictable hash for all inputs, PHP will still shoot you in the foot.


I didn't hear him blaming PHP. Defensive, are we?


He didn't blame PHP, just never used it again for anything important. Did we read the same comment?


Please, read "The Design of Everyday Things": http://www.amazon.com/Design-Everyday-Things-Donald-Norman/d...


If you don't like "keep it simple stupid" and determinism in your language of choice (much less immutability)... you're basically everything wrong with programming in the year 2015


You bought a new car. You took it out for a ride. a tree falls before you. You brake, but the car proceeded to hit the tree anyway.

You call the car company and talk to their engineers. One of them ask. 'Did this happen on a Friday evening, when it was raining?' You say 'Yes, how do you know?'

The engineer replies.

"Our brakes does not work on rainy Friday evenings. If you REALLY want to brake on a rainy Friday evening, you should also pull the lever under the dash board that is normally used to open the hood. It is very clearly printed on our manual. Didn't you read it? Our car is not the problem. You are the problem"

You were enlightened. You came back home. You never took the car out on rainy Friday evenings. When Somebody asks about the car, You said. "Yea, it is a great car. But you got to know how to use it".

You took great pride in knowing how to drive this car, which can easily kill someone who hasn't read the manual. When you hear that someone got killed while driving this car, you simply said. 'That car is Ok. but you should really know how to drive it, sadly this guy didn't. He was the problem, the car ain't...


From now on, when I write “RTFM,” I will also link to this comment.


> You bought a new car.

There's your problem, wasting money on something that only depreciates in value. Tsk tsk.


It's not the complexity of the car manual, but the falling tree that I fear.


sure, everything should be done perfectly or not at all ...


We can accept that perfection may be impossible, difficult to obtain, or a poor tradeoff against other factors.

But that doesn’t mean that all imperfect designs are of equal merit.


This, combined with the fact that you can increment strings gives some 'interesting' results:

    $a = "2d9"; 
    $a++; 
    echo $a . "\n"; 
    $a++; 
    echo $a . "\n"; 
Output

    2e0
    3


This is wonderful, I love it!

Interestingly [1], this echoes "2e0" followed by "3" in hhvm-3.7.0, but "3" followed by "4" in hhvm-3.6.0.

[1] http://3v4l.org/sJhP8


That means that someone was using this "feature" in a relatively core piece of code from the PHP ecosystem. Enough that hhvm felt they needed to support it.


There is some nasty type conversion going on here, from the type of stochastic random throws of two nine-sided dice to floats to integers. Where is your type preservation, PHP?


Is there any way to defend against this one? I know === to turn off type conversion with the equality operator, but what about here?


If condition check type and not try to increment a string...

if (!is_string($notstr)) ++$notstr;

edit:

I think checking for integer is better, if it's just incrementing integer.

I was going to say type hinting but I just realized php's primitive cannot be type hinted.


Who tries to increment strings anyway? What is your point here?


Given how happy PHP is about converting strings to integers on demand, it would be pretty easy to take string input intended to be a number, forget to actually convert it, and go around using it happily until one day you accidentally set off a bomb.


Yeah, who would do that? Nobody. So why the _ is it possible in the first place?


because PHP is dynamically typed, it's easier to accidentally increment a string.


Ahh PHP, the language where true == false

    php > if ((true == "foo") && ("foo" == 0) && (0 == false)) echo "yay!";
    yay!


I've never seen one, but somewhere there must surely be a PHP version of the infamous 'WAT' talk about JavaScript, full of examples like this and the "2d9"->"2e0"->3 example mentioned by lars.



This truly just bummed me out :(


Don't let it - he understands exactly how the types are being converted in order to make it appear that true == false.

This sort of thing happens in type conversion languages. You can either use === to stop conversion or you can understand how conversion works.

I'm not sure how the order of conversions is decided by PHP, but here's a brief explanation:

Compare "foo" to true. Convert the string "foo" to a boolean value. As it is desirable that a non-empty string evaluate to true, we will say they are "equal."

Compare "foo" to 0. Convert the string "foo" to a numeric value. As "foo" does not start with 0x it cannot be hex, and as it does not start with 0 it cannot be octal, so evaluate it as decimal - there are no numbers before the first letter so the string foo, when numerical, is 0.

Evaluate 0 to false. Well, that's just binary now isn't it? Of course false and 0 are equal!

The moral of the story, == is not "exactly equal" it is "relatively equal."


The problem with designing a language that does these sorts of implicit type conversions is that the "equality" operator violates the fundamental properties of equality. Since grade school mathematics we are all taught that equality is symmetric and transitive, and PHP's == operator is neither.


AFAIK == is always symmetric.


You can either use === to stop conversion or you can understand how conversion works.

It has been my experience that, to a first approximation, no-one fully understands how conversion works in such languages to the point of never getting it wrong in practice.

Of course we didn't know that would be the case when some of these languages were first created, but I think it is a compelling argument for making an actual-equality == operator the default in any new programming language design. There are enough plausible differences because of things like reference vs. value semantics already, without breaking basic intuitions about what comparisons mean as well.


> This sort of thing happens in type conversion languages. You can either use === to stop conversion or you can understand how conversion works.

You must admit that this is a lot of behavior to keep in mind.

Eg, there is no pattern like "try converting the value on the right to the type of the value on the left".

> Compare "foo" to 0. Convert the string "foo" to a numeric value.

I would expect this to convert 0 to "0" and fail. I suppose it's done this way because there's no way to represent a hexadecimal number except as a string.

> The moral of the story, == is not "exactly equal" it is "relatively equal."

The moral of the story for me would be "never use ==", if I were using PHP. I don't want to think about so many rules when trying to do a simple comparison.

FWIW, Ruby allows type conversion, but it generally must be explicit: `5 == "5"` is false; you must either do `5.to_s` or `"5".to_i` to compare, therefore nothing unexpected can happen. `if some_var` does "convert" to a boolean, but the rule is "nil and false are falsey, everything else is truthy", so again, not much to remember.

"Hard to mess up" is better than "easy to mess up", even if it's possible to avoid the mistake.


"This sort of thing happens in type conversion languages. You can either use === to stop conversion or you can understand how conversion works."

Even JavaScript isn't insane enough to somehow coerce a string to 0.


Yes, JavaScript will convert strings into numbers

    console.log(5*"12");
    60
    console.log(5*"0x0C");
    60


Actually, sometimes type conversion make some code become a little bit handy.

We use Java at the backend and of course Javascript for frontend. When serializing, in Java we should

        String dataRaw = "42";
        int objectId = Integer.parseInt(dataRaw);
Meanwhile, in JS, it is fairly simple:

        dataRaw = "42";
        var objectid = +dataRaw;


12 is not 0.


Yes and no. JavaScript gives you the good ol' "NaN" which is a number (despite not being a number).

PHP doesn't have that concept in it.


Sort of. 'NaN' is one of the IEEE 754 floating point constants, along with 'Inf' for infinity. They are numeric types, in that they can be returned via operations on numbers, such as dividing zero by zero or adding '-Inf' to 'Inf'. See https://en.wikipedia.org/wiki/NaN

I always understood that the 'isNaN()' function was required to check if a numeric variable is equal to 'NaN' directly, since normal equality cannot be used as there are multiple valid bitwise representations of 'NaN' in the standard - it is a float with an exponent of all ones and a non-zero fraction. However, 'isNaN()' now seems to have been co-opted into being used to check if a string is not a number, i.e. does not represent a numeric value, and in fact I believe this is now the documented description of the function in ECMAScript?


gNaN's Not a Number


I tried replacing == with === and it gave me bool(false) in all cases.

Here's the code: http://3v4l.org/15hr7


Same goes for the `0E` prefix with an uppercase E

The likelihood of generating a hash value with that kind of prefix is 2 in 65536.

Finding a collision `hash(a) == hash(b)` with this "weak" equality comparison is approximately 1 in 256 if I'm not mistaken.


You are mistaken: my guess is that you are taking the square root because of the birthday paradox, but that is incorrect, and the birthday paradox does not apply here anyway.

The probability of generating a hash with the right prefix is 10 in 16^3, or about 0.25%. Finding a 0e... == 0e... collision has probability ~6e-6, if both inputs are random. The chance that two hashes collide in this way given N random inputs is 1-(1-p)^(N-1), for N>0.


> The likelihood of generating a hash value with that kind of prefix is 2 in 65536.

The prefix is not sufficient though, the suffix must be entirely decimal otherwise it's not a valid number in scientific notation.


The prefix is sufficient. Any hash matching /0e[0-9].*/ works.


As far as I can see the prefix is not sufficient, a single non-digit character in the tail fails the conversion (and the equality check): http://3v4l.org/ctASF (vs http://3v4l.org/5FvJu, exact same strings but for the last character replaced by a digit)


Which means the probability of generating a hash value of the form 0[eE][0-9]{30} is (1/128)(10/16)^30 or 5.9e-9.

It certainly reduces the strength of the hash (and MD5 shouldn't be used anymore in any case), but still a roughly 6 in a billion chance of someone choosing e.g. a password and it happening to be exploitable in this manner.


From the manual:

  > The value is given by the initial portion of the string.
  > If the string starts   with valid numeric data, this will
  > be the value used. Otherwise, the value will be 0 (zero).
  > Valid numeric data is an optional sign, followed by one
  > or more digits (optionally containing a decimal point),
  > followed by an optional exponent. The exponent is an 'e'
  > or 'E' followed by one or more digits.
Also:

  > If you compare a number with a string or the comparison
  > involves numerical strings, then each string is converted
  > to a number and the comparison performed numerically.


You're right:

% php -r 'var_dump("0e1" == "0e2");' bool(true)


I don't think I'd use the word "magic" here, because it implies that == works when, by any reasonable standard, it does not.


Type coercion is fine so long as you recognize it as the syntactic sugar that it is. JS and PHP support easy type coercion because HTTP is string-only and it would be a pain in the ass to explicitly cast every value you get over the wire. You just have to be sure that, when you use it, you do so intentionally and not out of laziness.


Here's what I took away::

It's pain in the ass to validate and sanitize your input.


So, you should use the old shell trick of adding an "X" to the front of the strings before comparing?


Or use === instead of ==.

The PHP developers have been pretty honest about the mistakes they made early on because they didn't know better. Unfortunately, many of those mistakes persist. The difference between == and === is one of the more well-known mistakes.


PHP's type coercion is nothing like I have every seen in any other language. Its horrendously messy, ugly and completely inexcusable. Strings type-casted to integers are 0. Seriously? Take a look at this,

> $arr = array(0, "was", "invented", "in", "india");

> var_dump( in_array("Hello", $arr ) );

and yeah it is TRUE because "Hello" got coerced to 0. I blogged about a major bug, I faced, in PHP, where column name "10th_grade" was being type-casted to "10" failing the "bindParam" [1]. Even if they have to continue this "feature" because of backwards compatibility, the least they could have done was NOT to use it in the newer functions but no, even they have this stupid "type juggling".

[1]: http://coffeecoder.net/blog/my-perfect-reason-avoid-php-type...


There are a couple of things we have learnt in our collective 50+ years of software engineering:

1. Code is not English: Nice try COBOL, and someone had to try, but a failed experiment. Bizarre holdouts: SQL

2. People are not idiots, and will not collapse into a gibbering heap if their programming language insists that 0 and "0" are different things and must be managed accordingly. Bizarre holdouts: PHP, Javascript. Honourable mention: Excel (no Excel, that is not a f&@cking date, I will tell you if I want a date).


> 2. People are not idiots, and will not collapse into a gibbering heap if their programming language insists that 0 and "0" are different things and must be managed accordingly.

This. People are not idiots, they're learning. By making your language assume programmer is an idiot you're making it more difficult for said programmer to form a coherent mental model of what's going on.


Bizarre holdouts: SQL

I think SQL is actually one of the better implementations of this idea. It's a bit verbose, but I don't think it's tripped up people in the same way that PHP and JS do.


SQL is great for a very specific job: talking to a database. If you try to do anything else in it, you end up in a horrible mess (e.g. cursors).

Luckily, people rarely try to do anything difficult in SQL, because they are using another language and dropping into SQL to talk to their database. This can lead to inefficient code, depending on the API/SQL engine, but it means people end up with sane code (unless their other language is PHP, of course.)


Absolutely, to be fair to JS, Eich admitted it was an horrible mistake, and tools like JSlint enforce the use of === .

I didn't see any meaculpa from the PHP team yet.Would like to read about it.


Yes. To be strictly fair, both JS and PHP have legitimate excuses; JS because it was done in an insanely short timescale, PHP because it was (initially at least) cobbled together by an amateur for his own purposes. I doubt anyone could have predicted that both languages between them would basically be running the planet by 2015 :)


They're sorry you're such a terrible coder, worse than Rasmus Lerdorf himself:

"For all the folks getting excited about my quotes. Here is another - Yes, I am a terrible coder, but I am probably still better than you :)" - http://en.wikiquote.org/wiki/Rasmus_Lerdorf


> in_array(.., .., $strict)

I think you're aware of the third parameter but for anyone who reads this post, it disables the type coercion of the in_array call.


Spoiling a good rant with facts.


The fact that you need to specify a third, optional parameter to get sane output out of a really basic function is still pretty rant-worthy.


But beware, with strict comparison: 1 ≠ 1.0 (because int ≠ float).


But... wouldn't you expect that?

I mean, they AREN'T the same value really.


Agree, they aren't the same type to begin with.


    PHP's type coercion is nothing like I have
    every seen in any other language. Its
    horrendously messy, ugly and completely
    inexcusable. 
Is it objectively worse than type coercion in JavaScript?


Oh yes. In Javascript, the operands are only coerced if one of the operands is a number. So when comparing two strings (regardless of whether the strings can be interpreted as a number), you always get a regular string compare.

  "12" == "12.0" -> false, basic string compare
Furthermore, if one operand happens to be a number, and the other operand has illegal characters to be interpreted as a number, the two operands aren't equal.

  0 == "foo" -> false, "foo" is not a valid number
  12 == "12 monkeys" -> false, "12 monkeys" is not a valid number
  12 == "12.0" -> true, "12.0" is a valid number, and compares equal to 12.
In short, in Javascript the == operator actually makes sense. In PHP, every single one of above examples would evaluate to true.


While it is obvious that PHP's == operator is horrible, JavaScript has its share of pretty bad issues, like "x" - 1 giving NaN.

What I don't understand is why some people agree that PHP is a horrible language, while at the same time praising JavaScript as messiah of scripting. These two languages don't just have problems, they have very similar problems. Moreover, they gained popularity for very similar reasons (lack of choice).

Seriously, if you posted something similar to OP about JavaScript the first thing people would tell you is "What, you're still not using ===?!"


PHP: var_export(0 == "hello"); // true

JavaScript: console.log(0 == "hello"); // false


Actually, I was hoping for something more than a single example.

Or, did you mean that PHP and JavaScript were neck-and-neck all the way up to that one example, and ultimately it's the very one that proves PHP's type coercion is worse?


You're in a thread about how PHP's type coercion can easily cause a serious vulnerability. So, the title of this thread is your second example. If you want a third example, find it yourself.


Give me a break. 3 examples isn't enough to answer the question. Your comment history shows you ask questions in lieu of doing your own research. If you don't want to take the time, then move on.


Amusingly enough the Puppet current parser does this too because it has some weird form of type juggling. This has been fixed in the future parser, which actually has a type system too :).


This is well-known PHP-trick. Use === to right result.

  php > var_dump(md5('240610708') == md5('QNKCDZO'));
  bool(true)
  php > var_dump(md5('240610708'),   md5('QNKCDZO'));                                                                                                                                                    
  string(32) "0e462097431906509019562988736854"
  string(32) "0e830400451993494058024219903391"
  php > var_dump(md5('240610708') ===   md5('QNKCDZO'));                                                                                                                                                 
  bool(false)
  php > var_dump("0e462097431906509019562988736854" == "0e830400451993494058024219903391");
  bool(true)
  php > var_dump("0e462097431906509019562988736854" === "0e830400451993494058024219903391");
  bool(false)
  php > var_dump(md5('240610708') ===   md5('QNKCDZO'));                                                                                                                                                 
  bool(false)
  php > var_dump(md5('240610708') ==   md5('QNKCDZO'));                                                                                       
  bool(true)
  php > var_dump(md5('240610708') === md5('QNKCDZO'));
  bool(false)


> This is well-known PHP-trick. Use === to right result.

Everybody knows PHP is a trickly-typed language. Read the docs people or PHP will take advantage of your gullible ass.


perhaps ==== operator must reserved


Absolutely! However, we must be careful not to define it in a too predictable way lest we violate the Principle of Most Surprise.


php_real_equivalence_4()


Exactly!

But it must invoke with additional NULL-parameter to achieve real effect and analyse return value for TRUE, FALSE, NULL:

  php_real_equivalence_4($x, $y, null);


Except they had to call it php_real_equivalnce_4() because php_real_equivalence_4() was taken.


PHP's == has a lot of oddball effects. They were put in so that things would behave the way a novice expects them to (3 == '3') but would confuse more experienced programmers, or those coming from other languages.

Unless you're deliberately taking advantage of automatic type conversion and whatnot, you should probably use === by default.


> They were put in so that things would behave the way a novice expects them to (3 == '3')

It's a very wrong approach. It may look like newbie-friendly, but in fact it makes it much harder to learn and use. Any novice will be constantly attempting to form a mental model of what's going on and how the language interprets concepts. Refusing to do things like 3 == '3' is simple and makes sense. Assuming a programmer is an idiot and trying to outguess his mistakes makes the language so complicated, that the novice will not be able to form a coherent model and will most likely assume that "this thing is magic".


It's hard for newbies who want to master the language. It's not hard for people who have no interest in learning a programming language and just wan't to make the thingy in their HTML do some stuff.

Register globals,

    <?php
        if ($category == 2) {
            echo 'Foo';
        }
    ?>
and be done.

We have to remember the PHP origins and audience from way back to understand why this was considered easy to use.


That's actually interesting. It's not obvious to me that "2" should be parsed as an int and not a string. Perhaps we should either be explicit about what we want "2" to be parsed as (int, long, float, double, bigint, bigfloat, string...) or let the parsing of a number be determined in a more dynamic way. If you're comparing a string with an integer literal, then you probably want the string interpretation of the literal, right?

Not that this is particularly important, I guess.


We are pretty sure what the literals mean. On the other hand we have many string channels: get/post/cookie/persistent storage¹/… Given that environment its probably natural that you try to convert a string into its "intended" type.

¹no DB, but the "just write your visitor counter into a plain text file" back then


>We are pretty sure what the literals mean.

News to me. You have to enter a really high-precision number as a string in Java so it won't be rounded off to fit within a double. This is an unsolved problem.


shhhhh, people don't realize PHP started out as just a tool for Rasmus and ended up evolving. No, to them, PHP was DESIGNED this way on purpose from the ground up.


Do you consider that an acceptable excuse for its behaviors fifteen years after the fact? Because I do not.


Of course not, but everyone keeps comparing PHP to languages that were designed and developed to be languages, not a toolset that some crappy developer (his own words) created for his personal site that ended up evolving and becoming a real language.

It's got quirks, we get it. Let's keep improving the language as we go instead of constantly bashing it. I mean PHP is one of the most widely used languages on the web today.. Clearly it's doing something right.


People compare PHP to other languages regularly used in 2015 for web development. In that light, it compares very poorly.

McDonald's is super popular, too, and deserves even more of a ration of shit than they get for feeding people slop.


Designers of future languages, please take this example as a proof of the rule: don't design anything for newbies. They will find a way to make an error anyway, but dumbs-based design will be the problem for everyone else.


That - and the error is going to be a lot more subtle and harder to find.

In all fairness though, it's a balancing act - There are benefits to dynamic typing, but PHP clearly overdid it. (See also the disaster that was/is magic quotes)


I believe you are confusing dynamic with weak typing. Other languages got dynamic typing quite right.


Right - that should have read weak typing.


And in C we can do this to get TRUE:

    return (33 == '3');
:P


Incorrect. However, (0x33 == '3') will return true, as will (51 == '3'). Your point is valid, even if your code is wrong. Automatic type coercion can produce unexpected results in any language.

PHP's automatic type coercion rules are designed to help newbies at the expense of experienced developers. C's automatic type coercion rules are, largely, designed to expose the underlying memory layout to developers who know what they're doing, at the expense of inexperienced developers. Both can easily contain dangerous pitfalls, but I prefer the latter philosophy over the former.

(Disclaimer: I have built a career as a C programmer and frequently use its lower-level features to great advantage. I am biased.)


And there are no type-coercion rules at play in case of 51 == '3', because type of '3' is int (as per ISO 9899 p. 6.4.4.3.2).


Excellent point. Thank you for the clarification. This is true in c99 as well.


Okay, I stand completely corrected.


> you should probably use === by default.

unfortunately this can also backfire if your class/module is used in a different context where it gets strings instead of integers and you were just using === without really thinking about it:

We had a case where the code was something like:

  function doSomething($value) {
    if ($value === 0) {
      //do something
    } else {
      //do something else
    }
  }
This was then used in a slighly different context where $value was a string '0', it then ended up incorrectly in the //do something else block, doing the completely wrong thing. In this case the type co-erced == would have been better, and I think what the developer was expecting would be a type error due to the === but it's not a type error, it'll just fall into the else block.


You're describing the expected behavior of === and a bug.

This is not the === operator "backfiring."


Absolutely, I was just suggesting that "you should always use the === operator" advice which I see a lot of people say(examples multiple times in this thread), does not guarantee you won't run into problems with incorrect types, and giving an explanation.

As always, you should be thinking when programming.


As a rough generalization, all PHP code that involves "==" and "!=" should be considered broken.

PHP introduced "===" and "!==" a long time ago, and every programmer should know that they have to use that, without any excuses.

Also, don't use "in_array($a, $b)", but use "in_array($a, $b, true)" instead.


>without any excuses...

Oh yea? How about this,

http://www.reddit.com/r/PHP/comments/2zhg6z/how_true_is_this...


I don't see how "==" would help in that situation, other than "solving" this particular issue by opening another can of worms.

You simply can't use php arrays for user-generated keys in a safe manner. At least you have to add some prefix like '_stuff_' to all keys, to avoid accidental conversions. And yes, this "proper" solution (Can you ever can say "proper" in php? Anyway ...) doesn't have to involve "==", but works perfectly (and preferably) with "===".


So what you're basically saying is that the "standard" variations and APIs which people will find and use are broken, and the ones actually working are hidden somewhere in the documentation. And you're saying you think this is just fine?

In that case, I have a hammer to sell you, and I think you know which one.

http://blog.codinghorror.com/the-php-singularity/


> And you're saying you think this is just fine?

Not sure where you read this. I didn't provide any judgement of the situation.

Strawman arguments like this should have no place on HN.


Reminds me on bash, where I also have to prefix values to compare with x, to be able to handle empty vars.

    if [ x$1 == x$2 ];
But automatic string to float conversion is just crazy, esp. in comparison context. Perl, which is equally soft, has at least numerical and string comparison operators.

    $ perl -e'print "0e462097431906509019562988736854" ==
                    "0e830400451993494058024219903391"'
    1
    $ perl -e'print "0e462097431906509019562988736854" eq
                    "0e830400451993494058024219903391"'
So the solution is to use === which does not compare references with strings but the values, or the strcmp function. And refrain from using == with strings at all. '0XAB' == '0xab' is true. Comparing any string to 0 with == will return true.


I'm not sure why does this crazy "x" prefix tale still continue. You can simply quote them instead. Especially if you use bash and not some other sh-compatible shell:

    if [ "$1" == "$2" ];
will work just fine.

If you need all sh compatibility, it should be test for "x$1" anyway (still quoted).


I think you meant “=”, not “==” (though the latter would work with bash).


Well, either in the example. Parent was saying "Reminds me on bash"

For sh version, I'd go with super-safe:

    if test "x$1" = "x$2"


If you're doing that, even better to use "x${1}" to be safer. Also, conditional expressions ( [[ instead of [ or `test`) are generally a bit more well-behaved. See http://wiki.bash-hackers.org/syntax/ccmd/conditional_express... for more info.


But [[ is a bashism - it won't work on bare sh.


Actually, you don't prefix with “x” to handle empty vars, but special characters, as Stephane Chazelas recently reminded: http://www.zsh.org/mla/workers/2015/msg00797.html


Again here conditional expressions should make this a non-issue ( [[ instead of [ ) since the stuff inside doesn't get parsed the same as general input. See http://wiki.bash-hackers.org/syntax/ccmd/conditional_express...


Yes, but then you need either bash or zsh. It won't work on bare sh (or on dash, which is the default /bin/sh on Debian and derivatives like Ubuntu).




(Shameless plug) http://blog.hackensplat.com/2012/04/php-some-strings-are-mor...

At which point in this article do I start making stuff up about PHP's comparison operators?


> If you want to compare two strings that are the same except they each use different ways of expressing an 'é', you need to add another equal sign and use ==== to differentiate them, as === will see them as equal.


The fact that PHP is a dynamic language and that "==" would automatically convert the types of both ends to a flat because of the "0e" prefix of the string is problematic. Perhaps it's a bug in the PHP source code.

See below.

		# the examples were essentially similar like this comparison.
		php > var_dump("0e462097431906509019562988736854" == "0e830400451993494058024219903391");
		bool(true)

		# md5() does return a string type, but just happens to start with "0e"
		php > var_dump(md5('240610708'));
		string(32) "0e462097431906509019562988736854"
		php > var_dump(md5('QNKCDZO'));
		string(32) "0e830400451993494058024219903391"

		# and if PHP treats them as floats instead of strings, they all evaluated to the same thing. float(0)
		php > var_dump(0e462097431906509019562988736854);
		float(0)
		php > var_dump(0e830400451993494058024219903391);
		float(0)
		php > var_dump(0e087386482136013740957780965295);
		float(0)


One thing to note.

The md5 and sha1 interfaces have a second param which prevents this bug.

Instead of returning a string it will return binary data which won't get coerced to a float.

For example:

    <?php
    if (md5('240610708', true) == md5('QNKCDZO', true)){
        printf("Will never go here\n");
    }
PHP has a lot of.....PHPisms.


There's no "binary data" type. Raw hash output can certainly start with bytes matching "0e" or "0E", it's just a lot more rare.


Just to make it clear, I did not come up with this example. Unfortunately I can't find out the source anymore. It also contained some technical explanations about why this works. So if anyone remembers, I'd be happy if you could comment with the link.



Author of the original tweet here, thanks for sharing! Here's the link to the "original original" MD5 tweet https://twitter.com/spazef0rze/status/439352552443084800

For similar tricks for SHA-1 and plaintext see https://twitter.com/spazef0rze/status/523010190900469760


Nothing magic here. Be careful with the == comparison operator and its type juggling. If you want to match things precisely, use the === operator. Loose comparisons can have dangerous side-effects!

The MD5 examples are really just cloaked comparisons like this one, later in the list:

var_dump('0010e2' == '1e3');

((10 x 10^2) == (1 * 10^3))

http://php.net/manual/en/language.operators.comparison.php

http://php.net/manual/en/types.comparisons.php


All of this I can understand, but why then octal numbers are not compared the same way is beyond me?

    var_dump(0xA == '0xA'); // bool(true)
    var_dump(012 == '012'); // bool(false)


Check out which types those examples get cast to and it should make more sense :) I don't know the exact rules for type detection in PHP, but it looks like that's the cause.


I looked at this and said "oh well, at least hhvm is consistent".



While I do believe that it is possible to write great Apps with PHP I tend to stay away from it because it is not statically typed. For quick and dirty proof of concept it is nice though (IMHO).


PHP : 1 week is not always 7 days:

  $_1week = new DateInterval("P1W");
  $_7days = new DateInterval("P7D");
  var_dump($_1week == $_7days); // true
  var_dump($_1week);
  var_dump($_1week == $_7days); // false
  var_dump($_7days);
  var_dump($_1week == $_7days); // true
http://3v4l.org/CcAk8

Same result with '$_1week = new DateInterval("P7D");' :-)


I agree that all languages have it's warts and a good programmer should know about them.

I think what makes both PHP and Javascript not so great is the fact that it is so easy to overlook deadly mistakes like using "==" instead of "===" or forgetting to add a "var". And worst of all those errors can go unnoticed until something breaks and when it does it's pretty hard to find out the root of the problem.


sigh.. yes, == can be weird. get over it. any php dev worth anything knows to use ===


It's appalling to think that there are 'developers' who are still only now realising PHP's horrendous nature.


OK, how did this happen?


PHP's `==` tries very hard (even harder than javascript's) to "please" the user. That means if if can it will fallback to converting both sides to numbers and compare that.

Here all hashes are of the form "0e{digits}" which is a valid scientific notation, so when `==` internally converts them to numbers they're all parsed to `float(0)` and therefore equal, success!


How about ==== and ===== and ======?

For security reason, I suggest PHP to implement such operators... :D Example:

"abc" === 'abc'; # ==> true

"abc" ==== 'abc'; # ==> false, single-quote vs double-quote

"abc" ===== 'abc'; # ==> true, this is how it works

j.k :D


Here's a threaded app for finding such collisions: https://github.com/beched/php_hash_collision_finder


slightly off topic:

  $a = "DjBlYVWap4fQC8b3C73+NATPA2We"."c"."E+FNMAP+2WcTIdAzJQv6y2hFaP0F"."V"."y7hgdJc4ZlbX0fNKQgWdePWo3R7w";
  $b = "DjBlYVWap4fQC8b3C73+NATPA2We"."d"."E+FNMAP+2WcTIdAzJQv6y2hFaP0F"."d"."y7hgdJc4ZlbX0fNKQgWdePWo3R7w";
  var_dump($a === $b); // false
  var_dump(md5(base64_decode($a)) === md5(base64_decode($b))); // true
:-P


usual story

== is not the same as ===


not the usual story , == should be deprecated and a warning should be displayed. PHP has explicit coercion features, devs should use them alongside with === .


I mean its the usual story when people post stuff about PHP comparisons.

http://php.net/md5

the example itself uses === although no advice why is given


There is also a weird casting when a string starts with a digit: var_dump(10 == '10xyz');


another annoying thing about PHP is that it keeps emitting warning related INFO messages on webpage for visitors to see even after having proper try{}catch{} error handling. Then you got use set_error_handler for it to suppress unwanted messages


  php > var_dump("hello" == 0);
  bool(true)
  php >


This is why Strong Typing is so important !


So this is a PHP fail. But all the same, MD5 has been shown to fail collision resistance several times now.


The problem is this is an issue with any comparison of hex digit strings. It's a possible issue with any hash function not just MD5.


Yes md5 is broken. We've known this for quite some time.


The problem is with PHP, not MD5


This has nothing to do with md5 itself.


it's not _that_ broken




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: