All of them start with 0e, which makes me think that they're being parsed as floats and getting converted to 0.0. This is why "magic" operators like == in PHP and JavaScript never should have existed in the first place. Operators like == should be, by default, extremely boring. PHP's just happens to be a bit more magical even than JavaScript's.
Once I wrote a little PHP application to manage a clan in a browser game. I used an MD5 hash as session id that I checked with
if(session_id)
When users started reporting that their logins would sometimes not work at the first time, I found out that strings that start with zero are coerced to 0 and then interpreted as false.
To be fair, this kind of thing (maybe not exactly this, but type-coercion bugs) can happen in JavaScript, which is all the rage now for "important" stuff.
This is levels worse than what Javascript does though. Most high-level languages have some sort of implicit coercion (even python lets you do truth tests on non-boolean values). The problem here is the programmer isn't confused about types at all. They're comparing two things of the same type: two strings! Nevertheless, given two strings PHP tries to coerce them into ints before carrying out the equality test. Yes, you will have coercion bugs in other languages if you're testing things of different types, but I don't know any other language where a equality test between two things of the same type are automatically coerced into another.
It can happen in a few languages, but PHP is notably more aggressive in trying to convert to int.
Actually a common way to grief new websites is to try to register '0' as a username. `if (string)` is a common way to check for null, and '0' will often fail.
Yeah but javascript has 'use strict' whereas PHP decided that the easter egg "looks like you're using the wrong language!" was more important than actually allowing a 'use strict' to force === instead of ==.
While that's true, JavaScript is still horribly error-prone because of this. The suggestion that JS would be a much better language if the == operator worked more like === in the first place is very reasonable.
But to widely varying degrees. This kind of problem is a direct consequence of having a relatively weak and dynamic type system (or other semantics that mean you might as well have).
Plenty of people have warned about this kind of danger for a very long time. However, there seems to be a significant subset of the web development community that only has experience with languages like JS and PHP and to a lesser extent other dynamic languages like Ruby and Python, who simply fail to realise how many of these bugs should have been entirely prevented by using better tools by now. The usual counter seems to be something about unit tests, at which point anyone following the discussion who actually knows anything about type systems and the wider world of programming languages dies a little inside.
It is entirely fair to criticise bad tools for being bad, particularly in specific ways and with clearly identified problems that can result as in this case. It's bad enough that we are stuck with JS for front-end web development these days, but there aren't many good arguments for using something as bad as PHP on the back-end in 2015.
The hash function is completely irrelevant to this bug - whether you use a hash that returns 0 for every input, or invent a hash function that returns a perfectly unique and unpredictable hash for all inputs, PHP will still shoot you in the foot.
If you don't like "keep it simple stupid" and determinism in your language of choice (much less immutability)... you're basically everything wrong with programming in the year 2015
You bought a new car. You took it out for a ride. a tree falls before you. You brake, but the car proceeded to hit the tree anyway.
You call the car company and talk to their engineers. One of them ask. 'Did this happen on a Friday evening, when it was raining?' You say 'Yes, how do you know?'
The engineer replies.
"Our brakes does not work on rainy Friday evenings. If you REALLY want to brake on a rainy Friday evening, you should also pull the lever under the dash board that is normally used to open the hood. It is very clearly printed on our manual. Didn't you read it? Our car is not the problem. You are the problem"
You were enlightened. You came back home. You never took the car out on rainy Friday evenings. When Somebody asks about the car, You said. "Yea, it is a great car. But you got to know how to use it".
You took great pride in knowing how to drive this car, which can easily kill someone who hasn't read the manual. When you hear that someone got killed while driving this car, you simply said. 'That car is Ok. but you should really know how to drive it, sadly this guy didn't. He was the problem, the car ain't...
That means that someone was using this "feature" in a relatively core piece of code from the PHP ecosystem. Enough that hhvm felt they needed to support it.
There is some nasty type conversion going on here, from the type of stochastic random throws of two nine-sided dice to floats to integers. Where is your type preservation, PHP?
Given how happy PHP is about converting strings to integers on demand, it would be pretty easy to take string input intended to be a number, forget to actually convert it, and go around using it happily until one day you accidentally set off a bomb.
I've never seen one, but somewhere there must surely be a PHP version of the infamous 'WAT' talk about JavaScript, full of examples like this and the "2d9"->"2e0"->3 example mentioned by lars.
Don't let it - he understands exactly how the types are being converted in order to make it appear that true == false.
This sort of thing happens in type conversion languages. You can either use === to stop conversion or you can understand how conversion works.
I'm not sure how the order of conversions is decided by PHP, but here's a brief explanation:
Compare "foo" to true. Convert the string "foo" to a boolean value. As it is desirable that a non-empty string evaluate to true, we will say they are "equal."
Compare "foo" to 0. Convert the string "foo" to a numeric value. As "foo" does not start with 0x it cannot be hex, and as it does not start with 0 it cannot be octal, so evaluate it as decimal - there are no numbers before the first letter so the string foo, when numerical, is 0.
Evaluate 0 to false. Well, that's just binary now isn't it? Of course false and 0 are equal!
The moral of the story, == is not "exactly equal" it is "relatively equal."
The problem with designing a language that does these sorts of implicit type conversions is that the "equality" operator violates the fundamental properties of equality. Since grade school mathematics we are all taught that equality is symmetric and transitive, and PHP's == operator is neither.
You can either use === to stop conversion or you can understand how conversion works.
It has been my experience that, to a first approximation, no-one fully understands how conversion works in such languages to the point of never getting it wrong in practice.
Of course we didn't know that would be the case when some of these languages were first created, but I think it is a compelling argument for making an actual-equality == operator the default in any new programming language design. There are enough plausible differences because of things like reference vs. value semantics already, without breaking basic intuitions about what comparisons mean as well.
> This sort of thing happens in type conversion languages. You can either use === to stop conversion or you can understand how conversion works.
You must admit that this is a lot of behavior to keep in mind.
Eg, there is no pattern like "try converting the value on the right to the type of the value on the left".
> Compare "foo" to 0. Convert the string "foo" to a numeric value.
I would expect this to convert 0 to "0" and fail. I suppose it's done this way because there's no way to represent a hexadecimal number except as a string.
> The moral of the story, == is not "exactly equal" it is "relatively equal."
The moral of the story for me would be "never use ==", if I were using PHP. I don't want to think about so many rules when trying to do a simple comparison.
FWIW, Ruby allows type conversion, but it generally must be explicit: `5 == "5"` is false; you must either do `5.to_s` or `"5".to_i` to compare, therefore nothing unexpected can happen. `if some_var` does "convert" to a boolean, but the rule is "nil and false are falsey, everything else is truthy", so again, not much to remember.
"Hard to mess up" is better than "easy to mess up", even if it's possible to avoid the mistake.
Sort of. 'NaN' is one of the IEEE 754 floating point constants, along with 'Inf' for infinity. They are numeric types, in that they can be returned via operations on numbers, such as dividing zero by zero or adding '-Inf' to 'Inf'. See https://en.wikipedia.org/wiki/NaN
I always understood that the 'isNaN()' function was required to check if a numeric variable is equal to 'NaN' directly, since normal equality cannot be used as there are multiple valid bitwise representations of 'NaN' in the standard - it is a float with an exponent of all ones and a non-zero fraction. However, 'isNaN()' now seems to have been co-opted into being used to check if a string is not a number, i.e. does not represent a numeric value, and in fact I believe this is now the documented description of the function in ECMAScript?
You are mistaken: my guess is that you are taking the square root because of the birthday paradox, but that is incorrect, and the birthday paradox does not apply here anyway.
The probability of generating a hash with the right prefix is 10 in 16^3, or about 0.25%. Finding a 0e... == 0e... collision has probability ~6e-6, if both inputs are random. The chance that two hashes collide in this way given N random inputs is 1-(1-p)^(N-1), for N>0.
As far as I can see the prefix is not sufficient, a single non-digit character in the tail fails the conversion (and the equality check): http://3v4l.org/ctASF (vs http://3v4l.org/5FvJu, exact same strings but for the last character replaced by a digit)
Which means the probability of generating a hash value of the form 0[eE][0-9]{30} is (1/128)(10/16)^30 or 5.9e-9.
It certainly reduces the strength of the hash (and MD5 shouldn't be used anymore in any case), but still a roughly 6 in a billion chance of someone choosing e.g. a password and it happening to be exploitable in this manner.
> The value is given by the initial portion of the string.
> If the string starts with valid numeric data, this will
> be the value used. Otherwise, the value will be 0 (zero).
> Valid numeric data is an optional sign, followed by one
> or more digits (optionally containing a decimal point),
> followed by an optional exponent. The exponent is an 'e'
> or 'E' followed by one or more digits.
Also:
> If you compare a number with a string or the comparison
> involves numerical strings, then each string is converted
> to a number and the comparison performed numerically.
Type coercion is fine so long as you recognize it as the syntactic sugar that it is. JS and PHP support easy type coercion because HTTP is string-only and it would be a pain in the ass to explicitly cast every value you get over the wire. You just have to be sure that, when you use it, you do so intentionally and not out of laziness.
The PHP developers have been pretty honest about the mistakes they made early on because they didn't know better. Unfortunately, many of those mistakes persist. The difference between == and === is one of the more well-known mistakes.
PHP's type coercion is nothing like I have every seen in any other language. Its horrendously messy, ugly and completely inexcusable. Strings type-casted to integers are 0. Seriously? Take a look at this,
and yeah it is TRUE because "Hello" got coerced to 0. I blogged about a major bug, I faced, in PHP, where column name "10th_grade" was being type-casted to "10" failing the "bindParam" [1]. Even if they have to continue this "feature" because of backwards compatibility, the least they could have done was NOT to use it in the newer functions but no, even they have this stupid "type juggling".
There are a couple of things we have learnt in our collective 50+ years of software engineering:
1. Code is not English: Nice try COBOL, and someone had to try, but a failed experiment. Bizarre holdouts: SQL
2. People are not idiots, and will not collapse into a gibbering heap if their programming language insists that 0 and "0" are different things and must be managed accordingly. Bizarre holdouts: PHP, Javascript. Honourable mention: Excel (no Excel, that is not a f&@cking date, I will tell you if I want a date).
> 2. People are not idiots, and will not collapse into a gibbering heap if their programming language insists that 0 and "0" are different things and must be managed accordingly.
This. People are not idiots, they're learning. By making your language assume programmer is an idiot you're making it more difficult for said programmer to form a coherent mental model of what's going on.
I think SQL is actually one of the better implementations of this idea. It's a bit verbose, but I don't think it's tripped up people in the same way that PHP and JS do.
SQL is great for a very specific job: talking to a database. If you try to do anything else in it, you end up in a horrible mess (e.g. cursors).
Luckily, people rarely try to do anything difficult in SQL, because they are using another language and dropping into SQL to talk to their database. This can lead to inefficient code, depending on the API/SQL engine, but it means people end up with sane code (unless their other language is PHP, of course.)
Yes. To be strictly fair, both JS and PHP have legitimate excuses; JS because it was done in an insanely short timescale, PHP because it was (initially at least) cobbled together by an amateur for his own purposes. I doubt anyone could have predicted that both languages between them would basically be running the planet by 2015 :)
They're sorry you're such a terrible coder, worse than Rasmus Lerdorf himself:
"For all the folks getting excited about my quotes. Here is another - Yes, I am a terrible coder, but I am probably still better than you :)" - http://en.wikiquote.org/wiki/Rasmus_Lerdorf
Oh yes. In Javascript, the operands are only coerced if one of the operands is a number. So when comparing two strings (regardless of whether the strings can be interpreted as a number), you always get a regular string compare.
"12" == "12.0" -> false, basic string compare
Furthermore, if one operand happens to be a number, and the other operand has illegal characters to be interpreted as a number, the two operands aren't equal.
0 == "foo" -> false, "foo" is not a valid number
12 == "12 monkeys" -> false, "12 monkeys" is not a valid number
12 == "12.0" -> true, "12.0" is a valid number, and compares equal to 12.
In short, in Javascript the == operator actually makes sense. In PHP, every single one of above examples would evaluate to true.
While it is obvious that PHP's == operator is horrible, JavaScript has its share of pretty bad issues, like "x" - 1 giving NaN.
What I don't understand is why some people agree that PHP is a horrible language, while at the same time praising JavaScript as messiah of scripting. These two languages don't just have problems, they have very similar problems. Moreover, they gained popularity for very similar reasons (lack of choice).
Seriously, if you posted something similar to OP about JavaScript the first thing people would tell you is "What, you're still not using ===?!"
Actually, I was hoping for something more than a single example.
Or, did you mean that PHP and JavaScript were neck-and-neck all the way up to that one example, and ultimately it's the very one that proves PHP's type coercion is worse?
You're in a thread about how PHP's type coercion can easily cause a serious vulnerability. So, the title of this thread is your second example. If you want a third example, find it yourself.
Give me a break. 3 examples isn't enough to answer the question. Your comment history shows you ask questions in lieu of doing your own research. If you don't want to take the time, then move on.
Amusingly enough the Puppet current parser does this too because it has some weird form of type juggling. This has been fixed in the future parser, which actually has a type system too :).
PHP's == has a lot of oddball effects. They were put in so that things would behave the way a novice expects them to (3 == '3') but would confuse more experienced programmers, or those coming from other languages.
Unless you're deliberately taking advantage of automatic type conversion and whatnot, you should probably use === by default.
> They were put in so that things would behave the way a novice expects them to (3 == '3')
It's a very wrong approach. It may look like newbie-friendly, but in fact it makes it much harder to learn and use. Any novice will be constantly attempting to form a mental model of what's going on and how the language interprets concepts. Refusing to do things like 3 == '3' is simple and makes sense. Assuming a programmer is an idiot and trying to outguess his mistakes makes the language so complicated, that the novice will not be able to form a coherent model and will most likely assume that "this thing is magic".
It's hard for newbies who want to master the language. It's not hard for people who have no interest in learning a programming language and just wan't to make the thingy in their HTML do some stuff.
Register globals,
<?php
if ($category == 2) {
echo 'Foo';
}
?>
and be done.
We have to remember the PHP origins and audience from way back to understand why this was considered easy to use.
That's actually interesting. It's not obvious to me that "2" should be parsed as an int and not a string. Perhaps we should either be explicit about what we want "2" to be parsed as (int, long, float, double, bigint, bigfloat, string...) or let the parsing of a number be determined in a more dynamic way. If you're comparing a string with an integer literal, then you probably want the string interpretation of the literal, right?
We are pretty sure what the literals mean. On the other hand we have many string channels: get/post/cookie/persistent storage¹/… Given that environment its probably natural that you try to convert a string into its "intended" type.
¹no DB, but the "just write your visitor counter into a plain text file" back then
News to me. You have to enter a really high-precision number as a string in Java so it won't be rounded off to fit within a double. This is an unsolved problem.
shhhhh, people don't realize PHP started out as just a tool for Rasmus and ended up evolving. No, to them, PHP was DESIGNED this way on purpose from the ground up.
Of course not, but everyone keeps comparing PHP to languages that were designed and developed to be languages, not a toolset that some crappy developer (his own words) created for his personal site that ended up evolving and becoming a real language.
It's got quirks, we get it. Let's keep improving the language as we go instead of constantly bashing it. I mean PHP is one of the most widely used languages on the web today.. Clearly it's doing something right.
Designers of future languages, please take this example as a proof of the rule: don't design anything for newbies. They will find a way to make an error anyway, but dumbs-based design will be the problem for everyone else.
That - and the error is going to be a lot more subtle and harder to find.
In all fairness though, it's a balancing act - There are benefits to dynamic typing, but PHP clearly overdid it. (See also the disaster that was/is magic quotes)
Incorrect. However, (0x33 == '3') will return true, as will (51 == '3'). Your point is valid, even if your code is wrong. Automatic type coercion can produce unexpected results in any language.
PHP's automatic type coercion rules are designed to help newbies at the expense of experienced developers. C's automatic type coercion rules are, largely, designed to expose the underlying memory layout to developers who know what they're doing, at the expense of inexperienced developers. Both can easily contain dangerous pitfalls, but I prefer the latter philosophy over the former.
(Disclaimer: I have built a career as a C programmer and frequently use its lower-level features to great advantage. I am biased.)
unfortunately this can also backfire if your class/module is used in a different context where it gets strings instead of integers and you were just using === without really thinking about it:
We had a case where the code was something like:
function doSomething($value) {
if ($value === 0) {
//do something
} else {
//do something else
}
}
This was then used in a slighly different context where $value was a string '0', it then ended up incorrectly in the //do something else block, doing the completely wrong thing. In this case the type co-erced == would have been better, and I think what the developer was expecting would be a type error due to the === but it's not a type error, it'll just fall into the else block.
Absolutely, I was just suggesting that "you should always use the === operator" advice which I see a lot of people say(examples multiple times in this thread), does not guarantee you won't run into problems with incorrect types, and giving an explanation.
As always, you should be thinking when programming.
I don't see how "==" would help in that situation, other than "solving" this particular issue by opening another can of worms.
You simply can't use php arrays for user-generated keys in a safe manner. At least you have to add some prefix like '_stuff_' to all keys, to avoid accidental conversions. And yes, this "proper" solution (Can you ever can say "proper" in php? Anyway ...) doesn't have to involve "==", but works perfectly (and preferably) with "===".
So what you're basically saying is that the "standard" variations and APIs which people will find and use are broken, and the ones actually working are hidden somewhere in the documentation. And you're saying you think this is just fine?
In that case, I have a hammer to sell you, and I think you know which one.
Reminds me on bash, where I also have to prefix values to compare with x, to be able to handle empty vars.
if [ x$1 == x$2 ];
But automatic string to float conversion is just crazy, esp. in comparison context. Perl, which is equally soft, has at least numerical and string comparison operators.
So the solution is to use === which does not compare references with strings but the values, or the strcmp function. And refrain from using == with strings at all.
'0XAB' == '0xab' is true.
Comparing any string to 0 with == will return true.
I'm not sure why does this crazy "x" prefix tale still continue. You can simply quote them instead. Especially if you use bash and not some other sh-compatible shell:
if [ "$1" == "$2" ];
will work just fine.
If you need all sh compatibility, it should be test for "x$1" anyway (still quoted).
> If you want to compare two strings that are the same except they each use different ways of expressing an 'é', you need to add another equal sign and use ==== to differentiate them, as === will see them as equal.
The fact that PHP is a dynamic language and that "==" would automatically convert the types of both ends to a flat because of the "0e" prefix of the string is problematic. Perhaps it's a bug in the PHP source code.
See below.
# the examples were essentially similar like this comparison.
php > var_dump("0e462097431906509019562988736854" == "0e830400451993494058024219903391");
bool(true)
# md5() does return a string type, but just happens to start with "0e"
php > var_dump(md5('240610708'));
string(32) "0e462097431906509019562988736854"
php > var_dump(md5('QNKCDZO'));
string(32) "0e830400451993494058024219903391"
# and if PHP treats them as floats instead of strings, they all evaluated to the same thing. float(0)
php > var_dump(0e462097431906509019562988736854);
float(0)
php > var_dump(0e830400451993494058024219903391);
float(0)
php > var_dump(0e087386482136013740957780965295);
float(0)
Just to make it clear, I did not come up with this example. Unfortunately I can't find out the source anymore. It also contained some technical explanations about why this works. So if anyone remembers, I'd be happy if you could comment with the link.
Nothing magic here. Be careful with the == comparison operator and its type juggling. If you want to match things precisely, use the === operator. Loose comparisons can have dangerous side-effects!
The MD5 examples are really just cloaked comparisons like this one, later in the list:
Check out which types those examples get cast to and it should make more sense :) I don't know the exact rules for type detection in PHP, but it looks like that's the cause.
While I do believe that it is possible to write great Apps with PHP I tend to stay away from it because it is not statically typed. For quick and dirty proof of concept it is nice though (IMHO).
I agree that all languages have it's warts and a good programmer should know about them.
I think what makes both PHP and Javascript not so great is the fact that it is so easy to overlook deadly mistakes like using "==" instead of "===" or forgetting to add a "var". And worst of all those errors can go unnoticed until something breaks and when it does it's pretty hard to find out the root of the problem.
PHP's `==` tries very hard (even harder than javascript's) to "please" the user. That means if if can it will fallback to converting both sides to numbers and compare that.
Here all hashes are of the form "0e{digits}" which is a valid scientific notation, so when `==` internally converts them to numbers they're all parsed to `float(0)` and therefore equal, success!
not the usual story , == should be deprecated and a warning should be displayed. PHP has explicit coercion features, devs should use them alongside with === .
another annoying thing about PHP is that it keeps emitting warning related INFO messages on webpage for visitors to see even after having proper try{}catch{} error handling. Then you got use set_error_handler for it to suppress unwanted messages