Hacker News new | past | comments | ask | show | jobs | submit login

"This RFC fixes the very strange case in PHP where 0 == "foo" results in true. There are some other edge cases like that one, and this RFC fixes them."

It's good they fixed it I suppose, but yikes. As much as people say PHP has advanced from its fractal of awfulness days, a lot of those fixes are shocking if you come from other languages.




Most (non-strongly-typed?) languages have weird edge cases of this sort. There's a very funny presentation by Gary Bernhard highlighting some of these for Ruby/Javascript: https://www.destroyallsoftware.com/talks/wat

If you look hard enough, you'll probably find some site that shows surprising behaviors of your favorite language. Examples:

Python: https://github.com/cosmologicon/pywat

Javascript: https://loomcom.com/blog/0097_the_wats_of_javascript.html

Ruby: https://idiosyncratic-ruby.com/29-limitations-of-language.ht...


The issue is coercion sanity - not that PHP or JS are sane by any other standard to begin with. You can do `0 == "0"` in Javascript (try it now in the devtools console) and it'll coerce the right hand based on the left hand using parseInt/parseFloat, just like PHP. What it won't do is parse the string "foo" as a zero, which is insane even by Javascript standards.

`parseFloat("foo") == parseFloat("foo")` and `parseInt("foo") == parseInt("foo")` in Javascript evaluate to false which is the sane thing to do. `NaN != NaN` is part of the IEEE 754 floating point standard, which predates every single one of the languages you mentioned.

We're all competing in the olympics of suffering, so none of us should be promoting this shit :)


It's important to note here that PHP is converting to an integer, while JS is converting to a double. Integers don't have NaN as a concept. JS doesn't have integers, but if it did it would probably make the same kind of mistake. (This is a guess of course, but a reasonable one given the other insane conversions it does.)


Python may have Wats, but C has Nasal Demons.

https://en.m.wikipedia.org/wiki/Undefined_behavior#Examples_...


Java:

    jshell> Long a = 12l; Long b = 12l;
    a ==> 12
    b ==> 12
    jshell> a == b
    $3 ==> true
    jshell> a = 54321l; b = 54321l;
    a ==> 54321
    b ==> 54321
    jshell> a == b
    $6 ==> false
¯\_(ツ)_/¯


Really? Not being a Java guy, what is the explanation for this?

Terms like boxing and reference comparison come to mind but I can't stitch them into a narrative to explain this.


`Long` is a boxed (reference) type, but small numbers are interned so you get the same reference when boxing a primitive value. Larger numbers aren't interned, so you get fresh boxes.

https://stackoverflow.com/questions/1700081/why-is-128-128-f...


Reference comparison it is with a twist: the jvm caches the most common integers. You'll get the same object from -128 to +127 when implicitly assigning value. This is part of the spec, and can be set to a larger pool with -XX:AutoBoxCacheMax (but cannot be turned off).

If you explicitly ask for a new object (Long a = new Long(123l);) you will get a new object and the comparison will fail as expected.

The moral of the story is to always use .equals().


I love the "Wat" talk. I wrote a blog a few years ago where I attempted to dig into the JavaScript cases Gary's cites and figure out what is actually happening.

https://medium.com/@mikehall314/wat-s-happening-a-closer-loo...



Are there real use cases for type coercion?


For example, it is helpful to avoid having to do tons of explicit integer parsing when working with the DOM, which uses strings for everything


Yes. You have a list of floats and you want to add n to the n-th element. Then n is both an integer (when indexing the list) and a float (when adding).

Of course the language can require an explicit cast, but in most cases it goes against the intuitiveness and simplicity that dynamic languages want to achieve.


And? None of it should be condoned. It's all bad. How is that any argument?


I think it is good and should be condoned. Sometimes having simple rules with the occasionally strange outcome is better than having a complicated set of rules to achieve a simple set of outcomes.


It's a good thing all other languages are perfect and have always been perfect, otherwise the rest of the community would have no pedestal to stand upon.


Nobody is saying other languages are perfect. These are particularly egregious mistakes though.


It was a consequence of type conversion in a loosely typed language. There are consequences to loose typing, one of them is edge cases. PHP also has strict typing capabilities. If you need strict typing, then use it, it's right there for you.

My criticism isn't the valid criticisms of PHP, it's that no matter how much progress the language makes people are quick to find reasons to be dismissive.

The language is good, people do good work with it. Lots of good work. How Javascript became the golden child in a community that can't forgive PHP is beyond me.


> My criticism isn't the valid criticisms of PHP, it's that no matter how much progress the language makes people are quick to find reasons to be dismissive.

It's perfectly valid to be dismissive of PHP and you should recognize that. Main reason: there's a metric ton of PHP 5.X websites out there who are never going to be upgraded. That alone is a reason enough to distrust PHP.

If the community makes a concerted effort to introduce automated upgrades of legacy PHP codebases to the newer (supposedly better) versions then I personally would become a fan.

Everybody can tout the newest and greatest. Working with -- and improving -- the legacy stuff is what earns the respect of many, myself included.

(Disclaimer: not a JS dev. You seem to imply some kind of irony that JS devs mock PHP but this is not the case with me at least.)


Your point of view is so backwards. You dismiss an entire tool because legacy work exists? So because Windows 3.1 doesn't upgrade to Windows 10 I shouldn't bother? Because IE11 doesn't run ES5 we shouldn't work with it?

You are free to criticize the enormous amount of legacy codebases out there, but that is not the fault of PHPs core team and the community working to improve the language and ecosystem, and it doesn't make PHP a bad tool. Environment maintenance isn't the languages responsibility and PHP has been -very- careful with it's upgrade path. It's childs play to move between versions.

If anything you should be impressed at legacy PHPs robustness. How long will modern stacks last unatended? Barely a year in my experience before some CI or dependancy breaks and the whole tower crumbles. Try and run an NPM based build pipeline in two years and tell me how that automated upgrade path worked out.


This analogy doesn't work. You can still use a lot of these legacy mistakes today.


You can write bad code in any language, but we've tried to move on from that and employ solid software engineering within PHP.


> You dismiss an entire tool because legacy work exists? So because Windows 3.1 doesn't upgrade to Windows 10 I shouldn't bother? Because IE11 doesn't run ES5 we shouldn't work with it?

The crucial difference here is that nobody ever will even attempt to upgrade the PHP 5.X websites to PHP 7 or 8, whereas a lot of people gradually upgraded from Win 3.1 to Win10 over the years.

Obviously I am not saying "don't bother". Judging by your strong words (which don't make your argument more compelling) you seem to be a fan so you do you, code all the PHP that you like. I am giving you a realistic take on why many skip PHP. You getting a bit emotional over this doesn't change my stance.


It's just a tired discussion sorry, hence the strong words, I respect your points of view so far and I appreciate the discussion (for what it's worth I've been upvoting your comments as I see they're getting grayed out).

There is a strong, intelligent community backing PHP and working with PHP, and you are dismissing it for things out of their control I feel. You're not taking a realistic perspective on what starting a project with PHP is really like today. It's fast, it has easy deployability, and you can use all the software engineering best practices you so please. Instead you're implying that the abandoned projects from yesteryear are what PHP is about today.

During it's prime, the web exploded with PHP installations built by people who had never touched software before and may never have touched it again, that's what legacy PHP is. It can't be fixed by the PHP community because those installations aren't run by people who are involved in the PHP community. PHP has a very sane upgrade path, but the people who run those installs will never know about it or be interested in it.


I agree that mine -- and many others' -- points of view on PHP are skewed by a sort of a "shameful past". And I agree it's uncharitable. I am facing the same myself in the Elixir community which, despite being a very well-made language with an excellent runtime is still being derided by its lack of strong typing.

So I know your sentiment and I sympathize with it.

I too was a bit too dismissive and I am sorry.

My point was more along the lines of the seeming fact that the PHP community shrunk and, as you said, the original people who made it popular are long gone, likely to never come back. Not sure if anything can be done about it. As a fan of a fringe language I know how it is.

For what it's worth, I only heard good things about Laravel and several other libraries (or frameworks? don't know) along the lines of big productivity boosts.

Here's to hoping that one day all of that will converge somewhere where all of us will have to deal with less BS.


There is such a thing as writing code that is good, functional and satisfies the business requirements; this same code can also run untouched for years. Yes, this is a thing, and actually a hallmark of quality work.


Non-constructive snark with zero correlation to the topic at hand.

I have been visiting -- and repairing -- PHP 5.X websites for years. They are anything but stable. I did maintain Java mastodons for years and they have been both (a) stable and robust and (b) legacy. But PHP projects I was in long time ago -- far from it. Could be anecdotal evidence.

To not make this annoying for future readers: I am okay with legacy code sitting untouched (although I will argue it is usually more broken than you seem to imply), but I am not okay with advertising a language that definitely didn't help the reputation of the IT branch in the eyes of everybody else. PHP sits on a broken foundation.

I have a right to express my opinion and you have a right to express yours. Let's leave it at that since we aren't going to achieve anything here (except flags for non-constructive posts).


Oh please !!


It's not a binary of good/bad, it's how good and how bad? PHP in my mind is quite bad. A lot of other languages still suck, but suck significantly less.


It's a shame PHP borrowed so much from Perl but not separate "eq" and "==" operators that coerce to string and number before comparing. I want always one or always the other, not both unpredictably.


PHP half-copied Perl a lot. Another example is implementing arrays and hashes, but not as separate types. So you have odd behavior like the array sort functions sorting values but not renumbering keys because there's no distinction between an array and a map. Or copying Perl's string interpolation and concatenation operator but not its regular expression literals, so writing regexes in PHP requires an extra layer of escape sequences.


I'm not aware of any extra layer of escape sequences required in PHP compared with Perl. Could you elaborate?


PHP requires escaping quotes and literal backslashes (i.e., backslashes not used for metacharacters or escaping other characters). For many cases, it doesn't affect anything, but it can turn into a bit of a nightmare if you're matching anything involving backslashes since you need four backslashes to match one literal backslash. Perl's regular expression literals are also nicer for things like splitting a long expression over multiple lines.


I thought preg_quote() took care of that just like quotemeta in Perl?


Yeah, that's a real shame. I wrote lots of Perl and while it requires a bit of experience, I almost never incurred in issues such as those we see in PHP and JS. The fact Perl is also quite strict in you specify if the variable is hash, array or scalar also helps a lot, and IMHO the concept that it's the operator, not the operand that determines the type of the arguments is very powerful.


Yes, maybe, but who would write that? Seems an arbitrary example used to denigrate weak typed languages in general. Not sure what is bering tested there, but easy enough to do this: if (is_numeric($foo) and !intval($foo)) {


Should a string, when converted to an integer, become 1, not 0? (Unless it is a string of only digits, like "567". Then it should become 567.)

  (bool) 'x'  //  true
  (int) true  //  1
If a string is truthy in a boolean situation, shouldn't it also be truthy in a numeric one?

The types form a hierarchy in my mind, from simple to complex: from boolean to integer, to float, to string, to compound types like arrays and objects. It seems like values should cast up and down the chain uniformly.


These are easy questions.

1. If you convert a string which has non-numbers to an integer, an error is raised

2. Truthiness should not exist as a concept. Just booleans and expressions that evaluate to booleans.

PHP's backwards compatibilty promise makes these more difficult questions, of course.


> 2. Truthiness should not exist as a concept.

Meh, it's actually quite useful in many situations.


Easy questions but not obvious answers.

What kind of error to raise? Do you return a NaN? Throw an exception? Both have their uses.


Throw a standard exception. E.g. python calls it a ValueError. Returing NaN is clearly wrong design.


In reality, if you know what you are doing, you would never compare like that (0 == "foo"). What would be the use case?

If you on the other hand don't know what you are doing, you are going to shoot yourself in the foot in one way or another.

If you really have some dynamic type situation and you for some legitimate reason don't know if your variable is going to be an int or a string, you would just compare using the === operator, which is a standard and well-known tool and does not have this issue.


I could see this coming up in a pretty straightforward way actually. I used to have a coworker that would always write his conditions backwards, so like

    if(false == something) {}
In this case it's a semi-obscure C programming style to avoid accidentally using = in an if statement. (I think it's silly, but a decent number of people do it). So I could easily see a case in PHP where someone writes...

    if(0 == $foo)
and $foo usually holds a numerical string, but maybe a user entered "" there and now it's cast to 0 and maybe that code path shouldn't be taken.


> What would be the use case?

Reading data from an excel sheet. A1 is “foo”, A2 is blank. B1 is =A1, which evaluates to “foo”, and B2 is =A2, which evaluates to 0. If you want to aggregate according to column B, you have to compare 0 and “foo”.


You have an excel sheet with various unknown data types which can be a string, a number or a formula that needs evaluation? And you don't check types of variables before comparing them? In that case I would argue that you fall into the category "not know what you are doing", because checking the type would be the first needed thing. That is the very first situation in which it is needed in fact...


I think that the big mistake PHP and JS made was to loosely pick their weak typing from Perl while trying to look too much like Java. Perl has different operators for numeric and string equality, so you know that `$a == "3"` would convert both its arguments to number and then compare them, while `$a eq "3"` would convert them both as strings and run a character comparison on them. Using == for both was kind of asking for people to get confused and mess up, IMHO.


I’ve never used Perl but I periodically mistake eq and = in bash scripts. It’s quite annoying since there’s no obvious indication that one is for numbers and one is for strings.


There's no obvious indication what the standard C function

  void (*signal(int sig, void (*func)(int)))(int)
means; if you don't know C it's plain gibberish, impossible to fully understand.

Everything in computers (and I may say, more in general in life) is based on assumptions made on knowledge we implicitly have. After a while, writing eq or == becomes natural and you stop mixing them up. These kinds of errors are frequent when you don't use something daily; for instance, after more than a decade I still keep forgetting Powershell's syntax because I don't use it often and when I do, I tend to do basic things.


Seems like this still doesn't fix the infamous "1E1" == "010", which I guess can't be fixed without breaking a lot of old code and assumptions.


Why does 0 == "foo"?


"foo" is type casted into an int and, since it's not an int even when casted, it gets casted to a falsey 0.

(int)"9" === 9

(int)"a" === 0

I assume this will cause backward incompatibility? I could see people relying on this instead of is_numeric. Not sure I like this change, since it introduces multiple ways of type casting (and therefore complexity). Before, it was obscure but simple, now, it'd be both obscure and not simple.

EDIT: It will cause backward incompatibility https://wiki.php.net/rfc/string_to_number_comparison

I'm surprised this was 44 against 1 in the vote. Seems like it's very likely to cause issues. IMO they should just implement a "strict comparison" mode, where comparing different types throws a runtime exception. Or just deprecate "==" for "===".


>I'm surprised this was 44 against 1 in the vote. Seems like it's very likely to cause issues.

On the other hand, seems very much like the kind of crap that you're better to rip the band-aid and fix, than to keep for an eternity lest the fix causes issues...

>Or just deprecate "==" for "===".

As if this would cause less issues?


> you're better to rip the band-aid

Not for silent failure. This soft of change means that tons of systems will fail in unexpected ways in unexpected places, and often people will never know about it. It's a security nightmare.

Ripping the band-aid off is much more compelling when it will result in an exception. Silent failure, though. Ugh, if you actually care about writing reliable software, it's the worst.


Automatic type conversion is prone to errors. Either they acknowledge that and force strict comparison (or add a way to do that), or they ignore and thus their change has no real effect, since mistakes will still happen with the new scheme.

>As if this would cause less issues?

Causing issues isn't the problem. Transparently breaking code is. Deprecating would give time for people to adapt their code and would warn them about the downsides or loose type conversion. Changing the semantics of loose comparison will make static analysis and finding potential issues way harder (and may post-pone finding the problems it introduces to years down the line). Adding a mode that warns/throws when encountering loose comparison would both make current code still work, and warn users of potential issues.


> I assume this will cause backward incompatibility?

Yes but I assume codes that will be broken by this are already broken.


"If you compare a number with a string or the comparison involves numerical strings, then each string is converted to a number"

And a string that doesn't look like a number at all is converted to 0.

No, I'm not condoning this madness.


Well, in C you can cast any random set of bytes to an int, so there's that...


Sure but in C you usually do that explicitly, on purpose. Here it's almost certainly not what you wanted or expected.. Also, C is 20-30 years older and I don't think I've ever read anyone really compliment its type system in any way other than saying it's simple.


If one doesn't know the language, they can write shitty code, regardless whether it's C or PHP. Any decent PHP programmer knows about the `===` operator that doesn't do any type juggling.


This problem isnt really fixed with === per se. Converting strings to numbers is a common need, and the weird type conversation here is broken (and being fixed) for more than just implicit conversions in comparisons.

The PHP team is calling it "saner numeric strings"...seems to indicate they agree.


Should it be shocking? It's another language that has it's own history and culture. It evolved it's own way.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: