Hacker News new | past | comments | ask | show | jobs | submit login
'9223372036854775807' == '9223372036854775808' (php.net)
129 points by moe on April 12, 2012 | hide | past | favorite | 138 comments



I can understand the rationale for coercing strings to numbers for an operation that is not valid on strings, but coercing strings to numbers just because it's possible is clearly a terrible idea. It's like they looked at JavaScript and decided that the == operator was just not hazardous enough.


Please have a look at the comments by "jabakobob at gmail dot com" and myself ("nikic@php.net"). They explain why such behavior is actually good, in most cases.


Throwing away data without unavoidably good reason is BAD behavior. Period.

We've got gigabytes of RAM and terabytes of storage to work with; don't throw away characters in a string just because they don't fit in a compact data type used only because there is a passing resemblance of one to the other.

If I'm comparing two 53-digit barcodes, and they differ only by the last digit (checksum), then it's very important that comparing those two STRINGS comes up FALSE.


> If I'm comparing two 53-digit barcodes, and they differ only by the last digit (checksum), then it's very important that comparing those two STRINGS comes up FALSE.

Then use === and do a type-strict comparison.

    <?php

    // Prints bool(true)
    var_dump('9223372036854775807' == '9223372036854775808');

    // Prints bool(false)
    var_dump('9223372036854775807' ==='9223372036854775808');


Last week I spent 4 hours trying to find what turned out to be a missing = because C++ will not complain, but will behave very differently, when = is confused with ==. Now you want to add === to the mix?

I'll stand by the axiom that throwing away data should NOT occur unless no other sensible option is available. If I'm comparing two literal strings, I shouldn't have to start with the obscure knowledge that a simple comparison will result in an aggressive attempt to perform two consecutive non-obvious type casts high risk of data loss.

I'm reminded of the great Belkin router fiasco: wireless routers were shipped with the "hold muh beer" great idea that random web page requests would be redirected to Belkin ad pages. I don't buy Belkin products any more (and that was years ago now) because knowing they would go there broke the trust that they wouldn't. Ditto here: if PHP is going to go to great lengths to try to throw away critical data (hey, I'm storing those numbers as strings BECAUSE I need all the digits), then I can't trust that the language won't do other similarly stupid things. I'm working in an industry where such a cavalier attitude to data can cost MILLIONS of $$$ over one failure, and can't afford to use a language where such failures are systemic. That there exists a workaround is inadequate. </tangent>

Fine. I could use ===.

The problem remains that a fundamental axiom of the language design is that casting lossless to lossy data types, without direction or warning, is considered acceptable. Ya know, if PHP wants to convert my numeric strings to integers for comparison, fine ... IF it maintains precision and preserves all the data. I shouldn't have to know of and use other operators/functions to explicitly avoid a pathological pursuit of forgetfulness.


> Last week I spent 4 hours trying to find what turned out to be a missing = because C++ will not complain, but will behave very differently, when = is confused with ==.

Uh, all reasonable compilers warn about ambiguous use of = as a truth value.

  $ g++ -Wall -c a.cc
  a.cc: In function ‘int foo(int, int)’:
  a.cc:2:12: warning: suggest parentheses around assignment used as truth value [-Wparentheses]

  $ clang++ -Wall -c a.cc          # output is colored
  a.cc:2:9: warning: using the result of an assignment as a condition without parentheses [-Wparentheses]
    if (x = y) return 0;
        ~~^~~
  a.cc:2:9: note: place parentheses around the assignment to silence this warning
    if (x = y) return 0;
          ^
        (    )
  a.cc:2:9: note: use '==' to turn this assignment into an equality comparison
    if (x = y) return 0;
          ^
          ==
  1 warning generated.


You're assuming simple single-operator comparison. It compiles if ( x = y && p ) return -1; with joy.


Correction: if ( (x = y) && p ) return -1; to avoid -Wall warnings.


> C++ will not complain

Every half-decent compiler will spit warnings at you, though.

> Now you want to add === to the mix?

It's been this way forever. PHP does all these conversions intentionally and trusts that you want them done. If you don't, go write C++ with FastCGI yourself. The difference between == and === is something you should pick up within your first few days of using PHP - what kind of industry has an economy measured above the "MILLIONS of $$$", but can't afford a dedicated PHP programmer?

I'm almost glad there isn't an official first-party public bug tracker, mailing list, etc. for Javascript - it would be ten times worse than this.


No offense, but it took you four hours to chase down a syntax error? If it survived long enough to see that kind of debugging effort, it was almost certainly dead code; seems like a unit test would have caught it before the first commit. And what compiler are you using? GCC certainly will issue warnings when the result of a = operator is used in a boolean context.


"No offence", yes, always a good way to start one's post! I think you missed out the "just saying", though. You need that too.

'=' vs '==' is not a syntax error. Consider "x=y=z" vs "x=y==z". And it's in somebody else's code. And they wrote it 2 months ago, but the programmer who's using it has only just started working with it. And they are super busy and don't have time to look at it. And it sort of looks like the problem is in the code you changed last week.

You can easily lose 4 hours over this stuff... have some imagination ;)


I'm just sayin', but this is a ridiculous strawman. (Well, you're right that it's not a syntax error in the sense of compiler output. I should have been more precise and called it a "syntax goof" instead.)

I addressed the "someone else wrote it two months ago" point above: If that happened, and this was in the code, it was dead code for two months, because it clearly couldn't have been running correctly. That's a process problem, not a syntax issue, and the appropriate fix is clearly not to modify the syntax of the language.

(Edit for ctdonath: good grief. 1.) the reply was to to3m's post, not yours. 2.) The "two months" thing comes straight out of his example, please read it. 3.) It was a JOKE, based on his chiding me for language. 4.) Why are you still flaming about this?)


"two months ago" is your own straw man. You made that up.

I'd written the code the day before, and it was failing a pre-commit unit test. As I posted elsewhere, this kind of "forgot the second =" error can compile without warning, esp. within a complex evaluation. The process was running fine, as it caught the existence of the logic error early. That it took hours to find was a matter of tracing symptoms back to cause in an embedded system not easily debugged when running.

One could make a valid argument that this is a problem of language syntax, as everyone has been bit by the = vs == difference. As such, and in line with this thread OP, you'd think a new popular language would learn from that mistake and would not throw === into the mix as a solution to an even more obscure problem (casting a string to a float? really?).


Uh, the unit test being executed before the first commit was failing. It's a logic error, not a syntax error.

And I was using GCC, which is happy to assign values amid a more complex logical evaluation. Try: if ( x = y && p ) return -1;


    $ echo 'int foo(int x, int y, int p) { if(x=y&&p) return -1; return 0; }' > test.c
    $ gcc -Wall -c test.c
    test.c: In function ‘foo’:
    test.c:1:1: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
This warning has been there since at least gcc 2.x, I believe.

I'm not making fun of you, really. But seriously: if you are spending 4 hours chasing bugs that can be trivially found by turning warnings on in your compiler, you have some process issues unrelated to C or PHP syntax.


It likes if ( (x = y) && p ) return -1;

(No, I don't have the exact statement in question handy. It was more complicated than this example.)


Yes, and that is because "(x == y) && p" should be written as"x == y && p" according to gcc. If you add the brackets gcc takes that as a sign that you really wanted to do an assignment.


Unit tests aren't going to help with syntax errors. If it won't compile, then you can't run a unit test against it!


Yes. But as has been pointed out, I was imprecise: I should have said "syntax goof" or somesuch. It was an error "in" syntax, not "of" syntax.


I didn't notice the correction :-) In that case, yeah a unit test will definitely help.


That's an opt-in workaround, and proposing it as a solution is like trying to cover the Sun with your thumb. You'll have an infinite number of errors everywhere you can input numbers where the == operator is used, but you can only choose to use === in the finite number of expressions that you, yourself, write. This is insane!


I was trying to figure out a concrete example why this behaviour was dangerous, but couldn't come up with anything. Thanks for your barcode example - it eases my mind (and I'll use it in arguments with my friends!)


If == is supposed to be a numeric comparison and you are supposed to use strcmp for strings then why does == (sometimes) work on strings? Make it always coerce to a number, or die trying. Overloading it as numeric or string compare depending on what the string looks like is ridiculous.

And BTW, things that work in most cases, but not all cases, are exactly where bugs come from.


things that work in most cases, but not all cases, are exactly where bugs come from

If I didn't have such a low pain tolerance, I would get that tattooed on my body somewhere prominent.


Well, "why such behaviour is good" is stretching it. Maybe just "why such behaviour is"?


It may be a matter of taste, but I always considered this particular feature of == beneficial. In the normal case, it is what you need (in the context that most PHP applications deal with, namely communication over HTTP and with MySQL).

What is shown here is just a rare edge case that you should not normally encounter.

But I think that == has some other behaviors which are really detrimental. Like 0 == 'hallo world'. Sadly those can't be fixed due to backwards compatibility.


If it's mildly good in most cases as you claim, and catastrophically bad in certain cases as seems clear, then overall it's not a net gain.

This is what continually baffles me about PHP. There are plenty of bad languages out there, but PHP seems pretty much unique in having a community that actively resists any improvement and actively campaigns to keep the broken stuff around.


interesting when you see your comments on hacker news as submissions to hacker news a day later: ( http://news.ycombinator.com/item?id=3825132 )

"...My 5cents on that (recently published issues):

<? if ('9223372036854775807' == '9223372036854775808') { echo "I can not count!\n"; } ?> (see https://bugs.php.net/bug.php?id=54547)

or

built-in PHP web server dies with a large Content-Length header value: The value of the Content-Length header is passed directly to a pemalloc() call in sapi/cli/php_cli_server.c on line 1538. The inline function defined within Zend/zend_alloc.h for malloc() will fail, and will terminate the process with the error message "Out of memory". (see https://bugs.php.net/bug.php?id=61461) Luckily we are getting Javascript ready to replace all PHP on the server sooner or later ;-)..."


You are bragging as if you invented that bug :)


This behavior is documented here: http://php.net/manual/en/language.operators.comparison.php

  If you compare a number with a string or the comparison
  involves numerical strings, then each string is converted
  to a number and the comparison performed numerically.


Ah, an obscure point of absurdity which utterly kills my pending interest in the language. If this sort of thing exists under the hood, revealed only by a detailed analysis of the specification, what other nonsense is there? Going so far as analyzing a string to determine whether it consists entirely of numbers for the non-sequitur process of then and only then converting it to what it isn't for logical evaluation is working pretty hard to do something counter-intuitive; might be tolerable if it actually preserved all digits, but not only does it work hard to convert a string to an integer, it then converts large integers in to floating-point values - not just one, but two layers of explicitly undesired and unnecessary and unreasonable typecasting.

I'm currently working with barcodes: numerical strings from 6 to 55 digits. In no way can I risk having one barcode be evaluated as equal to a literally different barcode just because the symbols in that string just happen to exhibit a passing resemblance to data of a different type.

Again, it's not just that it has loose typing. It's that it's taking what is OBVIOUSLY a string, converting it to an integer, THEN converting it to yet another data type which imposes data loss.

Intolerable for real-world use. A toy language. Alas, PHP, we hardly knew you...

ETA: Oh, I'd love to know the justification for the downvoting.


> Going so far as analyzing a string to determine whether it consists entirely of numbers

I was about to give an outraged reply that, if PHP is like Perl, then it doesn't scan the string afresh, just keeps a flag indicating whether or not it thinks a string is numeric. However, it turns out that's not true at all. `Perl_looks_like_number`, defined in `sv.c`, calls `Perl_grok_number`, defined beginning on l. 577 (as of v5.14.2) in `numeric.c`, which (after some book-keeping) does this:

    if (s == send) {
      return 0;
    } else if (*s == '-') {
      s++;
      numtype = IS_NUMBER_NEG;
    }
    else if (*s == '+')
    s++;

    if (s == send)
      return 0;

    if (isDIGIT(*s)) {
      UV value = *s - '0';
      if (++s < send) {
        int digit = *s - '0';
        if (digit >= 0 && digit <= 9) {
          value = value * 10 + digit;
          if (++s < send) {
            digit = *s - '0';
            if (digit >= 0 && digit <= 9) {
              value = value * 10 + digit;
              if (++s < send) {
                digit = *s - '0';
                if (digit >= 0 && digit <= 9) {
                  value = value * 10 + digit;
and goes on and on and on and on in the same vein. Sheesh! (I didn't forget to close that last brace; the next line is de-dented, but that seems to be a mistake.)


Perl does cache whether an SV contains something usable as an integer (the IOK flag) or a floating point number (the NOK flag). That's why you almost never see `looks_like_number` on its own and always called after using one of the appropriate flag checking macros.


"Intolerable for real-world use. A toy language."

Seems like a silly thing to say. I wouldn't write avionics software with it, but there are a billion websites demonstrating that it's pretty decent for real-world use. At least as good as any other language, I'd guess.


You could use strcmp like you're supposed to I suppose, or ===.


The language could not throw away data on an obscure whim.


It does not throw it away on an obscure whim. It's clearly documented, and well known. === is pretty much the standard.

Your lack of knowledge does not make it obscure.


> It does not throw it away on an obscure whim.

So you could have predicted this yesterday? Just because it's codified somewhere, it doesn't make it clear, or anything other than a whim, or a product of circumstances, at best. That's not how languages should be defined, even if PHP clearly demonstrates that they can end up that way by chance.

Rasmus' lack of foresight does not make it reasonable.


Yes, you could have predicted this yesterday.

In fact, this behavior has been documented explicitly for a year and a half: http://web.archive.org/web/20100808122711/http://www.php.net...

Earlier versions have said the comparison converts the numbers to integers though, which may be incorrect, and misleading if it was. Did PHP not convert float-like strings to floats in, eg, 2009? http://web.archive.org/web/20091024233139/http://www.php.net...


Why, it does indeed say, a link or two down from there, that strings will be coerced to float: http://web.archive.org/web/20091024234517/http://www.php.net...

The point, however, is that you shouldn't pepper your language with operations which have consequences as hard to foresee as this with no good reason, and I really don't think that saving yourself some type conversions here and there would do.


"no good reason" is entirely subjective, though. If your purpose is to make the language simpler to newcomers, implicit conversions everywhere are a great way to get things done. And the popularity of PHP (especially for new-to-programming people) heavily supports that they made the correct decision to work well for that market.

The same kind of logic is used to make `false == ""` true. Or any 'falsy' language. If you want strictly typed behavior, yes, it's stupid to do that. If you don't, then it makes some things simpler, at the expense of more edge cases that are unlikely to happen - note that this bug was reported in 2011, and people are acting like it's a new thing. Because it comes up so rarely that, while it technically exists, many people never encounter it.


> "no good reason" is entirely subjective, though. If your purpose is to make the language simpler to newcomers, implicit conversions everywhere are a great way to get things done. And the popularity of PHP (especially for new-to-programming people) heavily supports that they made the correct decision to work well for that market.

You are right, in a way. Sure, it may attract and retain more newcomers, but that's like saying that tobacco is "teenager friendly". I think it's not beginner friendly at all if you must have years of experience to avoid the innumerable pitfalls which PHP lays for you all over the place, learning, e.g. the range of Integers in PHP, which defines when a string will be either a float or an int, or that you should actually use strcmp.

In Python, Ruby, or heck, Haskell, you'd just have to do == and there would be no surprises.


I agree entirely, but we're thinking like programmers. Grab someone who's never programmed at all and ask them if `123 is equal to "123"`.

This essentially breaks down to the top-down vs bottom-up education style debate. You can learn the gritty details and get caught up in minor details that may not matter in other languages, or learn how to do something, and get tripped up by the details in other languages. Similarly, we could teach kids abstract algebra, or basic +-*/ and then over-simplify when they try to divide by zero.

Neither is ideal, both have useful traits and problems, so we have to pick one. Or try to come up with something radically different.

edit: to ask it another way: if PHP is a massively-popular gateway drug to the world of programming, but it gives some people horrible flashbacks for the rest of their lives, do you want to make it illegal and close the door to a huge number of people?


You've never used floats, have you?


Yes, I have, thank you. How's the health?

Oh, and "everyone knows that PHP lights the upper-rightmost pixel in you screen purple and will crash if there's no screen" would not, in fact, justify such a thing.


> Yes, I have, thank you.

You must hate programming then.

    9223372036854775807.0 == 9223372036854775808


Alright, I'll spell it out for you: the behaviour may be what you'd expect from floating-point comparisons, but it doesn't have to be a floating point comparison in the first place.


No, it doesn't. Language designers make lots of decisions that end up being silly. But they make them. In PHP, if it looks like a number, it will get treated like a number when being compared via ==. It's a simple, well-established, fundamental rule.


Sadly, while '===' is the quick fix, you then have to litter your code with type casting operators if you're comparing numbers, particularly those sourced from, say, a database, where everything is returned as a string. Or from GET or POST data, where everything is a string.

This tripped me up when I was trying to compare two numbers, one of which was the result of a COUNT query via PDO. Of course, that COUNT result was a string.

I suppose if you worked entirely with strings it's alright. Or it wouldn't be so bad if you could make the reasonable assumption that functions returned appropriately typed data.


It does what it's intended to do. Use the methods you are supposed to use. You're just arguing for the sake of arguing now, or you just don't know what you are talking about at all.


> Intolerable for real-world use. A toy language. Alas, PHP, we hardly knew you...

I'm sorry, I would like that to be true, but programmers rarely are half as smart as they think they are. We have many more years of PHP and its resulting insanity ahead of us.


Downvoting because you're railing against a language without understanding it. That, and your C++ comment above, make you sound like the programmer version of internet tough guy. Whatever your real life skills may be, it certainly sounds like a lot of posturing.


Just do the /bin/sh thing and prepend "x" on the strings before comparing them. :)


"Intolerable for real-world use. A toy language. Alas, PHP, we hardly knew you..."

Have you been out on the internet the past decade? Do you have a tendency to make extremities of things and trying to stand the needle on its tip?


It throws away data on an obtuse, obscurely-documented, whim.

Making extremes? My medical application would fail FDA approval in minutes if ported to PHP precisely because of this issue.


All languages have their idiosyncrasies. You can pick out some obscure aspect of any language and say $LANG sucks.

Btw, PHP's behavior doesn't totally make sense to me either. But I'm willing to assume that its users and designers have thought this through and it makes sense for PHP's intended use cases, because I don't know PHP.

Javascript (which most people on HN seem to like) also has similar issues (null vs. undefined, == and === etc). It got so bad that "Javascript: the good parts" had to be written to define a de-facto sane subset of the language. People are actually writing in Coffeescript (in part) to avoid Javascript's pitfalls.

YMMV.


Your medical application should probably be using === for comparisons. I'm not defending PHP's language design, but I think it's pretty well known among professional PHP developers that you should almost always use === and avoid implicit type conversions.


How does one ensure that they do not accidentally use ==?


Got me. How do old-school C programmers ensure they don't accidentally use = when they mean == ?

There are actually some decent commercial PHP IDEs, believe it or not. I wouldn't be surprised if some of them are able to Warn on loose equality comparisons. I don't have much direct experience with them though.


How do old-school C programmers ensure they don't accidentally use = when they mean == ?

By making the constant the expression's lvalue. But they don't have to do this anymore; gcc warns when you accidentally use = instead of == now.


"obscurely-documented"? That seems pretty clear to me, given that it's a fundamental feature of the language, and documented (floating point problems too) in an obvious location.


Would your medical application fail FDA approval if you used a language like C or C++ that contained strncmp? Because that function throws away data, too.


There's a difference between strncmp existing precisely so you can specify how many characters to compare, vs. throwing away trailing characters in a string just because, by sheer chance, it contains only numerics.


Who ever thought of this? I can understand "don't use == for strings", or implicit conversion when one of the arguments is a number, but this is extremely sneaky as it will only behave this way with two numerically-looking strings. Ouch. Why does it even do that check, wasting cycles beyond a normal string comparison? It looks like an elaborate and cruel trap for novice programmers.

Edit: I know == is not a string comparison. But you'd expect it to fail in a predictable way when passed strings that are not parseable as numbers, instead of trying to fall back on a string comparison so that people get the wrong idea.


PHP gets input from various places, and one might want "1.00" from a form or URL to equal "1.0" read from a cookie or via database adapter that stringifies everything.


If only we had some sort of way of explicitly telling our computers that we wanted to to convert a sequence of characters into a number.


What I don't understand is why all these weak-typed languages don't optionally allow one to strong type a variable.


Weak typing does not require bonkers coercion of types.


That's like saying why doesn't someone create a combination hammer and screwdriver. A proper tool for every job.


And as much as I love python and javascript, I'd really like to use a screwdriver now and then rather than leave hammer marks everywhere, especially if they are being used for something beyond formatting HTML.

This hamdriver you speak of: tell me more.


This thing already exists, it is called "strong typing" (in C++, Go, Java etc).


Yes, that was the joke.


Not defending the bad design decision, but string comparison in PHP is strcmp, not ==.


IMO this conversion should fail if the number represented is not valid, or fall back to arbitrary precision math (GMP library for instance), instead of silently making such a questionable conversion.

I generally avoid exceptions/error_levels in all languages but this is probably a good cause for them, in order to keep the rest backwards compatible.


How do you test that it isn't valid? I think you may be underestimating the difficulty in predicting whether a particular decimal number can be accurately represented as a floating point type. It may be non-obvious, but the representation of precise numbers changes depending on the number base. For example, in base 10, we can't precisely represent 1/3. In base 3, we can (0.1). In base 2, we can't precisely represent 0.1, or 1/10. A simple number such as 0.1 has no precise representation in base 2.

In this case in php, the truncation happens due to loss of precision in the mantissa of the double precision float. But there are so many other ways to lose precision, I don't think it's reasonable to ask a language to attempt to account for them.

This is why languages should have clear rules about when type conversion occurs, and allow the user to prevent it when it isn't desirable.

edit: in fact, amusingly, php seems to be doing some non-standard stuff with its floats. I was going to make a point about how you can't determine if a double is a "correct" representation of a string decimal, but in mocking an example I discovered something odd. Check this out:

This is what one should expect:

$ ruby -e'puts "%5.25f" % 0.1'

0.1000000000000000055511151

$ perl -wle'printf "%5.25f\n", 0.1'

0.1000000000000000055511151

But in php:

$ php -r'printf("%5.25f\n", 0.1);'

0.1000000000000000000000000

$ php -r'printf("%5.25f\n", "0.1");'

0.1000000000000000000000000

Is php changing the type conversion? Or not using double precision at all?


The idea of casting everything to float is just wrong; a string of digits without a dot should be converted to a (big) integer, without any loss of precision. Anyway I just can't fathom how anyone could think weak typing is a good idea; it might make some superficial things "easier", but you'll soon shoot yourself in the foot with it.


Precision in floats is accepted as a fact of life. Converting exceedingly big INT string literals to bigger float types is a hack to win some naive benchmarks against languages doing native proper arbitrary precision. This shouldn't have happened in the first place, but since it's there and backwards compatibility is important, it could be shown in a warning error_level[1] that the conversion happened, so the user could at least check that and hack a solution together.

[1] This doesn't really happen in PHP, but you have $php_errormsg that can be set without stopping execution (as happens with some errors/warnings when error_level is not set to E_STRICT, and below that depending on the error). This errors could be triggered in a new level, let's say "E_PEDANTIC".


You just reiterated my point. This is precisely why your original suggestion of failing an "invalid" conversion is untenable. ALL conversions lack precision -- there is no such thing as an "invalid" conversion.


Nope, there are conventions.

We have two distinct problems here:

- strings converting to numbers without there being any number on any side. "Peculiar" of PHP but easy to circumvent using string comparison. IMO belongs in PHP4 but not at all in PHP5, which is an attempt at a "general-purpose" language. To be frank, I thought PHP4 made more sense because it was 1st of class at what it did, while PHP5 falls short to a number of languages in basically everything.

- automatic integer-to-float comparison to accomodate bigger integers. A horrible hack to squeeze a little extra performance in naive benchmarks in computers with no native 64 bit integer support. This really makes no sense whatsoever now and may have had some partial justification in the early 90s, prior to PHP4 even.

Both ideas are terrible and pretty much unique to PHP of all popular languages.

This is not a philosophical debate about typing styles or the existence of perfect type conversions. PHP's problems in this regard are relics from a dubious past.


- automatic integer-to-float comparison to accommodate bigger integers. A horrible hack to squeeze a little extra performance in naive benchmarks in computers with no native 64 bit integer support. This really makes no sense whatsoever now and may have had some partial justification in the early 90s, prior to PHP4 even.

No, this is not unique to php. Many popular, comparable languages perform an int -> float conversion. For example, Perl:

$ perl -wle'print "20938410923849012834092834" + 0 if "20938410923849012834092834" == "20938410923849012834092835"'

2.0938410923849e+25

- This is not a philosophical debate about typing styles or the existence of perfect type conversions. PHP's problems in this regard are relics from a dubious past.

Conversion from string -> number, and loose numeric types which auto-convert to float are near universal in loosely typed languages, out of necessity -- if such a scheme doesn't work consistently it can't be used at all. This brings me back to my point. You said "IMO this conversion should fail if the number represented is not valid, or fall back to arbitrary precision math". My response is that you cannot provide such a rule on the basis of "is it valid" because there is no such thing as a "valid" type conversion -- ALL have precision loss. It is inherent in the datatype. When I said "you may be underestimating the difficulty in predicting whether a particular decimal number can be accurately represented as a floating point type" you should perhaps read that as "you cannot do this, it is not possible".

Instead you might suggest that no loose conversion, no loose typing be permitted in a language design -- and I would agree wholeheartedly. But your suggestion that this be handled on a case-by-case basis depending on the numeric value is fundamentally unworkable. Big integers are not the only area this type of problem presents.


Perl5 is old enough so this behaviour has a niche. Possibly even PHP4 is old enough for that. But PHP5 was born when both 64 bit ints and good, open arbitrary precision libraries were available and very fast.

This, now, where it's being used, is absurd. There is no two ways to that. And this doesn't happen elsewhere to this extent.

I will leave you the last word though. Cheerio.


Testing validity should be pretty easy. Remove insignificant zeroes from both ends, then only accept the conversion when it is precisely correct. This can be done by simply converting to double and then back and seeing if you get the same thing. If there's any difference, it's not sufficiently accurate, bail out.


You're missing the point -- there is no such thing as a precisely correct conversion to a floating point number.

Your suggestion would make type conversion utterly unusable as it would fail seemingly randomly -- for example on simple numbers such as "0.1"


Sure there is such a thing. 0.25 can be precisely converted to float. You're right that such a thing would fail a lot, but that doesn't mean the goal is impossible, merely that achieving it is not very useful.


I did not say it was "impossible," I said it would be "utterly unusable."

I am happy to see you agree with me.


What's the low-level cost to determine whether a given string is "numerical" or not? Also, would "001" be considered a numerical string?

This reminds me of Excel-like programs that by default, automatically detect (and convert) fields that appear to be dates/strings...often to catastrophic effect.


You just hit upon by far my least favorite bug in Excel. Any integers around I think 40000, which are quite easy to come upon in various datasets, are automatically "detected" as a date. It makes Excel very dangerous for reading CSV files.


This is why I like programming languages with type systems and "numerical towers":

    Prelude> "9223372036854775807" == "9223372036854775808"
    False
    Prelude> "9223372036854775807" == 9223372036854775808

    <interactive>:1:25:
        No instance for (Num [Char])
          arising from the literal `9223372036854775808'
                   at <interactive>:1:25-43
        Possible fix: add an instance declaration for (Num [Char])
        In the second argument of `(==)', namely `9223372036854775808'
        In the expression: "9223372036854775807" == 9223372036854775808
        In the definition of `it':
        it = "9223372036854775807" == 9223372036854775808
    Prelude>
Yes. Strings are not numbers.


You don't actually need type systems to test equality and identity the right way...


The point here is that these strings are neither equal nor identical.


Yes, it's the implicit conversion that matters. If you try to write 2 == 2.0 in Haskell, it will blow up, because doubles and integers are not the same type. You need to explicitly convert one of them to another representation before you can compare them. That guarantees defined and repeatable semantics at compile time, which I think is excellent.

(This is not strictly required, of course; you can write a typeclass that defines a two-paramater ==, instead of a -> a -> Bool, it could be a -> b -> Bool. But that's dumb, so nobody does.)


Uh? I just tried it and it works without blowing up.

Prelude> 2 == 2.0 True


As a working programmer who has to use PHP, I just use === all the time and have long since moved on from even thinking about the insanity of PHP's == operator. Kinda like JavaScript programmers.


This. PHP's "==" is yet another trap of incompetent language design and almost all code that ever used it does the wrong thing for some inputs. $x == $y && $y == $z doesn't even tell you that $x == $z, much less that $a[$x] == $a[$y].


A somewhat related story dealing with MaxInt in Javascript.

One of the worst bugs I've encountered years ago involved the conversion of Javascript int from string to number. Javascript's long integer has only 53 bits, while most other languages have 64-bit long int. When the backend language generated Javascript snippets (JSON) containing integers greater than 53 bits, the horror started at the frontend. Javascript happily truncated the int to 53 bits upon conversion from string to int. It was not a happy tale since those long integers were account numbers. The wrong accounts ended up getting updated, randomly at first appearance.


I think the lesson there is that numeric types should only be used for things you actually want to do arithmetic with. An account ID that just happens to be all digits should still be stored and transmitted as a string.


The lesson I got was to be very careful about data type limitation when going across language boundary. The problem is not limited to numeric types. Different encoding and code page can screw up string values as well.


If you're not using UTF-8 everywhere then you're doing it wrong. Exceptions made for legacy systems, but you should get that data into UTF-8 as soon as possible.


It's unwise to lazily adopt a silver bullet without understanding the context and thinking through the consequence. I can say if you are not using XML with encoding specified to encode everything everywhere, then you are doing it wrong. You should get all your data into XML as soon as possible. Of course it sounds ludicrous.


XML is just one data storage and exchange format above many, with no particularly interesting properties and no compelling reason to use it. UTF-8 is the only encoding that's ASCII compatible, widely accepted/expected, and can represent any text you'll ever encounter.

I can come up with half a dozen reasons to use something other than XML for data storage. I've yet to hear anyone give me a compelling reason to use something other than UTF-8 for encoding strings. Just because what I said is absurd when you replace UTF-8 with XML doesn't mean the original was absurd.


UTF-8 is not efficient for random access.

I don't have problem with UTF-8. I have problem with the silver bullet attitude advocating using an approach for all cases without thought. That's just intellectually lazy.


No encoding that can handle all the necessary languages will be efficient for random access.

I'm not saying don't think about it. But once you think about it, I think there's really only one sane conclusion to reach.


Never say never. UTF-32 handles them just fine.


Precomposed versus decomposed accents? Jamo versus precomposed Hangul characters? The Unicode code point is rarely useful thing to know about on its own, and code which assumes that one code point equals one "character", for whatever definition of a character is in use, is likely to work poorly with UTF-32.


Some of the comments on the bug report asking for the operation of == to change are misguided. Such a change would break many real-world applications. As I understand it, PHP is casting number-like strings to integers, and this fails because both numbers generated from the cast are above PHP_MAX_INT, so their values are undefined.

This is easily solved by using the type-checking === operator, which exists for that purpose.

I hesitate to say that this is a feature, not a bug, but it is clear that this is documented behavior.


In this perspective, writing `strcmp` everywhere is not boilerplate, but a requirement.

(Such that

    strcmp('9223372036854775807', '9223372036854775808');
returns -1, meaning the strings are not equal.)


No, you just need strict comparison (like you need it in JS and any other language with weak typing):

    php > var_dump('9223372036854775807' == '9223372036854775808');
    bool(true)
    php > var_dump('9223372036854775807' === '9223372036854775808');
    bool(false)


I beg your pardon, but you do not 'need it in JS', if 'it' is referring to using === instead of == to compare strings.

Here is what node.js says:

    > "9223372036854775807" == "9223372036854775808"
    false


Ah but more fun with Javascript

>>> "9223372036854775807" == "9223372036854775808" false

>>> 9223372036854775807 == "9223372036854775808" true

>>> 9223372036854775807 == 9223372036854775808 true

I believe the grandparent post is more referring to "general" use cases then this one. Personally I now default to strict comparison operators both in JS and PHP unless I explicitly want a loose comparison and end up missing most of these strange vagaries these days.


Not in this case, but it's commonly accepted that == is "broken" in JavaScript, and === should be the equality operator of choice most of the time.

The one case where I sometimes use == is if I want to check for "null" or "undefined". Even then it scares me.


The fact PHP fails at floating point math isn't news (and I swear I've seen this exact bug somewhere, same numbers and everything, somewhere else)...

Its the fact PHP refuses to fix it that is the news.


This sort of thing is an intrinsic property of floating point math. It has limited precision. When the numbers get sufficiently large, that precision is insufficient to distinguish successive integers. That is to say, this is symptomatic of PHP implementing floating point stuff correctly.

This is what ghci (Haskell) says:

  Prelude> 9223372036844775807 == 9223372036844775808
  False
  Prelude> 9223372036844775807.0 == 9223372036844775808.0
  True
This is what Python says:

  >>> 9223372036844775807 == 9223372036844775808
  False
  >>> 9223372036844775807.0 == 9223372036844775808.0
  True
Here is what SBCL (Common Lisp) says:

  * (= 9223372036844775807 9223372036844775808)
  NIL
  * (= 9223372036844775807.0 9223372036844775808.0)
  T
Lua:

  > print(9223372036844775807 == 9223372036844775808)
  true (!!!!!)
  > print(9223372036844775807.0 == 9223372036844775808.0)
  true
Javascript:

  alert(9223372036844775807 == 9223372036844775808)
  true (!!!!!)
  alert(9223372036844775807.0 == 9223372036844775808.0)
  true
Other languages that will also do this[1]: Javascript, Lua. Languages that won't: anything with actual, honest-to-god integers, and not floats or doubles masquerading as them.[2] Languages that actually handle numbers sensibly: Lisp.[3] I'm not familiar with any others that actually treat rational numbers like rational numbers, but I expect there are some. (It's still, of course, impossible to treat real numbers like real numbers, meaning that this sort of thing will also happen there.)

[1] Well, not the string-to-number bit, but whatever.

[2] Except for the niggle that they'll still do this when you're using floating point numbers, because this is what floating point numbers do.

[3] https://en.wikipedia.org/wiki/Numerical_tower


I think you may be missing the point, or maybe it's I who am missing the point. Your three very thorough examples are a good way of showing how (most?) languages handle floating-point arithmetic vs. arbitrary arithmetic.

But it seems to me - and let me stress that I am not a PHP developer and won't be bothered to install PHP on my machine at this time - that PHP is failing to exhibit exactly the behavior your code examples are giving.

Put it another way - type coercion 'run amok' being another thing entirely, you are correct that this bug stems from the fact that PHP is converting these integers to floating-point, and the standard floating point implementations will all behave in this exact way (thus, not a PHP bug.)

However, the issue here is that (again, "most?") languages also provide an easy way to get to arbitrary-precision arithmetic - and indeed, in the three examples you posted, you simply encode in the most natural way (by simply writing them) the two integers and they automatically compare correctly.

My understanding is that this is not the case in PHP, and that is a shame.


I agree that, when we are talking about high-level languages, I prefer ones that will transparently convert integers to bignum when required. I'm just replying to the contention that (paraphrased) "this is a bug in PHP's handling of floating point numbers".


Ah, point taken, sorry about that. :)


Lua internally represents any kind of number as a floating-point one.

http://www.lua.org/pil/2.3.html


To anybody not familiar with PHP, it appears to fail at string comparison.


Appears? It DOES fail at string comparison. They're strings. You can't make it any plainer that they're strings. Sometimes, like with barcode processing, strings contain only numeric characters - they're still strings regardless. If I can't ask if one string literal is equal to another string literal without it making a concerted effort to not just convert both strings to integers (which, BTW, would be fine so long as it preserved all digits) but to then convert it from a non-lossy to a lossy data type - that's TWO unwarranted type conversions - then the axioms of the language are untenable, rendering it useless in the real world and making it little more than a toy.

If you're going to run with PHP's axioms, then the specifications should (1) demand an unlimited-length integer type, and (2) NEVER convert a non-lossy to lossy data type without very good overt reason.

Fail.


Arguably, such an obtuse implicit cast is failure at string comparison.



This is how the strtol function in the standard C library works. Here's a test program:

    #include <stdio.h>
    #include <stdlib.h> /* strtol, strtod */

    /* Convert string to long and return true if successful. */
    int string_long(char *beg, long *num)
        {
        char *end;
        *num = strtol(beg, &end, 10);
        return *beg != '\0' && *end == '\0';
        }

    int main(void)
        {
        char *x_str = "9223372036854775807";
        char *y_str = "9223372036854775808";

        long x;
        long y;

        int x_ok = string_long(x_str, &x);
        int y_ok = string_long(y_str, &y);

        printf("x_str = %s\n", x_str);
        printf("y_str = %s\n", y_str);

        printf("x = %ld (ok=%d)\n", x, x_ok);
        printf("y = %ld (ok=%d)\n", y, y_ok);
        printf("x and y are %s\n", x == y ? "equal" : "not equal");

        return 0;
        }

And here's the output:

    x_str = 9223372036854775807
    y_str = 9223372036854775808
    x = 9223372036854775807 (ok=1)
    y = 9223372036854775807 (ok=1)
    x and y are equal
I compiled it like so:

    gcc -c -Wall -Werror -ansi -O3 -fPIC src/test_num.c -o obj/test_num.o


Incidentally if you try using those numeric constants directly in a C program, it fails to compile:

    #include <stdio.h>

    int main(void)
        {
        long x = 9223372036854775807;
        long y = 9223372036854775808;

        printf("x = %ld\n", x);
        printf("y = %ld\n", y);
        printf("x and y are %s\n", x == y ? "equal" : "not equal");

        return 0;
        }
The error message is:

    gcc -c -Wall -Werror -ansi -O3 -fPIC src/test_num2.c -o obj/test_num2.o
    src/test_num2.c: In function ‘main’:
    src/test_num2.c:6:11: error: integer constant is so large that it is unsigned [-Werror]
    src/test_num2.c:6:2: error: this decimal constant is unsigned only in ISO C90 [-Werror]
    cc1: all warnings being treated as errors


Judging by the huge number of stupid comments on that bug report this bug was posted on reddit.

If you don't actually use PHP (as most of the commenters seem not to) don't comment on the bug, it has nothing to do with you and you are just making noise.


You can do literal comparison using ===

But the problem in my perception is that it is very error-prone. A less error-prone solution would be only convert one string value to number when the another is really a number. For example:

'9223372036854775807' == 9223372036854775808


> A less error-prone solution would be only convert one string value to number when the another is really a number.

That is, in fact, exactly how JS handles it.


It's the same with JavaScript regardless of whether you use == or ===

9223372036854775807 == 9223372036854775808

true

9223372036854775807 === 9223372036854775808

true


Perhaps those two lines are the same in JS and PHP (I do not have PHP installed so I can't confirm, but I do have node installed):

    > 9223372036854775807 == 9223372036854775808
    true
    > 9223372036854775807 === 9223372036854775808
    true
However, that is not the two lines written about in this post. The following works as one would expect on JS, but (I assume, based on the report) not in PHP:

    > "9223372036854775807" == "9223372036854775808"
    false
    > "9223372036854775807" === "9223372036854775808"
    false


ah, I see. I missed that. I just checked in chrome's console and see false for both when they're strings.


The difference being that the php code sample is trying to compare two strings.


While this comment is beside the point, I'm still curious why this version works the way it does. I may be missing something obvious, but why does JS think those two integers are the same?


Probably due to implicit conversion to floating point format. After conversion, mantissa and exponent are the same, and least significant bits are truncated during conversion.


JavaScript doesn't have an fixed integer data type.


Is "javascript does it too" supposed to be a defence?


More of an observation than anything else. I don't actually use PHP but find these recent PHP focused articles are helping me learn at thing or two about languages I do use.


Fair enough.


[deleted]


Your example would return true if just the first character was alike. And no, that's not how it works. PHP converts strings that look like numbers into floats and compares afterwards.


WHY?

"Hey, this very precise literal has a passing resemblance to another data type, so let's convert it to that, but since it's got too many digits and resembles another data type at this point, let's just throw away some of the data, make the conversion, THEN perform the equality comparison." Yeah, that makes sense. FAIL.


I'm not defending PHP. I'm not saying it makes sense. I'm just saying that's what's happening.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: