Hacker News new | past | comments | ask | show | jobs | submit login
Formatting floating point numbers (zverovich.net)
170 points by matt_d on June 14, 2019 | hide | past | favorite | 130 comments



Author of Grisu here.

It's really surprising, how long it took to find the most efficient algorithms for double-conversions. In 1980 Coonen already published a good algorithm, but that one was kind-of lost.

For a long time Steele & White's algorithm was the state-of-the-art (with some improvements here and there over time).

Now, Ryu is by far the fastest algorithm out there, but that took ages...

Unfortunately, there doesn't seem to be an easy-to-use, complete library for Ryu yet.

The Grisu library (https://github.com/google/double-conversion) is probably still the go-to library if you don't want to implement it again...


I just wanted to say I appreciate the imagery preceeding you paper, as I think it captures the struggle of FPP conversion perfectly. As my office's unofficial, "FPP expert" I sympathize the struggle. You've done an excellent job crafting a paper that not only captures your work, but you've managed to articulate the FPP problem so accessibly. Anybody that cares to consider your paper can gain insight into the true nature of the dragon.


For those who are interested in the paper:

https://dl.acm.org/citation.cfm?id=3192369

"Ryū: fast float-to-string conversion", Ulf Adams 2018

"Section 5 reviews the existing literature. All of the early ap- proaches require arbitrary precision arithmetic in the general case. More recent developments work with xed precision arithmetic, which is faster, but can increase the complexity of the code. By contrast, Ryu ̄ is simple and fast."

Also interesting:

"We did not compare our implementation against the C standard library function printf, as its specification does not include the correctness criteria set forth by Steele and White [15], and, accordingly, neither the glibc nor the MacOS implementation does."


My understanding is that fixed-length conversion to string is more or less trivial, the tricky part is writing a fast algorithm to compute the shortest round-trippable conversion to string, correct?


Depends.

"Correct" conversion, where the length doesn't matter (which could be called "fixed length" for a certain length that is big enough) is easier, because one can avoid a lot of rounding and imprecision issue: using the right technique, just produce enough digits until the imprecisions don't matter anymore.

However, fixed length is as difficult as the shortest, if the length is limited, since the last digit sometimes lies on the boundary and thus runs into the same difficulties as the shortest digit. Think of it this way: needing to decide whether 6 digits is enough, is often similar to asking whether the 6th digit is a 0 or a 9 (roughly speaking). This means that producing the best fixed-length representation (for length 6) runs into the exactly same question.


What about libfmt? Frankly I find it a little easier to use than double-conversion and it's equal in speed.


Looks like a great library. Not sure I actually knew about it. I will probably update the double-conversion library README to add a link.

Internally it uses the same algorithm, Grisu, (with some additional optimizations) so there shouldn't be any difference in output.

If it's easier to use for you, just go with it!

I'm guessing the double-conversion library is a bit more flexible, but I could be wrong.


libfmt's implementation of Grisu is what the original post (Formatting Floating Point Numbers) concerns, no?


libdouble-conversion is indeed far more flexible. I think that's what leads to all the extra typing! I mean, just including it is almost more typing than I'm willing to undertake ;-)

  #include "third_party/double-conversion/double-conversion/double-conversion.h"


In emacs, Meta-/ might help. :)

> Expands to the most recent, preceding word for which this is a prefix.


MSVC has the std charconv stuff for this, and the other implementors should soon enough too. That means sooner or later all C++ compilers will probably have Ryu soon


This highlights, for me, one of the reasons why our obsession with "human readable" serialization formats is so misplaced. You're burning up a lot of CPU power here, potentially creating a synchronization point, just to do something that could be addressed with a simple 4 or 8 byte memcpy.


It also creates a lot of misconceptions of how accurate floating point math is.

An interesting aspect of decimal formatting is how it frequently masks representation error (i.e encoding of 0.1 etc), because the error is symmetrical in the formatter/parser it makes such non-representable fractions appear to be stored perfectly to unsuspecting users.

This can be quite deceptive, if more users were aware of just how many of the simple rational decimals they input were converted into imprecise representations they probably wouldn't trust computers as much as they do. To confuse things more, when operating upon periodic representations the result often matches the representation error of the the equivalent accurate decimal value encoded directly (i.e there were errors, but everything canceled out through formatting) - when they occasionally do not (e.g 0.1 + 0.2) it makes the problem appear all the more elusive.

I think this detail is often lost in explanations of 0.1 + 0.2, that is: representation error is extremely common, 0.1 + 0.2 is merely one of the cases where it both persists through the formatter AND you notice it because the inputs were short decimals, and it's so obvious that the output should be a non-periodic decimal..

TL;DR formatting floats to decimals makes us trust floating point math far more than we should - it's healthy to remember that the formatting process is necessarily imprecise and that you are merely looking at a proxy for the underlying value. Remember that next time you look at _seemingly_ non-periodic decimal output.


Or perhaps it makes us mistrust more than we should? How often are we working on problems where the difference between 0.1 and 0.100000000000000006 is of any practical importance?

When I format a float out to 5 decimal places, I'm sort of making a statement that anything beyond that doesn't matter to me.


Small differences can matter.

One that comes to mind for me is the old game Kohan: Immortal Sovereigns. There was a native Linux port for the game that could play online with the Windows machines. However, the pathfinding solution used floating point numbers to make the decisions. A very small difference in the way Linux and Windows rounded the numbers meant that an online game between a Linux and Windows host would de-synchronize after 15-30 minutes because a model that zigged on the Linux host would zag on the Windows host, and the game would detect the difference in state after a bit and kick one of the players out. There was no way to rejoin the game at this point either.

This bug was never fixed, and the company that made the port (Loki) went out of business.


> When I format a float out to 5 decimal places, I'm sort of making a statement that anything beyond that doesn't matter to me.

Yes, it's true the vast majority of the time it doesn't actually matter. However decimal formatting does such a good job at giving us the impression that these errors are merely edge-cases, and calculators automatically formatting to 10sf etc further that illusion. If people are not aware it's only an illusion (or just how far the rabbit hole goes) it can be dangerous when they go on to create or use things where that fact matters.


Fair enough.

One thing along similar lines that I've always hated to see is when the floating-point type in a programming language is called "real".


haha, yes it's a terrible name, such arrogance to suggest fitting infinite things into a very finite thing. In fact they couldn't even call it rational, because even after the precision limitation it's a subset of rationals (where representation error comes from due to base)... Inigo Quilez cam up with a really interesting way around this limitation where numbers are encoded with a numerator and denominator, he called "floating bar", this essentially does not have representation error, but it will likely hit precision errors sooner or at least in a different way (it's kinda difficult to compare directly).


Yeah, that is more what I'm on about. I can accept that a computer's `int` type is not an infinite set, even if it does cause problems at the boundaries. It feels like more of a little white lie to me.

Whereas, even between their minimum and maximum values, and even subject to their numeric precision, there are still so many rational numbers that an IEEE float can't represent. So it's not even a rational. Nor can it represent a single irrational number, thereby failing to capture one iota of what qualitatively distinguishes the real numbers. . . if Hacker News supported gifs, I'd be inserting one that features Mandy Patinkin right here.


Yeah, I think this is the single aspect that everyone finds unintuitive, everyone can understand it not having infinite precision or magnitude. It's probably a very reasonable design choice if we could just know the reasoning behind it, I assume it will be mostly about practicality of performance and implementing the operators that have to work with the encoding.


>Inigo Quilez cam up with a really interesting way around this limitation where numbers are encoded with a numerator and denominator, he called floating bar

Thanks for the read! More fun than the original article to my taste :)

Here's the link, since Googling "floating bar" nets you very different results:

http://www.iquilezles.org/www/articles/floatingbar/floatingb...


I just remember his name, writes lots of interesting stuff :)


I always wonder if he ever says "My name is Inigo Quilez. Prepare to learn!".

My favorite post by him is where he explains that you never need to have trigonometry calls in your 3D engine. Because everyone now and then you still see "educational" article in the spirit of "learn trig to spin a cube in 3D!" :/


Yeah, I've noticed in physics programming in other peoples code you can often come across the non-trig ways of doing things that avoid unit vectors and rooting etc (which are both elegant and efficient)... However i've never personally come across an explicit explanation or "tutorials" of these targeted at any level of programmer's vector proficiency. Instead i've always discovered them in code, and had to figure out how they work.

I guess the smart people writing a lot of this stuff just assume everyone will derive them as they go. That's why we need more Inigo Quilez's :D to lay it out for us mere mortals and encourage it's use more widely.


Common Lisp has had a rational number Type since the 1980s at least, presumably inherited from MACSYMA


Then you should hate integer too.


"integer" as a type is less offensive though, as it only has one intuitive deficiency compared the mathematical definition (finite range). Where as "real" as a type has many deficiencies... it simply does not contain irrational numbers, and it does not contain all rational numbers in 3 respects: range, precision and unrepresentable due to base2, and for integers it can represent a non-contiguous range!


countable vs. uncountable deficiencies...


But I can build a computer that would accept any pair of integers you sent it, add them and return the result. Granted, they'd have to be streamed over a network, in little-endian form, but unbounded integer addition is truly doable. And the restriction is implicit: you can only send it finite integers, and given finite time it will return another finite answer.

You can't say the same about most real numbers, all the ones that we postulate must exist but that have infinite complexity. You can't ever construct a single one of them.


I'm not sure what your point is. It's impossible to ask the computer to store those numbers, so why does it matter whether they can be stored?

Or in other words: It can store all the non-computable numbers you give it. That should suffice.


The way I see it, 'int' as a truncation of 'integer' nicely symbolizes the way the type truncates integers to a fixed number of bits.

Of course any self-respecting modern language should also provide arbitrary-precision integers at least as an option.


It is not a matter of modernity, but a matter of use-case. You don't need arbitrary-length integers for a coffee machine or a fridge. You don't even need FP to handle e.g. temperatures; fixed-point is often more than enough. So if you are making some sort of "portable assembler" for IOT devices, you can safely stick with simple integers.


> I think that illusion can be a bit dangerous when we create things or use things based on that incorrect assumption.

I'd be curious to hear some of the problems programmers have run into from this conceptual discrepancy. We've got probably billions of running instances of web frameworks build atop double precision IEEE 754 to choose from. Are there any obvious examples you know?


Operations that you think of being associative are not. A simple example is adding small and large numbers together. If you add the small numbers together and then the large one (e.g. sum from smallest to largest) the small parts are better represented in the sum than sum from largest to smallest. Could happen if you have a series of small interest payments and are applying it to the starting principal.


I've worked with large datasets, aggregating millions of numbers, summing, dividing, averaging, etc... and I have tested the orders of operations, trying to force some accumulated error, and I've actually never been able to show any difference in the realm of 6-8 significant digits I looked at.


Here's a common problem that shows up in implementations of games:

    class Time {
        uint32 m_CycleCount;
        float m_CyclesPerSec;
        float m_Time;
        
    public:
        Time() {
            m_CyclesPerSec = CPU_GetCyclesPerSec();
            m_CycleCount = CPU_GetCurCycleCount();
            m_Time = 0.0f;
        }

        float GetTime() { return m_Time; }

        void Update() {
            // note that this is expected to wrap
            // during the lifetime of the game --
            // modular math works correctly in that case
            // as long as Update() is called at least once
            // every 2^32-1 cycles.
            uint32 curCycleCount = CPU_GetCurCycleCount();
            float dt = (m_CycleCount - curCycleCount) / m_CyclesPerSec;
            m_CycleCount = curCycleCount;
            m_Time += dt;
        }
    };

    void GAME_MainLoop() {
       Timer t;
       while( !GAME_HasQuit() ) {
           t.Update();
           GAME_step( t.GetTime() );
       }
    }
The problem is that m_Time will become large relative to dt, the longer the game is running. Worse, as your CPU/GPU gets faster and the game's framerate rises, dt becomes smaller. So something that looks completely fine during development (where m_Time stays small and dt is large due to debug builds) turns into a literal time bomb as users play and upgrade their hardware.

At 300fps, time will literally stop advancing after the game has been running for around 8 hours, and in-game things that depend on framerate can become noticably jittery well before then.


Though inside game logic you should probably default to double, which would easily avoid this problem.


If I'm going to use a 64 bit type for time I'd probably just use int64 microseconds, have over 250,000 years of uptime before overflowing, and not have to worry about the precision changing the longer the game is active.


So using fixed point. You could do that, but you can't make every single time-using variable fixed point without a lot of unnecessary work. Without sufficient care you end up with less precision than floating point. If you don't want to spend a ton of upfront time on carefully optimizing every variable just to avoid wasting 10 exponent bits, default to double.


> in the realm of 6-8 significant digits I looked at

That is far inside the range of 64bit double precision. For error to propagate up to that range of significance depends on the math, but i doubt the aggregation you are describing would cause it... provided nothing silly happens to subtotals like intermediate rounding to precision (you'd be surprised).

Something like compounding as the parent was describing are far more prone to significant error propagation.


I've seen rounding errors on something as simple as adding the taxes calculated per line item vs calculating a tax on a subtotal. This was flagged as incorrect downstream where taxes were calculated the other way.

In a real-life transaction where pennies are not exchanged this could mean a difference of a nickel on a $20 purchase which isn't a meaningful difference but certainly not insignificant.


How much was the difference? Was there any rounding involved at any step? When dealing with money, I see rounding and integer math all the time. As another comment has mentioned, within 53 bits of mantissa, the number range is so big, we are talking 16 digits. I''d be curious to see a real-world example where the float math is the source of error, as opposed to some other bug.


It doesn't take much imagination to construct an example. Knowing 0.1 isn't exactly represented make a formula with it that should be exactly a half cent. Depending on fp I precision it will be slightly above or below a half cent and rounding will not work as expected. We found this in prod at a rate of hundreds to thousands of times per day it only takes volume to surface unlikely errors.


I thought I was clear that the result was rounded. The fp imprecision affects the correctness of rounding e.g. half to nearest even.


It is the rounding that seems to be an issue here.


    x = 1e-20 + 1e20 - 1e20
    y = 1e20 - 1e20 + 1e-20
    assert x == y


That's like the first rule of floating point numbers though: don't do that


The person I replied to claims to have looked at large* data sets and never seen a discrepancy in 6-8 significant digits. I thought I'd show them a small data set with 3 samples that retains no significant digits.

* Never mind that "millions" isn't large by current standards...


But you are observing all the digits, not just 6-8. It's implicit in the semantics of the operation, and that's something everyone who works with floating point should know.


I think I see where you're hung up. If I change it to:

    x = 1e20 + 1 - 1e20
    y = 1e20 - 1e20 + 1
    assert x == y
There is only 1 digit, and it's wrong. You don't even need 6-8. I probably should've used this as my example in the first place.


You're making the same mistake but now it's less obvious because the change of scale. When you compare floating point numbers, simple == is not usually what you want; you need to compare them with a tolerance. Choosing the tolerance can be difficult, but in general when working with small numbers you need a small value and with large numbers you need a large value. This dataset involves datapoint(s) at 1e20; at that magnitude, whatever you're measuring, the error in your measurements is going to be way more than 1, so a choice of tolerance ≤ 1 is a mistake.


Ugh, you're preaching to the choir. I wasn't trying to make a point about the equality operator, I was trying to make a point about x and y being completely different. I must be really bad at communicating with people.


That's the thing: x and y are not "completely different"; they are within the tolerance one would use to compare them correctly.


So umm, more than a "difference in the realm of 6-8 significant digits"?


That construction can turn any residual N digits out into a first digit difference. It wouldn't matter without making comparisons with tolerance way under the noise floor of the dataset. But yes, you have technically invented a situation that differs from that real-world anecdote in regard to that property, in an extremely literal interpretation.


And here I was, worried I might be tedious or pedantic, trying to argue that floating point is just not that simple. You've really outdone me in that regard.


JavaScript is precise for 2^53 range. It's unlikely that you're operating with numbers outside of that range if you're dealing with real life things, so for most practical purposes doubles are enough.


> JavaScript is precise for 2^53 range

What does this mean to you? It's very easy to get horrible rounding error with real-life sized things. For instance

    document.writeln(1.0 % 0.2);
The right answer is 0.0, and the most it can be wrong is 0.2. It's nearly as wrong as possible. These are real-life sized numbers.

btw: I think IEEE-754 is really great, but it's also important to understand your tools.


> document.writeln(1.0 % 0.2);

> The right answer is 0.0, and the most it can be wrong is 0.2. It's nearly as wrong as possible.

Just to clarify for others, you're implicitly contriving that to mean: you care about the error being positive. The numerical error in 0.1 % 0.2 is actually fairly ordinarily tiny (on the order of x10^-17), but using modulo may create sensitivity to these tiny errors by introducing discontinuity where it matters.


Call it a contrivance if you want, my point was just that you can get very surprising results even for "real life" sized numbers.

Not sure why you changed it from "1.0 % 0.2" to "0.1 % 0.2". The error on the one I showed was near 0.2, not 1e-17. Did I miss your point?


I mistakenly used 0.1 instead of 1.0 but the _numerical_ error is still x10-17, the modulo is further introducing a discontinuity that creates sensitivity to that tiny numerical error, whether that is a problem depends on what you are doing with the result... 0.19999999999999996 is very close to 0 as far as modulo is concerned.

I'm not arguing against you just clarifying the difference between propagation of error into significant numerical error through something like compounding; and being sensitive to very tiny errors by to depending on discontinuities such as those introduced by modulo.


I'm talking about integer numbers. 2^53 = 9007199254740992. You can do any arithmetic operations with any integer number from -9007199254740992 to 9007199254740992 and results will be correct. E.g. 9007199254740991 + 1 = 9007199254740992. But outside of that range there will be errors, e.g. 9007199254740992 + 1 = 9007199254740992


You are are describing only one of the types of numerical error that can occur, and it is not commonly a problem: this is only an edge case that occurs at the significand limit where the exponent alone must be used to approximate larger magnitudes at which point integers become non-contiguous.

The types of errors being discussed by others are all in the realm of non-integer rationals where limitations in either precision or representation introduce error and then compound in operations no matter the order of magnitude... and btw _real_ life tends to contain _real_ numbers, that commonly includes rationals in use of IEEE 754.


I didn't see anything in your message above indicating you were restricting yourself to integers.


Actually this is a source of many issues where a 64bit int say a DB autoinc id can't be exactly represented in a js number. Not a inreallife value but still a practical concern.


I spent a day debugging a problem this created. Without going into irrelevant domain details, we had a set of objects, each of which has numeric properties A and B. The formal specification says objects are categorized in a certain way iff the sum of A and B is less than or equal to 0.3.

The root problem, of course, was that 0.2 + 0.1 <= 0.3 isn't actually true in floating point arithmetic.

It wasn't immediately obvious where the problem was since there were a lot of moving parts in play, and it did not immediately occur to us to doubt the seemingly simple arithmetic.


I can't show you, but my job involves writing a lot of numerical grading code (as in code that grades calculated student answers in a number of different ways). I've had the pleasure of seeing many other systems pretty horrible attempts at this, both from the outside and in, in both cases numerical errors rooted in floating point math abound. To give an easy example, a common numerical property required for building formatters and graders of various aspects (scientific notation, significant figures etc), is the base 10 floored order of magnitude. The most common way of obtaining this is numerically using logarithms, but this has a number of unavoidable edge cases where it fails due to floating point error - resulting in myriad of grading errors and incorrect formatting by one sf.

These are an easy target to find issues that _matter_ because users are essentially fuzzing the inputs, so they are bound to find an error if it exists, and they will also care when they do!

When these oversights become a problem is very context sensitive. I suppose mine is quite biased.


Web frameworks usually don't need floats at all. Developers working on other things encounter these problems all the time.

https://randomascii.wordpress.com/category/floating-point/

https://stackoverflow.com/questions/46028336/inconsistent-be...


You'd be surprised. Decades ago I had to convert mortgage software that was using 32-bit floating point to fixed point arithmetic because they discovered that the cumulative errors from computing payments over 30+ year mortgages could lead to non-trivial inconsistencies.

Don't knock the impact of cumulative loss of precision.


It can in mathematical modelling


What is the relative error in 0.1? What is the relative error in 0.2? And what is the relative error in the sum? That's how one thinks seriously about floats.


That's a really succinct way of putting it. I would add a less catchy companion to highlight the affect of the formatter:

What is the relative error in 0.1? What is the relative error in 0.3? And what is the relative error in the sum? And whats the relative error in 0.4?

(i.e if they are the same it will be obscured by the formatter).


The most precise measurement I know of is the frequency of a laser-cooled Rb 87 standard 6,834,682,610.904333 which is handled with full precision by 64-bit floating point.

If you think this isn't precise enough for you, maybe you don't really understand your precision needs.


Or you aren’t doing something physical. For example there are tons of things in math that can use as much precision as you want. For a toy example, looking at rates of convergence or divergence in extremely small regions of the Mandelbrot set. There are techniques that cut down the requirement for precision for that problem, but they are necessary because the default level of precision is insufficient.


> if more users were aware of just how many of the simple rational decimals they input were converted into imprecise representations they probably wouldn't trust computers as much as they do

I disagree. People overestimate the accuracy of decimal encoding so much more than they ever overestimate floating point. Then when they see 0.1 + 0.2 they tend to learn entirely the wrong lesson, and start underestimating the accuracy of floating point alone.

> TL;DR formatting floats to decimals makes us trust floating point math far more than we should

The only reason a decimal encoding can cause too much trust is because we trust decimal too much. Decimal fails in exactly the same ways.


I wonder about the terminology of calling these "errors". That implies that there's a mistake, when really floats are among the most accurate ways to represent arbitrary real numbers in a finite number of bits.

Another way to phrase it is that a single float value represents a range of real numbers, rather than a single real number. So 0.1 stored as a double precision float really represents the range of real numbers from roughly 0.09999999999999999862 to 0.10000000000000001249.


> Another way to phrase it is that a single float value represents a range of real numbers, rather than a single real number. So 0.1 stored as a double precision float really represents the range of real numbers from roughly 0.09999999999999999862 to 0.10000000000000001249.

But the range is different for every number, it's not a constant error like precision epsilon. I think "representation error" is reasonable name, as it's used when describing the error in converting representation between base10 and base2.

> I wonder about the terminology of calling these "errors". That implies that there's a mistake, when really floats are among the most accurate ways to represent arbitrary real numbers in a finite number of bits.

If you only care about the irrational portion of real numbers maybe, but for rationals this is definitely not true, you could use a format based on fractions which unlike IEEE754 would contain no representation error compared to base 10 decimals - in fact they would even allow you to represent rationals that base10 decimals could not such as 1/3. Inigo Quilez came up with one such format "floating bar" [1]. There are advantages and disadvantages to such a format (e.g performance). But in terms of numerical error I doubt IEEE 754 is best, for representation error it is definitely not (and I say that as someone who likes the IEEE 754 design O_o).

[1] http://www.iquilezles.org/www/articles/floatingbar/floatingb...


Hm, yes, perhaps "error" is still a good word for it.

I'm not sure I understand what you mean by "holes", the idea is that all the real numbers between roughly 0.09999999999999999862 and 0.10000000000000001249 are represented by the same double precision floating point value as for 0.1.

Thinking about it in terms of ranges helps to understand why 0.1 + 0.2 goes "wrong", as the middle of the range of real numbers represented by a double precision floating point value 0.1 is slightly higher than 0.1.


Sorry I edited that away as realised it was not strictly relevant to what you meant... I was highlighting how there are values in your range that _can_ be represented exactly e.g 0.125 like I said not really relevant.


What you should be saying is not "can be represented exactly" but "can be represented identically in decimal"

There is nothing inherently less accurate about binary floating-point representation than decimal. Some binary numbers can be identical to their decimal counterparts, while others aren't. This is OK, as we should know what amount of precision is required in our answer and round to that.


> What you should be saying is not "can be represented exactly" but "can be represented identically in decimal"

No, exactly, if the decimal was non-periodic such as 0.1 it is exact, it is exactly representing 1/10, but base2 cannot represent 1/10 exactly.

> There is nothing inherently less accurate about binary floating-point representation than decimal.

Yes there is, and this is the part that most people do not intuit, this was the entire point i was making about the deceptiveness of formatting masking representation error of the fractional part in my original comment... we are not talking merely about precision, but the representation error which is dependent on the base:

       base10  base3  base2
 1/10  yes     no     no
 1/3   no      yes    no
 1/2   yes     no     yes
For the first decimal place in base 10 (i.e 0 through 9 denominators of 10) you will find only 1 out of 10 possible fractional parts can be represented exactly in IEEE 754 binary. IEEE 754 actually specifies a decimal format, not that it's ever used, but if you were to implement it you would see these discrepancies between binary and decimal using the same format at any precision by noticing a periodic significand in the encoding when the source representation is non-periodic.

This is not a deficiency of IEEE 754 per say, but the entire concept of a decimal point in any base, which makes it impossible to finitely represent all rational numbers, kind of making them pseudo irrational numbers as a side-effect of representation... the solution is to use fractions of course.


At least for me, that's the most interesting part of it. (0.125 isn't in the range I mentioned above though)


Ok this is getting a bit meta recursive, I made a mistake in the explanation of my mistake that I attempted to edit away before you replied to it. Anyway, I was talking about the range above your range, yo dawg :D...

How if you took a limited precision range of rationals in base 10, e.g 0.10000 to 0.20000 and convert them to base 2, there are "holes" of non-representable numbers in the range. These holes are of different sizes, (one of which is the range you are talking about), so I summarized it as that.


I'm also not convinced they help human readability that much.

e.g., when I've done my own binary formats, I typically also create a "dump" utility that converts the file to a readable text dump. I find the ergonomics of this are just fine for my purposes.

  less somefile.json
isn't that much easier to type than

  dumpdat somefile.dat | less

That said, different purposes are different. I'm not so worried about human-readability for short-lived stuff, but, if you're talking about data that may be sitting around for decades, then the human readability question becomes something you might better characterize as "future comprehensibility": If someone's trying to dust off 50-year-old business data from deep in the archives, they're going to have a much higher chance of success if it's CSV files than if it's in some custom binary format whose documentation was lost 40 years ago.


The thing is, "human readable" files aren't, once they get to a nontrivial size. Picking apart a megabyte XML file in a text editor is arguably harder than the same data as a blob in a hex editor.

In fact, thinking about it now, I bet Wireshark would be awesome for that.


??? Even integers are difficult to read in an hex editors thanks to little endianness..


Eh, once you're used to reading integers in hexadecimal, byte order isn't that important. If I told you that 1234 was going to be written right to left, you'd know how big 4321 was, right?


“Something in the middle” could be an option. E. g. (in C)

  > printf ("%a\n", M_PI);
  0x1.921fb54442d18p+1

  > printf ("%a\n", 42.0);
  0x1.5p+5


Hear, hear!

Sometimes you do need to reliably communicate a floating-point number across a text channel. Hex float is low-risk in the sense that both the encoding and decoding process are much simpler and therefore easier to verify than something that round-trips through decimal. Yes, there are some correct implementations of the decimal<->binary algorithms that round-trip correctly. But there also remain a number of incorrect implementations.


To be clear: human readable formats when you have humans reading the data makes sense. What is insane is having machines reading & writing human readable formats to talk to each other.


This should be a lot higher. Hexfloats are not just a C/C++ thing, plenty of languages and software packages support them by now. And they can very much be used as part of a text input format, to avoid the sizeable overhead of "human-friendly" float parsing. If you maintain a text-based format that has any chance of being a CPU/compute bottleneck in this way, you should definitely make sure that hexfloats are supported!

Edit: someone else mentioned that roundtrip accuracy could also be an issue. Theoretically true, I guess, but really, if you want to use decimal-based float input you should just use an algorithm that roundtrips correctly, and eat that overhead. Anything else has a potential to impact reproducibility of results, etc. It's not worth it silently corrupting your data just for that saving in compute cost.


The "obsession" came about from all the problems with binary formats. Byte ordering. Unspecified signedness. Memcpy:ing structs with padding. Programmers micro-optimizing by choosing the smallest possible field size, and then you can't change it when it turns out to be too small. Format specs using WORD as if that's a well-defined size. Using int, long, etc, and assuming they are the same size everywhere.

It's a trade-off of CPU usage versus flexibility. It's not surprising that textual serialization formats took off together with managed memory languages.


Rubbish. It's possible to have poorly defined text formats just as easily as binary formats. Remind me how to read CSV again? How do I store 64-bit integers in JSON?

Text formats are popular because you don't have to use a special tool to view them. That's the only reason.


> Text formats are popular because you don't have to use a special tool to view them. That's the only reason.

But you need a special tool to get a computer to read them. Seems like not a win. Throw in compression & encryption... and suddenly you need a special tool anyway.


Byte ordering issues are easily addressed by still emitting text, but not converting to a base that differs in a prime factor. Just emit the mantissa and exponent in – say – hexadecimal form and the negative sign bit as a `-`. It avoids all the endianes issues and is trivially converted.


Fortunately with text you can't have issues with byte ordering, unspecified signedness, or padding... oh wait. ;-)


>You're burning up a lot of CPU power here

This is easy to write off as micro-optimizing, but it's no joke. I recently profiled some hard real-time software and was surprised to find that >50% of busy processor time was spent somewhere in the fcvt() family of functions.

It's fine for the use case in the medium term (we're within deadline and not starving for more cycles) so I'm avoiding the serialization format transition headache, but it did send me down the rabbit hole of floating point conversion techniques for half a day.


A few years ago I wrote a naive Java string-to-floating point conversion routine that only converted 'simple' numbers, i.e. no scientific notation, only one (fixed) decimal separator, etc. It was the first Java code I had written in 10 years and the approach was probably not very Java-esque (I imitated what I'd do in C++, using (IIRC) java.nio.ByteBuffer or something similar). Anyway this simple implementation was ~10 times faster than the 'officially' suggested way, using formatted string conversion routines, even after asking around (on SO and similar) on how to do high performance string-to-float conversion.

There are probably similar issues in other programming languages, this is not to rag on Java; my point is that when you don't need the 'fancy' floating point formatting stuff, and you need high performance (in my case, I had to convert many billions of floating-points-as-strings and the conversion took ~50% of my program's runtime, which was measured in days), it can pay off to write a custom version of something as seemingly mundane as converting a string to a number.


The biggest win, in my experience, is simply recognizing when you don’t need all ~16 significant digits provided by the standard implementations. If you’re only printing e.g. six digits, and can afford to round in the wrong direction 0.00000001% of the time, you can skip a ton of work. And as you mention, it gets even easier when you don’t have to support scientific notation.

The standard library functions occupy an awkward middle ground between this type of much faster, slightly sloppy float-to-string converter, and direct storage of the binary representation; in most applications where float serialization performance is actually relevant, at least one of these two alternatives is better.


Agreed.

I deal with financial data which is in well-defined formats for the various asset classes. You can save a lot of time by just writing some parsing functions to deal with your exact needs and avoid needless work. And, hopefully avoid using the built-in functions for conversions as those will almost always be A LOT slower due to what they're required to handle.

I prefer to store that type of data in csv files because there are times when you need to read through the data to find errors, adhoc analysis, etc....If you're loading those types of files very frequently, there are better formats of course. But, when the parsing time is peanuts compared to the runtime of your program, it's nice to be able to load the file into other programs when you need without having to convert it first.


> that only converted 'simple' numbers, i.e. no scientific notation, only one (fixed) decimal separator, etc. [...] Anyway this simple implementation was ~10 times faster

What about numbers with a large order of magnitude? e.g 1e234. Is making very long strings still faster than switching to e notation?


I don't know - I knew that my input data wouldn't have to deal with such cases so I didn't have to deal with / check for it. This was part of 'etc' catchall :)


I guess an alternative is _always_ using e notation but don't bother moving the decimal, i.e effectively the same as the encoding but in base 10. That would be both simple and have small maximum string length.

Users probably wouldn't like it thought :P damn users.


I had the same experience the other way. I halved the runtime of an ETL program by writing a custom float to string function.


> that could be addressed with a simple 4 or 8 byte memcpy.

This won't account for endianness.


In a serialization format you should explicitly define the endianness to be used and then convert to and from it explicitly in the serialize/deserialize code. So memcpy is wrong but the grandparent point still stands. Defining a serialization format for float/double that's just 4/8 bytes in little-endian might be a better idea than the mess we sometimes have right now because IEE-754 is more of a standard than anything else in the space.


Floating point formatting deals only with the default endianness of the hardware though. Afaik if you need to deal with that you have to do it yourself regardless (with bswap functions or similar).


Fortunately much of the world has already settled on an endianness, and the one that happens to be most logical to process (byte offset n has weight 256^n, much like bit x has weight 2^x.)

In the future, hopefully big-endian machines will be as uncommon as bytes that are not 8 bits, and the only protocols still using it are the ones which were designed long ago.


Indeed, "human readable" really means "in something similar to English" much of the time, and is a vague and quite subjective term; would someone who knows only English think Chinese is not "human readable"?

Personally, I consider binary formats to be just as readable (from the hexdump, or sometimes even the ASCII directly) and in some ways even easier to parse without ambiguity --- it just requires a bit (pun intended) of learning, like any language. I've worked with someone who could "read" TCP/IP; and I can read much of Z80 and some x86 Asm as well as a few other binary formats.


Sometimes I wish one could configure REPLs, for example, to print floats not as the shortest input which maps to that float ("0.1"), but the actual number represented by the float ("0.1000000000000000055511151231257827021181583404541015625") by default.

Somewhat annoying, but might avoid a lot of confusion ("Is floating point math broken?", https://stackoverflow.com/questions/588004/is-floating-point... ).


Would you like 1e-100 to print as

  1.00000000000000001999189980260288361964776078853415942018260300593659569925554346761767628861329298958274607481091185079852827053974965402226843604196126360835628314127871794272492894246908066589163059300043457860230145025079449986855914338755579873208034769049845635890960693359375e-100?


Nice example! Makes sense that it can get quite long, though it was not present in my mind. But, hey, that's what the number represented actually is.

As I said, make it optional...


IMO hex should be the default representation of binary fp. Printing hex gives the exact value, without giving the illusion of having precision in the hundreds of digits, and can be done way faster than conversion to decimal.

  >>> 0.1.hex()
  0x1.999999999999ap-4
  >>> 1e-100.hex()
  0x1.bff2ee48e0530p-333


We had such floating point printing performance issues in java once that i needed to implement a grisu variant that didn't fall into bignum calculations and had few other tweaks to short circuit the printing and improve some common cases. It made a decent impact especially on garbage generation (it was specifically zero gc). Since been ported to a couple other languages, and I was thinking of redoing it lately for another project and using the newer algo. https://github.com/jnordwick/zerog-grisu


To me the most complicated thing about formatting and parsing floating point numbers is usually if it uses decimal comma or decimal dot. It never seems to be what you expect if using a mix of different language operating systems.


As someone from a country which uses commas as a decimal separator (Norway)... Just use a dot. If you're not going to make a _serious_ commitment to actually make your software work well in all the various locales, use a dot. People will understand it regardless.

You're probably going to do something stupid like showing the user a comma-separated list of numbers at some point, which will be needlessly hard to parse for a human when your numbers use a comma as a decimal separator. You (or someone else, or your users if they are technical) will probably at some point make something which tries to parse some output, and that will break if you switch between points and commas arbitrarily. Your users will want to copy a number your software prints and paste it into a calculator or REPL or something, and that probably doesn't work with comma as a decimal separator.

Half-assed "localization" from people who don't know anything about how other countries work is just needlessly annoying to be subjected to.

That's at least my perspective as a Norwegian who experiences a lot of _bad_ localization even though I know English fairly well and configure all my computing devices to use English. The perspective of someone from a country where English is less well known might be different.

<rant>

Examples of horrible localization from clueless American companies or organizations include:

* A lot of software will use your IP address to determine your language. That's annoying when I'm in Norway and want my computers to use English, but it's horrible when abroad. No, Google, I don't want French text just because I'm staying in France for a bit.

* Software will translate error messages, but not provide an error code. All information about error messages online is in English on stackoverflow or whatever. If Debian prints an error message in Norwegian, there's absolutely no information about the error anywhere on the web.

* There was a trend for a while where websites would tick the "localization" checkbox by adding a Google Translate widget, so English websites would automatically translate themselves into completely broken Norwegian automatically. That would've been useless if I didn't know English, and it's even worse considering I already know the source language just as well as Norwegian. Luckily, most websites seem to have stopped doing that.

</rant>


Adding to this: ALWAYS use spaces as thousand-separators.

Resist the temptation to use commas or dots as thousand-separators. Seeing a number with a dot as a decimal separator instead of a comma will be fine for most people (even if proper localisation would mean using a comma), but if you throw in commas that mean something else you WILL confuse people. And I imagine the inverse is also true.


A space between numbers makes two separate numbers.


I would love to hear why that's wrong, when we use whitespace to separate pretty much everything in textual representation.


Well, SI did standardize the thousands separator as space, so there is at least precedence for using spaced numbers there.

I personally don't like it though, and tend to prefer either the swiss system (ie, 3'800'000.0), or maritime system (ie 3_800_000.0) if separators must be used


Yeah, I feel your pain, I’m Swedish living in Hong Kong, I don’t want my webpages or programs translated in either Swedish or Chinese, give me English please! And as for the float parsing part, I have had to fix many bugs through the years where floating point numbers were stringified on a Swedish computer and then read back into float on an English computer, or vise versa. And sometimes a mix of the two where human input is involved. Easy to fix, but still a common problem.


Life is so much easier when you use decimal to represent your numbers [http://speleotrove.com/decimal/IEEE-cowlishaw-arith16.pdf]


Can this formatting provide round trip for all double floats? The article don't say it.


Yes.

These algorithms (Grisu, Ryu, ...) all satisfy the "internal identity requirement", which means that they can print a double and read it back in to the same double.

The harder part is to also produce the shortest of all possible string-representations. (And then picking the closest if there are many).


These are two separate problems. This is talking about printing floating point numbers, and does not talk about parsing algorithms at all.

When considering the correctness of a floating point decimal printing algorithm, you could use "round trips with Y parsing algorithm", but that's flawed -- it ties the printing algorithm directly to a particular parsing algorithm.

Instead what is usually considered is "what is the closest representable floating point value to the real value output by the printing algorithm?" If the printing algorithm outputs a string representation which is closer to the input float than it is to any other float, then it's correct. It's up to each parsing algorithm to provide the reverse condition -- that it parses each string to the float which is closest to the real value of the represented string.

Modern float-print algorithms like Grisu / Dragon / etc. also add the additional restriction that they output the shortest possible string that meets that condition; for example, the real number 0.69999998 is closer to the 32-bit float (0b1.011_0011_0011_0011_0011_0011 * 2^-1, represented in memory as 0x3f333333) than any other float. The real number 0.7 is slightly further from that float, but it's still closer to it than any other float, and is much shorter -- those algorithms should print that float as "0.7".

A correct parser should parse both the string "0.7" and "0.69999998" to the same 32-bit float result.


I'm not sure, but generally speaking converting doubles to strings will fail to round trip at least the NaNs

If round tripping is important, my recommendation would be to output something that directly corresponds to the binary representation of the float. For example, printf %a


See this article for an in-depth look at implementing these conversions: https://research.swtch.com/ftoa


I don't understand why anyone would write a new Grisu conversion, with Ryu 3x as fast. It might have educational benefits, preparing you for a Ryu conversion, but shipping it?


Ryu is only a year old. Grisu is 10 years old. It takes time for people to absorb these things.


This is interesting. I have learned to just use whatever operating system/RTL utilities are provided (like Apple's NumberFormatter class).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: