Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I thought it was well known among programmers that (at least using IEEE floating point representation) half the set or storable floats are between -1.0 and 1.0, 1/4 are between -inf and -1.0 and another quarter are between 1.0 and inf.

One practical application of this is: if you have, say a percentage value, and fixed point is out of the question, and you want to retain as much precision as possible and still use floats, don't store it in a float with range [0.0,100.0]. Store it with the range [0.0,1.0]. Also why if you're dealing with angles, you should not store them in the range [0.0,360.0), but instead store them either as radians [0-2π), or better: [-π,π), or store them as [-1.0,1.0) and use trig routines designed to work with that range.

I always thought this made intuitive sense when you understand how the number is encoded. Then again, when I learned programming, we weren't "allowed" to use floats until we demonstrated that we understood how they were represented in memory.



A stronger argument for storing 'percentages' as [0.0,1.0] can be made, the precision involved is seldom a limiting factor.

It has to do with why I put percentage in scare quotes. A percentage is a certain way of writing a fraction, one which is more convenient in some cases for humans to work with. Generally, whatever you're doing with that number is more effectively done with the range [0.0,1.0], other than perhaps printing it, which is trivial. Carrying it around inside the calculation imposes a conceptual burden which is of no use in implementing whatever problem one is trying to solve.

It's true that you avoid a strange discontinuity of gradient at 1%, but it hardly matters, the real driving force here is that any practical use of 60% is realized using 0.6, so if you're going to divide your 'percentage' by 100 in order to do N * M%, just do it when you read in the number and get it out of the way.


> [if] you want to retain as much precision as possible and still use floats, don't store it in a float with range [0.0,100.0]. Store it with the range [0.0,1.0].

I just tested this out and it doesn't seem true.

The two storing methods seem similarly precise over most of the range of fractions [0,1], sometimes one gives lower spacing, sometimes the other. For instance, for fractions from 0.5 to 0.638 we get smaller spacing if using [0,100], but for 0.638 to 1 the spacing is smaller if storing in [0,1].

For very small fractions (< 1e-38), it also seems more accurate to store in the range [0,100] since you are representing smaller numbers with the same bit pattern. That is, because the smallest nonzero positive float32 is 1.40129846e-45, so if you store as a float32 range [0,1] that's the smallest possible representable fraction, but if you're storing as a float in range[0,100], that actually represents a fraction 1.40129846e-47, which is smaller.

For the general result, see for yourself in python/numpy:

    x = np.linspace(0,1,10000)
    plt.plot(x, np.float64(np.spacing(np.float32(x*100)))/100)  # plot spacing stored as [0,100]
    plt.plot(x, np.float64(np.spacing(np.float32(x))))  # plot spacing stored as [0,1]


> and you want to retain as much precision as possible and still use floats

They're equivalent. Both have a 23 bits (24 with implied mantissa, except for denormals).

Example, 2^n * (floating point here) are all exactly the same precision, since you can mult or div by 2 and not change any of the mantissa bits (untill you hit denormal or infinities).

So mult or div by 100 has prettyMuch the same precision (up to 100 maybe having some off bit flip in the lowest bit).

Another way to see it:

IEEE 754 guarantees that A op B where x is float mult or div, is bit exact up to 1 ulp or 1/2 ulp depending on rounding mode. So by changing between these you have the exact same real number obtainiable.

However, once you do what you claim is better, scaling an existing result to fit into some other range, you have potentially added a single bit flip error in the low bit, so you may be worse doing what you claim.

> ou should not store them in the range [0.0,360.0), but instead store them either as radians [0-2π), or better: [-π,π), or store them as [-1.0,1.0) and use trig routines designed to work with that range.

Nope, same reasons. Every elementary op you do adds max 1/2 ulp error (assuming one of the round to nearest modes). So, if you have something in radians, scale to 1.0 store/transmit whatever, and then your lib wants radians andScales back, at best you lost nothing, at worst you have added error.

Trig routines in general for floating point do range reduction, and then have to do a decent amount of work to get 1/2 ulp precision (which pretty much all major C++ libs do now - it's not IEEE guaranteed!).

So sure you can try to rederive a lib that works on [-bob,+bob], but you'll likley gain nothing against simply using whatever your library uses (which almost always goes into radians at some level since that is the most useful, and the lowest stacks end up there).

Please don't tell people to scale numbers as if that makes them better. It makes them worse.


What you are saying here is expressing some misunderstandings/misconceptions, and may confuse readers.

There's no reason to prefer floating point values with any particular exponent, as long as you are not getting too close to the ends, which for double precision is roughly googol^3 or 1/googol^3. (These numbers are absurdly big/small, and you are generally only going to get to them by multiplying a long list of big or small numbers together; if you need to do that you might need to occasionally renormalize the result and track the exponent separately, or work with logarithms instead.) Even for single precision, the exponent limits are about 10^38 or 10^(-38), which is very very big.

> want to retain as much precision as possible and still use floats, don't store it in a float with range [0.0,100.0]. Store it with the range [0.0,1.0]

This doesn't make sense to me. There are just as many floating point numbers between 64 and 128 as there are between 0.5 and 1.0, and the same number between 32 and 64, between 0.25 and 0.5, etc. All you did in multiplying by a constant is twirl up the mantissa bits and shift the exponent by ~7. Unless you care about the precise rounding in the ~16th decimal digit, there is limited practical difference. (Well, one tiny difference is you are preventing some of the integer-valued percentages from having an exact representation, if for some reason you care about that. On the flip side, if you need to compose these percentages or apply them to some other quantity the range 0–1 is generally more convenient because you won't have to do an extra division by 100.)

> if you're dealing with angles, you should not store them in the range [0.0,360.0), but instead store them either as radians [0-2π), or better: [-π,π), or store them as [-1.0,1.0) and use trig routines designed to work with that range.

Floats from 0 to 360 is a perfectly fine representation for angles, though you may want to use -180 to 180 if you want to accumulate or compare many very small angles in either direction, since there is much more precision near e.g. -0.00001 than near 359.99999. (Of course, if whatever software libraries you are using expect radians, it can be convenient to use radians as a representation, but it won't be any more or less precise.)

The reason pure mathematicians (and as a consequence most scientists) use radians instead is because the trig functions are easier to write down as power series and easier to do calculus with (using pen and paper) when expressed in terms of radians, because it eliminates an annoying extra constant.

Using numbers in the range -1 to 1 can be more convenient than radians mainly because π is not exactly representable in floating point (it can sometimes be nice to get an exact answer for arcsin(1) or the like), and because there are other mathematical tools which are nice to express in the interval [-1, 1].

Aside: If you are using your angles and trig functions for doing geometry (rather than, say, approximating periodic functions), let me instead recommend representing your angles as a pair of numbers (cos a, sin a), and then using vector algebra instead of trigonometry, ideally avoiding angle measures altogether except at interfaces with people or code expecting them. You'll save a lot of transcendental function evaluations and your code will be easier to write and reason about.

Aside #2: The biggest thing you should worry about with floating point arithmetic is places in your code where two nearly equal numbers get subtracted. This results in "catastrophic cancellation" that can eat up most of your precision. For example, you need to be careful when writing code to find the roots of quadratic equations, and shouldn't just naïvely use the "quadratic formula" or one of the two roots will often be very imprecise.


The quadratic solver implementation in kurbo is designed to be fast and reasonably precise for a wide range of inputs. But for a definitive treatment of how to solve quadratic equations, see "The Ins and Outs of Solving Quadratic Equations with Floating-Point Arithmetic" by Goualard[2]. I thought I understood the problem space pretty well, then I came across that.

[1]: https://github.com/linebender/kurbo/blob/b5bd2aa856781c6cf46...

[2]: https://cnrs.hal.science/hal-04116310/document



Seconded: great paper, clear and comprehensive.

I cited it from the first paragraph of https://en.wikipedia.org/wiki/Quadratic_formula#Numerical_ca... when I was working on it back in March but I'm sure the discussion could still be more complete there.


Accurate representation of a single quantity is one thing. Doing several mathematical operations with that quantity while _maintaining_ accuracy is another.


More accurately speaking, reduce the total inaccuracy contributed by all operations. The inaccuracy will be (generally) proportional to the number of operations, but keep in mind that some operation would be exceptionally inaccurate due to, for example, catastrophic cancelation. So do not normalize percentage into [0,1] if the input was already in [0,100] and an additional operation is needed for normalization. (By the way, cospi etc counts as a single instruction here which is why it is so good to have.)


Depends on the field, in numerical simulations this is well known and a lot of effort goes into normalizing everything to fit into that range to minimize numerical issues.

Many new programmers, and those that deal with languages such as Python, don't really think about such things, treating floats as mathematically precise.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: