Floating points between zero and one

mkl · 2024-08-29T14:37:05 1724942225

> I will note that the author says there are 1,056,964,610 "normal floating-point numbers" in [0,1) and I am not entirely sure why this disagrees with the number I produced.

The reason is there in the quote. You counted both normal and subnormal [1] floating point numbers, and they just counted normal floating point numbers. On mobile so can't check precise details, but that would explain it.

[1] https://en.wikipedia.org/wiki/Subnormal_number

tialaramex · 2024-08-29T15:29:13 1724945353

Huh, that Wiki link taught me something I didn't know - I hadn't realised a different FP representation could have Denormals (a word I knew) which weren't Subnormal (a new word to me, more precisely defining what's different about these values).

The Subnormals (presumably any Denormals regardless?) tend to be annoyingly slow, because the FP hardware usually special cases them to some slow but correct silicon not a fast path for the usual case. So, in some code it can make sense to "zap" subnormals, converting them to zero since they're almost zero anyway. Obviously in other code this is a terrible idea, so your domain knowledge comes into play when authoring such code. For example if we're using -1 .. +1 as the 0dB for our Karaoke software, a subnormal is clearly inaudible noise, but it might make our sophisticated reverb simulator (so you can pretend you're singing in a big auditorium) a thousand times slower than normal, so, zap those subnormals before the simulator and we hear the same results but don't get punished with miserable performance.

mkl · 2024-08-29T15:44:29 1724946269

Denormal and subnormal are synonyms as far as I know and can find. That first sentence is awkwardly phrased: "subnormal numbers are the subset of denormalized numbers (sometimes called denormals)" would be better as "subnormal numbers (sometimes called denormals) are the subset of denormalized numbers". It does imply there are other "denormalized" numbers though; not sure what they are.

stephencanon · 2024-08-29T18:02:31 1724954551

For IEEE 754-2019 basic formats, subnormal and denormal are synonyms.

There are formats (like x87 80-bit as implemented in pre-387 hardware¹) that are not 754 basic formats which admit non-normal encodings that are not subnormal (i.e. not smaller in magnitude than the smallest normal number). Some people used to call these "denormal" as well (even though the x87 docs call them "unnormal"), which caused confusion. "Subnormal" is unambiguous.

In decimal IEEE 754 formats there are also non-canonical encodings, but these are neither subnormal nor denormal (decimal encodings are not, in general, normalized the same way binary floating-point is).

-----

¹ on 387 and later these are invalid encodings, and behave like a signaling NaN.

jcranmer · 2024-08-29T16:11:22 1724947882

Terminology is confusing!

A floating-point number is generally defined as (-1)^sign * (mantissa) * base^exponent, where mantissa is a sequence of digits in the base.

A normalized floating-point number has mantissa be a fixed number of digits for a given floating-point type.

A denormalized floating-point number is one with a different number of digits, where the exponent is the minimum possible value.

An unnormalized floating-point number is one with a different number of digits, but the exponent is not the minimum possible value. (This doesn't occur in the standard IEEE 754 types, but the PPC and the x86 long double floating-point types have these numbers).

There's also noncanonical floating-point numbers, which are different representations for the same value. These occur, for example, with IEEE 754 decimal FP, as well as many pre-IEEE 754 types (where instead of subnormals, you tend to get noncanonical representations of 0). There's also another kind of scenario where you have multiple values that are all still canonical--for example, 0.0 and 0.00 are different canonical decimal FP values in IEEE 754 (they have different exponents, and I'm not going to attempt to remember what they are off the top of my head), and of course -0 and +0 are distinct values of 0 that work differently.

librasteve · 2024-08-29T22:05:15 1724969115

this is wrong(!)

> A normalized floating-point number has mantissa be a fixed number of digits for a given floating-point type.

well, no. in fact...

> A normalized [binary] floating-point number has mantissa which is bit shifted so that the first bit to the right of the binary point is '1'.

(not a great look in an expert explainer comment)

labcomputer · 2024-08-29T23:44:40 1724975080

Erm, no, not in 754 floating point.

The significand of a normalized number starts with an implied 1 to the left of the decimal point, which isn’t explicitly represented. In other words, the significand represents the range [1, 2) for normalized numbers.

In fact, that’s reason we need subnormal numbers: they are numbers for which the implied first bit is not 1, so the significand represents the range [0, 1).

Subnormal numbers therefore do not have a fixed number of digits (precision). As subnormals approach zero, the precision gets smaller.

librasteve · 2024-08-30T12:33:46 1725021226

fair point - I did reference this from a general article and it did not specify P754 - I guess this has an implied 1 and can save one bit of data path … but the definition of normalized is that the mantissa is shifted to be aligned to the binary point and not that is have a fixed number of digits since it will often be packed with trailing zeros

labcomputer · 2024-08-31T06:53:32 1725087212

> but the definition of normalized is that the mantissa is shifted to be aligned to the binary point

And that’s the key point. A subnormal number does not have the significand/mantissa aligned to start with an (implied) 1. It has leading zeros. Thus the number of digits of a subnormal number is not fixed.

1.0000x10^0 has the same precision as 1.0000x10^-99.

1.0000x10^-99 does not have the same precision as 0.0010x10^-99.

That is why the original poster wrote the subnormals have a not-fixed number of digits.

But if you’re going to go around writing snarky “gotcha” comments, at least try to get your facts straight.

Asraelite · 2024-08-29T17:09:25 1724951365

You didn't cover what exactly "subnormal" means

jcranmer · 2024-08-29T17:25:04 1724952304

In all practical terms, subnormal is the same as denormal (a non-normalized number with the smallest possible exponent).

I think this is a situation where there's confusion over precision of terminology, so things have shifted to a new term which is consistently used. Subnormal is now the new preferred terminology for denormal, I think out of confusion as to whether or not unnormal numbers were also denormal (I have never heard the term so used, but I'm also not a great connoisseur of non-IEEE 754 floating-point).

There's a similar attempt to shift use of the term "mantissa" to "significand", since a mantissa implies that the number is in the range [0, 1), whereas the usual implementation in a floating point type is to use the range [1, base).

a_e_k · 2024-08-30T00:21:32 1724977292

> So, in some code it can make sense to "zap" subnormals, converting them to zero since they're almost zero anyway.

Flush to zero (FTZ) is the usual term for this.

titzer · 2024-08-29T22:39:46 1724971186

> FP hardware usually special cases them to some slow but correct silicon

This is basically only true on Intel chips. Subnormals are not that slow on modern ARM or AMD hardware.

adgjlsfhk1 · 2024-08-30T05:20:28 1724995228

This is only mostly true unfortunately. If I recall my tests on M1 M1 has a performance penalty for subnormals, but more like a 10x than the 200x that it causes on Intel.

ryandrake · 2024-08-29T14:53:10 1724943190

I thought it was well known among programmers that (at least using IEEE floating point representation) half the set or storable floats are between -1.0 and 1.0, 1/4 are between -inf and -1.0 and another quarter are between 1.0 and inf.

One practical application of this is: if you have, say a percentage value, and fixed point is out of the question, and you want to retain as much precision as possible and still use floats, don't store it in a float with range [0.0,100.0]. Store it with the range [0.0,1.0]. Also why if you're dealing with angles, you should not store them in the range [0.0,360.0), but instead store them either as radians [0-2π), or better: [-π,π), or store them as [-1.0,1.0) and use trig routines designed to work with that range.

I always thought this made intuitive sense when you understand how the number is encoded. Then again, when I learned programming, we weren't "allowed" to use floats until we demonstrated that we understood how they were represented in memory.

samatman · 2024-08-29T16:55:05 1724950505

A stronger argument for storing 'percentages' as [0.0,1.0] can be made, the precision involved is seldom a limiting factor.

It has to do with why I put percentage in scare quotes. A percentage is a certain way of writing a fraction, one which is more convenient in some cases for humans to work with. Generally, whatever you're doing with that number is more effectively done with the range [0.0,1.0], other than perhaps printing it, which is trivial. Carrying it around inside the calculation imposes a conceptual burden which is of no use in implementing whatever problem one is trying to solve.

It's true that you avoid a strange discontinuity of gradient at 1%, but it hardly matters, the real driving force here is that any practical use of 60% is realized using 0.6, so if you're going to divide your 'percentage' by 100 in order to do N * M%, just do it when you read in the number and get it out of the way.

ealloc · 2024-08-29T19:37:45 1724960265

> [if] you want to retain as much precision as possible and still use floats, don't store it in a float with range [0.0,100.0]. Store it with the range [0.0,1.0].

I just tested this out and it doesn't seem true.

The two storing methods seem similarly precise over most of the range of fractions [0,1], sometimes one gives lower spacing, sometimes the other. For instance, for fractions from 0.5 to 0.638 we get smaller spacing if using [0,100], but for 0.638 to 1 the spacing is smaller if storing in [0,1].

For very small fractions (< 1e-38), it also seems more accurate to store in the range [0,100] since you are representing smaller numbers with the same bit pattern. That is, because the smallest nonzero positive float32 is 1.40129846e-45, so if you store as a float32 range [0,1] that's the smallest possible representable fraction, but if you're storing as a float in range[0,100], that actually represents a fraction 1.40129846e-47, which is smaller.

For the general result, see for yourself in python/numpy:

    x = np.linspace(0,1,10000)
    plt.plot(x, np.float64(np.spacing(np.float32(x*100)))/100)  # plot spacing stored as [0,100]
    plt.plot(x, np.float64(np.spacing(np.float32(x))))  # plot spacing stored as [0,1]

SideQuark · 2024-08-30T01:05:05 1724979905

> and you want to retain as much precision as possible and still use floats

They're equivalent. Both have a 23 bits (24 with implied mantissa, except for denormals).

Example, 2^n * (floating point here) are all exactly the same precision, since you can mult or div by 2 and not change any of the mantissa bits (untill you hit denormal or infinities).

So mult or div by 100 has prettyMuch the same precision (up to 100 maybe having some off bit flip in the lowest bit).

Another way to see it:

IEEE 754 guarantees that A op B where x is float mult or div, is bit exact up to 1 ulp or 1/2 ulp depending on rounding mode. So by changing between these you have the exact same real number obtainiable.

However, once you do what you claim is better, scaling an existing result to fit into some other range, you have potentially added a single bit flip error in the low bit, so you may be worse doing what you claim.

> ou should not store them in the range [0.0,360.0), but instead store them either as radians [0-2π), or better: [-π,π), or store them as [-1.0,1.0) and use trig routines designed to work with that range.

Nope, same reasons. Every elementary op you do adds max 1/2 ulp error (assuming one of the round to nearest modes). So, if you have something in radians, scale to 1.0 store/transmit whatever, and then your lib wants radians andScales back, at best you lost nothing, at worst you have added error.

Trig routines in general for floating point do range reduction, and then have to do a decent amount of work to get 1/2 ulp precision (which pretty much all major C++ libs do now - it's not IEEE guaranteed!).

So sure you can try to rederive a lib that works on [-bob,+bob], but you'll likley gain nothing against simply using whatever your library uses (which almost always goes into radians at some level since that is the most useful, and the lowest stacks end up there).

Please don't tell people to scale numbers as if that makes them better. It makes them worse.

jacobolus · 2024-08-29T15:30:43 1724945443

What you are saying here is expressing some misunderstandings/misconceptions, and may confuse readers.

There's no reason to prefer floating point values with any particular exponent, as long as you are not getting too close to the ends, which for double precision is roughly googol^3 or 1/googol^3. (These numbers are absurdly big/small, and you are generally only going to get to them by multiplying a long list of big or small numbers together; if you need to do that you might need to occasionally renormalize the result and track the exponent separately, or work with logarithms instead.) Even for single precision, the exponent limits are about 10^38 or 10^(-38), which is very very big.

> want to retain as much precision as possible and still use floats, don't store it in a float with range [0.0,100.0]. Store it with the range [0.0,1.0]

This doesn't make sense to me. There are just as many floating point numbers between 64 and 128 as there are between 0.5 and 1.0, and the same number between 32 and 64, between 0.25 and 0.5, etc. All you did in multiplying by a constant is twirl up the mantissa bits and shift the exponent by ~7. Unless you care about the precise rounding in the ~16th decimal digit, there is limited practical difference. (Well, one tiny difference is you are preventing some of the integer-valued percentages from having an exact representation, if for some reason you care about that. On the flip side, if you need to compose these percentages or apply them to some other quantity the range 0–1 is generally more convenient because you won't have to do an extra division by 100.)

> if you're dealing with angles, you should not store them in the range [0.0,360.0), but instead store them either as radians [0-2π), or better: [-π,π), or store them as [-1.0,1.0) and use trig routines designed to work with that range.

Floats from 0 to 360 is a perfectly fine representation for angles, though you may want to use -180 to 180 if you want to accumulate or compare many very small angles in either direction, since there is much more precision near e.g. -0.00001 than near 359.99999. (Of course, if whatever software libraries you are using expect radians, it can be convenient to use radians as a representation, but it won't be any more or less precise.)

The reason pure mathematicians (and as a consequence most scientists) use radians instead is because the trig functions are easier to write down as power series and easier to do calculus with (using pen and paper) when expressed in terms of radians, because it eliminates an annoying extra constant.

Using numbers in the range -1 to 1 can be more convenient than radians mainly because π is not exactly representable in floating point (it can sometimes be nice to get an exact answer for arcsin(1) or the like), and because there are other mathematical tools which are nice to express in the interval [-1, 1].

Aside: If you are using your angles and trig functions for doing geometry (rather than, say, approximating periodic functions), let me instead recommend representing your angles as a pair of numbers (cos a, sin a), and then using vector algebra instead of trigonometry, ideally avoiding angle measures altogether except at interfaces with people or code expecting them. You'll save a lot of transcendental function evaluations and your code will be easier to write and reason about.

Aside #2: The biggest thing you should worry about with floating point arithmetic is places in your code where two nearly equal numbers get subtracted. This results in "catastrophic cancellation" that can eat up most of your precision. For example, you need to be careful when writing code to find the roots of quadratic equations, and shouldn't just naïvely use the "quadratic formula" or one of the two roots will often be very imprecise.

raphlinus · 2024-08-29T21:54:44 1724968484

The quadratic solver implementation in kurbo is designed to be fast and reasonably precise for a wide range of inputs. But for a definitive treatment of how to solve quadratic equations, see "The Ins and Outs of Solving Quadratic Equations with Floating-Point Arithmetic" by Goualard[2]. I thought I understood the problem space pretty well, then I came across that.

[1]: https://github.com/linebender/kurbo/blob/b5bd2aa856781c6cf46...

[2]: https://cnrs.hal.science/hal-04116310/document

SideQuark · 2024-08-30T01:07:29 1724980049

Also nice - should compare to [2]

https://github.com/ChrisLomont/BetterQuadraticRoots

jacobolus · 2024-08-30T00:04:24 1724976264

Seconded: great paper, clear and comprehensive.

I cited it from the first paragraph of https://en.wikipedia.org/wiki/Quadratic_formula#Numerical_ca... when I was working on it back in March but I'm sure the discussion could still be more complete there.

akira2501 · 2024-08-29T20:39:03 1724963943

Accurate representation of a single quantity is one thing. Doing several mathematical operations with that quantity while _maintaining_ accuracy is another.

lifthrasiir · 2024-08-29T15:06:37 1724943997

More accurately speaking, reduce the total inaccuracy contributed by all operations. The inaccuracy will be (generally) proportional to the number of operations, but keep in mind that some operation would be exceptionally inaccurate due to, for example, catastrophic cancelation. So do not normalize percentage into [0,1] if the input was already in [0,100] and an additional operation is needed for normalization. (By the way, cospi etc counts as a single instruction here which is why it is so good to have.)

_aavaa_ · 2024-08-29T14:57:58 1724943478

Depends on the field, in numerical simulations this is well known and a lot of effort goes into normalizing everything to fit into that range to minimize numerical issues.

Many new programmers, and those that deal with languages such as Python, don't really think about such things, treating floats as mathematically precise.

zokier · 2024-08-29T14:45:55 1724942755

I was going to comment that floats have nextUp operation which allows iterating them without needing to go through u32/from_bits, but apparently that feature is still behind feature gate in Rust

https://github.com/rust-lang/rust/issues/91399

That is despite the fact that it is a IEEE-754 required operation! C and friends have equivalent available as nextafter https://en.cppreference.com/w/c/numeric/math/nextafter

jcranmer · 2024-08-29T15:27:23 1724945243

IEEE 754-1985 had a nextAfter function (which C implemented them as nextafter and nexttoward), but that was removed in IEEE 754-2008 in favor of nextUp and nextDown, which C implemented as nextup and nextdown, but only as of C23, which still hasn't officially been published by ISO.

There's also another wrinkle: next* functions go from -subnormal to -0 to subnormal, and don't give you an iteration for both -0 and +0, which can be useful for some iteration purposes.

tialaramex · 2024-08-29T15:47:19 1724946439

I think even with these operations, it's more intuitive to a programmer to just try all the bit patterns using f32::from_bits on integers especially if they aren't already deeply familiar with how the IEEE representation works, which seems like a pre-requisite for this experiment.

lostmsu · 2024-08-31T04:57:50 1725080270

If you use f16 or f8, you could even reasonably plot the result.

tialaramex · 2024-08-31T14:49:35 1725115775

Rust doesn't provide an f8 type, and it only provides f16 in nightly builds for now.

I don't see additional value in "plotting the result" for f16 over f32. 65536 values is too many to understand it other than as a pattern, and the patterns is present in f32. Can you elaborate ?

lifthrasiir · 2024-08-29T14:58:29 1724943509

I was more surprised that the proposed method does not include nextafter itself, which comes with its own distinct edge cases.

vlovich123 · 2024-08-29T15:36:12 1724945772

Reasoning explained here: https://github.com/rust-lang/rfcs/blob/master/text/3173-floa...

TLDR: no one really uses it, the spec is ambiguous and deprecated, and those that do use it seem to be using it with constants that are usually infinities which is better suited and clearer with next_up/next_down, and those that don’t use infinities tend to assume the range of the value and pick a constant outside the range which can be dangerous if the range changes

dirtdobber · 2024-08-29T15:16:55 1724944615

This is pretty easy to verify by just knowing how 32 bit floats are represented.

A 32-bit float is represented as:

- 1 bit = sign bit

- 8 bits = exponent; represents a signed value in [-128, 127]. Note that an exponent value 127 is special and reserved for infinity.

- 23 bits = mantissa

Numbers >= 1.0 ==> (sign = 1, 0 <= exponent <= 126, mantissa >=0). That's (2^23 * 127) + 1 (to include infinity). This comes out to exactly 1,065,353,217 for numbers in the range [1.0, +inf].

Numbers < 1.0 and > 0 ==> (sign = 1, exponent < 0, mantissa >= 0). That is 2^23 * 128 = 2^30. This comes out to exactly 1,073,741,824 for numbers in the range (0.0, 1.0).

mbauman · 2024-08-29T18:36:19 1724956579

So where is the even split point? It's [0, 1.5) and [1.5, Inf)! There's `2^62-2^51-1` values in both those sets. Or `2^30-2^22-1` for 32 bit.

Or perhaps even more interestingly, there's the same number — 2^62 or 2^30 — in [0, 2) as [1, Inf).

nyc111 · 2024-08-30T06:18:36 1724998716

What does “number of floats between 1 and positive infinity” mean? For example, in PHP “A numeric value that is larger than PHP_FLOAT_MAX is considered infinite.” So in computer science “infinite” means a number greater than the largest number the computer can handle and it has nothing to do with mathematical infinity or the philosophical infinity. I’ve been trying to understand how mathematicians define infinite, that’s why I’m asking.

bob1029 · 2024-08-30T02:14:47 1724984087

Normalized floating point math is quite pleasant compared to the alternatives.

I read a statement like "var x = n1 * n2 * n3;" as "x is a point in some 3 dimensional space". There is no limit to the dimensions. 7 figures gives you something in the hypercube(1,1,1,1,1,1,1). This can make comparison of complex figures extremely cheap if the tradeoffs make sense for your application (they often don't).

TZubiri · 2024-08-29T22:57:43 1724972263

This is why programmers using percentages in code is a pet peeve of mine.

applying a 15% discount by multiplying by 15 and dividing by 100 is unhinged to me.

peterhull90 · 2024-08-29T16:02:58 1724947378

In the HN comment that the article discusses [0] is the conclusion that commenter a1369209993 is correct (there are as many between 0 & 1 as 1 & +INF) and llm_trw is not correct? I got a bit confused.

Also, the article links to a blog by Daniel Lemire [1] in which he says (with regard to producing an unbiased random float) "picking an integer in [0,2^32) at random and dividing it by 2^32, was equivalent to picking a number at random in [0,1)" is incorrect and there is a ratio of up to 257:1 in the distribution so obtained. Not wanting to disagree with Daniel Lemire but I can't see why, and a quick experiment in Python didn't give this ratio.

[0]: https://news.ycombinator.com/item?id=41112688

[1]: https://lemire.me/blog/2017/02/28/how-many-floating-point-nu...

kccqzy · 2024-08-29T18:38:45 1724956725

The blog post explained it perfectly. There are 2^32 integers when you pick from [0,2^32). But there are 0x3f800000 floating point numbers between 0 and 1. And the former number is not divisible by the latter number. Therefore using division by 2^32 cannot be unbiased.

It's helpful if you first look at smaller examples. If we were to generate random integers in [0,10) by first generating random integers in [0,50) and then dividing by 5, that's valid. Exactly 5 numbers get mapped to one number each: the numbers [0,5) get mapped to 0, [5,10) get mapped to 1 etc. But what if you do the same division trick if you instead want to get numbers in [0,3)? Do you do the same division trick? Then the probability of the number 2 appearing is less than that of 0 or 1.

bell-cot · 2024-08-29T16:01:33 1724947293

Don't recall where I heard it, but: "Representing numbers with floating point is somewhere between lossy compression and a crap hash algorithm."

jacobolus · 2024-08-29T16:36:06 1724949366

This seems excessively glib. In my view floating point numbers are one of the most valuable and significant innovations in the history of science, and the widespread availability of IEEE-754-compliant float hardware is one of the best things about computers. We all owe an enormous debt of gratitude to the folks who spent a ton of (technical and political) work to make it happen.

Instead I would say: numerical analysis is hard irrespective of the number representation, and we should try to spread at least basic knowledge about it so people can accomplish their goals while avoiding some of the pitfalls.

jjmarr · 2024-08-29T17:11:25 1724951485

"Floating points are the worst method of representing rational numbers, except for all the others." —Winston Churchill probably

lmm · 2024-08-29T22:13:03 1724969583

Nah, arbitrary precision rationals are much better if you actually only need to represent rationals. And frankly even for reals, decimal types are better for most use cases.

jacobolus · 2024-08-29T23:57:57 1724975877

For "most use cases" (by far) of arbitrary fractional numbers, floats are significantly better than integer based representations, which is why they are pervasive. For certain niche uses (such as some kinds of money accounting calculations) alternative integer-based representations have some advantages, and those who need them use them where appropriate.

lmm · 2024-08-30T06:33:01 1724999581

Disagree. IEEE floats see a disproportionate amount of use because most programming languages make them much easier to use than more appropriate representations, and people (understandably) use the representation the language guides them to rather than seriously analysing and considering the options.

jjmarr · 2024-08-31T09:09:48 1725095388

Arbitrary precision sounds good but my registers have 64 bits and it takes several instructions to load arbitrary bit slices from memory.

It's a lot more performant to use the equivalent of scientific notation.

lmm · 2024-09-02T01:22:03 1725240123

Sure, but the vast majority of the time that performance difference doesn't matter, or at least isn't worth the large sacrifices in usability and consistency that it costs.

jacobolus · 2024-09-02T17:21:33 1725297693

The vast majority of the time floats are a significantly more convenient representation, with more than enough precision for the job, and alternatives are so slow that they are somewhere between painful and literally impossible to use. Arbitrary precision computation grows exponentially more expensive as you perform arithmetic operations on it, and people routinely apply transcendental functions whose results cannot be represented using a finite number of binary digits.

If you are finding otherwise in your personal work, you are likely in some niche field sheltered far from the bulk of numerical computing applications.

adgjlsfhk1 · 2024-08-30T14:36:33 1725028593

IMO IEE-754 is pretty good, but could be a bunch better. I really wish they had gotten rid of -0, Infs and all but one of the NaNs (which you would encode as -0 currently is encoded and be equal to itself). Currently the behavior of which NaN gets produced is unspecified and inconsistent across hardware which really sucks. Also, the exponent bias should be changed so that every nonzero finite value has a representable finite inverse. Currently 1/subnormal is Inf which is really annoying since ti means transforming x/y to x*(1/y) can overflow.

jacobolus · 2024-08-30T15:15:10 1725030910

Without -0 or Infs, IEEE-754 would be significantly worse for doing scientific computations. I personally find these to be among the most useful and important features of floating point arithmetic.

adgjlsfhk1 · 2024-08-30T15:23:05 1725031385

Do you have an example? I have a hard time believing this since any scientific data will have uncertainty that should make it completely random as to what sign your zero is supposed to be. Similarly with Inf, if any intermediate result gives an Inf, you've lost all precision, and the final result is probably going to be NaN anyway.

jacobolus · 2024-08-30T15:26:05 1725031565

> random as to what sign your zero is supposed to be

The purpose of negative zero is to preserve the sign of underflow, so you can e.g. get the branch cuts right when you are implementing some complex function. Cf. https://grouper.ieee.org/groups/msc/ANSI_IEEE-Std-754-2019/b...

Functions with singularities are ubiquitous in physics and many other branches of science, https://dlmf.nist.gov

adgjlsfhk1 · 2024-08-30T16:44:25 1725036265

I understand that that is the purpose, but the problem is as soon as you have any error in your computations, you lose the sign of the zero. From the paper you posted:

(ii) Rounding errors can obscure the singularities. That is why, for example, sqrt(z^2-1) = sqrt((z+1)(z-1)) fails so badly when either z^2=1 very nearly or when z^2<0 very nearly. To avoid this problem, the programmer may have to decompose complex arithmetic expressions into separate computations of real and imaginary parts. thereby forgoing Some of the advantages of a compact notation.

(iii) Careless handling can turn infinity or the sign of zero into misinformation that subsequently disappears leaving behind only a plausible but incorrect result. That is why compilers must not transform z-1 into z-(1+0i) as we have seen above, nor -(-x-x^2) into x+x^2 as we shall see below, lest a subsequent logarithm or square root produce a nonzero imaginary part whose sign is opposite to what was intended.

Branch cuts are a fundamentally dumb idea in finite precision math because if your input number has any uncertainty (which it always does), you don't know which side of the branch cut you are on. With a massive amount of manual work, you can sometimes hack the system well enough to work for extremely contrived cases, but in reality, it will never be useful because within the broader scope of the problem, if you've hit a singularity, you don't know what side you hit it from.

Functions with singularities are ubiquitous in physics and other branches of science, but symbolic math is needed to correctly track them.

jacobolus · 2024-08-30T18:00:49 1725040849

Using symbolic math requires an amount of CPU/memory which is exponential in the number of operations applied, so is often (usually) not a practical or even possible choice.

You may think signed zero is a "fundamentally dumb idea" but it has helped a lot of people to accomplish their work, so... shrug.

The basic issue is that numerical analysis is hard, and implementing numerical algorithms involves plenty of edge cases, so whoever is writing that code need to have a pretty good understanding of the problem and the tools and be willing to spend time on careful reasoning. Picking a slightly (or very) different number representation with different trade-offs doesn't really make it easier.

Even if you (somehow) had a perfect number representation with infinite resolution and nearly free operations, you'd have to be careful when implementing numerical algorithms and would have to understand how to do error analysis and have an understanding of numerical stability etc. to design new ones.

adgjlsfhk1 · 2024-08-30T18:37:23 1725043043

There's also Arb if you need guaranteed intervals to make sure you are on the right side of the branch cut. I completely agree that care and numerical analysis is necessary. My point is that once you've done the analysis, your actual answers aren't infinite (hopefully), and if your calculation returns Inf, it means that your program is wrong. Given that, you might as well just make the value of your function exactly on the branch cut NaN and be done with it.

jacobolus · 2024-09-02T17:32:56 1725298376

> if your calculation returns Inf, it means that your program is wrong

In my own programs, e.g. working with rational approximations or more general functions which are meromorphic in big parts of their domain possibly with branch cuts, or representing the sphere using the stereographic projection, Inf is a commonly expected and entirely correct result, either of intermediate calculations or of final results, and is also a commonly expected program input. In my experience looking at other people's code it is relatively common for Inf to be a perfectly reasonable result of numerical calculations. In some contexts, underflow resulting in 0 or overflow resulting in Inf is a more precise result than 1.000000000000000. It all depends what your numbers represent and what you are trying to compute.

(In some programs, it is even fine and ordinary to expect 0/0 or Inf/Inf resulting in NaN in intermediate computations, though these typically need to be checked for and special-cased so they can be recomputed to get the correct non-NaN but possibly Inf result.)

bell-cot · 2024-08-29T17:42:38 1724953358

Motor vehicles are one of the most valuable and significant innovations in history, etc. etc.

But that doesn't make them safe.

chillingeffect · 2024-08-29T15:18:54 1724944734

Tiny caveat, this only applies to float formats which include subnormals, including ieee754. Other systems may or may not include subnormals.

samatman · 2024-08-29T16:57:10 1724950630

I think we've reached the point where saying "float" should mean IEEE754, and the burden of being more specific falls on those working with various legacy formats.

buildbot · 2024-08-30T16:34:52 1725035692

There's recently been an expansion of floating point types, with BF16 and the MX datatypes - so many modern examples of float meaning different things too.

samatman · 2024-08-30T21:44:17 1725054257

I'm very much in favor of that! These should also be referred to using something more specific than "float".

hbrmjen · 2024-09-02T14:56:48 1725289008

Hfjjxxjxknkydglxy que bueno que bueno o en la tarde para ti mi vida hermosa te amo muchísimo y te amo más extremo y mucho más Ere. Gsgsusywheheljw. Ehhehgshw

antishatter · 2024-08-29T14:20:53 1724941253

Theres infinite numbers between any two numbers…different sizes of infinity and all that…

jimhefferon · 2024-08-29T14:25:09 1724941509

This post is not about real numbers, though. It's about floating point numbers.

ot · 2024-08-29T14:37:02 1724942222

Even if we were talking about real numbers (which we're not), it would still be true that (0, 1) and (1, +inf) have the same cardinality :)

As touched on in the post, 1/x is an explicit bijection (but any set of real numbers that contains an interval has the same cardinality)

mbauman · 2024-08-29T18:27:30 1724956050

But perhaps that's the interesting point — 1/x is *cannot be* an exact bijection for floating point numbers, because there's not a one-to-one mapping between the two sets.

davidt84 · 2024-08-29T14:23:32 1724941412

Not between floating point numbers.

ohnoitsahuman · 2024-08-29T14:23:54 1724941434

Counterpoint: Pick a random number between 0 and Infinity.

What set of infinite numbers are you more likely to wind up with?

Do this a million times.

Therefore 1 to Infinity is significantly larger than 0 to 1

jepler · 2024-08-29T14:51:22 1724943082

That may feel like an intuitive declaration, but that's not how it works in standard mathematics. In standard set theory, the cardinality of the reals is equal to the cardinality of any non-degenerate interval of reals. Wikipedia quotes this fact without proof (https://en.wikipedia.org/wiki/Cardinality_of_the_continuum#S...).

Here's one hand-wavy proof of why the cardinality of the real interval P=[0,1] is the same as the cardinality of the real interval Q=[0, infinity]: The function f(x) = 1/x-1 is a bijective function that maps the P interval onto the Q interval, which also proves the cardinality of the two sets is equal. (https://en.wikipedia.org/wiki/Bijection#Cardinality).

If you're not comfortable with 1/0 = infinity as a general matter, then simply replace the f(x) I gave with an explicit piecewise function f(x) = { 0 if x = ∞, else (1/x-1) } and the proof still works.

mitthrowaway2 · 2024-08-29T16:04:37 1724947477

There are several other objections posted, but only one that really refutes the heart of your claim.

To judge what set of numbers you're more likely to end up with, you need to specify a probability distribution. Without any specific information to prefer one number over another, you want the highest entropy distribution. It turns out that the most "natural" probability distribution for x extending between 0 and infinity is not uniform over x, but over its logarithm; therefore, by symmetry arguments, the probability is actually more like 50% that you draw a number between 0 and 1.

And it turns out that floating point numbers more or less respect this property.

jcranmer · 2024-08-29T15:16:08 1724944568

Pick a random number according to which distribution?

You sound like you want a uniform distribution (i.e., P(x in [a, b]) = b - a / total support of D), but when you have an infinite support, the denominator is infinity - 0 = infinity, so the probability that x is in any finite interval in that set is 0. I've never taken real analysis, so my knowledge here is quite shaky, but I'm not even certain that the resulting probability distribution function is well-defined as a probability distribution function in the first place.

Real analysis, aka, real numbers (and consequently infinite sets) are far weirder than you ever expected them to be.

roywiggins · 2024-08-29T18:35:37 1724956537

Yes, you can't define a uniform distribution on the reals at all, there's no way to make it to sum up to 1, which is required. Either it's 0 everywhere and the cumulative probability is 0, or it's a positive constant everywhere and the integral diverges, or it's not uniform.

mkl · 2024-08-29T14:44:43 1724942683

Not in floating point, which is the whole point here. Floats can only represent a finite subset of the real numbers (there are only 2^32 or 2^64 possible bit patterns, after all). They can represent lots of small numbers close together around 0, and lots of big numbers with increasingly large gaps between them, and there's an infinitely large gap between ~10^308 and inf. They are designed so that the possible floating point numbers are among the most useful numbers for most human calculations.

Even ignoring the realities of floating point, your argument depends on sampling uniformly from an infinite interval, which is not possible: https://math.stackexchange.com/questions/14777/why-isnt-ther...

kccqzy · 2024-08-29T15:20:42 1724944842

That's not how cardinality is measured. The very first class on an introduction to set theory course will teach you that bijections are used to measure the size of a set.

Consider the tan function. When you give it a number between 0 and pi/2, it gives you a number between 0 and infinity, and it does so in a bijective way. Therefore there are equal numbers between 0 and pi/2 as compared to 0 to infinity. Now consider a simple linear function that multiplies its input by pi/2. From here we know that there are equal numbers between 0 and 1 as compared to 0 and pi/2.

samatman · 2024-08-29T17:01:28 1724950888

Allow me to offer a brief proof of the contrary.

You may select any real number between zero and infinity. We will call this R.

I will give you 1/R, which is between zero and one.

QED.

I believe you're conflating range, where it is trivially true that [0,∞] is of greater extent than (0,1), with quantity, where it is not the case that there exists a greater quantity of values in the former range than in the latter.

grodriguez100 · 2024-08-29T21:27:51 1724966871

> You may select any real number between zero and infinity.

Should that have been “between one and infinity”? Otherwise you cannot claim that 1/R is between zero and one.

samatman · 2024-08-30T21:43:13 1725054193

You got me on that one.

I would prefer, given the domain, to amend to "given an R between 0 and infinity, I will return R for all R < 1, or 1/R otherwise". But yes, the proof was flawed.

alexarnesen · 2024-08-29T14:50:30 1724943030

I like the approach, but infinities can be counter intuitive. See Cantor, Dedekind. The problem is your experiment. We don't say "there are N reals between 0 and 1", we talk about cardinality. The sets (0,1] and (1,3] in R1 don't have "the same number of items" in them; they have the same cardinality.