Hardware support for UNUM floating point arithmetic [pdf]

noahdesu · on Nov 3, 2017

Woah, really cool. My understanding is that Posit, the successor to unum versions 1 and 2, is the future and is much easier to implement in hardware [0].

[0]: https://posithub.org/about

j_s · on Nov 3, 2017

There is a Posit BoF at SC17 (supercomputing) Nov 14 DEN: http://sc17.supercomputing.org/presentation/?id=bof135&sess=...

Posit Paper: http://johngustafson.net/pdfs/BeatingFloatingPoint.pdf

A new data type called a posit is designed as a direct drop-in replacement for IEEE Standard 754 floating-point numbers (floats). Unlike earlier forms of universal number (unum) arithmetic, posits do not require interval arithmetic or variable size operands; like floats, they round if an answer is inexact. However, they provide compelling advantages over floats, including larger dynamic range, higher accuracy, better closure, bitwise identical results across systems, simpler hardware, and simpler exception handling.

source: https://twitter.com/daniel_bilar/status/920252363159539712

petermonsson · on Nov 3, 2017

I am confused. I see half the performance and 3-5x the area of IEEE754, but the conclusion is that UNUM is comparable. That would not be my conclusion. What am I missing?

fdej · on Nov 3, 2017

The idea is that computers spend much more energy moving data between the processor and memory than actually processing the data. Unums use a variable length encoding which makes processing more costly but reduces the amount of data transfer. At least in theory; whether this is a net benefit remains to be proven.

jcranmer · on Nov 3, 2017

One of the ideas behind UNUM is that we're using extra bits to guard against the possibility that we see catastrophic cancellation (the mass loss of significant figures between two nearly-equal values), and using intervals would allow us to only use those extra bits when necessary (as it's a variable-width format). The argument is that unums instead of floats would allow for better packing of memory, and by counting the energy cost per value of memory, it's an overall win.

That argument is rather mendacious, though. Yes, it does cost energy to maintain large register files, caches, and DIMMs, but hardware already aggressively turns that stuff off when not in use. Thus the lowest-power solution is generally the one that maximizes the use of everything. Furthermore, the cache and RAM is generally a fixed cost anyways: you're not going to say "oh, I only need 12GB of RAM instead of 16GB because I can store on average 3 bytes/value instead of 4". Thus the only component you can really claim counts as part of the enum power usage is the register file, and this paper appears to be going with a fixed-width format for doing the hardware computations anyways, so the win there isn't going to happen.

iainmerrick · on Nov 3, 2017

If it's exactly reproducible on different hardware and software configurations, as seems to be one of the goals, that alone would be a useful improvement on IEEE754.

wolfgke · on Nov 3, 2017

> If it's exactly reproducible on different hardware and software configurations, as seems to be one of the goals, that alone would be a useful improvement on IEEE754.

IEEE754 already supports that (except for the minor fact that the NaN types can in theory be represented by different bit sequences - in practise all the same are used). The problem rather is that typical programming languages (such as C, C++) have no proper support for IEEE754.

And you have to be very precise in your intentions: For each of the operations defined in IEEE754 you have to define which of the four (+ one optional one in the 2008 revision) rounding modes is to be used. Or you have to be precise whether a MAC or and multiply (then round) and add (then round again) is to be used. This of course has again implications that depending on what the compiler makes out of it either the program will not run on processors with no MAC support or the code has to use a software emulation (slow).

evincarofautumn · on Nov 3, 2017

> except for the minor fact that the NaN types can in theory be represented by different bit sequences - in practise all the same are used

I can’t say it’s terribly common, but there’s always someone like me out there going “Ooh, 51 free bits! Don’t mind if I do.” It’s a common trick for value representation in implementations of dynamic languages.

Anyway, poor floating-point support is about as prevalent as poor Unicode support—the common cases seem to work, lulling you into a false sense of security before you discover that the edge cases are untested. I’ve seen bugs caused by a green thread getting rescheduled onto a different OS thread with a different rounding mode.

stephencanon · on Nov 4, 2017

There are a few other very minor details that can vary between platforms:

- In binary floating-point, implementations are allowed to "detect tininess" either "before" or "after rounding". ARM and PPC detect it before rounding, x86 detects it after. This only changes whether or not the underflow flag is set for results in a tiny 1/4-ulp-wide interval, and only effects multiplication, fma, and conversion (results from the other basic operations cannot land in that interval). Since almost no one cares about flags, this is not a big deal; if flush-to-zero is enabled, it will perturb results that land in this band, however.

- implementations are allowed to set or not set the invalid flag for fma(0, inf, quiet nan). Again, almost no one cares about flags, so no problem, but if invalid is unmasked, this effects whether or not you trap.

The bigger issue, as you say, is that C/C++ leave the width of intermediate expression evaluation up to the implementation (but the compiler has to say what it does via FLT_EVAL_METHOD, so you can refuse to compile if the compiler doesn't do what your program needs).

wolfgke · on Nov 4, 2017

Thanks for these details about the flags. I really was not aware that there exist differences.

iainmerrick · on Nov 3, 2017

Sure, I meant “relatively easily reproducible in portable C code”.

I may be wrong, but it’s my understanding that even given the precise rounding modes, fusing etc, the results could still differ as implementations are allowed to use varying extended precision internally (and do).

For example, as far as I know it’s difficult to guarantee identical FPU results on x86 and ARM.

[Edit to add: I guess I'm complaining about the popular implementations rather than the IEEE spec itself, but for ordinary users like me it amounts to the same thing. Overall IEEE 754 is wonderful, so it's exciting to see a proposal for something even better!]

wolfgke · on Nov 3, 2017

> For example, as far as I know it’s difficult to guarantee identical FPU results on x86 and ARM.

Can you give details/ressources on how it is difficult to obtain identical FPU results on x86 and ARM?

Does this even hold if you program in assembly using either

- only the primitives that are defined exactly in IEEE 754:2008 (i.e. not some functions defined in some, say, C library)

or

- using "identical" implementations of more complex functions (i.e. not the IEEE 754 primitives; think of cos, erf, gamma, ...)?

iainmerrick · on Nov 7, 2017

Can you give details/ressources on how it is difficult to obtain identical FPU results on x86 and ARM?

Not personally -- having already run into problems getting reproducible results on a single x86 machine with different compilers, I haven't even tried getting ARM to match!

Here's a long list of links on various issues: https://gafferongames.com/post/floating_point_determinism/

And here's a blog post that goes into detail on how to get reproducible results on x86: http://yosefk.com/blog/consistency-how-to-defeat-the-purpose... It's not too bad, but it sounds a bit fragile and I have no idea how well it would translate to other architectures. [Edit to add: hmm, actually, that post does say this is mostly just an x86 problem, or rather x87]

using "identical" implementations of more complex functions

That's a rather onerous requirement! Especially for multi-platform work. In almost all cases I'd like to be able to use the platform's math library, which is presumably well-optimized. Is there a good, reasonably efficient, highly portable implementation of math.h that gives fully reproducible results?

It seems to me (but I'd love to be convinced otherwise!) that if you really want reproducible results, fixed-point is a much better road to go down. Ints are just a lot more consistent than floats on pretty much every platform with a C compiler.

screeny05 · on Nov 3, 2017

please add a [PDF] to the header

gjem97 · on Nov 3, 2017

Just wondering: why is it useful to know it's a PDF before clicking?

dEnigma · on Nov 3, 2017

In my case, while I'm on my phone, the pdf is automatically downloaded and opened in a different app; so it is nice to have a warning.

pasbesoin · on Nov 3, 2017

It's also something of a legacy convention (though one I prefer to maintain), from when PDF files were a leading vector for malware.

Though even back then, malware concerns aside, people would curse when a link caused Acrobat Reader (or, depending on one's system, the full Acrobat program) to unexpectedly fire up. Once upon a time, browsers didn't have integrated PDF handling.

And, even now, with integrated PDF handling, said handling is not proof against exploits.