> DEC64 is intended to be the only number type in the next generation of application programming languages.
Please no. There's a very solid case for integers in programming languages: many places in code call for a number which must be an integer: having the type enforce this is nice: you don't need to check and abort (or floor, or whatever) if someone passes you a non-integer. Basically, anything that does an array index, which is any for loop walking through a memory buffer (which might be an array, a string, a file, the list goes on and on. Anything that can be counted) wants an integer. x[i] where i is not an integer just doesn't make sense: ideally, let a type system enforce that.
Granted, of course, that many languages will just truncate a float in an integers context, and so funny stuff does happen (I don't really feel that this is a good thing). (Although interestingly, JS is not one of them.) Personally, I think JS needs an integer type. Especially when you see people start getting into bignum stuff in JS, it gets silly fast, as first you can only store so many bits before floating point loses precision, but even then, even if you have an "integer", JS will find funny ways to shoot you in the foot. For example:
And there's also the fact that 64 bits is overkill for many applications as well - especially when you need to store large arrays of them. Another one of the most difficult to understand concepts for JS-only programmers seems to be the finiteness of memory...
Nothing is stopping you from building a natural number/integer on top of a dec64 type. You could add whatever logic necessary like truncating fractional values, throwing exceptions, etc.
That sounds like a lot of work and a lot of pain (and likely bugs) just to end up not quite back at square one thanks to the resulting performance drop. I feel what you've done here is the rough equivalent of pointing out that "nothing is stopping you" from building a webpage generator in brainfuck. It's turning complete after all! You could add whatever logic is necessary, like UTF8, XML, and Json parsing, etc.
Technically correct, but fails to address the core point that this might be a terrible idea.
This would benefit from a better comparison with IEEE 754-2008 decimal64; he dismisses it as being too inefficient but I don't see much other discussion about it, which is too bad since at least it's already implemented in hardware on some platforms.
Also, it's worth mentioning http://speleotrove.com/decimal/ as a great repository of decimal float info. I hope it will be updated with some discussion of this proposed format.
One difference is with non-numbers. IEEE 754 defines several infinities (positive, negative) and several NaN values. DEC64 seems to simplify things to a single NaN value. That seems more practical for the average use case.
But it complicates things by having 255 different zeros which are all equal, and by defining
1 / 0 == 0 / 0 == (-1) / 0 == MAX_DEC64 + X == (-MAX_DEC64) - X
(X being the some number large enough to cause overflow)
I'd qualify the proposal as a midbrow dismissial of all the real thought that went into the IEEE float standards (I mean, there is no actual substance to his criticism, except "uh, look, it's so bad nobody uses it," which isn't even true, it is in IBM Power7, IBM zSeries, Fujitsu SPARC64, SAP ABAP and gcc). Floating point on a finite precision computer is hard, you can't gloss over the details and assume you have solved everything by that.
Holy crap. I was about to comment that NANs are not equal to each other, as is the case in every other floating-point representation on the planet[0]. I wasn't actually expecting the page to say either way, but I decided to have a quick look in the vague hope of finding a quote to support that, and instead discovered:
"nan is equal to itself."
That strikes me as being very odd, and will likely catch a lot of programmers out who are expecting the "normal" behaviour there. I predict many bugs, and much wailing and gnashing of teeth, as a result.
[0] Maybe.
Edit: I see Stormbrew made the point before me, lower down the thread.
Interesting approach, rather than normalize to 1 it normalizes to the largest whole number. The positives of this are that you don't run into issues with encoding things like .1 that you do in binary (or any repeating binary fraction), the downside is that you lose precision when you have more than 16 digits of precision. And the of course the complaint most folks will throw at it is that it fails differently depending on what 'end' of the number you're losing precision. The largest decimal number you can represent in 56 bits is 72,057,594,927,935 so the largest decimal number you can represent will any digit value is 9,999,999,999,999 (16 digits), and you have "extra" space at the top (0-6).
However, to Doug Crockfords credit, the number space it covers is pretty useful for a lot of different things and in scripted languages and other uses that aren't safety related I can see the advantage of an 'fractional integer' type.
Edit: Mike Cowlishaw, who is also quite interested in decimal arithmetic has a much deeper treatment here: http://speleotrove.com/decimal/
But it can still bite you. I got hit with an IEEE 754 implementation detail that took me two days to figure out (http://boston.conman.org/2014/03/05.1). The same problem would exist with this proposal as well.
The summary: RLIM_INFINITY on a 32-bit system can be safely stored in a double. RLIM_INFINITY on a 64-bit system can't. I had code in Lua (which uses doubles as the only numeric type) that worked fine on 32-bit systems, but not 64-bit systems. It took me two days to track the root cause down.
Which is why Lua 5.3 will have a 64 bit integer type, and will move away from float types only. You can't interop with external stuff without full 64 bit ints.
> Interesting approach, rather than normalize to 1 it normalizes to the largest whole number. The positives of this are that you don't run into issues with encoding things like .1 that you do in binary
Nonsense. Binary floating-point too normalizes to the largest possible significand. It then omits the leading 1 from the representation, since the leading 1 is by definition a 1. That this trick is simple to pull in binary is only one of the advantages of a base-2 floating-point representation.
The fact that 0.1 is not representable in binary has nothing to do with the choice of representing it as 0x1999999999999a * 2^-56, because it already is, and that does not make it representable.
Instead of being stuck with yet another inaccurate number type for integers, I want to see hardware assisted bigints. Something like:
- A value is either a 63 bit signed integer or a pointer to external storage, using one bit to tell which.
- One instruction to take two operands, test if either is a pointer, do arithmetic, and test for overflow. Those cases would jump to a previously configured operation table for software helpers.
- A bit to distinguish bigint from trap-on-overflow, which would differ only in what the software helper does.
- Separate branch predictor for this and normal branches?
I don't know much about CPUs, but this doesn't seem unreasonable, and it could eliminate classes of software errors.
I see no need to provide a single instruction for this. It can already be implemented in a couple of instructions in standard instruction sets (I am familiar with OCaml's Zarith, which does exactly this: https://forge.ocamlcore.org/scm/viewvc.php/*checkout*/trunk/... . Other implementations surely exist).
The only inefficient aspect of the current implementation is that most operations contain two or three conditional jumps, two for the arguments and sometimes one for the possibility of overflow.
The way modern, heavily-pipelined processors are designed, any instruction that could possibly need to obtain an address from an address and jump to there would have to be micro-coded. Also, from the instruction dependency point of view, the way modern, heavily-pipelined processors are designed, all instructions dependencies are fetched as early as possible (before the instruction is executing and it is known which may be ignored). This is why cmov is sometimes worse than a conditional branch. The entries of the “previously configured operation table” would need to be fetched all the time. Again, the simplest way not to fetch them all the time would be to micro-code the instruction, meaning that it would take about as much time as a short sequence of instructions that expands to the same micro-instructions.
There used to be a way in IA-32/x86_64 to annotate conditional branch instructions with a static prediction (cs: and ds: prefixes), which could be useful regardless of the approach, but nowadays I believe this annotation is ignored altogether.
I agree that static prediction would be nice. I wonder if the new Intel bounds checking extension can be abused for that.
My main problem with writing it out is code size, since ideally you want every single integer operation in your nice fast C-like-language program to expand to that kind of sequence. But maybe it doesn't actually matter that much.
> Those cases would jump to a previously configured operation table for software helpers.
What you want is called software interrupt or exception. MIPS and Alpha can do it for integer overflows. SPARC can do it for the two bottom tag bits of integers (to distinguish integers from pointers). I bet that if you only wait long enough, you will eventually see an architecture with both features. ;-)
In the meantime we have to fall back to conditional jumps. :-/
I've always thought that something like a REPNZ ADCSB/W/D/Q would be very useful for multiprecision math. Add two numbers in memory, propagate carry, and repeat until you get to the end of the number.
"DEC64 is intended to be the only number type in the next generation of application programming languages."
FFS, just because the designers of JS couldn't numerically compute their way out of a paper bag doesn't mean that FUTURE languages should be saddled with that mistake.
Where's exp(), log(), sin(), cos()? He did the bare minimum amount of work and left all the interesting functionality out. We have relatively fast ways of dealing with these in IEEE754, but I see no equivalent here.
I really don't want to rehash the arguments that applied 30 years ago, as they do today. Decimal floating point is good for some things, but to be considered the "only number type in the next generation of programming languages" is laughable.
It can provide very fast performance on integer values, eliminating the need for a separate int type and avoiding the terrible errors than can result from int truncation.
What! I do not see how making a float fast on integers should ever eliminate the need for an int type. Ints and Real-approximations like Dec64 are simply different things entirely.
It's annoying enough that this happens in Javascript.
Why does anybody listen to Crockford? He wrote a human-readable data format but refused to give it comments. He wrote an obnoxious linter that bans continue;. Now he's proposing to replace integers with decimal floating point type (what happens if someone wants to store a 64-bit integer?).
As a succinct example, 64-bits is 584 years of nanoseconds. 56-bits is only 2 years of nanoseconds.
The problem is that many extant APIs return 64-bit integers, so if your language only has 56-bit integers you are creating a bug/vulnerability every time you want to talk to the outside world.
e.g. sqlite has sqlite3_column_int64() to get a value from a result set. How do you use that safely if your language can only do 56-bit ints? Ugly.
Remember the "Twitter Apocalypse" when Twitter changed their tweet IDs to be bigger than 56-bits and all the JS programmers had to switch to using strings?
EDIT: I also reject the premise that just because there's an ugly hack available, we can get rid of useful language features. Am I working for the language or is the language working for me?
The question is, why do APIs return 64-bit values? In general it's not because they need all 64 bits, it's because the 64-bit integer type is convenient for them. This might make Crockford's proposal completely impractical but it doesn't invalidate the argument that led to it.
I reject the nanosecond example because it's completely arbitrary. 64 bits of picoseconds would only cover 0.584 years so should we claim 64 bits isn't enough? Wouldn't 2000 years of microseconds in 56 bits be good enough?
I'll give you credit for bitboards though, that's one I hadn't considered.
Still, hardware is inherently base-2. What I'd like to see is hardware assisted adaptive precision predicates and compilers/runtimes that make proper use of modern instructions.
I have never ever thougt that float/double/decimal was too much choice.
Wrong. JavaScript has a binary floating point type.
This is proposing a decimal floating point type.
Which is actually not a bad idea. This "specification" is, however, laughable. Especially since there are quite good decimal floating point specification out there that he could have just stolen^H^H^Hborrowed from.
Which is not bad if that floating point type is decimal. All of your integers do what they're supposed to. In addition, things like the standard programming fail of using "i += 0.1;" as a loop iterator actually work when you use decimal floating point.
Also, things like converting to/from strings become operations with a priori bounded limits just by taking a quick look at the number of digits if you have decimal floating point.
This is not true for binary floating point. There is a reason why printf libraries are so blasted huge and have malloc issues--it's the binary floating point conversions.
Much of the stupidity we encounter in dealing with floating point (inexact numbers on exact decimals, silly rounding, inability to handle significant figures, weird conversion to/from strings) simply goes away if you use decimal floating point.
In addition, if you go back through the ECMAScript archives, they actually thought about this back in ECMAScript4 but rejected it for some good reasons. I don't agree with those reasons, but they did think about and discuss them.
To clarify, they are basic floating-point errors from C that happen to be errors for integers as well in Javascript, because Javascript does not support fixed-precision integers.
I know the author despises types, but sometimes types provide valuable semantic information. Like, indexing into an array using a float is completely unnecessary and meaningless.
It's not completely unnecessary and meaningless. In computer graphics, floating point indexing is used to interpolate between texel values during texture look-up.
I don't typically think of it as indexing an array, but it's conceptually similar in some circumstances I suppose. The hardware still eventually has to read values from integer-based addresses, though. As to the details:
Sampling modes customize how the value calculation is done. Some will simply round the index to the nearest integer texel, resulting in only one value being accessed (cheap.)
Others will interpolate via weighted average as you're thinking, in the 1D case. Then there's the 2D and 3D cases... plus mipmaps... plus anti-aliasing patterns... at which point you could easily have 16+ samples bearing little resemblance to "indexing an array", especially as one can apply all these interpolation techniques to generic functions such as perlin noise which aren't array based.
I do not see any basis for the statement 'very fast performance on integer values', given that simple calculations like substraction and checking of simple equality require many operations in the proposed standard.
> DEC64 is a number type. It can precisely represent decimal fractions with 16 decimal places, which makes it well suited to all applications that are concerned with money.
From the reference code
> Rounding is to the nearest value. Ties are rounded away from zero.
> This variant of the round-to-nearest method is also called unbiased rounding, convergent rounding, statistician's rounding, Dutch rounding, Gaussian rounding, odd-even rounding, bankers' rounding, broken rounding, or DDR rounding and is widely used in bookkeeping. This is the default rounding mode used in IEEE 754 computing functions and operators.
Rounding towards zero isn't compatible with financial calculations (unless you're performing office space style bank theft), so this should never be used for numeric calculations involving money. I wonder what he was really trying to solve with this since he missed a fairly important aspect of the big picture. That being said, there's no problem statement on his entire website to address what actual problem he was trying to solve, so all we see is a half baked solution for some unknown problem. On the plus side, at least he didn't use the "for Good, not Evil" clause in the license this time.
The superman penny shaving attack is not related to rounding behavior. You may multiply with fractional values, but when you add or subtract from a ledger, you are doing so in fixed precision (usually cents). If you were calculating something like interest and consistently rounding in one direction, you'd be either over or undercharging people, but the money wouldn't vanish.
In other cases, you need to actually think about rounding mode (e.g. how many shares of AAPL can I buy with $5000).
> there's no problem statement on his entire website to address what actual problem he was trying to solve
This is very true. I don't understand what problem this is trying to solve, because I don't spend a lot of time confused by the internal detail that a double is binary. The storage format isn't important beyond knowing roughly how many digits of accuracy I can count on.
So is this primarily a performance vs. convenience thing?
If both base2 and base10 floating point are implemented in hardware, what makes base10 inherently less efficient?
Also, I don't have a good intuition for the difference in what numbers can be exactly represented. I'd love to see this represented visually somehow.
Double precision can exactly represent integers up to 2^53, then half of integers between 2^53 and 2^54, then a quarter of integers between 2^54 and 2^55, etc.
Dec64 would be able to exactly represent integers up to 2^55, then 1/10th of all integers between 2^55 and 10(2^55), then 1/100th of all integers between 10(2^55) and 100(2^55).
So the "holes" are different, so-to-speak. How would this affect accuracy of complicated expressions?
Decimal floating point would hardly affect accuracy compared to binary floating point -- with the exception of domains where the inputs and output are strictly human-readable and human-precise constructs (e.g. banking... is there a better term for this?).
Binary floating point can accurately represent fractions of the form 1/(2^n). Decimal floating point can accurately represent fractions of the form 1/((2^n)*(5^m)). Either can only approximate fractions with other prime factors in the denominator (1/3, 1/7, 1/11, ...).
In terms of a programmer having to be concerned with accumulated errors due to approximations in the representation, I'd assert that decimal floating point in no way changes the scope of concerns or approach to numeric programming compared to binary floating point. I'd guess even in a domain with a very precise need of fractional decimals, a programmer would still need to reason about and account for numeric representation errors in decimal floating point such as overflow and underflow.
> Decimal floating point would hardly affect accuracy compared to binary floating point
The larger the radix, the more of the mantissa is wasted. Given fixed storage, base 2 floats will have higher precision over their range than higher bases, like 10 and 16.
The difference is easy to illustrate with base 16 and base 2, since we can convert between the two easily. Converting a base 16 float to base 2 will result in leading zeroes some of the time, which could have been used for storing data. The same is true with base 10, but you have to do more math to demonstrate it.
The thing that remains most compelling about decimal encodings is in the types of errors seen and how human-comprehensible they are.
For example, arithmetic operations in binary floating point are notorious for producing mystery meat error quantities, because the actual error quantity is in binary and only later represented as a decimal amount. And when coding it is most natural to account for floating point errors in decimal terms, even though this is a false representation.
So the main thrust of any decimal float proposal comes from "this is more in tune with the way humans think" and not the specific performance and precision constraints.That said, I have no special insights into Crockford's proposal.
I imagine it's because binary exponents correspond very well to binary arithmetics, while decimal exponents don't.
Imagine you have a floating point format which has 8-bit mantissa (i.e. 8 bits to store the digits, without the floating point). You're trying to calculate 200 + 200. In binary, that's
0b11001000 + 0b11001000 = 0b110010000
However, to represent the result you would need 9 bits, which you don't have, so you instead represent it as 400 = 200 * 2 = 0b11001000 * 2 ^ 1. Notice how the resulting mantissa is just the result (0b110010000) shifted one bit.
If your exponent is decimal exponent, you would instead have to represent 400 as 400 = 40 * 10 = 0b101000 * 10 ^ 1. In this case, the resulting mantissa has to be calculated separately (using more expensive operations), as it has no connection to the mathematical result of the operation.
Because division/multiplication by powers of two is a simple bitshift and can be implemented very easily on silicon. Division/multiplication by ten is complicated and needs many more gates and more time.
It's not inherently slower, however you have to add extra logic to correctly handle overflows and the like. However given the size of modern floating point units I doubt it's a massive overhead. Basically they would need to convert back and forth between the native zeros and ones of hardware and the decimal representation. And extra logic might mean slower hardware in certain circumstances.
Basically try to implement a BCD counter in verilog and you'll see where the overhead appears compared to a "dumb" binary counter.
In practice it would be slow because not a whole lot of CPU architectures natively handle BCD. If this "standard" goes mainstream maybe the vendors will adapt and make special purpose "DEC64 FPU" hardware.
I'm not really sure what's the point of using this floating point format outside of banking and probably a few other niche applications. For general purpose computing I see absolutely no benefits.
It's not inherently slower. It's a question of economics. How much are people willing to spend to get a CPU to make it be fast? With IEEE 754 math, there's a lot of monetary incentive because a lot of code uses 754. With a new decimal class, there is much less inventive.
Because it's easy to do things in base 2 with binary signals (like the kind used in computers). It's unnatural to use base 10 for anything in this domain.
It would be extremely unintuitive if math could be done faster in any base except for 2 (or a power thereof) when running on a modern CPU.
GCC, ICC, IBM C, and HP C have adopted the extension. Links to the respective software implementations can be found under "cost of implementation" in the C++11 proposal above, except GCC, which uses IBM's library. Its support is documented here: http://gcc.gnu.org/onlinedocs/gcc/Decimal-Float.html
Meanwhile, hardware support so far exists in POWER6 and 7 and SPARC64 X.
This may seem like slow adoption if you aren't accustomed to the glacial pace of standards processes. Especially in the context of a component as critical as floating point numerics. If there is any holdup it would be lack of demand for this format, which the article's proposal doesn't affect.
Python has three basic numeric types, unbounded integers, floats (with the precision of C doubles), and complex (for example, 5+0.3j). However, its standard library includes both a decimal type and a fraction type. The decimal type has been in the language for over 9 years. The documentation is quite clear [1].
Python's decimal type conforms to IBM's General Decimal Arithmetic Specification [2] which is based on IEEE 754 and IEEE 854. Python's decimal type is very complete. For example Python supports the complete range of rounding options found in these specifications (ROUND_CEILING, ROUND_DOWN, ROUND_FLOOR, ROUND_HALF_DOWN, ROUND_HALF_EVEN, ROUND_HALF_UP, ROUND_UP, and ROUND_05UP). By comparison Dec64 supports only one.
Despite having used Python on and off for over 10 years, I've never felt a need to use this decimal type. Integers and floats seem to work for me (although it's nice to have a good decimal implementation available if I need it).
I don't think this is necessarily an appeal to authority, for two reasons.
First, because some people will treat that authorship credit as a warning label rather than authority; that would make it ad hominem, not appeal to authority.
Second, because it actually gives useful context to some parts of this spec: "Oh, it makes sense that the author of a language without integer types would propose a spec that claims you don't need integer types". That's not a logical fallacy at all; that's perfectly reasonable reasoning.
"An appeal to authority is an argument from the fact that a person judged to be an authority affirms a proposition to the claim that the proposition is true."
I don't think the person you replied to was claiming that DEC64 was good or bad because it was written by Crockford? Didn't feel like he was passing any judgement, merely pointing out who the author was...
Rightly or wrongly, Crockford is seen as somewhat of a "god" among the JavaScript crowd, due to his "JavaScript: The Good Parts" book and his work on JSON. Within that crowd, he's seen to posses an extremely high amount of authority. So I think that many of them would see this other work as superior based on his involvement alone, without investigating the ideas and proposals themselves any further.
It doesn't logically follow that it is good because an authority has created it. But it definitely gives us an information gain, in the information theory sense.
A specific thing I haven't seen anyone mention is that NaN is equal to NaN in this, which is quite different from IEEE FP. Although it's a thing that often seems counterintuitive at first glance, doesn't this kind of ruin the error-taint quality of NaN?
How does he explain storing simple integers in a reasonable amount of space? Any decent programmer will avoid using too many bits for a variable that never exceeds a certain value (short, int, etc). It seems rather foolish and arrogant to claim this one half-implemented number type can satisfy everyone's needs in every programming language.
Next week he'll have a number format with 48 mantissa bits, 8 exponent bits and 8 unsigned base bits to define a base value between 0 and 255. Look at all the performance and simplicity involved!
"Any decent programmer will avoid using too many bits for a variable that never exceeds a certain value (short, int, etc)."
Why? Why aren't you use half-bytes also?
If all your pointers are 64-bit aligned, all your variables are 64-bit aligned and your processor isn't any faster processing 16-bit numbers - if it even have instructions to process those - than 64-bit numbers?
All your variables are not 64-bit aligned. An array of 16-bit integers will generally use 16 bits (2 bytes) per integer. So will two or more subsequent integers in a struct. In general, variables smaller than the alignment of the CPU (but still power of two sizes) only need to be aligned to their own size.
I actually use half-bytes when it makes sense; my language of choice has bit-vectors so I can use exactly the number of bits I desire.
> If all your pointers are 64-bit aligned, all your variables are 64-bit aligned and your processor isn't any faster processing 16-bit numbers - if it even have instructions to process those - than 64-bit numbers?
Maybe I have an array with at least 4 16-bit numbers? If I'm counting bits, then it already means I have a lot of numbers. If I have 2 billion numbers in the range [0,15] Then I can easily represent them in an array of 4 or 8 bit values, but will run into performance issues trying to do so (if I can at all) using a similar array of 64 bit values.
> If all your pointers are 64-bit aligned, all your variables are 64-bit aligned and your processor isn't any faster processing 16-bit numbers - if it even have instructions to process those - than 64-bit numbers
Memory bandwidth. If the processor can read a single 64-bit integer in a clock cycle, it can read 4 16-bit ones just as well. Memory is slower than the core.
As an example with AVX instructions you can process 8 floats at the same time, compared to 4 doubles. So if float is enough for you you can expect double performance in either memory transfer bound or ideally vectorizable algorithms.
And in mobile computer graphics 16bit values are common.
This is some serious amateur hour. The most glaring problem is this:
> There are 255 possible representations of zero. They are all considered to be equal.
There are also 255 representations of almost all representable numbers. For example, 10 is 1 x 10^1 or 10 x 10^0 – or any one of 253 other representations. Aside from the fact that you're wasting an entire byte of your representation, this means that you can't check for equality by comparing bits. Take a look at the the assembly implementation of equality checking:
The "fast path" (which is ten instruction) applies only if the two numbers have the same exponent. The slow path calls subtraction and returns true if the result is zero. The implementation of subtraction falls back on yet another function, which jumps around even more:
For most comparisons (no, comparing numbers with the same exponent is not the norm) it will take around FIFTY INSTRUCTIONS TO CHECK IF TWO NUMBERS ARE EQUAL OR NOT. Many of these instructions are branches – and inherently unpredictable ones at that, which means that pipeline stalls will be normal. All told, I would expect equality comparison to typically take around 100 cycles. It's not even clear to me that this implementation is correct because at the end of the subtraction, it compares the result to the zero word, which is only one of the 255 possible representations of zero. The lack of a single canonical representation of any number is just as bad for other arithmetic operations and comparisons, if not worse.
Crockfords bugaboo with IEEE 754 floating-point is bizarre, verging on pathological. He devoted a section in his book "JavaScript: The Good Parts" to a rather ill-informed rant against it. When I saw him give a talk, I took the opportunity to ask him what he thought would be a good alternative to using IEEE 754. His answer was – I shit you not – "I don't know". Apparently this proposal is the answer. No thanks, I will stick with the amazingly successful, ubiquitous, thoroughly thought out standard, that was spearheaded by William Kahan – one of the greatest numerical analysts of all time. Anyone who doesn't appreciate how good we have it with IEEE 754 should really read "An Interview with the Old Man of Floating-Point" [1], in which Kahan relates just how messed up this stuff was before the IEEE 754 standardization process. It should also be noted that there already is an IEEE standard for decimal floating-point [2], which is not only infinitely better thought out than this drivel, but also is already implemented in hardware on many systems sold by IBM and others, specifically for financial applications.
> There are also 255 representations of almost all representable numbers. [...] Aside from the fact that you're wasting an entire byte of your representation
How is this different than any other floating-point representation? I'm pretty sure IEEE floating-point has the same redundancy, though numbers are normalized so comparisons are cheaper as you note. But IEEE doubles "waste" even more bits due to the 2^52 representations of NaN.
> For most comparisons [...] it will take around FIFTY INSTRUCTIONS TO CHECK IF TWO NUMBERS ARE EQUAL OR NOT.
Good point, sounds like a notable weakness and barrier to adoption.
> Crockfords bugaboo with IEEE 754 floating-point is bizarre, verging on pathological.
He calls it "the most frequently reported bug in JavaScript." Wouldn't you be interested in improving on the most common cause of user confusion in a technology you care about?
IEEE 754 doesn't waste any bits – there is only a single representation of each value (except for multiple NaNs). In this proposal, there are 255 representations of most values, which means that it has almost an entire byte of redundancy. The waste is bad, but the lack of a canonical representation of each value is worse.
I personally think that the way to handle floating-point confusion is better user education. However, if you really want a decimal standard, then, as I mentioned above, there already is one that is part of the IEEE 754 standard. Not only do there exist hardware implementations, but there are also high-quality software implementations.
A better approach to making things more intuitive in all bases, not just base 10, is using rational numbers. The natural way is to use reduced paris of integers, but this is unfortunately quite prone to overflow. You can improve that by using reduced ratios of – guess what – floating point numbers.
> There are also 255 representations of almost all representable numbers. For example, 10 is 1 x 10^1 or 10 x 10^0 – or any one of 253 other representations.
You are not correct. The smallest significand possible is 1x10^1, but you can't delve further into positive exponents. Conversely, 56 signed bits allows the largest integer power of 10 as 10 000 000 000 000 000 so the exponent will be -15. So there are exactly 17 representations of 10, and that's the worst it gets. All other numbers except powers of 10 have fewer representations, and most real world data affected by noise has a single representation because they use the full precision of the significand, and you can't shift them to the right or left without overflow or loss of precision.
So the redundancy is much less than you think, one in 10 real values has two representations, one in 100 has three etc. This is common for other decimal formats and not that big of a problem, detecting zero is a simple NOR gate on all significant bits.
The real problem with this format is the very high price in hardware (changing the exponent requires recomputing the significand) and complete unsuitability for any kind of numerical problem or scientific number crunching. Because designing a floating point format takes numerical scientists and hardware designers, not assembly programmers and language designers.
Heck, the only reason he put the exponent in the lower byte and not the upper byte, where it would have ensured a perfect compatibility to most positive integers, is that X64 assembly does not allow direct access to the upper byte.
You are right. I forgot about the interaction between the exponent and the significand. The lack of a canonical representation is still quite problematic.
In one format, IEE754 has 24576 possible representations of zero[1], which fits your definition of "wasted bits". Some of your other criticisms might be valid, but at this point I'd like to see an accurate technical comparison between DEC64 and the decimal formats of IEEE 754.
This is why decimal floating-point formats are kind of a disaster in general and are only implemented in relatively rare hardware intended for financial uses. In many of those applications, using a decimal fixed point representations is better – i.e. counting in millionths of pennies (you can still count up to ±9 trillion dollars with 64 bits). But yes, a technical comparison of different decimal formats would definitely be interesting. I suspect that despite the occasional failure of intuitiveness, we're far better off with binary formats and better programmer education.
I feel like that bug has more to do with Javascript's (and clearly Crockford's) inane desire to pretend it is the case that all numeric types are the same.
Nobody who does anything with numbers believes that! Even if all you can do is count your fingers you believe in the difference between integers and floats. They have different algebraic properties entirely and it takes a whole hell of a lot of work to get from one to the other---there's even a whole class (fractions) in between.
I'm not sure what that has to do with it. Even if you are ok with the idea that integers and "decimal numbers" are different, it's still confusing that 0.1 + 0.2 != 0.3.
It's confusing because it is very difficult to look at a decimal number and know whether it can be represented exactly as base-2 floating point. It's especially confusing because you get no feedback about it! Here is a Ruby session:
The precise value of double(0.1) is 0.1000000000000000055511151231257827021181583404541015625. That is precise, not an approximation.
If you know of a program in any of these languages that will print this value for "0.1" using built-in functionality, please let me know because I would love to know about it.
Likewise the precise value of double(1e50) is 100000000000000007629769841091887003294964970946560. Anything else is an approximation of its true value.
In another message you said that what's really important is that the string representation uniquely identifies the precise value. While that will help you reconstruct the value later, it does not help you understand why 0.1 + 0.2 != 0.3.
It helps because 0.1 + 0.2 produces 0.30000000000000004 for 64-bit floats – so at least you can see that this value isn't the same as 0.3. In Ruby you just get two values that print the same yet aren't equal, which is way more confusing. I agree that printing the minimal number of digits required for reconstruction does not help with explaining why 0.1, 0.2 and 0.3 in 64-bit floats aren't the real values 1/10, 2/10 and 3/10.
rasky at monocle in ~
↪ python
Python 2.7.5 (default, Sep 2 2013, 05:24:04)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.0.68)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1
0.1
>>> ^D
rasky at monocle in ~
↪ python3
Python 3.3.3 (default, Dec 24 2013, 13:54:32)
[GCC 4.2.1 Compatible Apple LLVM 5.0 (clang-500.2.79)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 0.1
0.1
It used to, but it was changed to reduce users' confusion.
We may be speaking across each other here. Ruby is lying in the sense that there are multiple distinct float values that it will print as 0.3 – in particular, confusion ensues when two values look the same but are unequal. These other languages print each distinct float value differently, using just enough decimal digits to reconstruct the exact binary value. Ruby doesn't give you enough digits to reconstruct the value you have. Nobody actually prints the full correct value because it's fifty digits long and is completely redundant given that you know you're dealing with a 64-bit float.
Yeah, that's my point. 0.1 and 0.2 do add exactly to 0.3, but in any finite representation of real numbers you'll get rounding errors on some equations like this. If you use an infinite representation then equality is no longer computable.
> There are also 255 representations of almost all representable numbers.
This isn't true. Only zero has 255 representations. All other numbers are limited by the size of the coefficient, 16 digits. So a number with 1 significant digit would have 16 representations, one with two would have 15, etc. Not that I'm defending the format, but if we're critiquing it for being sloppy, we need to be careful about such things.
Could you elaborate on how these points not apply to the IEEE standard for decimal floating-point? From what I can grasp, also the IEEE standard has multiple representations for the same values.
> For most comparisons (no, comparing numbers with the same exponent is not the norm)
I'd say that exponent 0 is the norm, meaning precise numbers in a really wide range, isn't it?
That is not the case – there is exactly one representation for every single IEEE 754 value besides NaN – and checking if something is NaN or not is easy (two instructions). Not only that, but negative and positive values are in integer order, meaning that you can compare same-signed values using integer comparisons.
What I meant by exponent zero not being the norm is that most numbers you work with won't have exponent zero. Specifically only one in 2^16 of the possibly representable values and only one in 2^24 of the possible representations. If you're using the "natural" representation, then only the integers -9 through 9 have zero exponent.
I'm not an expert but you seem to be wrong on this. The wikipedia article you linked to for the IEEE 754 decimal64 format says:
"Because the significand is not normalized, most values with less than 16 significant digits have multiple possible representations; 1×102=0.1×103=0.01×104, etc. Zero has 768 possible representations (1536 if you include both signed zeros)."
I meant the IEEE 754 binary32/64 standard that has been implemented in hardware everywhere since the early 1980s and is what we all actually use on a day to day basis – the usage of which is what Crockford thinks is the biggest mistake in JavaScript. I now see that the above post was asking specifically about "IEEE 754 decimal", which I missed. Yes, there are multiple representations of many numbers in that too. This is why non-binary floating-point representations kind of suck.
Denorms basically don't occur except in very specific numerical cirumstances, so everybody treats these things as unique.
This actually had a significant impact on the implementation of IEEE754 on the DEC Alpha chips. The DEC Alpha microprocessors used a software trap whenever a denormal was encountered.
This was a fine idea--until you start emulating an x86 and discover that one of the big programs you want to support--AutoCAD--buries things into denormals and your performance is dog-slow.
That's not true – you could easily pick one of the 255 representations here as the representation and make sure that all operations always produce that one. Then you're still wasting a whole byte of your representation – and one in 255 bit patterns are invalid – but at least you can check for equality using bitwise comparison, and operations don't have to worry about all the possible representation of each value.
Correction: as was pointed out above, there isn't actually a whole byte of waste in this representation. It varies from 255 representations of zero, 17 of 10^-127, to only one representation of numbers that can't be shifted left or right without overflowing or losing bits.
Agreed. Both Haskell and much of the Lisp family support rational numbers, the latter with them automatically being used instead of imprecise results when doing divisions which can't be represented accurately. I miss them a lot when working in other languages.
>In the bigger picture what i fell we are generally missing in most programing languages is an EASY way to store and handle fractions.
I'm squeamish about that. I fear that a lot of new programmers would overuse a fraction type if it were built in, and this can quickly lead to really poor performance.
In most cases, floating point works just fine. Reducing fractions is slow, so you really only want to use them when perfect precision is absolutely necessary.
What I see happening is someone new to programming hits a point where they need a fractional number type. They look in the documentation and they see "integer, floating point, fraction" and they say "Aha! Fraction! I know what those are."
A simple example: (i will use decimal instead of binary as it is easier and binary suffers from the same stuff but for other numbers)
(10/3)x(9/4) == 7.5
but due to the fact that computers store the value instead of the fraction it would come out as something like 7.4999999999
cause instead of doing
(10x9)/(3x4) it would do 3.3333333333333 x 2.25
I see a few flaws that would prevent this from being a decent hardware type:
1) The exponent bits should be the higher order bits. Otherwise, this type breaks compatibility with existing comparator circuitry.
2) This representation uses 10 as the exponent base. That will require quite a bit of extra circuitry, as opposed to what would be required if a base of 2 was used. Citing examples from COBOL and BASIC as reasons for using base 10 is not a very convincing.
3) With both fields being 2's compliment, you're wasting a bit, just to indicate sign. The IEEE single precision floating point standard cleverly avoids this by implicitly subtracting 127 from the exponent value.
4) 255 possible representations of zero? Wat?
5) This type may be suitable for large numbers, but there's no fraction support. In a lot of work that would require doing math on the large numbers that this "standard" affords, those large numbers are involved in division operations, and there's no type for the results of such an operation to cast into.
6) This data type seems to be designed to make efficient by-hand (and perhaps by-software) translation into a human-readable string. But who cares? That's not a reason to choose a data type, especially not a "scientific" one. You choose data types to represent the range of data you have in a format that makes for efficient calculation and conversion with the mathematical or logical operations you want to be able to perform.
A real alternative to floating points would be a Logarithmic number system (http://en.wikipedia.org/wiki/Logarithmic_number_system) which has been shown to be a "more accurate alternative to floating-point, with improved speed."
You've got the wrong person or you're being really subtly tongue-in-cheek.
Douglas Crockford is the standardizer of JSON and author of "JavaScript: The Good Parts", in which he popularized non-terrible Javascript. Brendan Eich is the creator of Javascript.
However, one nit in terms of interesting architecture: The Burroughs 5000 series had a floating point format in which an exponent of zero allowed the mantissa to be treated as an ordinary integer. In fact, the whole addressing scheme was decimal. The addresses were stored in one decimal digit per nibble, so it was doing decimal at the hardware level.
While this looks interesting at first blush, good luck with DEC64 is intended to be the only number type in the next generation of application programming languages. I think float will be around for a while, what with the blinding speed in today's processors, and the availability of 80 bit intermediate precision.
In my experience float is well understood by very few of those who use it. I have had to explain multiple times to people who have been using floats for years that just because floats have guaranteed minimum 6 significant digits doesn't mean that the result of your calculations will be correct to 6 significant digits.
I thought the advantage of 2's complement was that we only had one zero and no additional conversions to do arithmetic on negatives, simplifying operations and ALU implementations.
Without normalization, how would that work with DEC64? Set both numbers to the highest exponent?
>By giving programmers a choice of number types, programmers are required to waste their time making choices that don’t matter
So I guess making sure that you can never do array[0.5] before your program ever runs doesn't matter? At least not to Crockford apparently, who seems to have some kind of irrational hatred for static type systems.
The name is a bit confusing. I have seen dec32 and dec64 numbers in some mil std ICDs. I can't find any of the documents online but here [1] is a discussion about dec 32 that links to a pdf which briefly describes the dec32 and dec64 formats.
"A later revision of IEEE 754 attempted to remedy this, but the formats it recommended were so inefficient that it has not found much acceptance."
IBM put hardware support for the IEEE 754-2008 Decimal Format in their POWER architecture. The POWER7 clocks 5 GHz. Decimal floating point is only considered slow because Intel and ARM do not have any support for decimal floating point in hardware. Lack of acceptance probably comes from lack of support in standard libraries rather than inefficiency inherent in the standard.
POWER7 has a max clock of 4.25 GHz according to that reference.
Anyhow, clock rate in this case is irrelevant. Look at the instruction latencies. I don't have these handy, but I'd bet $50 at at the chance of winning $10 that decimal floating point instructions (at least the divide) are slower than the IEEE754 ones.
These chips are also ridiculously expensive, so probably not the best benchmark.
As a "specification" this document is laughable. For example, rounding modes and overflow behavior are not addressed. The comment that object pointers can be stuffed into the coefficient field (usually called 'mantissa') is completely non-sequitur. Frankly I am surprised to see such a big name behind it.
I imagine this project is inspired by the sad state of numerical computing in Javascript, but this proposal will surely only make it worse. The world certainly doesn't need a new, incompatible, poorly-thought-out floating point format.
Compare the level of thought and detail in this "specification" to the level of thought and detail in this famous summary overview of floating point issues: https://ece.uwaterloo.ca/~dwharder/NumericalAnalysis/02Numer... ("What every computer scientist should know...")
> DEC64 is intended to be the only number type in the next generation of application programming languages.
I would hate if this became the only numeric type available in JS (in fact, I don't want it at all), but it's dishonest to quote "specification" as you have and then dismiss it as such, as it doesn't claim to be one, the word doesn't appear a single time in it, and if you go through to the github repo, the readme explicitly refers to the page as a "descriptive web page".
In fact, the only line of actual substance in your post is "For example, rounding modes and overflow behavior are not addressed", and it turns out that's only true for the descriptive web page, not the reference implementation.
> I imagine this project is inspired by the sad state of numerical computing in Javascript, but this proposal will surely only make it worse. The world certainly doesn't need a new, incompatible, poorly-thought-out floating point format.
I'll just quote pg here:
Yeah, we know that. But is that the most interesting thing one can say about this article? Is it not at least a source of ideas for things to investigate further?
The problem with the middlebrow dismissal is that it's a magnet for upvotes. The "U R a fag"s get downvoted and end up at the bottom of the page where they cause little trouble. But this sort of comment rises to the top. Things have now gotten to the stage where I flinch slightly as I click on the "comments" link, bracing myself for the dismissive comment I know will be waiting for me at the top of the page.
This format is suicide from a numerical stability POV. It will lose 3 to 4 bits of precision for an operation with mixed exponents, whereas double will only lose one. Any kind of marginally stable problem, say eigenvalues with poorly conditioned data, will yield crap.
From my own understanding, an operation will lose precision iff () the result cannot be represented with 52 significant coefficient bits. Logically, the same happens in IEEE754, the difference is that the loss of precision in DEC64 is always a multiple 3.5 bits, whereas IEEE754 can lose precision in decrements of one bit.
() Maybe excluding over/underflow scenarios.
To clarify, what you say sounds like (I'm pretty sure that's not what you meant) every operation with mixed exponent will lead to loss of precision. From my understanding, that is not the case. It also sounds like every IEEE754 operation with mixed exponents only leads to one bit precision loss while it could lead to much more (in fact to (<total number of fraction bits between the two IEEE754 doubles> - 52).
OK. It looks like the proposed representation represents numbers as a signed 56 bit integer times ten to a signed 8-bit number. That does avoid fiddling with BCD, but it's VERY different from usual floating point. Look at the neighborhood of zero. As with usual floating point there's a gap between zero and the smallest nonzero number... but unlike floating point numbers, the next larger nonzero number is twice the smallest nonzero number. The behavior of numerical methods in DEC64 will, I suspect, be quite different from that of floating point. It would be very interesting to know what the difference is, and I'd hope that proponents of it as "the only number type in the next generation of application programming languages" would exercise due diligence in that and other regards.
I completely accept and agree with your criticism of my criticism.
My gut sense is that this proposal is simply too amateurish to be worth the effort of thoroughly debunking. Floating point is kind of like cryptography: the pitfalls are subtle and the consequences of getting it wrong are severe (rockets crashing, etc). Leave it to the experts. This is not a domain where you want to "roll your own."
> rounding modes and overflow behavior are not addressed
He provides a reference implementation. That means this and many other details are defined by code. Quoting dec64.asm: "Rounding is to the nearest value. Ties are rounded away from zero. Integer division is floored."
> Compare the level of thought and detail in this "specification" to ... [Goldberg]
I don't think this comparison is fair. David Goldberg's text is an introduction to the topic. Douglas Crockford describes an idea and gives you a reference implementation.
> He provides a reference implementation. That means this and many other details are defined by code.
I dream of a day when we stop thinking of "reference implementations" as proper specifications. The whole concept of a "reference implementation" leads to an entire class of nightmarish problems.
What happens if there is an unintentional bug in the reference implementation that causes valid (but incorrect) output for certain inputs? What if feeding it certain input reliably produces a segfault? Does that mean that other implementations should mimic that behavior? What if a patch is issued to the reference implementation that changes the behavior of certain inputs? Is this the same as issuing an amendment to the specification? Does this mean that other implementations now need to change their behavior as well?
Or, worse, what if there is certain behavior that should be explicitly left undefined in a proper specification? A reference implementation cannot express this concept - by definition, everything is defined based on the output that the reference implementation provides.
Finally, there's the fact that it takes time and effort to produce a proper specification, and this process usually reveals (to the author) complexities about the problem and edge cases that may not become apparent simply by providing a reference implementation.
Bitcoin suffers from this problem. In fact, some people in the community frown upon attempts to make mining software or clients when bitcoin-qt/bitcoind can just be used instead, for the exact reasons you mentioned.
The spec for a valid transaction in Bitcoin can currently only be defined as "a transaction that bitcoin-qt accepts." The problem is magnified by how disorganized the source code is.
On the other hand, if you don't have a reference specification you shouldn't declare something standard.
(re: all the web standards that had very wide "interpretation" by different browser efforts, leading to chaos and a whole industry based on fear an uncertainty)
Said reference implementation is written in x86 assembly. 1261 lines, 650 lines sans comments and blanks.
To be fair it is extensively commented, but the comments describe what it does, not why. And for fuck's sake, hundreds of lines of assembly is not a spec, even if it is most readable code in the world
And what if the code has a bug? Code is also difficult to analyze. The people who do numerical computing need to prove theorems about what their algorithms produce.
> The BASIC language eliminated much of the complexity of FORTRAN by having a single number type. This simplified the programming model and avoided a class of errors caused by selection of the wrong type. The efficiencies that could have gained from having numerous number types proved to be insignificant.
I don't agree with that, and I don't think BASIC has much (if anything) to offer in terms of good language design.
I tried finding out whether that "having a single number type" actually were true (the microcomputer versions used a percentage sign suffix (I%, J%) to denote integers).
Off topic: in that PDF (page 4) the letter "Oh" is distinguished from the numeral "Zero" by having a diagonal slash through the"Oh". Yes, that program printed "NØ UNIQUE SØLUTIØN".
That made me think of the periodic rants here on HN about the supposedly neigh insurmountable inconsistencies in mathematical notation.
Not all microcomputer versions of BASIC used the percent sign to designate integers. The one I grew up using (Microsoft BASIC on the TRS-80 Color Computer) used only one representation for numbers, floating point (5 byte value, not IEEE 754 based).
It's addressed, but in the wrong way. IEEE 754 has positive and negative infinity for a reason. Why do there have to be 255 zero values? Also, what about rounding modes? Floating point math is really, really hard and this specification makes it look too easy.
I think this is pretty interesting and appreciate that it's presented with working code.
I think people freaking out that they're taking our precious integers away are being a little brash. A natural number type could easily be built on top of a dec64 type for those cases where you really need only whole numbers.
I seem to remember that "wobble" which is how the relative roundoff error relates to the absolute error, becomes worse as the base becomes larger (the ratio being the base itself). So that may be one disadvantage in using a decimal base.
Was this sweeping pronouncement approved by the Central Committee for the Design of Future Programming Languages (CCDFPL), or is Crockford only speaking for himself?
“By giving programmers a choice of number types, programmers are required to waste their time making choices that don’t matter. Even worse, making a bad choice can lead to a loss of accuracy or destructive bugs. This is a bad practice that is very deeply ingrained.”
This is like saying that the sharp blades on scissors (diverse numeric types) make them prone to causing bodily harm to surrounding people(errors due to constraints of types), then concluding that we should replace all scissors (numeric types) with those rounded plastic scissors(dec64 play dough floats) which come with a play dough set.
Every time somebody has the idea that by deluding developers more we can save them trouble and make them feel safer, we pat that person on the back and follow their recipe.
Then two years later there's a HN post about why you really shouldn't be doing whatever was prescribed.
Please no. There's a very solid case for integers in programming languages: many places in code call for a number which must be an integer: having the type enforce this is nice: you don't need to check and abort (or floor, or whatever) if someone passes you a non-integer. Basically, anything that does an array index, which is any for loop walking through a memory buffer (which might be an array, a string, a file, the list goes on and on. Anything that can be counted) wants an integer. x[i] where i is not an integer just doesn't make sense: ideally, let a type system enforce that.
Granted, of course, that many languages will just truncate a float in an integers context, and so funny stuff does happen (I don't really feel that this is a good thing). (Although interestingly, JS is not one of them.) Personally, I think JS needs an integer type. Especially when you see people start getting into bignum stuff in JS, it gets silly fast, as first you can only store so many bits before floating point loses precision, but even then, even if you have an "integer", JS will find funny ways to shoot you in the foot. For example:
does not hold true for all integers in JS.