We Need Hardware Traps for Integer Overflow

willvarfar · on June 4, 2014

(Mill team)

With apologies to those tired of hearing about the new Mill CPU let me explain how the Mill traps on integer overflow:

For overflow-able arithmetic operations, we support four modes:

* truncate

* except

* saturate

* double-width

With excepting overflow, the result is marked as invalid (we term it "Not-a-Result (NaR)").

As these invalid results are used in further computation, the NaR propagates.

When you finally use the result in non-speculatively e.g. store it or branch on it then the hardware faults.

NaRs have lots of other uses.

This is described in the Metadata talk http://millcomputing.com/topic/metadata/

And http://millcomputing.com/topic/introduction-to-the-mill-cpu-... for a broader overview.

rwallace · on June 4, 2014

I think the article overstates the cost of doing this in software and understates the cost of doing it in hardware.

The cost of doing it in software is just a highly predictable (not taken) branch after every integer arithmetic operation that the compiler can't prove stays within bounds. The article presents no data on this cost. I have none to hand either, but I'm going to predict that on a modern CPU with typical workloads it will be small enough that it would be very hard to measure.

The article speaks as though doing it in hardware would be free, but that's very far from true. The hardware solution might have a nominal cost of 'zero clock cycles' where overflow doesn't occur, but extra transistors in critical, heavily used parts of the CPU core would be burning a small but nonzero amount of energy all the time even on code that doesn't use the overflow check - i.e. the vast majority.

If you think overflow check is a great feature (of which personally I'm not at all convinced), go ahead and add it to a new language or provide it as a library function in an existing language. But imposing it as an inescapable tax on all hardware makes no sense whatsoever.

azakai · on June 4, 2014

> The cost of doing it in software is just a highly predictable (not taken) branch after every integer arithmetic operation that the compiler can't prove stays within bounds. The article presents no data on this cost.

The article does mention a 5-10% cost on JavaScript VMs, as one datapoint. And as another, that languages like Rust choose to allow overflow to avoid the overhead, despite preferring the security of not having silent overflows (implying they measured a slowdown they find unacceptable).

osmala · on June 5, 2014

Here's another point, how much clockspeed suffers from adding this feature? What about instruction latencies? Does it increase the pipeline length for bigger branch missprediction penalty? Does it interfere with microarchitecture optimizations we are using or could be using? All these are performance questions that may result from adding this instruction. There is reason why RISC eliminated most of mixing control flow with arithmetic.

As for his numbers on the performance penalty of doing it in software it surely depends on micro-architecture. All intel x86 processors before HASWELL did only ONE branch per cycle, then in haswell they increased that to two. So the cost of doing it in software should of went down recently. And Intel is adding new instructions to speed up arbitrary precision arithmetics in broadwell so they probably have already thought about this and decided its not worth the cost in hardware.

jcalvinowens · on June 4, 2014

> The cost of doing it in software is just a highly predictable (not taken) branch after every integer arithmetic operation that the compiler can't prove stays within bounds.

While I agree with you that hardware overflow traps are a bad idea, I think the article's author is referring more to the general overhead of a software BIGNUM implementation. Specifically with his JavaScript example, I think it's very plausible that using integers instead of IEEE754 would incur a 10% overhead if you were throwing lots of big enough numbers around.

What the author really wants is a hardware BIGNUM implementation, not overflow trapping. I just don't think he really thought it through.

His contention that simply checking for overflow in C or C++ incurs a 5% overhead is unquestionably false though. Typically one would check for integer overflow in C like this:

  unsigned int do_addition(unsigned int a, unsigned int b)
  {
	unsigned int tmp;

	tmp = a;
	a += b;
	if (a < tmp)
		asm volatile ("nop;" ::); /* Handle it here */

	return a;
  }

Any decent compiler will do the right thing (GCC with -Os):

  0000000000000000 <do_addition>:
     0:	89 f0                	mov    %esi,%eax
     2:	01 f8                	add    %edi,%eax
     4:	73 01                	jae    7 <do_addition+0x7>
     6:	90                   	nop
     7:	c3                   	retq

I think rwallace is absolutely right in saying the overhead of such a check would be very close to zero in the non-overflow case.

cliffbean · on June 5, 2014

There probably is close to zero overhead in a small benchmark. Hardware branch prediction is indeed good. However, it's a finite resource. In a large application, lots of needless branches everywhere translates into fewer branch prediction resources available for the branches that matter, which means more mispredictions. Also, they take up icache, itlb, etc. It's not at all obvious that the overhead would be close to zero in context.

tomp · on June 5, 2014

Would it be possible to mark the JO instruction (jump-if-overflow) as unlikely, so that the CPU would always predict the branch to not be taken, without consuming one branch prediction slot?

jcalvinowens · on June 6, 2014

> so that the CPU would always predict the branch to not be taken

AFAIK there isn't any notion of likely/unlikely in object code on most CPU architectures. I'm only really familiar with x86 and ARM though.

__builtin_expect() in GCC (the "likely()" and "unlikely()" macros in the Linux kernel use this) can do a lot of things to optimize branches, like prefetch instructions and decide whether or not the jump should be the usually taken or usually not taken side, but it can't actually emit instructions that directly tell the CPU "always predict this" AFAIK.

jcalvinowens · on June 6, 2014

> It's not at all obvious that the overhead would be close to zero in context.

That's a fair point, although I think using a single function for all your checked addition like the one above would go a long way towards mitigating the resource waste you mention.

Maybe that's a naive assumption on my part: I suppose you could construct a branch predictor that maintains state based on the function call chain as opposed simply using the address of the branch... but that seems like it would be prohibitively complex.

stormbrew · on June 4, 2014

Note, in case someone blindly takes this as advice: This method of expressing it in C doesn't work for multiplication (you could wrap more than once).

I'm not sure you can express an overflow check for multiplication purely in portable C (without library or intrinsic support), actually. Well, I guess you could break the multiplication into checked additions manually, but that's probably not a great idea.

xenonflash · on June 4, 2014

He also picked one of the most trivial cases - unsigned addition. Signed data, especially signed multiplication, requires a lot more steps.

I'm working on a set of numerical problems in C now that involve checking for integer overflow. The best way of doing software overflow checks depends on the larger scope of the problem. You can knit your checks into various places in your code in ways that avoid unnecessary duplication of computational effort. However, that is a lot of work and has the potential for hidden programmer introduced errors.

The real problem is that languages like C simply don't allow you to take advantage of hardware which already exists in the CPU without writing in-line assembler. A standard set of "checked" math macros which handled the portability issues would probably satisfy most C applications.

Edit: For addition, subtraction, and multiplication, you just take one operand, calculate the largest possible second operand for that data type which won't overflow, and check that the actual second operand doesn't exceed it (remembering to take signs into account). For division and modulus, check for division by zero. For multiplication, division, modulus, negation, and absolute value of signed values, check that you are not negating the maximum negative integer, as integer ranges are not symmetrical (e.g. one byte is -128 to +127).

If you are looping over arrays and have multiple checks for different cases (e.g. negative, positive, etc.), then you can have different loops for different cases and so avoid redundant checks for that data. I'm working on this sort of application, so the above works out best for that. If you're doing something a bit different, then different algorithms may make sense. Unfortunately, there's not universal one-size-fits-all solution to this problem in software.

jcalvinowens · on June 5, 2014

> For addition, subtraction, and multiplication, you just take one operand, calculate the largest possible second operand for that data type which won't overflow, and check that the actual second operand doesn't exceed it

Sure, but that calculation is CPU-dependent, since it depends on how the underlying hardware represents signed integers. By definition, it is impossible to portably check for signed integer overflow in C, as I'm sure you know.

I implemented a simplistic BIGNUM library in C once (that's where I pulled that expanding multiply code in the other comment from). The only truly portable way to do that is to make your bignums sign-magnitude and use exclusively unsigned arithmetic on them. That's what I was envisioning in my original point about performance degradation due to overflow checking.

Realistically of course, most CPU's these days are twos-complement, and you can make signed overflow defined by compiling with "-fwrapv", which I would guess is what you're doing.

xenonflash · on June 5, 2014

Yes I'm assuming two's complement, but there's not a lot of hardware around these days that isn't two's complement. I'm writing a library for something that already assumes two's complement while doing other things.

If the code had to be portable to one's complement hardware, then I would create special cases for that type of hardware. Laying my hands on such hardware for testing would be the big problem, and if you haven't tested it, then how do you know that it works?

As for "-fwrapv", it's not portable either, and I need to cover both signed and unsigned math. It's also not compatible with what I need to link to (I've gone down this road already). I also need to cover the largest native word sizes, so the trick of using a larger word size won't work for me.

I'm only dealing with arrays of numbers though, so I can often amortize the checking calculations over many array elements instead of doing them each time. This is an example of knitting the checks into the overall algorithm instead of using a generic approach.

As things stand, there's currently no universal ones-size-fits-all answer to this problem in most languages.

I do like how Python has handled this - integers are infinitely expandable and simply can't overflow. This comes at the expense of performance though. What this type of solution needs is an option for unchecked native arithmetic for cases where you need maximum speed.

jcalvinowens · on June 6, 2014

> As for "-fwrapv" [...] It's also not compatible with what I need to link to (I've gone down this road already)

I'm actually really curious: exactly what issues did you run into with this? Intuitively I wouldn't think it would be a problem.

stormbrew · on June 4, 2014

It's surprising there aren't at least intrinsics for some of these things in GCC/clang. At least as far as I can tell.

Someone · on June 4, 2014

http://clang.llvm.org/docs/LanguageExtensions.html#builtin-f..., the search for "Checked Arithmetic Builtins"

stormbrew · on June 4, 2014

Nice! Thanks for the link, I admit I didn't do as much looking on the clang side as gcc. Doesn't look like these exist in gcc though, unfortunately. :/

jcalvinowens · on June 4, 2014

> Note, in case someone blindly takes this as advice: This method of expressing it in C doesn't work for multiplication (you could wrap more than once).

Yeah, you're absolutely right.

> I'm not sure you can express an overflow check for multiplication purely in portable C (without library or intrinsic support), actually. Well, I guess you could break the multiplication into checked additions manually, but that's probably not a great idea.

I've never tried to implement this, but couldn't you look at the position of the most significant bit in the multiplicands to guess if the multiply will overflow? You'd have to be okay with false positives I suppose? Naively it would be slow, but if you have a hardware CLZ instruction it could be pretty fast.

I did find this on SO:

  x = a * b;
  if (a != 0 && x / a != b) {
    // overflow handling
  }

That makes sense, but of course it would be HORRENDOUSLY expensive.

If you're willing to not be non-portable, you could just do expanding multiplies and check the high word:

  struct exmulres {
        unsigned long hi;
        unsigned long lo;
  };

  static inline __attribute__((always_inline)) struct exmulres do_one_expanding_multiply(unsigned long a, unsigned long b)
  {
        struct exmulres res;
        #if defined(__x86_64__)
                asm ("movq %0,%%rax; mulq %1; movq %%rdx,%1; movq %%rax,%0"
                : "=r" (res.lo), "=r" (res.hi) : "0" (a), "1" (b) : "%rax","%rdx");
        #elif defined(__i386__)
                asm ("movl %0,%%eax; mull %1; movl %%edx,%1; movl %%eax,%0"
                : "=r" (res.lo), "=r" (res.hi) : "0" (a), "1" (b) : "%eax","%edx");
        #elif defined(__arm__)
                asm ("umull %0,%1,%2,%3"
                : "=r" (res.lo), "=r" (res.hi) : "r" (a), "r" (b) :);
        #else
                #error "No expanding multiply assembly has been written for your architecture"
        #endif
        return res;
  }

It probably wouldn't be very difficult to fill that out for all the CPU architectures Linux supports, which would cover all your bases unless you're doing super embedded stuff.

EDIT: Typo

stormbrew · on June 4, 2014

Yeah that div check works I guess, but as you say is horribly expensive.

For the latter it looks like at least on recent versions of gcc, you can do something like (WARNING: Just a vague proof of concept, please don't use it. It might kill kittens):

    uint64_t func(uint64_t a, uint64_t b)
    {
        unsigned __int128 res = (unsigned __int128)a * (unsigned __int128)b;
        if (res >> 64) {
            asm volatile ("nop;" ::); /* handle here */
        }
        return (uint64_t)res;
    }

Which compiles to this in gcc -Os and -O3:

    0:	48 89 f8             	mov    %rdi,%rax
    3:	48 f7 e6             	mul    %rsi
    6:	48 85 d2             	test   %rdx,%rdx
    9:	74 01                	je     c <func+0xc>
    b:	90                   	nop
    c:	f3 c3                	repz retq

Which I'm pretty sure could be optimized further by using the overflow flag, but the compiler doesn't seem to want to ignore the upper word.

edit: and it probably suffers the same problem as described here: https://news.ycombinator.com/item?id=7848771 -- and also probably has general issues with signedness that would need to be further checked.

edit2: changed it to uints to simplify and avoid a whole class of problem. I still don't think you should go use this, though.

(inspiration from SO: http://stackoverflow.com/a/13187798)

danbruc · on June 4, 2014

Looking at the GCC source [1] starting at line 150 this is almost what GCC actually does - perform the multiplication with twice the operand size and then check by shifting right but GCC compares the result of two shift operations differing by one in the shift amount.

[1] https://github.com/mirrors/gcc/blob/master/libgcc/libgcc2.c

stormbrew · on June 4, 2014

For future reference, you can link to a specific line in github with #LNNN. Which you can get by clicking on the line number. Pretty handy.

https://github.com/mirrors/gcc/blob/master/libgcc/libgcc2.c#...

sounds · on June 4, 2014

In x86 assembly almost all multiplication operations return a result double the width of the inputs: http://www.aldeid.com/wiki/X86-assembly/Instructions/mul

Checking for overflow is just a matter of testing if the high-order bits are zero.

pbsd · on June 4, 2014

The correct thing to do is also to check the carry and overflow flags after (I)MUL, which does the right thing even when only the lower bits of the multiplication are saved.

Here's an example where there is overflow, but your suggestion would not catch it:

    mov eax, 0x7fffffff
    mov ecx, 2
    imul ecx ; edx:eax = eax * ecx
    ; edx = 0
    ; eax = 0xfffffffe
    ; overflow flag = 1

There is signed overflow here, but testing edx (the higher word of the result) would not catch it.

sounds · on June 4, 2014

Yes. I decided not to mention IMUL to keep the post simple.

There are also issues with detecting overflow of SSE multiplies (signed and unsigned).

stormbrew · on June 4, 2014

Yep, in assembly it's usually no question possible. That's why I specified portable C.

C has some interesting gaps for something that's long been described as a macro for assembly. A rotate operator would also be nice.

maxlybbert · on June 5, 2014

There are better checks. I'm generally not a fan of Microsoft, but the SafeInt library ( https://safeint.codeplex.com/ ) does those kinds of checks in sofware.

I don't remember where I read it, but regarding multiplication: if you multiply two N-bit numbers together, the result will always fit in a 2N-bit number. For instance, the product of two 16 bit numbers will fit in 32 bits, the product of two 32 bit numbers will fit in 64 bits, etc. You can assign the product to an integer able to hold the larger size and then check if it would also fit in an integer of the smaller size.

Someone · on June 4, 2014

A smarter way is to split both numbers to multiply into two halves, and pretend to do two-word arithmetic on a machine with words half the size.

In decimal, that means that you compute the product of two-digit numbers using single-digit multiplications and shifts (= multiplications by 10):

  23 x 67
  = (20+3) x (60+7)
  = (2x3) x 100 + (2x6) x 10 + (3x7) x 10 + 3x7

deathanatos · on June 4, 2014

> I have none to hand either, but I'm going to predict that on a modern CPU with typical workloads it will be small enough that it would be very hard to measure.

I write in a language that does (sort of) this (Python!), and I'd argue that I'd agree. Python gives, by default, integers that aren't bounded, so overflow doesn't happen. Of course, under the hood, it must be keeping track of all the bits of the number somehow, and determining when it can't store a section of the integer in the underlying uintxx_t; it must use some sort of overflow here, even if perhaps it's not truly overflowing. Integer objects (for those interested) are 28 bytes by default and more if they get larger; a rough test with multiplication just now seems to take ~25-50ns (compared to ~2-3ns for a int32t in C) and in my experience, most of the time, this doesn't matter.

With this flurry of new languages, I had hoped to see more of "integer" just being an integer, and fixed-sized integers being a special case. Often, I feel I really don't know what the bounds on the input will be. (Sometimes, there aren't, aside from available memory constraints.) Obviously, there will need to be code that needs to run fast and can use a more restrictive type.

Something like, `ranged_integer<0, 200>`, which I can use when I know my inputs should never exceed [0, 200], and has some well-defined behavior if I over/underflow. (Or even, configurable!) In the `ranged_integer<0, 200>` case, I'd expect the compiler/library to be able to easily optimize that into a `uint8_t`. (You could have `native::uint8_t`, which is just whatever the processor has, and you're on you're own, but it's more obvious that you're on your own.)

brigade · on June 4, 2014

Eh, new instructions are added all the time. Just look at BMI - I'd be willing to bet the static power cost of adding that dwarfs what trapping arithmetic would cost. And most of those instructions are more esoteric and/or give less benefit over existing instructions than trapping arithmetic would.

Also the current branches for overflow checks might be well-predicted, but only if it uses branch prediction resources that could go to other branches. Not to mention the extra codesize and issue resources taken.

Dylan16807 · on June 4, 2014

The processor is already calculating overflow with every operation. All it has to do is allow that bit to trigger a trap.

crest · on June 4, 2014

Which isn't as trivial as you are implying because it requires a feedback back trap handling parts of the core.

stephengoodwin · on June 4, 2014

I agree and would add: What do you do when an integer overflow occurs? This would add additional complexity to transferring and handling this knowledge all way up to the application level.

userbinator · on June 4, 2014

It's a little amusing that x86 has the INTO instruction, a single byte opcode at position CEh, that was designed specifically for this purpose and was there since the 8086, but when AMD designed their 64-bit extensions, it turned into an invalid instruction (and Intel was forced to go along, presumably for compatibility.) A rather shortsighted move, I think; instead of having a possibly useful (but not previously often-used) instruction, that single-byte opcode becomes wasted in 64-bit mode. With it, adding overflow-trapping arithemtic support to a compiler would be trivial and only add 1 extra byte to each arithmetic operation that needs it.

Ditto for BOUND, which is useful for automatic array bounds-checking - it performs both lower and upper bounds checks.

Also, I don't really get why integers wrapping around should be "unexpected". It is only to those who don't understand how integers work. The saying "know your limit, play within it" comes to mind.

ScottBurson · on June 5, 2014

Also, the original SPARC chips had TADDCCTV, which would add two 32-bit integers, trapping either if there was overflow or if either operand had a nonzero value in its two low-order bits. This was specifically to support tagged fixnum addition in Lisp.

They dropped this instruction for the 64-bit SPARCs.

JoshTriplett · on June 4, 2014

Those instructions generate an interrupt; you'd have to define the OS ABI to make those instructions trap back to the application in a catchable way.

userbinator · on June 5, 2014

There's an existing way to do this already: SIGFPE + the UNIX signal-handling mechanism. Integer overflow is even mentioned as one of the possible causes of this signal. The BSDs have a defined constant FPE_INTOVF_TRAP for it too. On Windows, SEH is an equivalent functionality.

ahomescu1 · on June 4, 2014

If you put the INTO immediately after the overflowing instruction, the OS could just go back one instruction (harder than it seems, but not impossible). Wouldn't that work?

JoshTriplett · on June 4, 2014

That'd require the OS to define that as part of the ABI.

Also, you almost certainly don't want to just go back an instruction; you want to catch and handle the overflow.

sounds · on June 4, 2014

Modern CPUs don't like traps ("exceptions"). The exception causes a pipeline flush which kills performance for math-intensive code.

For example, detecting integer overflow on x86 and x86_64 CPUs is easy: check the overflow flag after every arithmetic operation. It would only be slightly more difficult to detect overflow for SSE (vector) operations, which would require doing some bit masking and shifting.

For a language such as Swift, building it in is simple.

maggit · on June 4, 2014

The entire premise of the linked article is that doing it the way you suggest is too expensive for the common case, where no overflow happens. Further, it implicitly asserts that the cost of a hardware trap/exception, while great, will be offset by the savings from the common case.

Now, it doesn't back these assertions with much data, but neither do you :)

mmastrac · on June 4, 2014

Exactly -- the premise would be that non-overflowing arithmetic would be the "fast path" and overflow would be truly exceptional.

This is similar to null-reference protection in various languages and runtimes, where null dereferences are caught by mapping guard pages at the beginning of memory. Dereferencing a pointer is optimistic and assumes that it will succeed, the same as arithmetic operations would in this proposal.

al2o3cr · on June 4, 2014

The problem is that potentially-overflowing integer instructions make a real mess out of things like out-of-order execution and speculative execution.

For data to back this, see your favorite computer architecture reference, particularly anything that discusses the consequences of highly-complex instructions in things like the VAX.

brigade · on June 4, 2014

Citing stuff from the 80s RISC movement is pretty outdated nowadays, especially in this specific case where quintessential RISC architectures such as MIPS implemented trapping arithmetic instructions from the start.

solarexplorer · on June 4, 2014

To the contrary, out-of-order execution and speculative execution make precise exceptions trivial to support. As long as exceptions are not frequent, there is no performance pentalty either. It's the older in-order pipelines who struggle with precise exceptions. But there are well-known tricks to work around that too...

vonmoltke · on June 4, 2014

The problem with a hardware implementation is that you break algorithms and coding systems for which integer overflow, or rather integer wrapping, is a necessary piece rather than an exception case. You could, in theory, add an instruction that disables integer exceptions, but that would make the hardware more complicated. You could also redesign the aforementioned algorithms and systems to not rely on the implicit wrapping, but then you make them more expensive.

brigade · on June 4, 2014

No one is advocating eliminating modulo arithmetic - the trapping instructions would be different ops, just like on architectures that already implement this such as MIPS.

vonmoltke · on June 4, 2014

OK, so then its on the compiler to use the non-exception-generating instructions when generating the assembly.

pjc50 · on June 4, 2014

I suspect the best solution would be something like NaN behaviour: carry on with your performance-critical arithmetic and then examine the final result for validity.

jacquesm · on June 5, 2014

Exceptions are - by definition - rare, and when they happen you have other problems than worrying about performance or pipeline flushes.

It means that your program contains a bug.

What's swift got to do with it anyway? A language that was launched last week is now a benchmark for including new features? Has it been open sourced since then? If not how is building it in simple?

acqq · on June 5, 2014

"Those who do not understand the hardware are doomed to blog about it, poorly.

Intel chips already detect overflow, that is what the OF (bit 11) flag in the flags/eflags register indicates. That the most recent operation overflowed.

Testing, and generating a trap, for an overflow is a single instruction (JO – jump if overflow and JNO – jump if no overflow).

This is true of almost all CPU’s. At the hardware/assembly programming level, overflow detection is handled by hardware, and detected by a single instruction. The reason all of your languages listed don’t have any sort of trap/etc. is simple. The language designers did not bother to check the result of their operations. Why, most likely because almost all of them are ultimately implemented in C, and C does not reflect the overflow status back into the C language level.

So the problem isn’t the hardware. It is the software architectural design(s) implemented above the hardware. They are not bothering to communicate to you the results of what the hardware already has available and tells them about."

(Qouting comment from Anon)

dspillett · on June 5, 2014

> Why, most likely because almost all of them are ultimately implemented in C

That isn't right: languages created before and after C exhibit the same behaviour. Some languages do explicitly check the overflow flag every time.

The reason for not always checking (or never checking) is that in a tight loop the extra instruction can significantly affect performance especially back when CPUs had a fraction of their current speediness capability. The reason for not exposing the value to the higher level constructs is similar: you have to check it every time it might change and update an appropriate structure, which is expensive given an overflow should be a rare occurrence so checking every time and saving the result is wasteful.

The linked article specifically mentions the performance effect of checking overflow flags in software. I believe what it is calling for is some form of interrupt that fires when the flag is switched on in a context where a handler for it is enabled - a fairly expensive event happens upon overflow but when all is well there is no difference in performance (no extra instructions run). Of course there would be complications here: how does the CPU keep track of what to call (if anything) in the current situation? Task/thread handling code in the OS would presumably need to be involved in helping maintain this information during context switches.

acqq · on June 5, 2014

The performance penalty you mention ("in a tight loop the extra instruction can significantly affect performance") doesn't happen if you add the new type in the language (that is, what in VC++ is a "SafeInt"). In the really tight loop you wouldn't use that type. That type is important exactly for the things I've given a MSFT's example (calculating how much to allocate -- overflow means you allocated much less and you don't catch that!) So no, you don't have to "check every time."

The reason it's not in standard C is to be portable with some odd old architecture which doesn't have the overflow flag at all. Some modern language can be clearly designed to depend on the overflow flag. The cost would happen only when the programmer really does access it (in a modern language: by using such a type) and the cost would be minimal, as there is a direct hardware support.

> I believe what it is calling for is some form of interrupt that fires when the flag is switched on in a context where a handler for it is enabled

And that is misguided, as it doesn't allow for fine grained control -- it's all or nothing, either all instructions generate "an interrupt" or none. If you want to change the behavior from the variable to variable, changing processor mode would cost. If you add a new instructions for all things that can overflow and trap, you'd add a lot of new instructions. So it's also bad. The simplest approach is: use what's already there. The flag is there in the CPU, it's not used by the languages OP mentions, but once the language supports it for the "safe" integer type, it will be checked only when it's really needed: for that type and nowhere else.

Finally, the maintenance of the exception handling code (what you name under "how does the CPU keep track of what to call") is something that modern compilers and even assembly writers must take care of and is very good understood among them: for example, Winx64 ABI expects every non-leaf function to maintain the stack unwinding information properly and even if I write assembly code I effectively have to support exceptions outside of my code for every non-trivial function I write. So this part is very good known, and the most is taken care of outside of the OS. The OS merely has some expectations, the compiler (in broader sense, that is, including native code generator and linker) writers must fulfill them.

acqq · on June 5, 2014

And it is true, there is already the hardware "overflow" flag, just not visible in "higher" level languages.

jacquesm · on June 5, 2014

Setting a flag is not the same as raising an exception. When you get a segfault that's not because a flag was checked, that's because a chunk of hardware interrupted the next operation.

So yes, you can check for overflow by inspecting a flag, but that's not all that much better from coding around the potential overflow to see if your operands + operation combo will cause an overflow.

Imagine setting a flag on a divide by 0 instead of raising an exception.

So the author is most likely aware of the overflow flag (just like any other old timer that has programmed in assembly and that had a cursory look at the flags register, or in some cases more than a cursory look :) ), he's writing about hardware exceptions, not about how to write in javascript.

But inspecting a flag simply isn't on the same level as raising a hardware exception, right along with segfault, division by 0, FPEs and so on.

That instantly propagates through to any programming language executed on the machine. Checking a flag in a register is a decision by a compiler writer and many compiler writers (consciously!) ignore that (usually because the language specs are vague enough that wrapping is considered acceptable when it is in fact almost always simply wrong).

acqq · on June 5, 2014

No, I don't agree that "the author is most likely aware of the overflow flag" since the author says:

"Processors should support integer math instructions that optionally trap on overflow. Because popular architectures lack this feature, otherwise excellent modern systems programming languages, such as Rust, Go, and D, have default integer types that wrap."

In short, he believes there's no hardware support and therefore Rust, GO and D don't have it.

There is however hardware support in every imaginable CPU -- overflow is important to be able to synthesize adds bigger than the native size (e.g. 128 bit add if your regs are 64-bits) . Except for the implementor of the HLL compiler you as the user wouldn't be able to see how it's implemented in the hardware unless you look at the resulting generated code (you'd see an "exception" triggered no matter how it's implemented).

I know: I had the opportunity to implement the language which has special behavior when the floating point compare involves NaN. Luckily, there is a flag in the Intel CPU for that too and as far as I know you can't access it from any popular high level language, but my code generator injects additional flag check in every FP compare and in the production code I haven't seen any negative performance impacts compared to the case where I don't do the stated injection. That's how good the current branch prediction in CPUs can be.

So no, you don't need "hardware traps" you need somebody to first try to implement the given behavior in any HLL language before before we argue further. As soon as you have "wrappable" and "unwrappable" types in the language, you can have all the crypto primitives that need wrapping use the "wrappables" without any performance impact.

In fact "unwrappables" were interesting mostly for the calculations of the limits, practically wherever you'd use, for example, Microsoft's SafeInt classes.

The example from their header:

       void* AllocateMemForStructs(int StructSize, int HowMany) {
          SafeInt<unsigned long> s(StructSize);
          s *= HowMany;
          return malloc(s);
       }

At least using MSFT's Visual C++ you can already have exceptions on integer overflows.

jacquesm · on June 5, 2014

> "Processors should support integer math instructions that optionally trap on overflow. Because popular architectures lack this feature, otherwise excellent modern systems programming languages, such as Rust, Go, and D, have default integer types that wrap."

The operative word there is trap.

There is no hardware support for traps on overflow, period.

You can detect overflow in software and then you can generate an exception (or deal with it in some other fashion) like you describe, this is not the same as a trap.

Feel free to stick to your definition if you want to but the accepted one is that a trap operates much like an interrupt would.

http://en.wikipedia.org/wiki/Trap_%28computing%29

acqq · on June 5, 2014

Of course I know the semantics of the "trap," the topic here is that the very "trap" is not necessary to have a reasonable high level language support for overflow checking, as there is the hardware flag already present and simply not used (according to the OP, in the sense of "not having the types that can produce an exception on overflow") in the languages mentioned by OP ("Rust, Go, and D").

I also gave an example of the language (Visual C++) which already implements exceptions (the application programmers sees only exceptions, not traps) on integer overflows, provided the programmer specifies that he wants such a type (declares them as SafeInt).

jacquesm · on June 5, 2014

I think you're missing the point entirely, which is that if we had such a trap then integer overflow situations would be dealt with automatically rather than that they would rely on support by the individual languages.

Now whether or not the balance of burden (hardware/software) would favour a hardware solution or not is a different matter entirely. But I completely get what the author is trying to achieve and I'm well aware of the various bits in the flag registers of a whole pile of processors. I see having such a trap as a distinct advantage, just like I see division by 0 and floating point exceptions as advantageous.

The fact that you could do this in software right now has no bearing on his argument, that's a choice by the implementors of the various languages, which are usually built for speed rather than safety and where unchecked integer overlows are the norm. Retro-fitting a trap mechanism in hardware would likely turn up a whole pile of bugs in systems that we currently consider to be solid.

acqq · on June 5, 2014

Well I have the news for you (re "I see division by 0 and floating point exceptions as advantageous"): when you use floating point in any current language now the division with 0 would not trigger any "exception" and it's certainly not supported by any language directly. CPU traps for the FPU are disabled by default by the language std libs.

Even worse, Java's model doesn't allow anybody to reach to the hardware.

The author of IEEE 754, prof. Kahan lamented exactly that: that the languages of today effectively can't use a lot of the features he designed (in 2004, http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf Page 20: "Java lacks these flags and cannot conform to IEEE 754 without them.").

That was the FP domain, but is instructive. The traps are relatively "all or nothing" thing. To have a language where some integer variables behave one way and some another way (you know, the types are important now), you already have the hardware flags and they simply aren't used in the languages OP mentions.

Just changing the compiler behavior would be absolutely enough to "turn up a whole pile of bugs in systems that we currently consider to be solid" that you suggest. You certainly don't need to wait for some new CPUs.

So we return to what Anon said: "Those who do not understand the hardware are doomed to blog about it, poorly." If somebody wants to have the language types that trigger exceptions on overflows, he doesn't have to wait for the new processors. It's possible now, if there's interest in it. And even as the real "traps" as the very specific CPU feature for FP already exist, no language I know of uses them for types.

forrestthewoods · on June 4, 2014

I'm gonna shill for a moment and link a post I wrote on integer overflows very recently. http://forrestthewoods.com/perfect-prevention-of-int-overflo...

TLDR: Not accidentally performing an undefined operation is really really hard.

pascal_cuoq · on June 4, 2014

Hello,

the comment system on your blog eats comments, that become lost forever when submitted. You might want to know that.

Apart from that:

1) “2147483650f” is Java syntax. Also this number, written in C++ as 2147483650.0f, actually represents the number 2147483648 (assuming float is the single-precision IEEE 754 format). You might want to denote that number 2147483648.0f, which would be less confusing.

2) the line “const int min = 0x80000000;” does NOT contain undefined behavior. The overflows occurs during a conversion from an integer type to a signed integer type. Overflow during such a conversion is implementation-defined (or an implementation-defined signal is raised). Even an implementation-defined signal is not undefined behavior, but in practice, the compiler you use produces wrap-around behavior, and it will continue to do so, because it has been forced to document it.

forrestthewoods · on June 4, 2014

What login did you use for making the comments? It just uses Disqus which is pretty common these days. I was able to successfully make a comment with a Twitter login. Recent posts have had surprisingly few comments but not zero. Strange...

The 2147483648 vs 2147483650 is actually kind of interesting. Visual Studio watch window prints floats with one less digit of precision they they need as it actually displays (from memory) 2.14748365e9 when it should display as 2.147483648e9. Thanks for the catch.

I'll have to check spec docs to if the unsigned to signed overflow is undefined or implementation defined. I think you're correct but I need to verify.

angersock · on June 4, 2014

One of the surprisingly annoying minor things we implemented for some safe code were C routines for "safe arithmetic"--explicitly and carefully catching overflows due to multiplication and addition, for example.

This code proved invaluable when writing binary parsers designed to support "unsafe" mesh data coming off of the network--the file might be garbage, but we could at least safely parse it.

I'd go so far to say that if you are dealing with a binary format--especially a sane one which has length headers and chunks for rapid seeking--and you aren't using something similar, you are doing it wrong.

EDIT:

For the curious, you may find some of these bit-twiddling hacks to be of some use.

http://graphics.stanford.edu/~seander/bithacks.html#IntegerL...

TorKlingberg · on June 4, 2014

Regehr, where this post is from, recently had an article about how to do safe arithmetic in C.

http://blog.regehr.org/archives/1139

As you say, it is surprisingly different to it get right, especially if you want good performance. A common mistake is checking for signed integer overflow after the fact. It is too late when undefined behavior has already happened.

userbinator · on June 5, 2014

> It is too late when undefined behavior has already happened.

This seems more like a theoretical concern, as just about all the hardware out there is 2's complement and signed overflow wraps around in the usual way - because that's the simplest, most straightforward way to implement it. (This also means I think compilers that exploit this "loophole" in the language are seriously violating the principle of least surprise.) If you're working on the few exceptions to this, then integer overflow is probably going to be the least of your worries...

zvrba · on June 4, 2014

> One of the surprisingly annoying minor things we implemented for some safe code were C routines for "safe arithmetic"

I'm surprised that something that necessary and tricky to implement still hasn't been considered for the C standard library.

JoshTriplett · on June 4, 2014

Personally, I'd like to see more programming done in languages that simply don't allow integer overflow in the first place. Most current languages have arbitrary-precision integers; well-implemented arbitrary-precision integers are quite efficient when they fit in a machine word, and as efficient as possible when larger. Sure, you'll lose a bit of performance due to checks for overflow, but those checks need to exist anyway, in which case it seems preferable to Just Work rather than failing.

jjoonathan · on June 4, 2014

Checks are the smallest cost of bignums. Dynamic allocation isn't free, it's terribly expensive. Dereferencing pointers isn't free, it's terribly expensive. It's one thing for a scripting language (where people expect poor, inconsistent performance) to have automatic bignums but it's quite another for a language in which people will be writing performant code to have automatic bignums. They introduce a thousand difficult-to-debug edge cases that slow your code to 1/3 or 1/5 (or 1/100th or 1/1000th) speed. I don't know about you, but I don't consider that "Just Working."

Don't get me wrong, they could be a useful feature, perhaps even one which should be enabled by default in Swift (I'd argue against them, but I wouldn't be entirely unsympathetic to their proponents). However, they aren't for everyone and they don't come cheap.

metaphorm · on June 4, 2014

context matters. most arithmetic operations are not performed in hot loops or are not performed on large vectors. I think its better to have a default where correctness and error prevention are gained at the cost of speed. When the speed is needed an explicit choice to use an unsafe but faster performing number type would be the prerogative of the programmer. Its important to have both options available, because context matters.

mikeash · on June 4, 2014

Isn't the alternative introducing a thousand difficult-to-debug edge cases that cause your code to produce incorrect output, or crash? I'd rather the default setup be "can be slow sometimes". If you're writing performance-sensitive code where you want to control the overflow behavior, make that an option, but not the default.

In the context of Swift, it would make a lot of sense to me for Int to be a bignum with no built-in limits on range. The common case of looping over a small set of values, computing indexes for UI elements, etc. will fall within the range of machine native integers and remain fast. If you blow out the limits, your code will still work. If you want control/speed, use one of the provided integer types with an explicit width.

jjoonathan · on June 4, 2014

First of all, I'm all for overflow traps and I'm quite happy with bignums in my scripting languages. I only claim:

1. There is a large class of languages where they do not make sense as a default.

2. The most sensible approach is to fastlane the all-int native case, but this comes at the cost of a preciptitous drop in speed the moment even a single bignum happens.

> Isn't the alternative introducing a thousand difficult-to-debug edge cases that cause your code to produce incorrect output, or crash?

Unintentional bignums are usually bugs, so incorrect output / crash is usually the best one could hope for anyway. People understand that programs have bugs and crash, but the combination of slowness with incorrect output is a special recipe for infuriated users (I waited 30 minutes for the file to load and then it crashed?!?!).

mikeash · on June 4, 2014

You sound like you have more experience with this than I, so take the following with that in mind....

My experience with this has mainly been in languages like Python, where bignum performance is way down on the list of worries, and C, where you wrap if you're lucky and end up with exploitable security vulnerabilities if you're not. I don't believe I've seen a language at work which traps overflows.

My experience with C has been that overflows (which would be unintentional bignums in another environment) are almost always one of:

1. Underflow of unsigned values used for sizes, especially for memory allocation, which causes a request for vast quantities of memory.

2. Computations on large quantities without thinking about how large they can be. File sizes are a really common example of this, especially since they fit into 32 bits for so long.

3. Reading textual data containing integers that either doesn't specify how large they can be, or specifies it but the specification got ignored. A fun example of this was the Twitpocalypse, when Twitter collected enough tweets for the tweet IDs to overflow signed 32-bit integers and broke a ton of clients.

#1 is an error, and crashing would be fine. That's usually what happens in C anyway, just in a somewhat more confusing manner because you get an error trying to allocate 4 billion bytes of memory rather than the more obvious "can't put -1 in here" that actually happened.

#2 is pretty much the poster child for automatic bignums. The downside being potential slowness, but most aren't performance sensitive.

#3 is an interesting case. I'm not sure that using a language "integer" type is even the right way to represent an ID from a remote service, even if that ID happens to be an integer. But automatic bignums would have prevented this problem, even if the design wasn't necessarily right to begin with.

As far as I know (and I realize it's probably hard to tell), I haven't seen bugs where automatic bignums would produce the wrong answer. They can be bugs (e.g. #1 above, possibly #3) but they don't involve incorrect output.

Anyway, to me it just makes sense for the default behavior to be correctness. When you write "a + b" without taking special measures, the result should be the right answer in a mathematical context. If you want bounds checking, that should be made explicit. (A way to create configurable integer types with arbitrary bounds could be nice.) If you want wrapping, that should be made explicit too. And if you want performance at the potential expense of correctness at the edges (presumably because you either know you won't hit the edges, or they behave the way you want them to) then that ought to be opt-in, not opt-out.

JoshTriplett · on June 4, 2014

> Dynamic allocation isn't free, it's terribly expensive. Dereferencing pointers isn't free, it's terribly expensive.

So don't do either of those things until your arithmetic overflows the size of a word; until then, you can keep numbers in a register.

xxs · on June 4, 2014

That's not easy either as it'd require very heavy inlining by the compiler. Functions that accept just 'integer' have to check if the value is a native number or actual reference and process differently. C/C++,Java* ,C# have it easier there - when you pass 'int'/'long' the receiver knows it's a native number.

* Fixnums support (headless objects) needed for non-Java lanaguages on JVM is still unimplemented to my knowledge[0]

[0] http://bugs.java.com/bugdatabase/view_bug.do?bug_id=6674617

JoshTriplett · on June 5, 2014

> That's not easy either as it'd require very heavy inlining by the compiler.

Right, bignums need to be a language feature, not a library feature. Compilers with native bignum support often have ways of handling native unboxed single-register numbers, and then branching to full bignum routines when needed. Haskell can do that, for instance.

Also, you can use the standard trick of decreasing the maximum single-register size and using the extra bits to identify indirect objects.

xxs · on June 5, 2014

>>Also, you can use the standard trick of decreasing the maximum single-register size and using the extra bits to identify indirect objects.

That's given, you still pay the price (like mask/shift), though. That was the part the "languages like C/C#/Java have it easier" about. The point is mostly that even if built-in support (interrupts) exist in the hardware, bignums still need quite a lot of extra code in-place, plus a good optimizing/inilining compiler.

Personally I am happy with constrained integer types - else the entire stack (incl. storage) must support bignums and often (like almost always) going out of range would be a bug actually.

xxs · on June 4, 2014

I am temped to do Bill Gates here and say: 64bits should be enough for everyone (in the common case). If not - using SSE like maths it's an option too. Using bignums by default would make everyone suffer - they do come with a price and the hardware can't support such a bizarre beast naturally.

rlpb · on June 4, 2014

> Sure, you'll lose a bit of performance due to checks for overflow...

Which is why the author wants support for integer overflow traps. He even mentions Python, which does what you describe, and suggests that other languages don't do this precisely because of the performance hit for which he's proposing a solution.

AnimalMuppet · on June 4, 2014

You seem to be defining "current languages" to mean "languages that have recently originated" rather than "languages that are currently used". But the languages that are heavily used currently - C, C++, C#, and Java - while they have bignum packages available, the integers within the languages themselves are not arbitrary-precision.

taeric · on June 4, 2014

It isn't just recent languages that have this. One of the confusions of many folks starting to use most lisps is actually that they typically have very solid number libraries. Such that it is sometimes confusing to see 1/3 as the number you are holding, instead of an approximation.

jlarocco · on June 4, 2014

x86 processors already have overflow and carry bits in their flags register to tell when overflow has occurred.

It makes more sense to me to have compiler writers check the flags if they care about overflow, and avoid the slow down if they don't.

danbruc · on June 4, 2014

No, doing it in hardware makes more sense. If you expect overflows you disable overflow checks in the hardware and everything is as before. If you don't expect overflows you enable overflow checks in the hardware and if you have a bug you get the exception. If you do it in software you have to execute additional instruction every time but they will do nothing useful if your program is correct.

Someone · on June 4, 2014

I found it weird that nobody mentioned it, but was this posted because Apple's new language Swift has integer overflow detection (with operators &+, &-, etc. operators for doing conventional C-style modulo integer math)?

noblethrasher · on June 4, 2014

It was posted to reddit six days ago[1], four days before Swift.

[1] http://www.reddit.com/r/programming/comments/26s8iu/we_need_...

paulannesley · on June 4, 2014

> In a unicode language we might use ⊞, ⊟, etc.

Nope.

zvrba · on June 4, 2014

Or you could have a "sticky" overflow flag, so that you could check for overflow after each complete expression instead of after every single math operation. This needs one new flag and one new conditional instruction. (Plus compiler modifications.)

stormbrew · on June 4, 2014

I don't know if this solves all the associated problems. If part of the chain of operations isn't side-effect-free, eg. function call, you'll still have to check more than once for the whole chain. And if you want to do more than throw an error/exception you may need to re-run the operations on a slow path to get the correct result.

But I like it a lot even so. For a general case it seems like it would help.

xxs · on June 4, 2014

The branch is to be predicted (close to perfectly) by the hardware so the branch cost would be like an extra cycle.

pbsd · on June 4, 2014

x86-64 could repurpose the Adjust flag (bit 4 of EFLAGS) for this. It is not longer used, since BCD instructions were deprecated in the x86->x86-64 transition. There is also precedent in the last few years for new instructions that change flag semantics: ADCX and ADOX.

stormbrew · on June 4, 2014

An interesting tangent to this, in terms of trapping math operations, is that Swift added an operator for a non-trapping div-by-zero operation: &/

I'm not sure I've seen that in any other languages before.

Sirius17 · on June 5, 2014

I've worked with many CPUs (i86, Z80, 6502, ...) and all had some kind of overflow / underflow indicator, normally in the status register.

Gracana · on June 4, 2014

ARMv5 and beyond have saturating arithmetic instructions, which can be pretty handy. It won't cause an exception, but it is at least one way to improve the situation.

and a sticky saturation flag in the status register

stephencanon · on June 5, 2014

Deprecated (and moved to NEON) in aarch64, FWIW.

Gracana · on June 5, 2014

Huh, okay, I guess that's sensible. Thanks for pointing that out, I haven't been paying much attention to the latest in ARM, so I hadn't heard that.

shasta · on June 4, 2014

In case anyone has the same difficulty reading this as I did, "integer types that wrap" is supposed to refer to arithmetic modulo 2^n. I thought at first he was talking about wrapped integer types.

maggit · on June 4, 2014

> wrapped integer types

A google search for "wrapped integer types" gives me the Wikipedia article on Integer Overflow as its first result. The other hits seem to be related.

What else do you mean by "wrapped integer types" if not arithmetic modulo 2^n?

simcop2387 · on June 4, 2014

I think he probably means boxed integer types: http://msdn.microsoft.com/en-us/library/yz2be5wk.aspx

shasta · on June 4, 2014

A google search for "striped horse" gives me the Wikipedia article for Zebra as its first result. Similarly a google search for "wrapped integer types" takes you to a Wikipedia page for "integer overflow" that never uses the phrase "wrapped integer."

simcop2387 is correct that I assumed he was talking about boxing at first.

robryk · on June 4, 2014

I wonder how often it'd be possible to deduce that an overflow is not possible from the context. (For example, if we've just checked that INT_MAX/a > b before doing a*b.)

DasIch · on June 4, 2014

Division is a lot more expensive than the multiplication itself, checking the overflow flag after the multiplication - which OP critizes for being too slow - is going to be much faster than your check.

stormbrew · on June 4, 2014

That's probably more expensive than a jno instruction after the possibly overflowing operation.

perlgeek · on June 4, 2014

My assembler is very rusty, couldn't compilers check the carry flag on i386 after a math operation that could potentially overflow, and handle the trapping in software?

danbruc · on June 4, 2014

Of course and this is what actually happens if you enable overflow checking but it comes at a price - if your code is correct you will never need the checks but you will execute them every time.

throwawayaway · on June 4, 2014

here's the best guide i could find on enabling those checks in gcc. it is unclear whether they take advantage of the carry bits, do you have a source for that information?

http://www.pixelbeat.org/programming/gcc/integer_overflow.ht...

danbruc · on June 4, 2014

You can see the implementation in GCC here [1], for example addition starting on line 74. What machine code gets emitted for that obviously depends on the target architecture but it is reasonable to assume that the optimizer recognizes these patterns and uses hardware flags where available.

[1] https://github.com/mirrors/gcc/blob/master/libgcc/libgcc2.c

throwawayaway · on June 5, 2014

http://www.emulators.com/docs/LazyOverflowDetect_Final.pdf This paper seems to imply that that high level languages can't access the carry bit, yet C#'s checked mechanism uses it.

I've been using -ftrapv with gcc for a long time, from here: https://gcc.gnu.org/onlinedocs/gcc/Code-Gen-Options.

From the code you posted, and the assembly[1] emitted in the pdf above I can't see the optimiser taking advantage of the carry bit. No mention of CF, OF:

[1]

lea ebx, DWORD PTR [edi+eax]

cmp ebx, edi

jae SHORT $LN4@unsigned_c

cmp ebx, eax

jae SHORT $LN4@unsigned_c

mov ecx, 1

jmp SHORT $LN5@unsigned_c

xor ecx, ecx

The gcc flag ftapv only works for signed integers, not clear what the carry bit behaviour is for unsigned integers.

throwawayaway · on June 5, 2014

looking over it again, it seems like quite a leap for the optimiser to do that, i have my doubts.

enupten · on June 5, 2014

Didn't lisp machines have Hardware traps of some kind ?

ScottBurson · on June 5, 2014

Lots of them. But the CPU wasn't even pipelined, never mind superscalar, so adding traps was easy -- just more microcode.

For example, there was tag type DTP-GC-FORWARD. When the CPU loaded a word from memory with this value in the tag field, it would automatically indirect through the pointer contained in the word.