Hacker News new | past | comments | ask | show | jobs | submit login
Bug in the Pentium FPU (1994) (trnicely.net)
105 points by shubhamjain on June 2, 2015 | hide | past | favorite | 30 comments



Ah yes, Dr Nicely caused quite a bit of excitement at Intel. I was on the p6 architecture team at the time. (p6 == Pentium Pro) Our FPU was formally verified and didn't have the same bug.

To be nice to Dr Nicely we sent him a pre-release p6 development system to test with his program to demonstrate that his bug was fixed. He was working on a prime number sieve program at the same and came back reporting that the p6 ran at 1/2 the speed of a Pentium for his code. Wow, another blackeye/firestorm caused by Dr. Nicely. He had too much of an audience for him to report to the world this new processor was slower.

So I got to spend a lot of time learning how to sieve works and what is happening. For the most part it allocates a huge array in memory with each byte representing a number. You walk the array with a stride of known primes setting bytes and whatever is left must be prime. ie. every 3 is not prime, every 5 is not prime, every 7 ....

So in the steady state you are writing a single byte to a cache line without reading anything. And every write hits a different cache line.

Now p6 had a write allocate cache, but the Pentium would only allocate on read, so on the Pentium a write that misses the cache would become a write to memory. On the p6 that write would need to load the cache line from memory into the cache and then the line in the cache was modified. And since every line in the cache was also modified we had to flush some other cache line first to make room. So every 1 byte write would become a 32-byte write to memory followed by a 32-byte read from memory.

Normally write allocate is a good thing, but in this case it was a killer. We were stumped.

Then the magic observation: 99% of these writes were marking a space that was already marked. When you get up to walking by large strides most of those were already covered by one of the smaller factors.

So if you changed the code from:

     array[N] = 1
to:

     if (!array[N]) array[N] = 1

Now suddenly we are doing a read first, and after that read we skip the write so the data in the cache doesn't become modified and can be discarded in the future.

Also the p6 was a super-scalar machine that ran multiple iterations of this loop in parallel and could have multiple reads going to memory at the same time. With that small tweak the program got 4X faster and we went from being 1/2X the speed of a Pentium to being twice the speed. And this was at the same clock frequency! The test hardware ran 100Mhz, we released at 200Mhz and went up from there.


You can also reduce re-marking spaces by starting your iteration in the marking process at ii (as all numbers less than ii would have been marked by a previous iteration) instead of 2*i. So instead of, e.g., marking off multiples of 3 starting at 6 (which would have already been marked off from the 2 case), you would start at 9.


Of course, and you only store odd numbers, and you can do all the smaller factors in a single pass. Nicely had already done some of these tweaks and I tried some others, but it didn't change the overall problem:

In the steady state you are still striding a huge array in memory and missing the cache with every access.


One thing I always wondered about this: why was he using the FPU? These all seem like integer operations.


Look here: http://www.mersenne.org/various/math.php at the Lucas-Lehmer test which uses floating point FFT's to square large numbers. I am not sure if that is what is was doing originally, but I suspect it was.

We used that prime95 program from mersenne.org quite a bit in testing because it was very close to our best max-power test for processors. It would keep both the FP and integer ALUs saturated and validated all the results so if anything was wrong it would start complaining.


Explained in the second question of his FAQ: http://www.trnicely.net/pentbug/pentbug.html


And then you get an optimizing compiler and the slow behavior is back.


Or you get an optimizing compiler that's aware of this difference and automatically emits a read before a blind write.


...Which makes things worse for any CPU that doesn't have this bug.


It doesn't sound like a bug, just a legitimate difference in how things are done. In any case, CPU-specific optimizations are hardly rare.


A more detailed explanation of the cause:

http://www.cs.earlham.edu/~dusko/cs63/fdiv.html

5 entries out of the 1066-entry lookup table were wrong. They probably didn't use test vectors that exercised all the entries.

But in general, testing complex ICs is hard. There are analogue effects too - if an instruction happens to make the right set of transistors switch in a certain way, going past the estimated margins, power supply fluctuations and crosstalk could flip a bit or two. Sometimes these bit-flips don't cause any problems since it happens in an unused part of the logic, but sometimes they do. As the enthusiasts who like to overclock have shown, it's easy to get something that looks like it works most of the time, but then completely crashes when executing just the right instructions.


My favorite tale about testing is from AMD effort. They hired ACL experts to verify their FPU. These experts have built a translator from Verilog to ACL and then formally verified resulting ACL code. They found several bugs that slipped 84 million test suite.


This was done by Boyer and Moore (the same names behind the Boyer–Moore string search algorithm) at Computational Logic, Inc. The automated reasoning software that thesz is referring to here is ACL2, and not that other Austin export that goes by the name ACL.


CLI is no more, it all but stopped in 1997. http://computationallogic.com/news/index.html

While you might be referring to Austin City Limits (the television show), you might also be referring to the ACL Festival, which didn't launch until 2002.


When I was in college a couple Intel engineers came as part of a recruiting event and taught my digital design class for a day. They told us about this bug and it blew our minds that they were using a simple (but super fast) lookup table to do math (a big part of the class was all about how to reduce logic). They had a few more gates to play with than the typical chip designs that had influenced this course's curriculum. It was also a little mind blowing that they were talking about this horrible embarrassing bug with such candor with us.

I think we all applied for the jobs they were there pitching us after that :-)


In the end despite the expensive recall, this was a big win for the Intel brand. The early processor years were tumultuous as on the battleground the 486 clones of Cyrix and TI lost. Intel went the Trademark way with the Pentium and the media embraced the incident as something that would sink them.

Although very few people would ever come across the bug, Intel allowed every processor to be exchanged. No matter if you were a gaming consumer or a giant corporation using coprocessor-heavy software.

So I remember a UPS driver coming by my student flat with an exchange processor and picking my faulty unit up a week later or so. It was incredible service that made Intel as a brand very reliable.


> the media embraced the incident as something that would sink them.

It could have given them a damn good kicking in the consumer retail market and this was new(ish) territory for Intel so they pulled out all the stops to make sure thre was no way it could be made to look like they were trying to fob off the end user.

In reality bugs were present in CPUs all the time in the past, errata were published and libraries & compilets & such were adjusted to avoid the problems with no one making a big fuss - look at the Linux source for remnents of these issues in the x86 line of processors (there were a few in 386/384 era chips, some time after FDIV came the "F00F" bug in the pentium line, and so on). Other chip lines are similarly affected: I remember from my youth tinkering with assembly language there being a bug in some 6502's that meant inirect jumps referencing the last byte of a page would not work as expected.

But the FDIV bug came to attention around the time when CPUs were first being marketted directly at end users rather than PC makers ina big way (as the next phase of the battle you mention in the i486 era), with the man on the street suddenly being aware that they might be able to make the value choice between the alternatives. That is in part why the Pentium line got a name instead of just a code/number (80586, i586, ...): Intel found they couldn't stop people using a number directly in their product names which would have made it harder to differentiate their products from the competition (of course the workaround for this that everyone used was to call their alternative chips "pentium class"). Even ignoring that, a name tends to be much easier for marketing to work with, but I digress... The common consumer had different expectations of how flaws were dealt with and Intel couldn't risk trying to plecate people with "this has happened before, your software will be recompiled and everything will be fine, in fact you are likely not to be affected anyway, stay calm, we've go this" because the masses probably wouldn't take that, especially as Intel's competitors would capitalise on the situation in any way they were given time to, so they instead took the route already common in direct consumer markets: the face saving recall and free replace.


For some "fun" reading, here are the 149 public errata of Haswell CPUs: http://www.intel.com/content/dam/www/public/us/en/documents/...


> you are likely not to be affected anyway

True, but the real problem was there was no way to tell if your results were corrupted or not. This made the chip unusable if you had to rely on the results.


There is a story I heard about this, but I can't verify if it is true; though it does come from a professor of the field who could have contacts there at the time.

Internally Intel knew about the bug before release but they dismissed it as minor, despite warnings from a certain engineer. Once the bug got out, Intels' top executives had a meeting on how to avert the disaster. After the meeting where difficult decisions were made (the cost of replacement and marketing to regain the trust of the consumers was huge), this certain engineer went to the CEO and said “I told you”. The CEO answered “You are fired”.

This engineer then went to AMD where he had a key role to AMD K6 and Athlon lines which gave Intel much trouble with their performance.

Maybe the story is fake, but it sounds interesting. :)


That scores pretty much a 10/10 on the urban legend scale.


> Although very few people would ever come across the bug, Intel allowed every processor to be exchanged. No matter if you were a gaming consumer or a giant corporation using coprocessor-heavy software.

Initially they only wanted to exchange CPUs if this bug was likely to significantly affect you. I had to prepare a letter describing our use of AutoCAD to get a replacement CPU ordered. As the media seized on the bug and it became more widely known they revised the policy to update everyone.


PKZIP 2.04g, now there's a version number that fires a long-dormant set of neurons.


This, people, is how you write a bug report. Very detailed, he even told them which version of pkzip he used. Very nice of him.


I grew up in Lynchburg and was in high school at this time. It was by far one of the most exciting reasons my town ever hit national headlines. :) Thanks for sharing this -- it was nice to relive the memories.


Heh, thought these comments sounded familiar

https://news.ycombinator.com/item?id=1742088


hehe -- history repeats itself! :)


Well, huh. "bc" is still that bad:

  % bc -l
  bc 1.06
  Copyright 1991-1994, 1997, 1998, 2000 Free Software Foundation, Inc.
  This is free software with ABSOLUTELY NO WARRANTY.
  For details type `warranty'.
  (824633702441.0)*(1/824633702441.0)
  .99999999224129613242
  (824633702441.0)*(1/824633702441.0)-0.999999996274709702
  -.00000000403341356958
  ((824633702441.0)*(1/824633702441.0))/0.999999996274709702
  .99999999596658641539
Python on the same machine is not:

  % python
  Python 2.7.10 (default, May 26 2015, 13:01:57)
  [GCC 4.2.1 Compatible Apple LLVM 6.1.0 (clang-602.0.53)] on darwin
  Type "help", "copyright", "credits" or "license" for more information.
  >>> (824633702441.0)*(1/824633702441.0)
  0.9999999999999999


In bc, write

    scale=1000
(or however many digits you want) first, and you'll get better results.


"The bug has been observed on all Pentiums I have tested or had tested to date, including a Dell P90, a Gateway P90, a Micron P60, an Insight P60, and a Packard-Bell P60."

This made me laugh... we used to call them Packard-Smells.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: