Hacker News new | past | comments | ask | show | jobs | submit login

> CPU technology is quite arcane, very high level, there are so many patents, IP money and a lot of secrecy involved, since CPU tech is quite a strategic one for geopolitical power. Do you work as an engineer at intel, ARM, AMD? On chip design?

If such a mechanism existing it would be documented at at least a high level and its effects observable under controlled tests. Neither are, in contrast to the power and temperature envelopes I mentioned. There is no actual evidence that aged chips operating with the same clockrate perform computation slower, your subjective experience that hardware 'slows down' does not count.

> It's not about failing, it's about error detection. Redundancy is a form of error detection. If several gates disagree on a result, they have to start again what they worked on. That's one simple form of error detection.

> CPU never really fail, they just slow down because gates generate more and more errors, requiring recalculation until they finally correct the detected error. An aged chip will just have more and more errors, that will slow it down. Which is the reason why old chip are slower, independently of software.

This is not how consumer CPUs work. It's not even how high-reliability CPUs necessarily work (some work through a high level of redundancy but they don't generally automatically retry operations when a failure happens: that's a great way of getting stuck). Such redundancy is so incredibly expensive from a power and chip area point of view that no CPU vendor would be competetive in the market with a CPU which worked like you describe. If a single gate fails in a CPU, the effects can range from unnoticable to halt-and-catch-fire.

The only error correction which is present is memory based, where errors are more common and ECC can be implemented relatively cheaply compared to error checking computations.




> If such a mechanism existing it would be documented

Why would it? It's an internal functionality, and CPU usually have a 1 year warranty or so, and I'm not sure they really have guaranteed FLOPS, only frequency I guess. If it's tightly coupled to trade secrets, I would not expect this to be documented. I also doubt that you could find everything you want to know in a CPU documentation.

> There is no actual evidence

The wikipedia article I mentioned, physics is enough evidence.

> If a single gate fails in a CPU

I did not say fail, I meant "miscalculated". There is a very low probability of it happening, but it can still happen because of the high quantity of transistors, hence error correction.

> Such redundancy is so incredibly expensive from a power and chip area point of view

Sure it is, so what? At one point all CPU need it and it becomes necessary. There are billions (I think?) of transistors on a CPU.


Documentation is light on details, but both major CPU vendors give extensive documentation on the performance attributes of their processors, such as how many cycles an instruction may take to complete, and none see fit to mention once 'may take an arbitrary amount longer as the CPU ages'. Not to mention, these performance attributes are frequently measured by reasearchers and engineers, and such an effect as instructions taking more cycles on one sample compared to another from the same batch has yet to be observed (and it's notable and noted when it does differ, e.g. from different steppings or microcode versions). At least one of the many many people who investigate this in great detail would have commented on it.

The wikipedia article you linked makes zero mention of redundant gates as a workaround for reliability issues. The only thing close is that designers must consider it, but this is design at the level of the geometry of the chip, not its logic. It doesn't even make good sense as a strategy: the extra cost of redundant logic to work around reliability issues on a smaller node will outweigh the advantages of that node.

One of the greatest things about modern CPUs is how reliably they do work given that you need such a high yield on individual transistors.


Thanks for convincing me!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: