Hacker News new | past | comments | ask | show | jobs | submit login

It's wild to me that we gave up hardware error correction on memory at the same time we increased memory sizes about 1000x, shrinking the die (and thus reliability) by a roughly similar amount.



This is true, but even today bit flips per GB/hour are still really low.

However failures in the memory chip -> chip pin -> dimm -> dimm connector -> motherboard -> CPU socket -> CPU pin -> CPU are pretty common. Sure ECC helps with random bitflips, but it's also very useful to diagnose something is broken in the CPU <-> memory chip pipeline. It's very frustrating to debug something when the main sign of the problem is a reboot.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: