Hacker News new | past | comments | ask | show | jobs | submit login

I remember circa 1999 having a database server which had a stuck bit in memory. The bit happened to be placed in the page cache, so it subtly corrupted disk writes resulting in the database throwing checksum errors. It took an insane amount of time to even diagnose where the problem could be. We of course thought it was the disks themselves and tried many variations of disks and external RAID cards. Finally, one run of memtest86 found the real problem, and I threw away the memory and motherboard and replaced it with one capable of ECC RAM.

I forget now why we even thought to build a server without ECC RAM, but I sure learned my lesson after that.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: