Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I would strongly recommend also testing for when things fail. This is the hardest part in troubleshooting and building a reliable system.

You can make do with READ/WRITE LONG if the disks are SAS to generate medium errors, be aware that it may kill your disk if you do it too much to the same disk.

You can also use relatively inexpensive SAS/SATA link error injectors from Quarch.

Be sure to test hot pulls of the disks to simulate a hard failure.

Simulating other soft failures is much harder, sometimes the disk just stops responding but doesn't go offline so the system finds it harder to detect the case and may reach timeout conditions.



I couldn't agree with you more, in the time that I have set aside to establish the new storage I have around 75% of it dedicated to testing, around half of that is dedicated to failure scenarios. A have a list of 'bad things to do' that I'll be executing and reporting on.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: