Hacker News new | past | comments | ask | show | jobs | submit login

My thoughts exactly. The article is quite inflammatory and tosses out some bold statements without really deep diving into them. My favorite:

"Finally, the unpredictable latency of SSD-based arrays - often called all-flash arrays - is gaining mind share. The problem: if there are too many writes for an SSD to keep up with, reads have to wait for writes to complete - which can be many milliseconds. Reads taking as long as writes? That's not the performance customers think they are buying."

This is completely false in a properly designed server system. Use the deadline scheduler with SSD's so that reads aren't starved from bulk I/O operations. This is fairly common knowledge. Also, if you're throwing too much I/O load at any storage system, things are going to slow down. This should not be a surprise. SSD's are sorta magical (Artur), but they're not pure magic. They can't fix everything.

While Facebook started out with Fusion-io, they very quickly transitioned to their own home-designed and home-grown flash storage. I'd be wary of using any of their facts or findings and applying them to all flash storage. In short, these things could just be Facebook problems because they decided to go build their own.

He also talks about the "unpredictability of all flash arrays" like the fault is 100% due to the flash. In my experience, it's usually the RAID/proprietary controller doing something unpredictable and wonky. Sometimes the drive and controller do something dumb in concert, but it's usually the controller.

EDIT: It was 2-3 years ago that flash controller designers started to focus on uniform latency and performance rather than concentrating on peak performance. You can see this in the maturation of I/O latency graphs from the various Anandtech reviews.




There is unpredictablilty in SSDs however, its most like whether an IOP will take 1 ns or 1 ms, instead of 10 ms, or 100 ms with an HD.

The variability is an order of magnitude greater but the worst case is an is several orders of magnitude better. Quite simply no one cares that you might get 10,000 IOPS or 200,000 IOPS from an SSD when all you're going to get from a 15K drive is 500 IOPS


Best-case for a SSD is more like 10µs, and the worst-case is still tens of milliseconds. Average case and 90th percentile are the kind of measures responsible for the most important improvements.

And the difference between a fast SSD and a slow SSD is pretty big: for the same workload a fast PCIe SSD can show an average latency of 208µs with 846µs standard deviation, while a low-end SATA drive shows average latency of 1782µs and standard deviation of 4155µs (both are recent consumer drives).


Where does one find 10us reads? The NAND is usually with a Tread of 50 to 100 us so just the NAND operation itself is more than 10us.

Tprog is around 1ms and Terase can be upwards of 2ms.

All in all this means a large variability in read performance depending on what other actions are done on the SSD and how well the SSD manages the writes and erase operations in the background.

This doesn't even change with the interface (SAS/SATA/PCIe), those add their own queues, link errors and thus variability.

Then you have the differences in over provisioning that allow high OP drives to mask out better the programming and erase processes.


One can see tens of ms from an SSD if you run it long enough and hard enough. It is also possible to get hundreds of ms and even seconds if the SSD is getting bad.

It's true that 99% of your IOs will see service time of below 1ms but it's the other 1% that really matters to avoid getting late night calls or even a mid-day crisis.


deadline seems less useful now that a bunch of edge cases were fixed in cfq (the graphs used to look way worse than this): http://blog.pgaddict.com/posts/postgresql-io-schedulers-cfq-...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: