Hacker News new | past | comments | ask | show | jobs | submit login

I'm trying to imagine why wear leveling would depend on the chips. The number of writes you can get out of them would, but the optimal wear leveling algorithm should be to spread the writes around as much as possible regardless of that, shouldn't it?

Likewise, some chips could require more error correction because they expect more errors. But the amount of error correction is a tunable in the algorithm; it's a space vs. resilience trade off. Where to set the dial shouldn't be hard to derive from the manufacturer's specifications, or empirical testing if the spec is worthless.




Wear leveling: Just search Google Scholar for "wear leveling" to get an idea for how deep that rabbit hole goes. At minimum think about the difference between placing an infrequently written logical sector vs a frequently written one, and what the optimal strategy would be given physical blocks with varying write life left on them. Then how do you partition between regions you treat as SLC vs MLC? And on and on and on.

Error correction: Not just amount. What block size? Interleaving? How do you trade overhead due to more ECC vs overhead due to more spare blocks (have to decide how many to allocate up front)? What is the behavior vs temperature and what are you running the thing at?

There has been an ongoing push to standardize NAND interfaces but the controller still needs to know a fair amount about the specific chips it's talking to. I don't know how it will resolve but hopefully the integration becomes more like DRAM where things are a bit more consistent. Then again, if you care a lot about reliability you qualify specific DRAM modules too...


> At minimum think about the difference between placing an infrequently written logical sector vs a frequently written one, and what the optimal strategy would be given physical blocks with varying write life left on them.

That's not a simple answer but why would it depend on whose chips you have?

> What is the behavior vs temperature and what are you running the thing at?

This is where you might start seeing differences between manufacturers. But this is also optimization. If you don't have this information the chip should still meet a minimum spec and you can use a conservative value. If you do know it you can do something more efficient. And if this becomes popular the chip makers will start publishing what's necessary to do the optimization.


> That's not a simple answer but why would it depend on whose chips you have?

How do you estimate how much life is left on any given part of the physical flash?

This is one of those areas where neglecting the optimizations qualitatively changes the result. It's like Twitter without scalability, sure you can build it in a weekend but what problem does that solve?

Don't get me wrong, I would like to see all of this stuff get better standardized, documented, and open source, but for the most part this is a cost-sensitive commodity and if you want to roll your own SSD qty 1 from bare chips it's just going to be orders of magnitude more expensive than letting SanDisk or whoever do it. Maybe it will be possible to get the hyperscalers and cloud operators to push for the kind of standardization you describe, it would benefit them.


> How do you estimate how much life is left on any given part of the physical flash?

In proportion to how many times it's been rewritten already, or by checking the error rate for what's already written there.

> This is one of those areas where neglecting the optimizations qualitatively changes the result.

It mostly changes how much error correction you have to use. But the nature of it also allows other optimizations.

Suppose you're going to use multi-drive array. Now you could stripe the error correction across devices and use the same system to recover from device failures, which at the same level of total error correction is more resilient.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: