Consumer SSDs don't have much room to offer a different abstraction from emulati...

sitkack · on Feb 21, 2022

Check out https://www.snia.org/ if you want to track development in this area.

namibj · on Feb 21, 2022

There is no strong reason why a consumer SSD can't allow reformatting to a smaller normal namespace and a separate zoned namespace. Zone-aware CoW file systems allow efficiently combining FS-level compaction/space-reclamation with NAND-level rewrites/write-leveling.

I'd probably pay for "unlocking" ZNS on my Samsung 980 Pro, if just to reduce the write amplification.

wtallis · on Feb 22, 2022

Enabling those features on the drive side is little more than changing some #ifdef statements in the firmware, since the same controllers are used in high-end consumer drives and low-power data center drives. But that doesn't begin to address the changes necessary to make those features actually usable to a non-trivial number of customers, such as anyone running Windows.

vladvasiliu · on Feb 22, 2022

Isn't this a chicken and egg problem? Why would OS vendors spend time implementing this on their side if the drives don't support it?

The difference here being that it's not clear to me that there's much cost on the drive side to actually allow this. Aside maybe for the will to segment the market.

To me, this looks like the whole sector size situation. OSs, including regular Windows, have supported 4K drives for quite a while now. I bought a Samsung 980 (non-pro) the other day that still pretends to have 512B sectors. The OEM drive in my laptop (some kind of Samsung) can be formatted with a 4k namespace, but the default is also 512B. The 980 doesn't even support this.

wtallis · on Feb 22, 2022

It's not quite a chicken and egg problem. Features like ZNS come into existence in the first place because they are desired by the hyperscale cloud providers who control their entire stack and are willing to sacrifice compatibility for small efficiency gains that matter at large scale.

The problem for the rest of the market is that the man-hours to rewrite your software stack to work with a different storage interface that allows eg. a 2% reduction in write amplification isn't worthwhile if you only have a fleet of a few thousand drives to worry about. There's minimal trickling down because the benefits are small and the compatibility costs are non-zero.

Even simple stuff like switching to shipping drives with a 4kB LBA size by default has very little performance impact (since drives are tracking things with 4kB granularity either way) and would be problematic for customers that want to apply a 512B disk image. The downsides of switching are small enough that they could easily be tolerated for the sake of a significant improvement, but for most of the market the potential improvement is truly insignificant. (And of course, fancy features are a market segmentation tool for drive vendors.)

rbanffy · on Feb 22, 2022

> Why would OS vendors spend time implementing this on their side if the drives don't support it?

In the case of Microsoft, forcing the adoption of a de-facto standard (and refusing to support competing ones OOTB) they create is immensely beneficial in terms of licensing fees.

zrm · on Feb 22, 2022

> Consumer SSDs don't have much room to offer a different abstraction from emulating the semantics of hard drives and older technology.

From what I understand the abstraction works a lot like virtual memory. The drive shows up as a virtual address space pretending to be a a disk drive and then the drive's firmware maps virtual addresses to physical ones.

That doesn't seem at all incompatible with exposing the mappings to the OS through newer APIs so the OS can inspect or change the mappings instead of having the firmware do it.

wtallis · on Feb 22, 2022

The current standard block storage abstraction presented by SSDs is a logical address space of either 512-byte or 4kB blocks (but pretty much always 4kB behind the scenes). Allocation is implicit upon writing to a block, and deallocation is explicit but optional. This model is indeed a good match for how virtual memory is handled, especially on systems with 4kB pages; there are already NVMe commands analogous to eg. madvise().

The problem is that it's not a good match for how flash memory actually works, especially with regards to the extreme disparity between a NAND page write and a NAND erase block. Giving the OS an interface to query which blocks the SSD considers as live/allocated rather than deallocated and implicitly zero doesn't seem all that useful. Giving the OS an interface to manipulate the SSD's logical to physical mappings (while retaining the rest of the abstraction's features) would be rather impractical, as both the SSD and the host system would have to care about implementation details like wear leveling.

Going beyond the current HDD-like abstraction augmented with optional hints to an abstraction that is actually more efficient and a better match for the fundamental characteristics of NAND flash memory requires moving away from a RAM/VM-like model and toward something that imposes extra constraints that the host software must obey (eg. append-only zones). Those constraints are what breaks compatibility with existing software.

_0w8t · on Feb 22, 2022

If anything consumer-level SSDs move to the opposite direction. On Samsung 980 Pro it is not even possible to change the sector size from 512 bytes to 4K.