> Maybe we could do all the "smart" stuff in the OS and application code, and ju...

aseipp · on Feb 22, 2022

Flash storage is incredibly complex in the extreme at the low level. The very fact we're talking about microcontroller flash as if it's even the same ballpark as NVMe SSDs in terms of complexity or storage management says a lot on its own about how much people here understand the subject (including me.)

I haven't done research on flash design in almost a decade back when I worked on backup software, and my conclusions back then were basically that: you're just better off buying a reliable drive that can meet your your own reliability/performance characteristics, and making tweaks to your application to match the underlying drive operational behavior (coalesce writes, append as much as you can, take care with multithreading vs HDDs/SSDs, et cetera), and testing the hell out of that with a blessed software stack. So we also did extensive tests on what host filesystems, kernel versions, etc seemed "valid" or "good". It wasn't easy.

The amount of complexity to manage error correction and wear leveling on these devices alone, including auxiliary constraints, probably rivals the entire Linux I/O stack. And it's all incredibly vendor specific in the extreme. An auxiliary case e.g. the case of the OP, of handling power loss and flushing correctly, is vastly easier when you only consider some controller firmware and some capacitors on the drive, versus the whole OS being guaranteed to handle any given state the drive might be in, with adequate backup power, at time of failure, for any vendor and any device class. You'll inevitably conclude the drive is the better place to do this job precisely because it eliminates a massive amount of variables like this.

"Oh, but what about error correction and all that? Wouldn't that be better handled by the OS?" I don't know. What do you think "error correction" means for a flash drive? Every PHY on your computer for almost every moderately high-speed interface has a built in error correction layer to account for introduced channel noise, in theory no different than "error correction" on SSDs in the large, but nobody here is like, "damn, I need every number on the USB PHY controller on my mobo so that I can handle the error correction myself in the host software", because that would be insane for most of the same reasons and nearly impossible to handle for every class of device. Many "Errors" are transients that are expected in normal operation, actually, aside from the extra fact you couldn't do ECC on the host CPU for most high speed interfaces. Good luck doing ECC across 8x NVMe drives when that has to go over the bus to the CPU to get anything done...

You think you want this job. You do not want this job. And we all believe we could handle this job because all the complexity is hidden well enough and oiled by enough blood, sweat, and tears, to meet most reasonable use cases.

colejohnson66 · on Feb 22, 2022

Apple’s SSDs are like that in some systems, and they’ve gotten flack for it.

aseipp · on Feb 22, 2022

No, they look like any normal flash drive actually. Traditionally, for any hard drive you can buy at the store, the storage controller exists on the literal NVMe drive next to the flash chips, mounted on the PCB, and the controller handles all the "meaty stuff", as that's what the OS talks to. The reason for this is obvious: because you can just plug it into an arbitrary computer, and the controller abstracts the differences from the vendors, so any NVMe drive works anywhere. The key takeaway is the storage controller exists "between" the two.

Apple still has a flash storage controller that exists entirely separately from the host CPU, and the host software stack, just like all existing flash drives do today. The difference? The controller just doesn't exist on the literal, physical drive next to the flash chips. Because it doesn't exist; they just solder flash directly on the board without a mount like an M.2 drive. Again, no variability here, so it can all be "hard coded." And the storage controller instead exists by the CPU in the "T2 security chip", which also handles things like in-line encryption on the path from the host to the flash (which is instead normally handled by host software, before being put on the bus). It also does some other stuff.

So it's not a matter of "architecture", really. The architecture of all T2 Macs which feature this design is very close, at a high level, to any existing flash drive. It's just that Apple is able to put the NVMe controller in a different spot, and their "NVMe controller" actually does more than that; it doesn't have to be located on a separate PCB next to the flash chips at all because it's not a general 3rd party drive. It just has to exist "on the path" between the flash chips and the host CPU.