>The thing that first got to me was what really made me see the MP3 + ID3 file i...

rasz · on June 7, 2022

Updating 32-bits long rewrites 2-5MB on a flash memory devices, genius specification!

DaiPlusPlus · on June 8, 2022

...does that mean ye-olde OLTP SQL leads to accelerated storage aging more than other use-cases? So doing an UPDATE to set a single `int`-column in a single row is far more expensive (in terms of wear-and-tear) than we assumed?

...could a DoS attack on any SSD-based cloud database/storage provider be done simply by doing lots of small writes?

------------

If this really is true, it makes me wonder why Intel's 3D XPoint (y'know: Optane SSD) lost its steam - it's byte-addressable (and so I assume it's also byte-writeable... right?), so surely would be perfect for OLTP scenarios like this?

At the same time, I'm not aware of any actual modern RDMBS that reads and writes anything less than a 4KB page at a time, regardless of underlying storage media, simply because 4096 bytes is such a common allocation-unit everywhere you look: Linux, Windows, HDDs, SSDs, NTFS, extfs, even btrfs defaults to 4KB for file extents.

Can anyone with experience with the internals of "serious" RDBMS (i.e. MySQL, PostgreSQL, Sybase, MSSQL, Oracle, etc) explain exactly what happens at a low-level on-disk or on-SSD (or on-SAN?) when you run a SQL UPDATE to overwrite a single `int` column in a single row?

jval43 · on June 8, 2022

Usually the whole row is copied, updated, and then appended to a different file on disk. After some time a job runs that removes the unused rows by deleting them for good, thus rewriting files and compacting the database. This has lots of advantages, on a small scale resilience against power outages (journaling), and on a large scale you can e.g sync a second DB / backup by just streaming appends.

Simplifiying a lot here of course. E.g if your DB uses MVCC (multiversion concurrency control) old values cannot be removed until the last transaction is done seeing them.

Of course if you write your own storage engine you can do whatever you want. I wrote one for MariaDB a few years ago and you basically just have to implement the minimally required interfaces for it to work.

DBs traditionally have these technical details figured out pretty well, and the line between what is part of the DB and what is done by the OS can be blurry at times.

layer8 · on June 7, 2022

Doesn’t flash memory support rewriting single blocks?

iggldiggl · on June 8, 2022

At least for SSDs the description I read was something like yes,you can write a single block if you only need to turn 0s into 1s, but no for the reverse, which requires erasing a whole group (forgot the precise term for that) of blocks.

rasz · on June 8, 2022

In NOR flash erase actually sets all bits to 1. Afaik same thing for NAND flash, but in case of SSD its abstracted away, especially if drive implements compression.

iggldiggl · on June 8, 2022

Thanks for the correction, I was only going by some half-remembered memory.

layer8 · on June 8, 2022

I actually meant blocks on the file system level, meaning that you don’t have to rewrite the whole MP3 file once the play count field is there. Of course that might still mean that more has to be rewritten on the flash storage level, but that would be independent from the MP3 file size.

rasz · on June 8, 2022

Modifying files in place seems to be very rare outside of OS. Do you know of any every day programs doing it other than actual disk editors? Not even Hex Editors get it right.

HxD will modify file in place one character at a time, that is individual WriteFile commands per byte. Depending on write caching policy modifying 100KB of a file has a chance of generating 100K*(cluster size) total written bytes to SSD.

Frhed DGAF and replaces whole file in place from offset 0 no matter the size of a file, offset of edit or size of edits.

The safe way nowadays is write new file, rename, delete old one. Best case scenario you get appends.

layer8 · on June 8, 2022

> Modifying files in place seems to be very rare outside of OS.

It’s difficult to tell how common it is, but it is trivial by either using seek() and write() or by mmap()-ing a file. I’ve done the former for the purpose of simulating virtual memory to work with multi-gigabyte data structures in 32-bit memory space. Dropbox does it when syncing changed blocks within a file. Databases usually do it.

philjohn · on June 7, 2022

Especially on an SSD

LanternLight83 · on June 7, 2022

I don't know that the number of times a track is started would have any perceptible impact on an SSD, especially given the small size of the write (much smaller then, eg., a game save). I do think it would be cute to have an old rip of my favorite ambient YouTube mix that has accumulated hundreds of plays, even if that count isn't synced between devices (it'd accumulate them on whichever device I actually played it off of), and that that sort of passive, local scrobbing is worth at least as much enjoyment as a small game save.

philjohn · on June 8, 2022

I suppose it depends - although it's pretty much the worst case for write amplification, 1-4 bytes leading to a 4k block erase and reprogram.

codeflo · on June 8, 2022

True, but if you’re just listening and not for some reason updating your whole library, we’re talking about one block write every three minutes or so. That’s negligible compared to everything else the system is doing.