Hacker News new | past | comments | ask | show | jobs | submit login

>The thing that first got to me was what really made me see the MP3 + ID3 file in a different light. Play counter (PCNT). This mighty little frame contains a number and it is intended to be the number of times the file has been played. According to spec it should be incremented when it begins playing. This means that the file changes as people “consume” the media in it.

Thanks, I hate it.




Updating 32-bits long rewrites 2-5MB on a flash memory devices, genius specification!


...does that mean ye-olde OLTP SQL leads to accelerated storage aging more than other use-cases? So doing an UPDATE to set a single `int`-column in a single row is far more expensive (in terms of wear-and-tear) than we assumed?

...could a DoS attack on any SSD-based cloud database/storage provider be done simply by doing lots of small writes?

------------

If this really is true, it makes me wonder why Intel's 3D XPoint (y'know: Optane SSD) lost its steam - it's byte-addressable (and so I assume it's also byte-writeable... right?), so surely would be perfect for OLTP scenarios like this?

At the same time, I'm not aware of any actual modern RDMBS that reads and writes anything less than a 4KB page at a time, regardless of underlying storage media, simply because 4096 bytes is such a common allocation-unit everywhere you look: Linux, Windows, HDDs, SSDs, NTFS, extfs, even btrfs defaults to 4KB for file extents.

Can anyone with experience with the internals of "serious" RDBMS (i.e. MySQL, PostgreSQL, Sybase, MSSQL, Oracle, etc) explain exactly what happens at a low-level on-disk or on-SSD (or on-SAN?) when you run a SQL UPDATE to overwrite a single `int` column in a single row?


Usually the whole row is copied, updated, and then appended to a different file on disk. After some time a job runs that removes the unused rows by deleting them for good, thus rewriting files and compacting the database. This has lots of advantages, on a small scale resilience against power outages (journaling), and on a large scale you can e.g sync a second DB / backup by just streaming appends.

Simplifiying a lot here of course. E.g if your DB uses MVCC (multiversion concurrency control) old values cannot be removed until the last transaction is done seeing them.

Of course if you write your own storage engine you can do whatever you want. I wrote one for MariaDB a few years ago and you basically just have to implement the minimally required interfaces for it to work.

DBs traditionally have these technical details figured out pretty well, and the line between what is part of the DB and what is done by the OS can be blurry at times.


Doesn’t flash memory support rewriting single blocks?


At least for SSDs the description I read was something like yes,you can write a single block if you only need to turn 0s into 1s, but no for the reverse, which requires erasing a whole group (forgot the precise term for that) of blocks.


In NOR flash erase actually sets all bits to 1. Afaik same thing for NAND flash, but in case of SSD its abstracted away, especially if drive implements compression.


Thanks for the correction, I was only going by some half-remembered memory.


I actually meant blocks on the file system level, meaning that you don’t have to rewrite the whole MP3 file once the play count field is there. Of course that might still mean that more has to be rewritten on the flash storage level, but that would be independent from the MP3 file size.


Modifying files in place seems to be very rare outside of OS. Do you know of any every day programs doing it other than actual disk editors? Not even Hex Editors get it right.

HxD will modify file in place one character at a time, that is individual WriteFile commands per byte. Depending on write caching policy modifying 100KB of a file has a chance of generating 100K*(cluster size) total written bytes to SSD.

Frhed DGAF and replaces whole file in place from offset 0 no matter the size of a file, offset of edit or size of edits.

The safe way nowadays is write new file, rename, delete old one. Best case scenario you get appends.


> Modifying files in place seems to be very rare outside of OS.

It’s difficult to tell how common it is, but it is trivial by either using seek() and write() or by mmap()-ing a file. I’ve done the former for the purpose of simulating virtual memory to work with multi-gigabyte data structures in 32-bit memory space. Dropbox does it when syncing changed blocks within a file. Databases usually do it.


Especially on an SSD


I don't know that the number of times a track is started would have any perceptible impact on an SSD, especially given the small size of the write (much smaller then, eg., a game save). I do think it would be cute to have an old rip of my favorite ambient YouTube mix that has accumulated hundreds of plays, even if that count isn't synced between devices (it'd accumulate them on whichever device I actually played it off of), and that that sort of passive, local scrobbing is worth at least as much enjoyment as a small game save.


I suppose it depends - although it's pretty much the worst case for write amplification, 1-4 bytes leading to a 4k block erase and reprogram.


True, but if you’re just listening and not for some reason updating your whole library, we’re talking about one block write every three minutes or so. That’s negligible compared to everything else the system is doing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: