SQLite format itself is not very simple, because it is a database file format in...

SyrupThinker · 2025-05-26T17:05:45 1748279145

Yes, the limits are important to keep in mind, I should have contextualized that before.

For my case it happened to work out because it was a CDC based deduplicating format that compressed batches of chunks. Lots of flexibility with working within the limits given that.

The primary goal here was also making the reader as simple as possible whilst still having decent performance.

I think my workload is very unfair towards (typical) compressing archivers: small incremental additions, needs random access, indeed frequent incompressible files, at least if seen in isolation.

I've really brought up 7z because it is good at what it does, it is just (ironically) too flexible for what was needed. There probably some way of getting it to perform way better here.

zpack is probably a better comparison in terms of functionality, but I didn't want to assume familiarity with that one. (Also I can't really keep up with it, my solution is not tweaked to that level, even ignoring the SQLite overhead)

duskwuff · 2025-05-26T03:30:37 1748230237

BLOBs support random access - the handles aren't stateful. https://www.sqlite.org/c3ref/blob_read.html

You're right that their size is limited, though, and it's actually worse than you even thought (1 GB).

lifthrasiir · 2025-05-26T05:04:31 1748235871

My statement wasn't precise enough, you are correct that random access API is provided. But it is ultimately connected to the `accessPayload` function in btree.c which comment mentions that:

    ** The content being read or written might appear on the main page
    ** or be scattered out on multiple overflow pages.

In the other words, the API can read from multiple scattered pages unknowingly to the caller. That said I see this can be considered enough for being random accessible, as the underlying file system would use similarly structured indices behind the scene anyway... (But modern file systems do have consecutively allocated pages for performance.)