Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

SQLite format itself is not very simple, because it is a database file format in its heart. By using SQLite you are unknowingly constraining your use case; for example you can indeed stream BLOBs, but you can't randomly access BLOBs because the SQLite format puts a large BLOB into pages in a linked list, at least when I checked last. And BLOBs are limited in size anyway (4GB AFAIK) so streaming itself might not be that useful. The use of SQLite also means that you have to bring SQLite into your code base, and SQLite is not very small if you are just using it as a container.

> My archiver could even keep up with 7z in some cases (for size and access speed).

7z might feel slow because it enables solid compression by default, which trades decompression speed with compression ratio. I can't imagine 7z having a similar compression ratio with correct options though, was your input incompressible?



Yes, the limits are important to keep in mind, I should have contextualized that before.

For my case it happened to work out because it was a CDC based deduplicating format that compressed batches of chunks. Lots of flexibility with working within the limits given that.

The primary goal here was also making the reader as simple as possible whilst still having decent performance.

I think my workload is very unfair towards (typical) compressing archivers: small incremental additions, needs random access, indeed frequent incompressible files, at least if seen in isolation.

I've really brought up 7z because it is good at what it does, it is just (ironically) too flexible for what was needed. There probably some way of getting it to perform way better here.

zpack is probably a better comparison in terms of functionality, but I didn't want to assume familiarity with that one. (Also I can't really keep up with it, my solution is not tweaked to that level, even ignoring the SQLite overhead)


BLOBs support random access - the handles aren't stateful. https://www.sqlite.org/c3ref/blob_read.html

You're right that their size is limited, though, and it's actually worse than you even thought (1 GB).


My statement wasn't precise enough, you are correct that random access API is provided. But it is ultimately connected to the `accessPayload` function in btree.c which comment mentions that:

    ** The content being read or written might appear on the main page
    ** or be scattered out on multiple overflow pages.
In the other words, the API can read from multiple scattered pages unknowingly to the caller. That said I see this can be considered enough for being random accessible, as the underlying file system would use similarly structured indices behind the scene anyway... (But modern file systems do have consecutively allocated pages for performance.)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: