Hacker News new | past | comments | ask | show | jobs | submit login

SQLite is great but it's way overhyped and abused on HN. People are very eager to turn SQLite into a durable, distributed database and it's really not meant for that, and by going down that road instead of using something like MySQL or Postgres you're missing out on lots of important functionality and tooling.

I only say this because I have made this mistake at my previous startup. We built these really cool distributed databases on top of a similar storage engine (RocksDB) plus Kafka, but it ended up being more trouble than it was worth. We should have just used a battle-tested relational database instead.

Using SQLite for these applications is really fun, and it seems like a good idea on paper. But in practice I just don't think it's worth it. YMMV though.




So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

Also databases all use the same fundamental primitives and it's up to you to choose the level of abstraction you need. For example, FoundationDB is a durable distributed database that uses SQLite underneath as the storage layer but exposes an ordered key/value API, but then allows you to build your own relational DB on top.

If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices like CockroachDB/Yugabyte/TiDB/Memsql/etc that can serve the purpose instead of building your own.


It's actually quite similar. Both are embedded storage engines that are designed for a single node.

Actually, the case for RocksDB for backing a distributed data store is probably much stronger than SQLite given that it supports multiple concurrent writers.

SQLite lacks many important characteristics that one would expect a distributed data store to have. Row level locking is one obvious feature that's super important in a highly concurrent context (as mentioned, RocksDB has this). Want to backup your production DB? You're going to need to block all writes until the backup completes.

Additionally, features like profiling and replication are nonexistent or immature with SQLite. Rqlite and Litestream are super new relative to tools like Postgres and MySQL and you can't find a lot of people that know how to run them.

Also, you can't autoscale your app since your processes are now highly stateful. Sure, this is a problem with MySQL/Postgres too, but I can pay AWS or Google Cloud for a managed version that will abstract this problem away from me.

Most of these problems are solvable with enough net new software on top of SQLite. But... why? I think the only reason you'd subject yourself to such an architecture is because you want to learn (great!) or you're gunning for that next promotion and need to show off your system design skills :P


> So you didn't use SQLite then? Because RocksDB + Kafka is not similar at all.

To me, I could make the connection in the sense that just like sqlite, rocksdb is an embedded store, while Kafka can be used to build a replicated log (log device).

> If you just needed distributed SQL because a single instance wasn't enough then there are already plenty of choices...

Well, that was GP's point, too? In addition, they mention that existing DBMS like Postgres have way more breath and depth than a replicated sqlite can ever hope to have (which isn't really a controversial assertion at all, tbh).


I accept that you learned a lot about the limits of combining RockDB with Kafka, especially in the exact way you combined them.

This might have limited utility if the goal were to combine RocksDB with something else. And even less for SQLite and something else.

The big push of interest in SQLite serverside isn't driven by people who have never set up pgbounce, but rather by developers who have both read the SQLite docs very carefully and have used the library extensively, and know what it's good for.


I'm not sure why you concluded that SQLite is the problem when you built a "really cool distributed database" with Kafka. Distributed databases are complicated, Kafka's complicated.

If you're saying that a replicated Postgres setup would be simpler than what you're built, I agree; but SQLite+Litestream probably would be too.


Litestream is too much work if you're not using S3: replication over sftp. Even fossil has nicer no nonsense replication done over http/s. It's way easier to set up mysql with replication than manage unix accounts and public keys.


Is this any good? https://github.com/rqlite/rqlite

I've been looking for a turn key solution that is better than me running a single node Postgres instance "bare metal" or in a container.

postgres-operator seems cool but... k8s, pretty heavy I guess.


It’s the default storage engine for FoundationDB - not sure many would agree that isn’t a “durable, distributed database”.


For one thing, they're ripping it out because of its poor write parallelism https://youtu.be/nlus1Z7TVTI?t=271

But that's orthogonal to my point. As a user of FoundationDB, you're not programming directly against SQLite, so you aren't going to run into these issues as much since FoundationDB exposes different semantics and coordinates concurrency across many SQLite instances in parallel.

I think it's best to think of SQLite as a replacement for your filesystem, rather than a replacement for your relational DBMS.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: