Using SQLite is most certainly not "building their own filebased database-system...

PurpleRamen · on Nov 7, 2023

> Using SQLite is most certainly not "building their own filebased database-system"

[..] Each user has their own SQLite file [..]

[..] We also introduce 3 separate SQLite databases for managing service state [..]

This doesn't use SQLite for the database-managment, but for the individual "document". The database-managment itself is handled in the application-server. You jiggle around files and poke wherever it matches, this is by a classical filebased database-managment-system.

> It would make total sense for a PDS instance to have a single user, and in fact that is likely for many self-hosters.

Sure, if it's just a low-user-instance, the performance is not much of a deal. But from my impression here, this is also the code Bluesky uses for everything else, from low to massively high user-instances. And then I want to see how RAM holds up, when you have 10k+ user-databases open at the same time on one instance.

> There are trade-offs to this kind of a system but it comes out way ahead in this case.

Which is why I want to see some actual numbers and solid explanations going more into details then the gossip in the comments here.

> A major consideration is that we're planning to run at least 100+ instances, which would require operating 100+ high availability (primary+replica) Postgres clusters.

Are those independent instances, or just 100+ instances servers from the same company on different locations? But I don't see how this can replace a whole postgres-cluster without removing significant functionality. I mean sqlite does not have good replication on it's own AFAIK, so as you seem to still use replication, you just replace it with another solution? Which also means you remove the same options for anyone else, and forces them to use your solutions?. I don't see how this will be beneficial for self-hosters.