Each users' data is naturally partitioned at the atproto repository level, so this is the sweet spot for per-user SQLite databases. It would make total sense for a PDS instance to have just a single user on it, and in fact that is likely for many self-hosters. It's also worth noting that the PDS software already had SQLite support, which made this change somewhat easier.
There are legitimte trade-offs to this kind of a system but it comes out way ahead in this case, and it's not as wild as it may seem to those not familiar with the power of SQLite.
A major consideration is that we're planning to run at least 100+ instances, which would require operating 100+ high availability (primary+replica) Postgres clusters. This would be a huge amount of operational and financial overhead.
We chose this route because it is better for us as a small team, with relatively limited resources. But it also has the property of being much easier for self-hosters, of which we hope there will be many.
> Using SQLite is most certainly not "building their own filebased database-system"
[..] Each user has their own SQLite file [..]
[..] We also introduce 3 separate SQLite databases for managing service state [..]
This doesn't use SQLite for the database-managment, but for the individual "document". The database-managment itself is handled in the application-server. You jiggle around files and poke wherever it matches, this is by a classical filebased database-managment-system.
> It would make total sense for a PDS instance to have a single user, and in fact that is likely for many self-hosters.
Sure, if it's just a low-user-instance, the performance is not much of a deal. But from my impression here, this is also the code Bluesky uses for everything else, from low to massively high user-instances. And then I want to see how RAM holds up, when you have 10k+ user-databases open at the same time on one instance.
> There are trade-offs to this kind of a system but it comes out way ahead in this case.
Which is why I want to see some actual numbers and solid explanations going more into details then the gossip in the comments here.
> A major consideration is that we're planning to run at least 100+ instances, which would require operating 100+ high availability (primary+replica) Postgres clusters.
Are those independent instances, or just 100+ instances servers from the same company on different locations? But I don't see how this can replace a whole postgres-cluster without removing significant functionality. I mean sqlite does not have good replication on it's own AFAIK, so as you seem to still use replication, you just replace it with another solution? Which also means you remove the same options for anyone else, and forces them to use your solutions?. I don't see how this will be beneficial for self-hosters.
SQLite is just about as mature and well-tested as it gets in the entire world of software: https://www.sqlite.org/testing.html
Each users' data is naturally partitioned at the atproto repository level, so this is the sweet spot for per-user SQLite databases. It would make total sense for a PDS instance to have just a single user on it, and in fact that is likely for many self-hosters. It's also worth noting that the PDS software already had SQLite support, which made this change somewhat easier.
There are legitimte trade-offs to this kind of a system but it comes out way ahead in this case, and it's not as wild as it may seem to those not familiar with the power of SQLite.
A major consideration is that we're planning to run at least 100+ instances, which would require operating 100+ high availability (primary+replica) Postgres clusters. This would be a huge amount of operational and financial overhead.
We chose this route because it is better for us as a small team, with relatively limited resources. But it also has the property of being much easier for self-hosters, of which we hope there will be many.