It's using SQLite files (all of Datasette is built around SQLite at the moment) which are stored on Fly Volumes (Datasette Cloud provides a dedicated Fly Machines Firecracker container for each team account) and backed up to S3 using Litestream.
The initial goal was to provide a private collaboration space, where scaling isn't as much of a challenge - at least until you get companies with thousands of employees all using it at once, though even then I would expect SQLite to be able to keep up.
I've since realized that the "publishing" access of Datasette is crucially important to support. For that I have a few approaches I'm exploring:
1. Published data sits behind a Varnish cache, which should then handle huge spikes of traffic as long as it's to the same set of URLs.
2. Datasette has a great scalability story already for read-only data: you publish to something like Cloud Run or Vercel which can spin up new copies of the data on-demand to handle increased traffic. So I could let Datasette Cloud users say "publish this subset of data once every X minutes" and use that.
3. Fly are working on https://fly.io/docs/litefs/ which is a perfect match for Datasette Cloud - it would allow me to run read-replicas of SQLite databases in multiple regions around the world.
Part of Datasette/Datasette Cloud development is sponsored by Fly at the moment, in return for which we'll be publishing detailed notes on what we learn about building and scaling on their platform.
In terms of scaling volume storage itself... the technical size limit for SQLite is 280TB, but I'm not planning on getting anywhere near that! I expect the sweet spot for Datasette Cloud will be more around the 100MB to 100GB range, probably mostly <10GB.
The initial goal was to provide a private collaboration space, where scaling isn't as much of a challenge - at least until you get companies with thousands of employees all using it at once, though even then I would expect SQLite to be able to keep up.
I've since realized that the "publishing" access of Datasette is crucially important to support. For that I have a few approaches I'm exploring:
1. Published data sits behind a Varnish cache, which should then handle huge spikes of traffic as long as it's to the same set of URLs.
2. Datasette has a great scalability story already for read-only data: you publish to something like Cloud Run or Vercel which can spin up new copies of the data on-demand to handle increased traffic. So I could let Datasette Cloud users say "publish this subset of data once every X minutes" and use that.
3. Fly are working on https://fly.io/docs/litefs/ which is a perfect match for Datasette Cloud - it would allow me to run read-replicas of SQLite databases in multiple regions around the world.
Part of Datasette/Datasette Cloud development is sponsored by Fly at the moment, in return for which we'll be publishing detailed notes on what we learn about building and scaling on their platform.
In terms of scaling volume storage itself... the technical size limit for SQLite is 280TB, but I'm not planning on getting anywhere near that! I expect the sweet spot for Datasette Cloud will be more around the 100MB to 100GB range, probably mostly <10GB.