The inability to reuse database connections would be a huge performance hit.
In a traditional webapp backend, you have a pool of connections to the database. User01 hits your service, and grabs a connection off the pool. User02 does the same, and so on. These connections get put back in the pool for reuse once a user is done with them.
In your design, every time a user hits your service, a new connection, specific to that user, will have to be made. This will incur network traffic and the overhead of logging in to the DBMS.
If you're thinking about using something like SQLite, you will hit a hard wall when the OS isn't able to open any more file descriptors, as well.
Like you said, DB administration will be a huge pain in the ass. Rather than having Flyway or Liquidbase or whatever run a migration on one database, you'll have to run it on thousands of databases. There will be significant downtime when your databases are not in a consistent state with one another. There will also be bloat from the migration tool's record keeping, which will be duplicated for every user, rather than every database.
A lot of the tools a database gives you for free will also need to be implemented in application logic, instead. For example, you might want to run a query that says "show me every user using more than 1GB of storage," but under your schema, you'll have to log into every user's database individually, determine storage used, and add it to an in-memory list.
If you ever want to allow users to collaborate, you will end up replicating the owner_id field type metadata anyway, and the entire benefit of this schema will evaporate.
Most frameworks are not set up to handle this style of database access, either. I don't use Rails, but Spring Boot would fight you every step of the way if you tried to do this.
> In your design, every time a user hits your service, a new connection, specific to that user, will have to be made. This will incur network traffic and the overhead of logging in to the DBMS.
If connecting to your DB is significantly increasing your page times, you've got seriously fast pages. Even back when I was working with a MySQL database regularly in 2010, connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).
> If you're thinking about using something like SQLite, you will hit a hard wall when the OS isn't able to open any more file descriptors, as well.
You could always just only keep open the most recently used 1 Million databases. It's pretty easy to tune the FD limit, FreeBSD lets you set it to one FD per 16k of ram without modifying the kernel, but it's a challenge to be so memory and cpu efficient that that's a limit.
All that said, it really depends on the application. If this is consumer facing, and each account is an end user, one database per user is probably excessive overhead; one database (or sharded, if you've got the users) or one (flatish) file per user makes a lot more sense.
If it's business facing, than one database per account could make sense. You would have isolation between customers and could put big customers on dedicated boxes and let the customer drive upgrades and migrations etc. Just please please please consider that corporations merge and divide all the time, don't be like G Suite and not offer a way to merge and divide accounts to reflect their corporate ownership.
Ok, you and I can be friends. A lot of people are using 'lightweight' frameworks where hello world is 30 ms, and then they call slow services and run slow queries, etc.
If your target is 10 ms, then you probably should worry about db connection time.
>connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far)
is that with creating a new connection in the pool?
we work with microservices so we have a "db gateway" so any request to the DB goes through that and it routes to the correct db server for that tenent. our latency for an already "hot" connection is about 40-50 on average but i belive the lowest number i got (for query by primary key in a kinda small table) was no less than 20-30
and opening a new connection added a couple of 10's atleast to that number
Yeah, that's a totally new connection (no pool), time from memory (could be off).
On my home system with a database I happen to be running anyway, I see:
$ time mysql -u mythtv -h 192.168.0.12 mythconverg \
-e 'select * from credits where person = 147628 limit 1' > /dev/null
real 0m0.023s
Server is running a Celeron(R) CPU 1007U @ 1.50GHz, client is Celeron(R) 2955U @ 1.40GHz, networking is 1GBps. I don't have a super easy way to measure just the connect + login time, so this is connect + login + indexed query. The server is lightly loaded, and I warmed up the table, but it's also a laptop chip on a desktop oriented board with a lowend NIC.
> If connecting to your DB is significantly increasing your page times, you've got seriously fast pages. Even back when I was working with a MySQL database regularly in 2010, connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).
One of the major ways I helped Mastodon was improving their connection pooling situation. If you haven't encountered connection size issues with your database count yourself lucky.
My experience was with Apache + mod_php, so there was no option to pool connections between workers, and you would set the Apache connection limits such that they summed up to less than the MySQL connection limits (unless you had a lot of traffic that didn't hit the database... then sizing would be tricky)
I think (but it's not totally clear) that the proposal is about B2B services and it's one database per paying account, not one database per end user.
For instance, if you're Slack, the proposal would be to have one database per Slack workspace, not one database per person who has a login to any workspace. You absolutely need to have data relationships between multiple users in a workspace. You don't necessarily need relationships between multiple workspaces (as it happens, Slack only added this feature very recently), and having different databases means that if company 1 triggers pathological behavior on their Slack, company 2 is unaffected.
Or, if you're Wikia^WFandom, you'd have one database per site. You could certainly run one big database for the entire site, but you could also have one database for Wookieepedia, one for Memory Alpha, one for the Harry Potter Wiki, etc.
In these situations, you wouldn't have the problem about the performance hit or about making it work with frameworks - you'd run separate web application servers per workspace/site/customer, too. Some of your machines would be running Rails/Spring Boot/Django/PHP/whatever with an environment file for Wookieepedia. Some would run Memory Alpha. Ideally you'd throw this in some sort of autoscaler (anything from Apache-managed FastCGI processes to Kubernetes would work fine). But User02, when they visit Wookieepedia, would hit a Wookieepedia application server that User01 has previously used and already has a connection to the Wookieepedia DB.
Yes, you would need to deal with doing a DB migration per site/customer instead of one big migration - but on the other hand, you get to to a DB migration per customer instead of one big migration. Each one is unaffected by the others. You'd spend a bit of time automating the process of migrations and you'd hit your test sites first, and then migration is much more reliable for your customers. If you really need downtime, you can set separate downtime per customer, and if one customer has a bunch of data that takes forever to migrate, no other customer cares. It's a tradeoff, and it may be useful.
If you want to allow customers to collaborate, you need to build it as an API, as if they were two different products interacting at arms' length - but you can certainly do that.
Not a problem with MySQL, "use `tenant`" switches a connection's schema.
Rails migrations work reasonably well with apartment gem. Never had a problem with inconsistent database migrations. Sometimes a migration will fail for a tenant, but ActiveRecord migrations records that, you fix the migration, and reapply, a no-op where it's already done.
We don't use a single mysqld for every tenant mind, it's not like migrating tenants is completely serialized.
But if the idea is to isolate accounts form each other, the different schemas would be available to different DB users. You would have to re-authenticate to get access to the other DB.
Using schemas gives you imperfect but still improved isolation. It's still possible for a database connection to cross into another tenant, but if your schema search path only includes the tenant in question, it significantly reduces the chance that cross-customer data is accidentally shared.
I think numeric ids should be allocated out of the same key space, other identifiers should be hierarchical and scoped to the tenant in the database.
The same query run across all databases should either return 1 query (for the valid tenant) and empty set for all other databases, OR it should return the same result set regardless.
I just realized what I am proposing, a hidden out of band column that is effectively the "database id" for that row.
If you built a tenanting library that used partioning rather than schemas, you'd probably end up with something that looked pretty close to what you're describing.
With schemas, it's definitely possible to use the same generator for ids across schemas (at least, I'm 90% sure it is in Postgres), but you'll probably end up fighting against ORM libraries to get it to work properly (Rails for instance makes a LOT of assumptions about how the id column works), and you aren't technically guaranteed uniqueness since you'll still have distinct PK columns.
(It's not actually simpler when query execution over 100s of millions of rows is a perf bottleneck, and each tenant has several billion rows in the main tables. Then you're grateful you can schlep them around, and keep small tenants fast, etc. Even now, Postgres would still be a dubious choice due to the unpredictability of its query planner, though I use it for all my hobby projects.)
It's a terrible idea in the same way that using PHP instead of Rust to build a production large scale application is a terrible idea (i.e. it's actually a great idea but it's not "cool").
It’s not a cool factor issue. It’s an issue of bloating the system catalogs, inability to use the buffer pool, and having to run database migrations for each and every separate schema or maintaining concurrent versions of application code to deal with different schema versions.
It’s be just as dumb to do it in Rust as it would be in PHP.
As you can see now that the thread has matured, there are a lot of proponents of this architecture that have production experience with it, so it's likely not as dumb as you assume.
> As you can see now that the thread has matured, there are a lot of proponents of this architecture that have production experience with it, ...
Skimming through the updated comments I do not see many claiming it was a good idea or successful at scale. It may work fine for 10s or even 100s of customers, but it quickly grows out of control. Trying to maintain 100,000 customer schemas and running database migrations across all of them is a serious headache.
> ...so it's likely not as dumb as you assume.
I'm not just assuming, I've tried out some of the ideas proposed in this thread and know first hand they do not work at scale. Index page caching in particular is a killer as you lose most benefits of a centralized BTREE structure when each customer has their own top level pages. Also, writing dynamic SQL to perform 100K "... UNION ALL SELECT * FROM customer_12345.widget" is both incredibly annoying and painfully slow.
Extremely few companies that sell B2B SaaS software for enterprises have 10K customers, let alone 100K (that's the kind of customer base that pays for a Sauron-looking tower in downtown SF). Service Now, Workday, etc, are publicly traded and have less than 5000 customers each.
All of them also (a) don't run a single multitenant cluster for all their customers and (b) are a massive pain in the ass to run in every possible way (an assumption, but a safe one at that!).
Or you have a connection pool for every user, which is basically the same except that you must do some mapping by yourself and you have more connection pools and open connections.
Serverless lambdas only re-use connections in certain situations, IIRC. That might not be the "traditional webapp backend", but it's a growing concept.
In a traditional webapp backend, you have a pool of connections to the database. User01 hits your service, and grabs a connection off the pool. User02 does the same, and so on. These connections get put back in the pool for reuse once a user is done with them.
In your design, every time a user hits your service, a new connection, specific to that user, will have to be made. This will incur network traffic and the overhead of logging in to the DBMS.
If you're thinking about using something like SQLite, you will hit a hard wall when the OS isn't able to open any more file descriptors, as well.
Like you said, DB administration will be a huge pain in the ass. Rather than having Flyway or Liquidbase or whatever run a migration on one database, you'll have to run it on thousands of databases. There will be significant downtime when your databases are not in a consistent state with one another. There will also be bloat from the migration tool's record keeping, which will be duplicated for every user, rather than every database.
A lot of the tools a database gives you for free will also need to be implemented in application logic, instead. For example, you might want to run a query that says "show me every user using more than 1GB of storage," but under your schema, you'll have to log into every user's database individually, determine storage used, and add it to an in-memory list.
If you ever want to allow users to collaborate, you will end up replicating the owner_id field type metadata anyway, and the entire benefit of this schema will evaporate.
Most frameworks are not set up to handle this style of database access, either. I don't use Rails, but Spring Boot would fight you every step of the way if you tried to do this.