The inability to reuse database connections would be a huge performance hit. In ...

toast0 · on May 26, 2020

> In your design, every time a user hits your service, a new connection, specific to that user, will have to be made. This will incur network traffic and the overhead of logging in to the DBMS.

If connecting to your DB is significantly increasing your page times, you've got seriously fast pages. Even back when I was working with a MySQL database regularly in 2010, connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).

> If you're thinking about using something like SQLite, you will hit a hard wall when the OS isn't able to open any more file descriptors, as well.

You could always just only keep open the most recently used 1 Million databases. It's pretty easy to tune the FD limit, FreeBSD lets you set it to one FD per 16k of ram without modifying the kernel, but it's a challenge to be so memory and cpu efficient that that's a limit.

All that said, it really depends on the application. If this is consumer facing, and each account is an end user, one database per user is probably excessive overhead; one database (or sharded, if you've got the users) or one (flatish) file per user makes a lot more sense.

If it's business facing, than one database per account could make sense. You would have isolation between customers and could put big customers on dedicated boxes and let the customer drive upgrades and migrations etc. Just please please please consider that corporations merge and divide all the time, don't be like G Suite and not offer a way to merge and divide accounts to reflect their corporate ownership.

llimllib · on May 26, 2020

> connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).

I try to get all my endpoints under 10ms! 5ms per call would be huge.

(Obviously I can't succeed all the time, but 5ms is big numbers imo)

toast0 · on May 26, 2020

Ok, you and I can be friends. A lot of people are using 'lightweight' frameworks where hello world is 30 ms, and then they call slow services and run slow queries, etc.

If your target is 10 ms, then you probably should worry about db connection time.

freedomben · on May 26, 2020

With Elixir/Phoenix I sometimes see response times measured in µs rather than ms!

Thristle · on May 26, 2020

>connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far)

is that with creating a new connection in the pool? we work with microservices so we have a "db gateway" so any request to the DB goes through that and it routes to the correct db server for that tenent. our latency for an already "hot" connection is about 40-50 on average but i belive the lowest number i got (for query by primary key in a kinda small table) was no less than 20-30 and opening a new connection added a couple of 10's atleast to that number

toast0 · on May 26, 2020

Yeah, that's a totally new connection (no pool), time from memory (could be off).

On my home system with a database I happen to be running anyway, I see:

    $ time mysql -u mythtv -h 192.168.0.12 mythconverg \
     -e 'select * from credits where person = 147628 limit 1' > /dev/null

    real    0m0.023s

Server is running a Celeron(R) CPU 1007U @ 1.50GHz, client is Celeron(R) 2955U @ 1.40GHz, networking is 1GBps. I don't have a super easy way to measure just the connect + login time, so this is connect + login + indexed query. The server is lightly loaded, and I warmed up the table, but it's also a laptop chip on a desktop oriented board with a lowend NIC.

Thristle · on May 27, 2020

my guess is the service http call chain can add up. common call chains are 3+ services

krainboltgreene · on May 26, 2020

> If connecting to your DB is significantly increasing your page times, you've got seriously fast pages. Even back when I was working with a MySQL database regularly in 2010, connect + login was 5 ms at maximum (and I think it was much less, I just don't remember that far).

One of the major ways I helped Mastodon was improving their connection pooling situation. If you haven't encountered connection size issues with your database count yourself lucky.

toast0 · on May 26, 2020

My experience was with Apache + mod_php, so there was no option to pool connections between workers, and you would set the Apache connection limits such that they summed up to less than the MySQL connection limits (unless you had a lot of traffic that didn't hit the database... then sizing would be tricky)

geofft · on May 26, 2020

I think (but it's not totally clear) that the proposal is about B2B services and it's one database per paying account, not one database per end user.

For instance, if you're Slack, the proposal would be to have one database per Slack workspace, not one database per person who has a login to any workspace. You absolutely need to have data relationships between multiple users in a workspace. You don't necessarily need relationships between multiple workspaces (as it happens, Slack only added this feature very recently), and having different databases means that if company 1 triggers pathological behavior on their Slack, company 2 is unaffected.

Or, if you're Wikia^WFandom, you'd have one database per site. You could certainly run one big database for the entire site, but you could also have one database for Wookieepedia, one for Memory Alpha, one for the Harry Potter Wiki, etc.

In these situations, you wouldn't have the problem about the performance hit or about making it work with frameworks - you'd run separate web application servers per workspace/site/customer, too. Some of your machines would be running Rails/Spring Boot/Django/PHP/whatever with an environment file for Wookieepedia. Some would run Memory Alpha. Ideally you'd throw this in some sort of autoscaler (anything from Apache-managed FastCGI processes to Kubernetes would work fine). But User02, when they visit Wookieepedia, would hit a Wookieepedia application server that User01 has previously used and already has a connection to the Wookieepedia DB.

Yes, you would need to deal with doing a DB migration per site/customer instead of one big migration - but on the other hand, you get to to a DB migration per customer instead of one big migration. Each one is unaffected by the others. You'd spend a bit of time automating the process of migrations and you'd hit your test sites first, and then migration is much more reliable for your customers. If you really need downtime, you can set separate downtime per customer, and if one customer has a bunch of data that takes forever to migrate, no other customer cares. It's a tradeoff, and it may be useful.

If you want to allow customers to collaborate, you need to build it as an API, as if they were two different products interacting at arms' length - but you can certainly do that.

barrkel · on May 25, 2020

Not a problem with MySQL, "use `tenant`" switches a connection's schema.

Rails migrations work reasonably well with apartment gem. Never had a problem with inconsistent database migrations. Sometimes a migration will fail for a tenant, but ActiveRecord migrations records that, you fix the migration, and reapply, a no-op where it's already done.

We don't use a single mysqld for every tenant mind, it's not like migrating tenants is completely serialized.

sp332 · on May 26, 2020

> USE `tenant`

But if the idea is to isolate accounts form each other, the different schemas would be available to different DB users. You would have to re-authenticate to get access to the other DB.

ryanbrunner · on May 26, 2020

Using schemas gives you imperfect but still improved isolation. It's still possible for a database connection to cross into another tenant, but if your schema search path only includes the tenant in question, it significantly reduces the chance that cross-customer data is accidentally shared.

sitkack · on May 26, 2020

I think numeric ids should be allocated out of the same key space, other identifiers should be hierarchical and scoped to the tenant in the database.

The same query run across all databases should either return 1 query (for the valid tenant) and empty set for all other databases, OR it should return the same result set regardless.

I just realized what I am proposing, a hidden out of band column that is effectively the "database id" for that row.

ryanbrunner · on May 26, 2020

If you built a tenanting library that used partioning rather than schemas, you'd probably end up with something that looked pretty close to what you're describing.

With schemas, it's definitely possible to use the same generator for ids across schemas (at least, I'm 90% sure it is in Postgres), but you'll probably end up fighting against ORM libraries to get it to work properly (Rails for instance makes a LOT of assumptions about how the id column works), and you aren't technically guaranteed uniqueness since you'll still have distinct PK columns.

barrkel · on May 26, 2020

The idea is prevent a simple SQL mistake from exposing information across tenants.

latch · on May 26, 2020

RLS seems like a simpler solution.

barrkel · on May 26, 2020

How well does that work with mysql 5.5 in 2012?

Exactly.

(It's not actually simpler when query execution over 100s of millions of rows is a perf bottleneck, and each tenant has several billion rows in the main tables. Then you're grateful you can schlep them around, and keep small tenants fast, etc. Even now, Postgres would still be a dubious choice due to the unpredictability of its query planner, though I use it for all my hobby projects.)

jbverschoor · on May 25, 2020

You can use the same connection across multiple databases without any problem.

WatchDog · on May 25, 2020

Not on postgres.

koolba · on May 25, 2020

You can use schemas.

Also it’s still a terrible idea.

andrewflnr · on May 26, 2020

Why is it a terrible idea?

gomox · on May 26, 2020

It's a terrible idea in the same way that using PHP instead of Rust to build a production large scale application is a terrible idea (i.e. it's actually a great idea but it's not "cool").

koolba · on May 26, 2020

It’s not a cool factor issue. It’s an issue of bloating the system catalogs, inability to use the buffer pool, and having to run database migrations for each and every separate schema or maintaining concurrent versions of application code to deal with different schema versions.

It’s be just as dumb to do it in Rust as it would be in PHP.

gomox · on May 26, 2020

As you can see now that the thread has matured, there are a lot of proponents of this architecture that have production experience with it, so it's likely not as dumb as you assume.

koolba · on May 26, 2020

> As you can see now that the thread has matured, there are a lot of proponents of this architecture that have production experience with it, ...

Skimming through the updated comments I do not see many claiming it was a good idea or successful at scale. It may work fine for 10s or even 100s of customers, but it quickly grows out of control. Trying to maintain 100,000 customer schemas and running database migrations across all of them is a serious headache.

> ...so it's likely not as dumb as you assume.

I'm not just assuming, I've tried out some of the ideas proposed in this thread and know first hand they do not work at scale. Index page caching in particular is a killer as you lose most benefits of a centralized BTREE structure when each customer has their own top level pages. Also, writing dynamic SQL to perform 100K "... UNION ALL SELECT * FROM customer_12345.widget" is both incredibly annoying and painfully slow.

gomox · on May 26, 2020

I don't think we share the definition of "scale".

Extremely few companies that sell B2B SaaS software for enterprises have 10K customers, let alone 100K (that's the kind of customer base that pays for a Sauron-looking tower in downtown SF). Service Now, Workday, etc, are publicly traded and have less than 5000 customers each.

All of them also (a) don't run a single multitenant cluster for all their customers and (b) are a massive pain in the ass to run in every possible way (an assumption, but a safe one at that!).

neeleshs · on May 26, 2020

Yep. Too easy and not cool. But works really well and no headaches

Aweorih · on May 26, 2020

Or you have a connection pool for every user, which is basically the same except that you must do some mapping by yourself and you have more connection pools and open connections.

JMTQp8lwXL · on May 26, 2020

Serverless lambdas only re-use connections in certain situations, IIRC. That might not be the "traditional webapp backend", but it's a growing concept.

greatjack613 · on May 25, 2020

This - golden