Hacker News new | past | comments | ask | show | jobs | submit login

The multi-tenancy problem actually already applies to almost every multitenant database in Postgres, Oracle and MySQL. Whether or not they use FTS. You just might not notice its impact in your case if query performance is "good enough".

War story time.

Awhile ago, I worked on a big SQL (Oracle) database, millions of rows per tenant, 10ks of tenants. Tenant data was highly dissimilar between most tenants in all sorts of ways, and the distributio n of data "shape" between tenants wasn't that spiky. Tenants were all over the place, and the common case wasn't that common. Tenants (multitenant tables and/or whole separate schemas) routinely got migrated between physical database hosts by hand (GoldenGate didn't exist at the beginning of this company, and was too expensive by the end).

A whole host of problems in this environment cropped up because, inside the DB, indexes and cache-like structures were shared between tenants. An index histogram for a DATETIME column of a multitenant table might indicate that 75% of the dates were in 2016. But it turns out that most of that 75% were the rows owned by a few big tenants, and the other 1000+ tenants on that database were overwhelmingly unlikely to have 2016 dates. As a result, query plans often sucked for queries that filtered on that date.

Breaking tables up by tenant didn't help: query plans, too, were cached by the DB. Issue was, they were cached by query text. So on a database with lots of separate (identical) logical schemas, a query plan would get built for some query when it first ran against a schema with one set of index histograms, and then another schema with totally different histograms would run the same query, pull the now-inappropriate plan out of the cache, and do something dumb. This was a crappy family of bug, in that it was happening all the time without major impact (a schema that was tiny overall is not going to cause customer/stability problems by running dumb query plans on tiny data), but cropped up unpredictably with much larger impact when customers loaded large amounts of data and/or rare queries happened to run from cache on a huge customer's schema.

The solve for the query plan issue? Prefix each query with the customer's ID in a comment, because the plan cacher was too dumb (or intended for this use case, who knows?) to strip comments. The SQL keys in the plan cache would end up looking like a zillion variants of "/* CUSTOMER-1A2BFC10 */ SELECT ....". I imagine this trick is commonplace, but folks at that gig felt clever for finding it out back in the bad old days.

All of which is to say: database multitenancy poses problems for far more than search. That's not an indictment of multitenancy in general, but does teach the valuable lesson that abandoning multitenancy, even as wacky and inefficient as that seems, should be considered as a first-class tool in the toolbox of solutions here. Database-on-demand solutions (e.g. things like neon.tech, or the ability to swiftly provision and detach an AWS Aurora replica to run queries, potentially removing everything but one tenant's data) are becoming increasingly popular, and those might reduce the pain of tenant-per-instance-ifying database layout.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: