Hacker News new | past | comments | ask | show | jobs | submit login

That might be the goal. If you are storing private data for your users which should not be queried. You can have separate encryption keys for each user and keep their data on separate databases.



We did pretty much that ages ago when I built a webmail provider.

Marketing hated it, because it meant collecting user data took conscious effort which meant they had to ask the dev team, which meant requests were often turned down when hard questions were asked about whether it was justified to collate personal information.

That was a feature, from my perspective.

We actually took most user demographics data entirely offline. Data that was only intended to create aggregates anonymised profiles of our users were kept encrypted in bank box when not being analysed. Any analysis on it was done on an airgapped machine, and only the anonymised reports taken out.

The only central database that was online was one that kept a mapping of whether or not a given user name was available.

Then each storage shard kept track of which users were on which backend, and the account data that had to be online for each user (primarily their actual mail since we were a mail provider, along with settings etc. for their mailbox) were kept on a per user basis.

It worked well, but you need to have buyin from the top for this approach, as there will be constant pressure for easing access to more and more user data.


I think this is an excellent architecture for powerful, respectful, hosted applications. I’ve been thinking about a few extensions of this idea:

First, use advances in privacy technology to create a service-wide data warehouse that has enough information to help you make good decisions without exposing any specific user’s data. Done properly, users will benefit from your improved decision-making without giving up their personal data. Differential Privacy can do this.

Second, give users the opportunity to download their own little database in native format (e.g. SQLite) This is the ultimate in data portability. I think Dolt [0] might be good for this, because its git-like approach gives you push/pull syncing as well as diffing. That would make it easy for users to keep a local copy of the data up to date.

Third, you can start to support self-hosting and perhaps even open-source the primary user-facing application. The hosted service sells convenience and features enabled by the privacy-respecting data warehouse.

The big questions, of course, are many:

- Would users pay for this?

- Does increased development cost and reduced velocity outweigh the privacy benefits?

- Would the open-source component enable clones that undermine your business, or attract new users who may eventually upgrade to your paid service?

I would like to find out the answers!

[0] https://github.com/dolthub/dolt


One of the interesting side-effects, to me, with respect to what you mention, is that designing things this way prevents your from accidentally building solutions that are hard to self-host. The boundary between "per-user" or "per-tenant" vs "site wide" becomes very sharp because it becomes a choice of where the data is stored, so it's always obvious when you're stepping across that boundary.


Re #3: At my former B2B SaaS, each customer had their own MySQL schema. We allowed users to perform a full mysqldump of their schema as a form of backup. We found that, for us, the database schema alone wasn’t enough for anyone to straight up copy our product. The magic was in the business logic code which was closed-source.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: