At the scale of Notion, with millions of users, they’d have that much data. I’ve...

iLoveOncall · 2024-07-14T21:13:59 1720991639

The concern isn't the scale, it's the use. What is there to _process_ when they're supposed to only store and retrieve to show to users?

ctippett · 2024-07-15T01:32:13 1721007133

The data doesn't have to be the content of user's notes. Think of all the metadata they're likely collecting per user/notebook/interaction – the data's likely useful for things like flagging security events, calculating the graph of interconnected notes, indexing hashed content for search (or AI embeddings?) ... these are just a few use-cases that come to mind from the top of my head.

TeMPOraL · 2024-07-15T06:30:10 1721025010

Of which security and stability seems like the only reasonable use cases. Indexing content for search globally? Embeddings? They just can't help themselves, can they? All that juicy data, can't possibly leave it alone.

bastawhiz · 2024-07-15T02:46:58 1721011618

Great, you build only store and retrieve functionality. How:

1. Do you identify which types of content your users use the most?

2. Do you find users who are abusing your system?

3. Do you load and process data (even on a customer by customer basis) to fine tune models for the QA service that you offer as an optional upgrade? Especially when there could be gigabytes of data for a single customer

4. Identify corrupt data caused by a bug in your code that saves data to the db? You're not doing a full table scan over hundreds of billions of records across almost 500 logical shares in your production fleet

These are just the examples I came up with off the dome. The job of the business is to operate on the data. If you can't even query it, you can't operate on it. Running a business is far more than just being a dumb CRUD API.

fragmede · 2024-07-15T03:17:09 1721013429

Fwiw, you should able to answer #1 and #2 without hitting the main db if you've got good observability into your system.

bastawhiz · 2024-07-15T20:13:47 1721074427

Observability data comes from a drumroll database! Most analytics products that can answer these questions are just time series data warehouses.

fragmede · 2024-07-15T20:22:01 1721074921

a database, obviously, but are you really storing metrics and logs next to customer data in the same database, or did you skip over the part where I used the word “main”?