Hi all—I'm the EM for the Search team at Notion, and I want to chime in to clear...

jzelinskie · 2024-07-15T05:17:37 1721020657

This is a fantastic post that explains a lot of the end product, but I'd love to hear more about the journey specifically on denormalizing permissions at Notion. Scaling out authorization logic like this is actually very under-documented in industry. Mind if I email you to chat?

Full disclosure: I'm a founder of authzed (W21), the company building SpiceDB, an open source project inspired by Google's internal scalable authorization system. We offer a product that streams changes to fully denormalized permissions for search engines to consume, but I'm not trying to pitch; you just don't often hear about other solutions built in this space!

atak1 · 2024-07-16T20:37:57 1721162277

Curious - what do you guys use for the T step of your ELT? With nested blocks 12 layers deep, I can imagine it gets complicated to try to de-normalize using regular SQL.

Have you explored a pattern like https://runtrellis.com or https://unstructured.io/ for unnesting?

mritchie712 · 2024-07-15T13:10:41 1721049041

Hey! While you're here...

> Iceberg and Delta Lake, on the other hand, weren’t optimized for our update-heavy workload when we considered them in 2022

Curious about your thoughts here. Have you followed Icebergs progress? Do you think it'd be a tougher decision in 2024 between Hudi and Iceberg?

infogulch · 2024-07-15T09:18:54 1721035134

Interesting! Now I'm curious how you handle live permission changes and indexes with stale permission data.

jitl · 2024-07-15T13:50:28 1721051428

(I’m not on the search team, but I did write some search stuff back in 2019, explanation may be outdated)

The blocks (pages are a block) in Notion are a big tree, with your workspace at the root. Some attributes of blocks affect the search index of their recursive children, like permissions: granting access to a page grants access to its recursive child blocks.

When you change permissions, we kick off an online recursive reindex job for that page and its recursive subpages. While the job is running, the index has stale entries with outdated permissions.

When you search, we query the index for pages matching your query that you have to. Because the index permissions can be stale, we also reload the result set from Postgres and apply our normal online server-side permission checks to filter out pages you lost access to but that have stale permissions in the index.

infogulch · 2024-07-15T19:00:56 1721070056

Neat, thanks for sharing!