They didn’t say the quiet part out loud, which is almost certainly that the Five...

rorymalcolm · 2024-07-14T15:37:03 1720971423

Found this comment (from Fivetran's CEO, so, with that in mind) regarding this article enlightening regarding the costs they were facing here https://twitter.com/frasergeorgew/status/1808326803796512865

pbd · 2024-07-15T03:31:19 1721014279

Snowflake as destination is very very easy to work with on fivetran. Fivetran didn't have S3 as destination till late 2022. So it literally forces you to use one of BQ, Snowflake, redshift as destination. So fivetran CEO's defence is pretty stupid.

mritchie712 · 2024-07-14T18:30:13 1720981813

They weren't that quiet about it:

> Moving several large, crucial Postgres datasets (some of them tens of TB large) to data lake gave us a net savings of over a million dollars for 2022 and proportionally higher savings in 2023 and 2024.

patrickmay · 2024-07-14T20:28:56 1720988936

I'd like to see more details. 10s of TB isn't that large -- why so expensive?

shrikant · 2024-07-15T08:07:02 1721030822

Fivetran charges by "monthly active rows", which quickly adds up when you have hundreds of millions to billions of rows that are constantly changing.

https://fivetran.com/docs/usage-based-pricing

mritchie712 · 2024-07-15T11:41:39 1721043699

yep, and Notion's data model is really bad for this pricing. Almost every line you type is a "block" which is a new row in their database.

aabhay · 2024-07-15T03:48:35 1721015315

They’re likely paying for egress from the databases as well.

tomrod · 2024-07-15T04:47:48 1721018868

DBA salaries, maybe?

riku_iki · 2024-07-14T22:46:29 1720997189

Maybe cloud hosted

sneak · 2024-07-14T16:52:57 1720975977

I thought the quiet part was that they are data mining their customer data (and disclosing it to multiple third parties) because it’s not E2EE and they can read everyone’s private and proprietary notes.

Otherwise, this is the perfect app for sharding/horizontal scalability. Your notes don’t need to be queried or joined with anyone else’s notes.

altdataseller · 2024-07-14T20:24:40 1720988680

Also whether this data lake is worth the costs/effort. How does this data lake add value to the user experience? What is this “AI” stuff that this data lake enables?

For example, they mention search. But i imagine it is just searching only within your own docs. Which i presume should be fast and efficient if everything is sharded by user in Postgres.

The tech stuff is all fine and good, but if it adds no value, its just playing with technology for technology sakes

wearhere · 2024-07-15T01:58:03 1721008683

I too was surprised to read that they were syncing what reads, at a glance, to be their entire database into the data lake. IIUC the reason that Snowflake prioritizes inserts over updates is because you're supposed to stream events derived from your data, not the data itself.

redpoint · 2024-07-14T17:01:20 1720976480

This ^. This switch from managed to in house is a good example of only building when necessary.