Hacker News new | past | comments | ask | show | jobs | submit login

They didn’t say the quiet part out loud, which is almost certainly that the Fivetran and Snowflake bills for what they were doing were probably enormous and those were undoubtedly what got management’s attention about fixing this.



Found this comment (from Fivetran's CEO, so, with that in mind) regarding this article enlightening regarding the costs they were facing here https://twitter.com/frasergeorgew/status/1808326803796512865


Snowflake as destination is very very easy to work with on fivetran. Fivetran didn't have S3 as destination till late 2022. So it literally forces you to use one of BQ, Snowflake, redshift as destination. So fivetran CEO's defence is pretty stupid.


They weren't that quiet about it:

> Moving several large, crucial Postgres datasets (some of them tens of TB large) to data lake gave us a net savings of over a million dollars for 2022 and proportionally higher savings in 2023 and 2024.


I'd like to see more details. 10s of TB isn't that large -- why so expensive?


Fivetran charges by "monthly active rows", which quickly adds up when you have hundreds of millions to billions of rows that are constantly changing.

https://fivetran.com/docs/usage-based-pricing


yep, and Notion's data model is really bad for this pricing. Almost every line you type is a "block" which is a new row in their database.


They’re likely paying for egress from the databases as well.


DBA salaries, maybe?


Maybe cloud hosted


I thought the quiet part was that they are data mining their customer data (and disclosing it to multiple third parties) because it’s not E2EE and they can read everyone’s private and proprietary notes.

Otherwise, this is the perfect app for sharding/horizontal scalability. Your notes don’t need to be queried or joined with anyone else’s notes.


Also whether this data lake is worth the costs/effort. How does this data lake add value to the user experience? What is this “AI” stuff that this data lake enables?

For example, they mention search. But i imagine it is just searching only within your own docs. Which i presume should be fast and efficient if everything is sharded by user in Postgres.

The tech stuff is all fine and good, but if it adds no value, its just playing with technology for technology sakes


I too was surprised to read that they were syncing what reads, at a glance, to be their entire database into the data lake. IIUC the reason that Snowflake prioritizes inserts over updates is because you're supposed to stream events derived from your data, not the data itself.


This ^. This switch from managed to in house is a good example of only building when necessary.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: