They didn’t say the quiet part out loud, which is almost certainly that the Fivetran and Snowflake bills for what they were doing were probably enormous and those were undoubtedly what got management’s attention about fixing this.
Snowflake as destination is very very easy to work with on fivetran. Fivetran didn't have S3 as destination till late 2022. So it literally forces you to use one of BQ, Snowflake, redshift as destination.
So fivetran CEO's defence is pretty stupid.
> Moving several large, crucial Postgres datasets (some of them tens of TB large) to data lake gave us a net savings of over a million dollars for 2022 and proportionally higher savings in 2023 and 2024.
I thought the quiet part was that they are data mining their customer data (and disclosing it to multiple third parties) because it’s not E2EE and they can read everyone’s private and proprietary notes.
Otherwise, this is the perfect app for sharding/horizontal scalability. Your notes don’t need to be queried or joined with anyone else’s notes.
Also whether this data lake is worth the costs/effort. How does this data lake add value to the user experience? What is this “AI” stuff that this data lake enables?
For example, they mention search. But i imagine it is just searching only within your own docs. Which i presume should be fast and efficient if everything is sharded by user in Postgres.
The tech stuff is all fine and good, but if it adds no value, its just playing with technology for technology sakes
I too was surprised to read that they were syncing what reads, at a glance, to be their entire database into the data lake. IIUC the reason that Snowflake prioritizes inserts over updates is because you're supposed to stream events derived from your data, not the data itself.