And if you want to join data from different SaaS and internal systems e.g. Googl...

peterthehacker · on June 1, 2021

There are integration companies that solve this specific use-case. I’ve used Fivetran [0] and highly recommend it. They will extract-load data from your SaaS to your warehouse and your data scientists can run SQL against the tables. Their most popular warehouses are Redshift and Snowflake. So you can still use a centralized data warehouse without dedicating internal resources to the integrations.

[0] https://fivetran.com/

ak217 · on June 2, 2021

What I find amazing is that Fivetran is a bunch of glue code to forward data between different APIs and database formats, and it's legitimately useful, in part because when the upstream API breaks they go and fix the connectors for you instead of you having to deal with the resulting emergency... but it's only needed because data interchange standards are in such poor shape. If users demanded that SaaS products make data/event streams/replication logs available via robust and standardized APIs, a lot of the use cases for Fivetran would disappear.