Ask HN: Why do you use a data lake instead of a data warehouse?

tobyhede · on Jan 13, 2020

You probably have both.

"Raw" data lands in the lake S3 or similar object store. Processing pipelines take data, slice and combine, and push data back to the lake. Data in the lake will vary from the truly raw like log files to highly processed. Keeping the raw analogy we could say it's like the difference between a cow carcass and a carefully butchered filet mignon.

But you nearly always need a way to query and analyse the raw data, so you have an engine sitting on top like snowflake. Engines might help drive your processing as its much easier to write pipelines against. But also allow people to access at a higher level of abstraction.

It's not an either or.