I’m building our companies first data platform right now (fivetran, dbt, snowflake), so I’ll definitely check this out!!
1) Do you have Metabase on your roadmap? Lightdash?
2) I see that you alert on schema changes, which is great. Can you monitor for schema changes of a Postgres database? Reason I ask: Fivetran (and others) will try to buffer some schema changes from you to prevent data loss (drop columns, rename columns, etc). There is some more complex nuance I have in mind here, but it’s a bit too long to type out on my phone, :)
1) An integration with Metabase Cloud is on our roadmap for Q1! We'd love to integrate with Lightdash, but they don't have a public API just yet[1].
2) Several of our customers use us to alert on schema changes in Postgres, specifically so they can get ahead of application database changes that will end up in the warehouse, so you're definitely not alone! Here's a link on how to connect postgres: https://docs.metaplane.dev/docs/postgres
That's an excellent stack and one we kept front and center when building out Metaplane, so definitely let us know if you have any feedback or suggestions here!
My plan was to monitor the postgres database in the staging environment, so we can be alerted to schema changes before they are released into production (and hopefully stop the production deploy).
I have a goal of moving this even further upstream into the CI build for the source application itself (Ruby on Rails in this case), so that the application's test suite will fail a developer introduces a breaking schema change. Note: this is a pretty tricky problem to solve without a) the tests being way too brittle OR b) super slow end to end tests. I have some goals of introducing which is a mashup of: Spectacles [1], Pact [2], and dbt models [3].
That sounds like a great plan. We're planning to build our public API and CI/CD integrations early next year, so that developers can know what the downstream impact of their changes might be, and whether it could introduce unexpected results. We may be able to slot right in there with Pact.
Mitigating the impact with monitoring is where we're at right now, but we're with you that preventing errors can be even more important.
If it's interesting to you, we're happy to open up a shared slack channel to dig into the nuance as well! Just email me (guru@metaplane.dev) with the email you'd like to be added.
When Nick Schrock created dagster, he argued that many "data cleaning" tasks which people attribute to "data engineering" aren't actually "cleaning", but are architecture problems. I believe schema changes also fall into this category. I'm extremely new to data engineering, but when I think about "What are the things which will break this system?" an application engineer thinking "I'm going to rename this column and my tests pass, so this should be fine" will break things all the time. (Similar goes for dropping a column, changing a one-to-many into a many-to-many)
1) Do you have Metabase on your roadmap? Lightdash?
2) I see that you alert on schema changes, which is great. Can you monitor for schema changes of a Postgres database? Reason I ask: Fivetran (and others) will try to buffer some schema changes from you to prevent data loss (drop columns, rename columns, etc). There is some more complex nuance I have in mind here, but it’s a bit too long to type out on my phone, :)