Hacker News new | past | comments | ask | show | jobs | submit login

How do you end up with two identical SQL tables in the first place?



Consider owning an ETL system where users can create periodic jobs, like calculating a "daily active user" rollup table from a bunch of event source tables. Now consider being asked to switch the query execution engine from A to B, perhaps because engine B is much more cost-effective for your system's query patterns. Ex: You are tasked with switching from MapReduce to Tez, from Presto to Spark, etc. Before full migration, you can reduce risk by double-writing jobs, with engine A writing to the standard output location, and engine B writing to some other migration location. This should lead to two identical SQL tables, which you should then verify, because sometimes these execution engines have bugs hiding in the corners :)


I immediately considered it for comparison of backups.


The sql in the article is more like brian gymnastics.

It is interesting, but not something you should use. It scales horribly with number of columns


Good point


or a host migration.


Regression testing.


grading homework




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: