That is about it. Only need to add the actual data quality that matters, too: I often get a ton of junk to work with, which is partially useless. And the difficulty isn't just "algebraic debugging", but embedding the whole pipeline in a way that won't blow up or grind to a halt the it's used in a production environment, especially during peak loads when "everybody is looking" or when a new semantic event type happens.