Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Same Query, Different Hash: Snowflake's query hashing blind spots (twingdata.com)
3 points by dangoldin 11 months ago | hide | past | favorite | 3 comments


Really interesting! Could you clarify what difference having two different query_parameterized_hash for similar queries is? Is there a performance hit?


Yea - the idea is that Snowflake will generate these after a query runs in order to help you look at multiple runs of the same query. So imagine you run a query that's "select a from b where c = 1" and you want to find all examples of that query running. That's where "query_hash" comes in. But Snowflake also says well what if we let you be generic about the parameters - so "where c=1" and "where c=2" and "where c=300000" all have the same query_parameterized_hash.

That's the intent but turns out it's only doing a very simple hashing and not actually looking at the canonical version of the query. For example it won't treat aliases/renames as the same even though it should. This makes it harder to look at all queries that are in essence doing the same thing.


Oh that's really interesting! I imagine there could be a reason for it, for instance the data is distributed differently in the micropartitions so different where values could result in different data lookup patterns as you may skip more/less blocks. But overall this makes a lot of sense!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: