We looked at Parquet early on and it wasn’t competitive even with what we were u...

We looked at Parquet early on and it wasn’t competitive even with what we were using at the time.

And yeah, this really depends not just on the dataset, but also on how selective your queries are, what predicates and aggregations they employ, etc. A significant percentage of queries gets orders of magnitude faster. I can’t disclose how much faster things got on average, but it was a significant gain, way more than would be sufficient to offset the increased cost of encoding (which is another aspect people typically don’t consider), even considering that much of the data people encode is hardly ever touched.