I thought the bigger advantage was that you could do sequential-only I/O on proj...

anarazel · on Sept 15, 2017

> I thought the bigger advantage was that you could do sequential-only I/O on projections?

Not quite following - a column store will usually not have more sequential IO than a row-store. Often enough to the contrary, because you have to combine column[-groups], for some queries. What you get is: Higher compression ratios, better IO & cache access patterns for filter-heavy queries, easier to vectorize computations. Especially if you either filter heavily or aggregate only a few rows, you can do a lot less overall IO in total, but the sequential-ness doesn't really improve.

> Or are you saying that you're only CPU-bottlenecked?

Oftentimes, yes. You might be storage space constrained, but storage speeds for individual sequential-IO type queries are usually fast enough. Parallelism helps with that (if you can push down enough work, a lot of it added in 9.6 & 10), plain old code optimizations (better hash-tables, new expression evaluation framework, both in 10), as does JITing parts of the query processing (WIP, patches posted for 11).