Hacker News new | past | comments | ask | show | jobs | submit login

I thought the bigger advantage was that you could do sequential-only I/O on projections?

Or are you saying that you're only CPU-bottlenecked?




> I thought the bigger advantage was that you could do sequential-only I/O on projections?

Not quite following - a column store will usually not have more sequential IO than a row-store. Often enough to the contrary, because you have to combine column[-groups], for some queries. What you get is: Higher compression ratios, better IO & cache access patterns for filter-heavy queries, easier to vectorize computations. Especially if you either filter heavily or aggregate only a few rows, you can do a lot less overall IO in total, but the sequential-ness doesn't really improve.

> Or are you saying that you're only CPU-bottlenecked?

Oftentimes, yes. You might be storage space constrained, but storage speeds for individual sequential-IO type queries are usually fast enough. Parallelism helps with that (if you can push down enough work, a lot of it added in 9.6 & 10), plain old code optimizations (better hash-tables, new expression evaluation framework, both in 10), as does JITing parts of the query processing (WIP, patches posted for 11).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: