Are column-store databases relevant on SSD/NVME? I ask because on a physical med...

azundo · on Nov 11, 2017

Columnar stores are as much about the compression benefits as the physical layout on disk. The article goes through a bunch of different relevant compression strategies.

geocar · on Nov 11, 2017

> Are column-store databases relevant on SSD/NVME?

Yes. SSD generally has lower latency for responses.

While sending data (throughput) is similar, the operating system doesn't ask for all of the blocks of a file at once -- even if you read(fd,buf,size) -- because other processes might ask for other blocks in the meantime. An IO schedular is making decisions, and that latency helps turn around those decisions faster.

> I ask because on a physical medium like hard disk, storing data on physical disk in column orientation can make a significant improvement to read operations.

I'm not really sure this is true. Hard Drives have (for over a decade, probably longer) had logic that lies about the physical layout of the disk to the point where all I can believe about the linear block address is that the circuitry "believes" that likelihood software will ask for the next linear block address is higher than any other one.

Further that, I imagine SSDs can probably make similar optimisations.

The reason column-orientation helps is that it reduces the volume of data that needs reading. If you have a table with 100 columns in it, but a query that operates on 2, then a column-oriented database needs to read 2 things, while the row-oriented database either reads 100 things, or it interleaves reads of 2 things with skips of 98 things.

It isn't difficult to believe that the circuitry needed to handle the former will outpace the circuitry needed to handle the latter for a long time.

SamReidHughes · on Nov 13, 2017

Hard drives exhibit the straightforward performance that you'd expect from block addresses linearly increasing around the circle according to all the experiments I've done.

londons_explore · on Nov 11, 2017

Even SSD's are super slow compared to RAM. When you want to read a few bytes from millions of rows, an SSD has to decode an entire block of data for every read.

Also, even with NVMe SSD's, there is a lot of operating system overhead associated with every read. Having layers of drivers to orchestrate the transfer of 2 bytes of data you wanted really slows it down...