Count distinct performance compared on top 4 SQL databases

interactive_rep · on Jan 27, 2014

One of the reasons you are seeing that the subquery runs twice as fast on most of the databases ( like SQL server, oracle & MySQL), is that it is able to execute the query with one pass through of the data. When it is taking twice as long (without the subquery), it is doing a second pass for the count distinct.

dccoolgai · on Jan 27, 2014

Crazy to see those numbers on old MSSQL...there's still a whole bunch of stuff they're behind the curve on (json support / array support), but its tough to argue with those benchmarks...

interactive_rep · on Jan 27, 2014

Another reason postgres could be running very slow, is that it may be spooling the intermediate result to a temp file on hard disk.

Do you see a performance increase when using SSD Drives?

hglaser · on Jan 27, 2014

We didn't try SSD drives. It's a good idea, worth a shot for a future post.

stock_toaster · on Jan 28, 2014

You ran this on RDS? (mentioned in blog post)

I would be worried about external factors, nonstandard/non-default configs, and other such things impacting your tests.

AlterEgo20 · on Jan 28, 2014

Did you use default postgres configuration? By default postgres uses very small amount of RAM and is forced to store hash maps on disk -> very significant slowdown.

toong · on Jan 28, 2014

See related discussion https://news.ycombinator.com/item?id=7114310