Hacker News new | past | comments | ask | show | jobs | submit login
Count distinct performance compared on top 4 SQL databases (periscope.io)
31 points by hglaser on Jan 27, 2014 | hide | past | favorite | 7 comments



One of the reasons you are seeing that the subquery runs twice as fast on most of the databases ( like SQL server, oracle & MySQL), is that it is able to execute the query with one pass through of the data. When it is taking twice as long (without the subquery), it is doing a second pass for the count distinct.


Crazy to see those numbers on old MSSQL...there's still a whole bunch of stuff they're behind the curve on (json support / array support), but its tough to argue with those benchmarks...


Another reason postgres could be running very slow, is that it may be spooling the intermediate result to a temp file on hard disk.

Do you see a performance increase when using SSD Drives?


We didn't try SSD drives. It's a good idea, worth a shot for a future post.


You ran this on RDS? (mentioned in blog post)

I would be worried about external factors, nonstandard/non-default configs, and other such things impacting your tests.


Did you use default postgres configuration? By default postgres uses very small amount of RAM and is forced to store hash maps on disk -> very significant slowdown.





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: