I agree that on some level more data sets would be nice, but I felt that it clut... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

lmcinnes on May 2, 2016 | parent | context | favorite | on: Comparing Clustering Algorithms

I agree that on some level more data sets would be nice, but I felt that it cluttered and obscured the exposition. Instead I used the one synthetic dataset, but crafted in to have various properties (noise, cluster shape, variable density, non-standard distributions) that will confound many different clustering approaches ... it is meant to be the "hard" case that with all the difficulties and confounding factors rolled into one dataset.

mattnedrich on May 2, 2016 [–]

Cool, I think you did a great job. Do you have run time data for each algorithm on that data set?

lmcinnes on May 2, 2016 | [–]

It's included in the upper left corner of the plots. To be fair, these are for the sklearn implementations, some of which are excellent, but I can't speak for the performance of all of them.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact