Hacker News new | past | comments | ask | show | jobs | submit login

I agree that on some level more data sets would be nice, but I felt that it cluttered and obscured the exposition. Instead I used the one synthetic dataset, but crafted in to have various properties (noise, cluster shape, variable density, non-standard distributions) that will confound many different clustering approaches ... it is meant to be the "hard" case that with all the difficulties and confounding factors rolled into one dataset.



Cool, I think you did a great job. Do you have run time data for each algorithm on that data set?


It's included in the upper left corner of the plots. To be fair, these are for the sklearn implementations, some of which are excellent, but I can't speak for the performance of all of them.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: