Hacker News new | past | comments | ask | show | jobs | submit login

Can anyone clarify if song data is dense? If it is dense, I am not even sure if Mapreduce is the right paradigm to use mainly because you will eventually get to a situation where transfer time overwhelms compute time.



the EN song data is dense in the sense that there is far more "columns" than rows in almost any bulk analysis -- average song unpacks to ~2000 segments, each with ~30 coefficients + global features.

however, in paul's case here he's really just using MR as a quick way to do a parallel computation on many machines. There's no reduce step, it's just taking a single average from each individual song and not correlating anything or using any inter-song statistics.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: