the EN song data is dense in the sense that there is far more "columns" than row...

the EN song data is dense in the sense that there is far more "columns" than rows in almost any bulk analysis -- average song unpacks to ~2000 segments, each with ~30 coefficients + global features.

however, in paul's case here he's really just using MR as a quick way to do a parallel computation on many machines. There's no reduce step, it's just taking a single average from each individual song and not correlating anything or using any inter-song statistics.