I'm not sure what I should be impressed by. Maybe there's some real technical feat happening here, but I feel like a basic mad-libs style algorithm could produce something better.
I'm not very familiar with mad-libs so correct me if I was wrong. I think generating a lyrics passage (zero hard-coded rule on content or grammar or anything) from an image would not be something you can do with mad-libs.