It is just very hard to measuer real things. One program is not like the other. How do you want to measure success in a software project, where the result is actually not even clear to the customer?
The real problem with measuring real things is that real things have infinite dimensions. You can only measure along a limited number of dimensions, and if you want to make a ranking you can only directly compare one (scalar) dimension at a time.
So in the end you will always get overoptimization over those dimensions you are measuring, to the detriment of the other dimensions. I don't think there is a better solution than continuously changing the dimensions of measurement.