> Peeling off the front 8 or 16 elements should be no harder than peeling off th...

lmm · on May 10, 2017

Ultimately what we want is a representation that makes it easy to understand the performance characteristics of a given piece of code (and I'd agree that this is an area where the purist functional tradition has lagged behind). In the past assembly was this, but I don't know that it can be these days; I've read that on modern hardware, the biggest factor affecting performance is cache efficiency. So even seeing the sequence of machine instructions might not be enough to allow a developer to reason about performance, because two similar-looking sequences of CPU instructions can easily end up having very different performance characteristics (branch prediction can have a similar effect AIUI).

In the long term I hope for a "nice" language (in particular one in which checked correctness is easy) with explicit performance semantics along the lines of http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf . But yeah under the current state of the art there is no ideal approach, so there's a tradeoff in practice even though I don't think there should be one in theory.