In practice, probably not. In theory, there's no reason one couldn't compile a l...

mrkgnao · on May 9, 2017

GHC already does a lot of loop fusion and optimizing away of linked lists (which is good given the unfortunate overdue of cons-lists over nicer things among Haskell users) which can be extended to work with user code using rewrite rules. Libraries like text, bytestring, and vector do exactly the elision of intermediate structures that you talk about, allowing you to convert from a vector to a list, apply some function that annoyingly only works with lists, and convert back into a vector, and have that compile into a single tight for-loop in many cases.

I wonder what the upcoming linear types work will do.

burntsushi · on May 9, 2017

> Peeling off the front 8 or 16 elements should be no harder than peeling off the front 1, though naively you might have to handle each partial case from 1 to 15.

I think I under-specified my requirement. :-) By "16 bytes at a time," I mean, "run a single CPU instruction on those 16 bytes."

But yeah, I get your drift. I can see how it might be theoretically possible. I suppose the key gains might be in how much confidence a programmer can have that their code compiles down to the right set of instructions.

lmm · on May 10, 2017

Ultimately what we want is a representation that makes it easy to understand the performance characteristics of a given piece of code (and I'd agree that this is an area where the purist functional tradition has lagged behind). In the past assembly was this, but I don't know that it can be these days; I've read that on modern hardware, the biggest factor affecting performance is cache efficiency. So even seeing the sequence of machine instructions might not be enough to allow a developer to reason about performance, because two similar-looking sequences of CPU instructions can easily end up having very different performance characteristics (branch prediction can have a similar effect AIUI).

In the long term I hope for a "nice" language (in particular one in which checked correctness is easy) with explicit performance semantics along the lines of http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf . But yeah under the current state of the art there is no ideal approach, so there's a tradeoff in practice even though I don't think there should be one in theory.