Hacker News new | past | comments | ask | show | jobs | submit login

> Peeling off the front 8 or 16 elements should be no harder than peeling off the front 1, though naively you might have to handle each partial case from 1 to 15.

I think I under-specified my requirement. :-) By "16 bytes at a time," I mean, "run a single CPU instruction on those 16 bytes."

But yeah, I get your drift. I can see how it might be theoretically possible. I suppose the key gains might be in how much confidence a programmer can have that their code compiles down to the right set of instructions.




Ultimately what we want is a representation that makes it easy to understand the performance characteristics of a given piece of code (and I'd agree that this is an area where the purist functional tradition has lagged behind). In the past assembly was this, but I don't know that it can be these days; I've read that on modern hardware, the biggest factor affecting performance is cache efficiency. So even seeing the sequence of machine instructions might not be enough to allow a developer to reason about performance, because two similar-looking sequences of CPU instructions can easily end up having very different performance characteristics (branch prediction can have a similar effect AIUI).

In the long term I hope for a "nice" language (in particular one in which checked correctness is easy) with explicit performance semantics along the lines of http://www.cs.cmu.edu/~rwh/papers/iolambda/short.pdf . But yeah under the current state of the art there is no ideal approach, so there's a tradeoff in practice even though I don't think there should be one in theory.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: