Wow. As someone with a background of mostly C/Asm, it is rather shocking to see the amount of extra code compiling higher-level-languages produces; summing the contents of a list (array or linked) shouldn't be more than half a dozen instructions in a loop... and then I see things like dynamic allocation. Seriously, wow.
This is not inherent in the process. What you're seeing is the confluence of several factors:
(1) Scheme is dynamically typed, not statically, so some operations need to operate on objects whose properties are not known at compile time and which need to be constructed/wrapped. The same does not necessarily hold for a statically typed language.
(2) C is not really designed to be the target language of a compilation process. In particular, there's no portable access to stack frames, so any mechanism that inspects the stack (in this case, for precise garbage collection) needs to recreate the necessary machinery. If you target a representation that is actually designed for this purpose (such as LLVM), a lot of these problems go away. Likewise, C does not have native support for continuations, tail calls, or a number of other mechanisms that more sophisticated backends do support.
(3) The compiler does not need to generate human readable code; code generation that targets C is generally not concerned with producing minimal output, since the C optimizer will strip away superfluous stuff, so the focus is more on keeping the backend simple.
> so some operations need to operate on objects whose properties are not known at compile time and which need to be constructed/wrapped
But in the given example, shouldn't the compiler be able to see, via a simple analysis, that this function is only being applied to lists of numbers?
> since the C optimizer will strip away superfluous stuff
That's a huge assumption; in practice I have not seen any optimisation that would take that Scheme-compiled C program and turn it into one a C programmer would write (if there was, I think it could certainly make Scheme a lot more popular), and I've been looking at compiler output for over two decades now. There has been improvement but not that much. There's also the question "why even generate 'superfluous stuff' if it's going to disappear anyway?"
To paraphrase an old saying, "with great flexibility comes great complexity."
> But in the given example, shouldn't the compiler be able to see, via a simple analysis, that this function is only being applied to lists of numbers?
No, that's the beauty and curse of dynamic languages such as the Scheme mentioned here. Of course, you could do special optimizations, such as inlining bindings that never get used out of scope or defining special functions that don't operate on real Lisp data structures but plain C structures, but they add extra complexity and duplicated logic to the compiler. I'm sure those kind of optimizations would be out of scope of this article.
> in practice I have not seen any optimisation that would take that Scheme-compiled C program and turn it into one a C programmer would write
Obviously Scheme code is going to look different than C code, they're different languages. I think the point being made was that it's okay to emit some C code that's a little extra verbose, doing things like defining extra vars and such, knowing that the compiler will simplify many such things.
> There's also the question "why even generate 'superfluous stuff' if it's going to disappear anyway?"
To keep the compiler's implementation simple and understandable. The more complex output it has to produce, the harder it is to read/write/debug.
I am not talking about optimizations that require understandings of the source language. But, for a simple example, a Pascal-ish language may have the statement
x := (a+b)*(c+d)
compile into something like:
_t1 = a + b;
_t2 = c + d;
_t3 = _t1 * _t2;
x = _t3;
in order to simplify translation from an internal representation such as an AST. Putting such a piece of code back into human readable form makes the backend more complicated, while C compiler know how to optimize away the extraneous variables well.
Note that this is just one of many things that contribute to the verbosity of compiler output, not even necessarily a major one in this case.
But what if you want to reduce the list with something other than a sum ? with this substrate (and closures) you can separate concerns, increase reuse and reduce bugs.
In Scheme, tail-call optimization is part of the standard. Without this requirement the generated code would probably be much more similar to what you'd write by hand.
On top of this, the generated code is said to be portable C.
I've skimmed through the papers mentioned in the article years ago, but gave up understanding them. This article is golden if I really want to understand what's going on in a real-world use of the techniques described in those papers.
I respect a person's taste being tuned to minimizing number of code lines, so the asm output is like a haiku. But I hope others respect another person's taste that happens to be tuned to appreciated doing some of the things that can be done in such a system. (After all, in 2014 the devices we have in front of us no longer make the case for minimal asm quite so compelling.)