Hm, regarding the tail-optimization slides, the example used doesn't look like a tail-recursive call. I just tried compiling the function he provided using gcc's -O1, 2 and 3, and neither produced a tail optimized call. I rewrote it to match what my idea of a tail-recursive call is, and sure enough, under -O2 and -O3, gcc tail-optimized it. It didn't however with no optimization settings or -O1. I rewrote it as follows:
long
fact(long x, long acc)
{
if (x <= 0) return acc;
return fact(x-1, acc*x);
}
(edit: the whole slide deck is summarized well on the last slide: "If you do an optimization, test it on real world data. If it’s not drastically faster but makes the code less readable: undo it.")
I think it depends on the version of gcc; I tried compiling his function with 3.4.5 and it didn't look tail-call-optimized, but with -Os on 4.2.3 it did.
You are right that it does not look like what is usually called "tail call", ie. call to 'fact' is not the last evaluated expression. However, this doesn't mean that the compiler can't optimize it, provided that the last expression is simple enough. GCC can probably optimize arithmetic expressions. I have read somewhere (can't find it though) that Erlang bytecode compiler also optimizes prepending to a list and some other primitive operations.
This means that in these cases you don't have to explicitly use accumulator variable because the compiler will create it for you.