A C standard proposal exists for tail call [0], in the form of "return goto (exp...

haberman · 2024-08-19T16:30:18 1724085018

Thanks for the pointer, I'll have to check this out.

Can you elaborate on how "return goto" is easier to implement? [[musttail]] also ends the lifetime of local objects AFAICS.

One thing I'll mention from a quick scan:

> [The] function called in tail position must have identical type to the callee. This ensures both that the return value does not require any conversion, and also that argument passing space is available and calling convention (if relevant) is maintained.

One complaint I've seen repeatedly about [[musttail]] (which I implemented in Clang) is that this constraint is unnecessarily strict, since some architectures will allow tail calls even for functions that do not perfectly match: https://github.com/llvm/llvm-project/issues/54964

"But then the code would be nonportable." True, but tail call optimization is inherently nonportable, since some targets fundamentally do not support tail call optimization (eg. WASM without the tail call extension).

fuhsnn · 2024-08-19T19:25:35 1724095535

>[[musttail]] also ends the lifetime of local objects AFAICS.

That's good to know. I had this github issue [0] in the back of my mind, as well as witnessing occasions of clang turning [[musttail]] into inner loops, and concluded clang's implementation must be more sophisticated than simply replacing calls with jumps. Just a little paranoia from trying to be serious with compiler dev[1]: fulfilling a laid-out spec feels more sound versus imitating something out there.

>this constraint is unnecessarily strict

I would agree, at least for x86 psABI, it can be pretty elaborative as long as the return value is the same register and argument stack don't exceed what's provided. Debug/profiling side might hate it, though.

[0] https://github.com/llvm/llvm-project/issues/72555 [1] https://github.com/fuhsnn/slimcc/

haberman · 2024-08-19T20:08:13 1724098093

I certainly understand your caution. I don't have intimate expertise with the implementation of musttail in the backend -- when I implemented musttail in Clang, it was piggybacking on an existing attribute from the LLVM IR: https://llvm.org/docs/LangRef.html#call-instruction

That said, my rough understanding is that a tail call ends the lifetime of all objects in the old stack frame. It follows that it is UB to access any objects from the previous stack frame after a tail call, and that would include Gerben's first example in https://github.com/llvm/llvm-project/issues/72555

Your slimcc project looks really interesting, thanks for the pointer.

tines · 2024-08-19T17:03:08 1724086988

> some targets fundamentally do not support tail call optimization

Can't any tail call be rewritten as a loop? Couldn't a WASM compiler without the tail call extension implement it this way?

keithwinstein · 2024-08-19T18:26:52 1724092012

Yes, wasm2c implements the Wasm tail-call feature with trampolines, exactly this way. (https://github.com/WebAssembly/wabt/blob/main/test/wasm2c/ta... has an example.)

Doing it with a trampoline is probably slower than if C really had tail calls. On the other hand, adding "real" tail calls to C would probably require changing the ABI (e.g. to "tailcc" or "fastcc -tailcallopt"), and I think there's some reason to think this would probably impose a penalty everywhere (https://llvm.org/docs/CodeGenerator.html#tail-call-optimizat...).

haberman · 2024-08-19T18:40:13 1724092813

> On the other hand, adding "real" tail calls to C would probably require changing the ABI (e.g. to "tailcc" or "fastcc -tailcallopt")

But [[musttail]] does exactly this while respecting existing calling conventions: https://clang.llvm.org/docs/AttributeReference.html#musttail

keithwinstein · 2024-08-19T21:05:30 1724101530

No -- as discussed upthread, clang's musttail attribute requires the target function to have the same number of arguments as the caller and for each argument to be similar to the corresponding caller argument. That's stricter than the underlying LLVM musttail marker (when targeting the tailcc/swifttailcc calling conventions) and is too restrictive to implement Wasm's tail-call feature (and probably Scheme's, etc.), at least if arguments are getting passed to functions natively.

It would be nice if the more relaxed rules of the LLVM musttail marker with tailcc could be exposed in clang (and gcc). I think that's basically what "return goto" would do.

slaymaker1907 · 2024-08-19T17:56:31 1724090191

It can't be rewritten as loop due to function pointers. Using JS notation to avoid noise:

function logAndCall(statement, func) { console.log(statement); return func(); }

Tail call optimization is actually possible here since we call func in "tail position". It might be unlikely to blow up the stack, but it can definitely happen if you do a lot of continuation passing.

Perhaps more commonly for C++/Rust, tail call optimization would be enormously valuable to have behind the scenes for destructors. It's actually very difficult to implement a linked list in safe Rust that doesn't explode the stack for large lists since no TCO is done for destroying subobjects. You can still avoid stack overflow, but you have to do things like manually enumerating the list.

derefr · 2024-08-19T19:21:25 1724095285

I don't get the problem. The generated code for your example would return the same function pointer + evaluated args list to the wrapping trampoline (i.e. the code that contains the loop), where that loop expects each thing it invokes to return a sum type NextInvoke(func, args) | Return(value).

And that's not a special case for passed-in function pointers; that's what a trampoline always does. Trampolines expect static-invoked tail-calls to be codegen'ed into function-pointer references too.

slaymaker1907 · 2024-08-20T17:05:57 1724173557

A trampoline is less efficient theoretically because it implies that you must check a condition and that you can't make an unconditional tail-call. It's also not quite equivalent since it's a non-local compilation technique requiring the caller to do something differently.

It's completely different than the story with tail recursion which can be essentially reduced to syntactic sugar over a normal loop (just change the parameters to be mutable variables).

metadat · 2024-08-19T17:08:17 1724087297

Only with the aforementioned escape analysis. The function call stack frames serve a critical purpose in most nontrivial logic.

tonyg · 2024-08-19T17:25:56 1724088356

> Can't any tail call be rewritten as a loop?

No. In general tail calls cannot be rewritten into loops.

cobbal · 2024-08-19T18:11:54 1724091114

More specifically, tail recursion is usually easy to turn into a loop. Tail calls can be difficult to turn into loops when they call a different function, or worse a function passed in as a variable.

eru · 2024-08-20T05:23:24 1724131404

To give the standard example:

Consider a state machine where each state is a function, and you transition into a different state by tail-calling another function.

State machines can have arbitrarily complicated graphs, that would be hard to put into a simple loop.

(However, you can do something like a 'trampoline' to remove the mutual recursion.)

nxobject · 2024-08-19T16:19:59 1724084399

As an aside, I’m excited - but also with lots of trepidation - about the new energy in adding new functionality to C. Excited because there are changes and additions that are crying to be made, even clarifying ideas… trepidation because C++’s gung ho update cadence has somehow ended up in wart patched over wart. Especially when feature reacts with feature in an unfelicitous way far earlier than anticipated. I hope the standards process finds a way to be very conservative, really thoroughly test features in large and diverse codebases, rather than just relying on rationales alone, when choosing to include feature.

pjmlp · 2024-08-19T19:49:02 1724096942

What new energy? We have had C89, C99, C11, C17, C23

And yet, no fix in sight for proper strings, arrays, or at very least bounds checked slices, like Dennis Ritchie originally proposed to ANSI/ISO.