How do functions that not end in ret work?

mananaysiempre · 2025-01-21T04:40:56 1737434456

A function with an unlikely slowpath can easily end up arranged as

    top part
    jxx slow
    fast middle part
  end:
    bottom part
    ret
  slow:
    slow middle part
    jmp end

There may be more than one slow part, the slow parts might actually be exiled from inside a loop and not a simple linear code path and can themselves contain loops, etc. Play with __builtin_expect and objdump --visualize-jumps a bit and you’ll encounter many variations.

DSMan195276 · 2025-01-21T03:46:49 1737431209

In addition to what others said, I'd simply point out that all 'ret' does on x86 is pop an address off the top of the stack and jump to it. It's more of a "helper" than a special instruction and it's use is never required as long as you ensure the stack will be kept correct (such as with a tail-call situation).

dcrazy · 2025-01-21T05:11:44 1737436304

`ret` also updates the branch predictor’s shadow stack. Failing to balance `call` and `ret` can seriously impact performance.

dkersten · 2025-01-21T07:02:23 1737442943

If anyone else is looking for more information on this, like I was, this stack is called the “return stack buffer”.

DSMan195276 · 2025-01-21T14:48:22 1737470902

Right, I didn't want to get into it but definitely using 'ret' "properly" has big performance benefits. My point was just that it won't prevent your code from running, it's not like x86 will trigger an exception if they don't match up.

ack_complete · 2025-01-21T20:34:10 1737491650

RET does more these days. If Intel CET is enabled then it also updates the hardware shadow stack, and the program will crash if RET is bypassed unless the SSP is adjusted. IIRC Windows x64 also has pertinent requirements on how the function epilog restores registers and returns since it will trace portions of the instruction stream during stack unwinding.

duskwuff · 2025-01-21T02:40:02 1737427202

The return is somewhere before the end of the function, e.g.

  loop:
    do stuff
    if some condition: return
    do more stuff
    goto loop

Alternatively, the function might end with a tail-call to another function, written as an unconditional branch.

jcranmer · 2025-01-21T02:53:06 1737427986

There are things like compiling a tail call as JMP func_addr.

frogsRnice · 2025-01-21T03:36:55 1737430615

Would you not have to use a jump instead of call for it to be a tail call at all- ie otherwise a new frame is created on each call

nagaiaida · 2025-01-21T08:15:25 1737447325

the call is still in tail position whether or not it reuses the stack frame. there are also more involved ways to do tail call optimization than a direct single-jump compilation when you leave ret behind entirely, such as in forth-style threaded interpreters

frogsRnice · 2025-01-21T09:58:03 1737453483

I guess were talking about optimising tail recursion. Would there be any reason to refer to a tail call other than that optimisation?

I’ll do some reading on the latter part of your post, thank you!

biodniggnj · 2025-01-21T14:59:53 1737471593

You don’t need recursion to make use of tail call elimination. In Scheme and SML all tail calls are eliminated. GCC also does it, but less often. Still, it’s not recursion that triggers it.

nagaiaida · 2025-01-21T10:09:43 1737454183

i only meant that "optimized/eliminated tail call" is more useful terminology than an uneliminated tail call not counting as "a tail call". i find this distinction useful when discussing clojure, for instance, where you have to explicitly trampoline recursive tail calls and there is a difference between an eliminated tail call and a call in tail position which is eligible for TCO

i'm not sure how commonly tail calls are eliminated in other forthlikes at the ~runtime level since you can just do it at call time when you really need it by dropping from the return stack, but i find it nice to be able to not just pop the stack doing things naively. basically since exit is itself a threaded word you can simply¹ check if the current instruction precedes a call to exit and drop a return address

in case it's helpful this is the relevant bit from mine (which started off as a toy 64-bit port of jonesforth):

  .macro STEP                                                                             
    lodsq                                                                               
    jmp *(%rax)                                                                         
  .endm  

  INTERPRET:                                                                              
    mov (%rsi), %rcx                                                                    
    mov $EXIT, %rdx                                                                     
    lea 8(%rbp), %rbx                                                                   
    cmp %rcx, %rdx     # tail call?                                                     
    cmovz (%rbp), %rsi # if so, we                                                      
    cmovz %rbx, %rbp   # can reuse                                                      
    RPUSH %rsi         # ret stack                                                      
    add   $8, %rax                                                                      
    mov %rax, %rsi                                                                      
    STEP

¹ provided you're willing to point the footguns over at the return stack manipulation side of things instead

russdill · 2025-01-21T04:41:41 1737434501

Yes, I think the most common is a tail call. There also of course can be several ret's from a single function.

to11mtm · 2025-01-21T02:22:17 1737426137

My gut (been a while since I've been that low level) is various forms of inlining and/or flow continuation (which is kinda inlining, except when we talk about obfuscation/protection schemes where you might inline but then do fun stuff on the inlined version.)

ngneer · 2025-01-21T04:24:22 1737433462

If compilation uses jmp2ret mitigation, a trailing ret instruction will be replaced by a jmp to a return thunk. It is up to the return thunk to do as it pleases with program state.