Hacker News new | past | comments | ask | show | jobs | submit login
Neat Rust Tricks: Passing Closures to C (seantheprogrammer.com)
215 points by rabidferret on Nov 25, 2019 | hide | past | favorite | 66 comments



This is a really well-written article about a common way to trampoline Rust methods into C embedded RTOS tasks, which often work in exactly this way. I've done just this in the service of getting ChibiOS and uC/OS II running Rust code.


Is this a neat trick or just standard operating procedure for calling C from <your favorite lang>? As it was billed as a trick, I was expecting some sort of runtime code generation to pass the data pointer and some jump instruction to jump to the right spot and unpack the data pointer.

Maybe I just overcomplicate things ;-)


I’d say that “standard procedure” would be to do it the same way as it would be done in C: define a struct, allocate one somewhere, then pass a pointer to it as the data pointer. Using the anonymous struct which represents the closure itself seems like skipping a step, the user doesn’t need to spell out which values are stored in the struct.


If the language supports closures which capture variables from their surrounding environment, there's no way around using "the closure itself" as your data object. After all, "the user" is not expected to "know" what any given closure is capturing from the environment; part of the point of closures is implementing a sort of information hiding.


It's standard procedure. I've done the exact same thing when wrapping C APIs into Python using Cython, several times. You pass the Python closure as the void *data and then register a shared generic callback which casts it and calls it. Easy. Getting the memory management right is slightly tricky, but not too bad.

Fun fact: you can't safely do this with ctypes. Since it is called as pure Python, it cannot do watertight Python exception handling in a callback context (because even if you have a try/except block, an exception can always happen right before or after it), and ctypes provides no usable internal way of doing it - it just eats exceptions inside callbacks. This is what motivated me to rewrite Ceph's librbd bindings from ctypes to Cython.


I thought as much, thanks for the confirmation :-)


It does seem quite similar to Haskell FFI code: https://github.com/bobfrank/hasqlite/blob/4e38801d969a43e88b...

The "neat" factor comes from how little type wrangling and unsafe code is needed.


I believe this actually JITs a trampoline with libffi, so only one code pointer is needed, not separate code and data pointers.

(Also hi, go contribute to Nixpkgs again!)


Interestingly this is very similar to how I implemented passing closures into JavaScriptCore as hooks for JS class invocations ("function calls"). [0]

Essentially it's taking advantage of the fact that closures are static methods with "implicit" data pointers. It should be fairly obvious that this is a massive violation of safety and undefined behavior, and most likely to break when debugging symbols etc. are inserted.

The safest way to do this until Rust has figured out a stable-enough-ABI for closure passing would be a thread-local trampoline, I guess. Not very nice..

[0] https://github.com/psychonautwiki/rust-ul/blob/master/src/he...


I read an article by a guy talking about stupid C tricks. One of them was to 'mangle' raw pointers into an index before passing them. And then de-magle them to get back a raw pointer. Advantages are you can pass meta data with the 'pointer'. Which also allows you to invalidate a pointer. The pointer can't be dereferenced. The enclosed variables/data isn't accessible and cannot be modified by the target.

For callbacks the overhead likely isn't significant.


Where's the UB? It casts a boxed closure to a raw pointer, and then back to a boxed closure. There's no tricky introspection being done here.


I'm not entirely sure you read the code I'm referring to. There's no box there.


You can pass closures to C as C functions in TXR Lisp, a language I created.

Example:

http://rosettacode.org/wiki/Window_creation#Win32.2FWin64

In this program, a translation of Microsoft's "Your First Windows Program" from MSDN, defun is used to define a WindowsProc callback. defun generates a lambda under the hood, which carries a lexical scope.

The lambda is passed directly to Win32 as a callback, which is nicely called for repainting the window. (Or at least, things appear that way to the programmer.)

Setting this up requires a few steps. We need a target function, of course, which can be any callable object.

Then there is this incantation:

  (deffi-cb wndproc-fn LRESULT (HWND UINT LPARAM WPARAM))
The deffi-cb operator takes a name and some type specifications: return type and parameters. The name is defined as a function; so here we get a function called wndproc-fn. This function is a converter. If we pass it a Lisp function, it gives back a FFI closure object.

Then in the program, we instantiate this closure object, and stick it into the WNDPROC structure as required by the Windows API. Here we use the above wndproc-fn converter to obtain WindowProc in the shape of a FFI closure:

  (let* ((hInstance (GetModuleHandle nil))
         (wc (new WNDCLASS
                  lpfnWndProc [wndproc-fn WindowProc]
         ...
The lpfnWndProc member of the WNDCLASS FFI structure is defined as having the FFI type closure; that will correspond to a function pointer on the C side. The rest is just Windows:

  (RegisterClass wc)
 
register the class, and then CreateWindow with that class by name and so on.


Here is another example of callbacks at work from the TXR Lisp test suite: using the C library funtion qsort to sort a Lisp array of strings.

http://www.kylheku.com/cgit/txr/tree/tests/017/qsort.tl

It's done in two ways, as UTF-8 char * strings and as wchar_t * strings.

What's used as the callback is the function cmp-str which is in TXR Lisp's standard library. A lambda expression could be used instead.

Also tested is the perpetration of a non-local control transfer out of the callback, instead of the normal return. This properly cleans up the temporary memory allocated for the string conversions.


TXR looks interesting. Is there a project README?


There is a boring and poorly maintained home page: http://www.nongnu.org/txr.

And big honkin' manual.


How does this relate to nested functions in C? (And resulting “infectious executable stacks”?)

https://nullprogram.com/blog/2019/11/15/


It doesn't. This is just showing the normal way that callbacks are implemented in vanilla C and how you would make that programming pattern interoperate with Rust closures. Neither one relies on the compiler trickery/runtime code generation described in the earlier article.


The executable stack trick is only required if you want to implement closures that can be called as if they were plain C functions, with only a function pointer and no extra (void *) argument.


It doesn't relate to it at all. The issues around linking to problematic object files mentioned in that article will apply to Rust as well, but that's unrelated to the subject of this article, it's a property of the linker you're using and the toolchain used to compile whatever C dependencies you have


The problems there don't apply I believe because Rust closures don't require an executable stack.


That's correct - a Rust closure generally [1] can't be converted to a function pointer as it requires both code and state.

[1] https://github.com/rust-lang/rust/issues/39817


The whole point of jgtrosh's link is that there is a way to hide data behind a function pointer, so Rust could convert any closure to a function pointer. But it requires writable-and-executable memory, so it's a pretty bad idea (in GCC's implementation, that memory is on the stack, which is an extra bad idea, but i don't think it needs to be).


Definitely.

Technically this can also be done via static code trampolines that are mmap'd as well [1]. That approach has been used on iOS in the past to turn blocks into raw function pointers.

If you have a platform that allows W+X on code (yikes!), you can do [2] as well.

[1] https://github.com/plausiblelabs/plblockimp/blob/master/Sour... [2] https://www.mikeash.com/pyblog/friday-qa-2010-02-12-trampoli...


Anything that doesn't require W+X would need an entire page allocated per closure, wouldn't it?


No, you can of course allocate W+X pages from the OS and put multiple closures in them using a standard userspace memory allocator.

Or if the OS doesn't support W+X allocation at all, then you can have a bunch of tightly packed pregenerated trampolines in the binary.


Right, this is how Objective-C's implementation works, except it keeps around one page of trampolines and remaps that around when necessary to be able to "create" more trampolines on the fly, I believe.


Nope! You'd do something to the effect of:

  clo_code:
  4C8B1501100000  mov r10 [rel clo_code+0x1008]
  FF25F30F0000    jmp [rel clo_code+0x1000]
  0F1F00          nop3
  # one page away...
  struct clo_slot {
    void (*func)(void* _R10,...);
    void* data;
    };
Edit: to use r10 rather than rotating all the argument registers.


For example, every platform that has a virtual machine with JIT compilation support.


C doesn't have nested functions - they are a GCC extension.


Now call qsort with a closure.


qsort(3) or even ftw(3) is the simple case. You can either dynamically generate trampoline code with exactly bounded dynamic scope (and even do the gcc-style executable stack hack) or simply stash the whole context in some TLS region and completely sidestep the whole issue.

Side point: ftw(3) is much more interesting unix API to call from some FFI layer than qsort(3). And I spent about a year pestering people from Sun with you should implement fts_open(3) and friends because it presents more sane API for FFIs for the same functionality.


And it appears that you succeeded. Awesome!

It seems Solaris has been adding many BSD and, especially, Linux compatibility APIs lately. It seems too little, too late; or perhaps the initiative is part of their effort to EoL Solaris, providing an upgrade path to Linux.


Well in fact I gave up about 10 years ago :)


`void ()(void, everything_else)` is in my personal experience a much more common API than that of `qsort` (probably for exactly the point you're pedantically trying to make), so I chose to focus on that API for this article.

There's really no reason to pass a rust closure to `qsort` instead of sorting in Rust. That said, if there's demand for real world use cases that require passing Rust closures to C APIs that take only a function pointer and not a data pointer, I'll be happy to write a follow up.


In any decent language, all functions are first-class, so if you want to use them as callbacks, you need that to work.

That's still true even if the API takes a separate context pointer that is given to your function as an argument.

There is still a function pointer there, and what you'd like to use as a function pointer is a function in your local language, and that's an object with an environment. Even if some instances of it have no environment, the FFI mechanism might want to tack on its own. For instance, the FFI needs to be able to route the callback to the appropriate function. Whatever function pointer FFI gives to the C function, when the C library calls that function, FFI has to dispatch it back to the original high level function. That requires context. Now that context could be shoehorned into that context parameter, but it could be inconvenient to do so; that parameter belongs to the program and to the conversation that program is having with the C API.


Generating native function pointers on the fly:

- is inherently slow, because CPUs have separate data and instruction caches;

- is extra slow in practice because you need a separate allocation for executable memory (unless your stacks and heap are RWX, which is a terrible idea);

- is not portable, requiring architecture- and OS-specific code; and

- is not supported at all in many environments (of varying levels of braindeadness).

For a statically compiled language like Rust, it makes much more sense to use the context pointer.


XSetErrorHandler for instance.


signal, sigaction.


I don't think qsort changes anything about the mechanism described in the blog post, but maybe I'm missing something. (I.e., use qsort_r...)

(qsort is really only for C. Other languages can potentially inline the comparison function, so using FFI for that is kind of insane.)


Gets me thinking: I wonder how good Intel CPUs are with dealing with this sort of thing. Can the CPU detect repeated jumps to comparators and in-line them from there? I’d be interested to see a comparative benchmark.


Use `qsort_r()`, which gives you an extra argument for your closure.


If a Rust closure doesn’t actually close over any variables, it can be coerced to a function pointer. Otherwise, you’re stuck with the workarounds others have mentioned.


I’m sure we could have rustc generate a trampoline function directly on the stack like gcc can. What could go wrong?


Nothing, if your Rust implementation is bug-free ;)


Just stash the data in a stack in thread local storage.

Problem solved.


Then it wouldn't be a lexical closure.


It can be from the perspective of the calling code.

Roughly speaking:

  thread_local! {
    static CBQ: Option<Box<impl FnMut(i32, i32) -> i32>>;
  }

  #[no_mangle]
  extern "C" fn qsort(array: *mut i32, val: usize, callback: impl FnMut(i32, i32) -> i32);

  pub fn rust_qsort(array : Vec<i32>, callback: impl FnMut(i32, i32) -> i32){
    CBQ.replace(Box::new(callback)).unwrap_none();
    
    unsafe {
        qsort(array.as_mut_ptr(), array.len(), &rust_qsort_callback);
    }
    
    CBQ.take().unwrap();
  }

  fn rust_qsort_callback(a: *mut i32, b: *mut i32) -> i32 {
    let callback = CBQ.take().unwrap();

    let (a, b) = unsafe { 
        (*a, *b)
    };

    let result = callback(a, b);
    
    CBQ.replace(callback).unwrap_none();

    result
  }

  fn main() {
    let a = vec![4,5,6,3,2];

    rust_qsort(a, |a, b| {
        if a < b {
            -1
        } else if a > b {
            1
        } else {
            0
        }
    })
  }
ought to work. (There's some fun with generics and panics, which is some fun to solve, but nothing which breaks the premise above).


This fails horribly if called recursively (or from a signal handler). You need something like:

  wrapped_qsort(/* array,callback */)
    {
    auto tmp = CBQ;
    CBQ = wrap(callback);
    qsort(array.ptr,array.len,cbq_callback);
    CBQ = tmp; /* pop old value from stack */
    }


It'll fail from a signal handler inside qsort, or inside the callback function.

It won't fail recursively - while the second call is happening, the first will be stored on the stack (see the take and replace in the callback shim function)


Your `fn rust_qsort` takes ownership of the vector, so it frees its memory after sorting and it can't be used after sorting in the caller function. And generic `impl FnMut` won't work in `extern "C"`, it only accepts function pointers.


you have reinvented dynamic scoping :).


  This is the TXR Lisp interactive listener of TXR 228.
  Quit with :quit or Ctrl-D on empty line. Ctrl-X ? for cheatsheet.
  1> (with-dyn-lib nil
      (deffi qsort "qsort" void ((ptr (array wstr)) size-t size-t closure))
      (deffi-cb qsort-cb int ((ptr wstr-d) (ptr wstr-d))))
  #:lib-0005
  2> (let ((vec #("the" "quick" "brown" "fox"
                  "jumped" "over" "the" "lazy" "dogs")))
       (prinl vec)
       (qsort vec (length vec) (sizeof wstr)
              [qsort-cb (lambda (a b) (cmp-str a b))])
       (prinl vec))
  #("the" "quick" "brown" "fox" "jumped" "over" "the" "lazy" "dogs")
  #("brown" "dogs" "fox" "jumped" "lazy" "over" "quick" "the" "the")
  #("brown" "dogs" "fox" "jumped" "lazy" "over" "quick" "the" "the")

The lambda is pointless; we could create the FFI closure directly from cmp-str with [qsort-cb cmp-str]. It shows more clearly that we can use any closure.


> If you’re not familiar with C’s syntax, here’s the equivalent signature in Rust

Author is hilarious. Who is familiar with that but not c?


I came to Rust without writing C before. Most of my experience with C comes from problems exactly like this. I doubt I'm alone in this.


Rust is already doomed. The amount of literature being published about either comparisons or compatibility with C is a strong indicator C is here to stay.


If Rust is intended to replace C, wouldn't you expect lots of this sort of literature? i.e. isn't this actually a sign of its _success_?


Also, being able to add Rust to an existing C or C++ codebase was an important design consideration. Big projects like Firefox aren’t just going to re-write millions of lines of code all at once.


And this is why rust is not going to succeed. I has not a great compatibility with c++.


No. It shows that people are still struggling with changing the way in which they write software to the "rust" way. The attitude of falling back to C or using unsafe rust just undermines the premise of the argument of why you should use rust.

This is just like the node.js craze a few years ago - people will rant on trying to justify why you should use rust and write the "rust" way before realising that what they already had worked as intended.

A true replacement for C (when one is finally developed) will remove all of these doubts and back-shadowing behaviour almost instantly (kind of like the react way of ux did)

EDIT: typo


> It shows that people are still struggling with changing the way in which they write software to the "rust" way.

It shows no such thing.

I generally work on relatively small ~1MLOC C++ codebases. There are codebases out there measured in the hundreds of MLOC. These are not the comparatively tiny javascript codebases you find React used in - where additional milliseconds of download / parse / evaluation time has a measurable effect on your user retention statistics. There is no "near instant" at 100M+ LOC scales. There is only incremental rewrites, and incremental rewrites means making your new code talk to your old code, and to other people's existing code.

This means interop with existing C ABIs. There is no such thing as a C "replacement" that can't talk to an existing C ABI, or expose an existing C ABI.

Of course, a C ABI doesn't mean C. It's more frequently C++ in my ecosystem, for example. But it could just as easily be Rust, or any number of other languages capable of exposing a C ABI.


>There are codebases out there measured in the hundreds of MLOC.

I agree with your argument, but I think in practice trying to "port" something with hundreds of MLOC is a losing battle (especially away from C). By the time you finished porting to rust (or your other language of the week), rust will likely have come and gone and will be been replaced by something either better or "better".

IMO people should spend less time trying to re-invent the wheel in rust and more time either improving C or the tooling / static analysis for C. It would avoid _so many_ of these issues.


I agree that porting 100MLOC is a losing battle no matter the language. And I'll concede I'm not certain Rust will have the staying power - although I hope it might.

And I'm all for more C tooling. Static analysis, fuzzing, sanitizers, valgrind, clang thread safety annotations, etc. are all wonderful tools I lean on heavily. But these are opt in, patchwork, platform specific, rife with false positives, false negatives, inconsistent, slow, painful to configure and use... I've wanted far more out of my C and C++ tooling than it's been able to give me for many years now. I'll frequently try out new attributes and annotations, only to curse when they fail to handle really trivial edge cases.

Meanwhile, Rust? It already catches things I didn't even realize I wanted to catch. Static checks opt-out into fast dynamic checks opt-out into heisenbugs in auditable unsafe blocks. The defaults are great.

I doubt C or C++'s tooling will reach the state of Rust, as frozen today, within the next decade. Smart people have tried long and hard to improve things, with quite middling results, convincing me it's a hard problem. I'm a bit more optimistic that it might catch up within the next century, but if I'm not long dead by then, I'll almost certainly be long retired. If it eventually does catch up, I suspect it'll have taken more than a few notes from Rust's approach.

I share your wariness of re-inventing the wheel, but the C & C++ static analysis ecosystem has left enough to be desired that I think it's warranted in this case. It's to rust's credit that they aren't re-inventing everything, and e.g. leverage LLVM for codegen, optimizations, debug info generation, etc.


Given the 30+ years of proven security exploits due to memory corruption, the C community has proven multiple times that it doesn't care about those solutions, except when required to do so in certified software.

That Solaris, iOS and in the future Android, pursue hardware memory tagging as workaround for memory corruption exploits, it is a proof how bad the situation in terms of security is.


Rome wasn't built in a day.

As for C staying around, unfortunately yes, until we get rid of POSIX based OSes, C will be around.

After all we need to keep those <UNIX clone OS> Security conferences alive. /s


I think a language being highly compatible with C is what would have the greatest potential to replace it. In some ways it's similar to Microsoft's "embrace, extend, extinguish" strategy.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: