Hacker News new | past | comments | ask | show | jobs | submit login
The Dark Arts of Advanced and Unsafe Rust Programming (rust-lang.org)
181 points by ngaut on Aug 24, 2018 | hide | past | favorite | 25 comments



IMO title should at least contain "The Rustonomicon"; "The Dark Arts of Advanced and Unsafe Rust Programming" is the subtitle. I wouldn't have clicked through if I had known that the link was to this document.


I recently had to write a lot of unsafe code in C# to interoperate with a driver.

As advanced as modern languages are, the excellent memory models only work within the language. Interoperability requires that you can work with memory directly when needed. (And only when needed.) I don't have a lot of Rust experience, but C# has plenty of tricks to avoid unsafe code but get close enough to working with real memory.

One thing that would help is if there was some kind of a tool that could take C header files, figure out how the structs and functions compile, and generate the Rust / C# structs and fuction signatures. This is such a time consuming task to do manually in C#, and novices screw it up all the time.

(I wish I had the time to do more work in Rust. It's a great concept!)


There are a couple of projects that do this for Rust, depending on what your inputs and outputs are:

* https://github.com/rust-lang-nursery/rust-bindgen - Inputs are C/C++ headers, outputs are Rust type definitions and extern functions to interoperate with the type and functions in the headers

* https://github.com/immunant/c2rust - Inputs are C headers and source, outputs Rust code that is semantically equivalent to the C (modulo bugs, etc.)

* https://github.com/eqrion/cbindgen/ - Inputs are Rust source, outputs C/C++ headers that can be used to interoperate with the types and functions exposed by Rust


None of these are able to handle dynamic header templating properly, e.g. try embedding Guile Scheme in a non-C/C++ language through FFI alone.


bindgen does this, and in particular bindgen is great because it uses libclang to parse the C header files, so it's interpreting them the same way that clang would.

I'm using bindgen in a project to bind the Linux kernel headers, which makes me a little bit uncomfortable because the Linux kernel is only guaranteed to compile with GCC (and may use GCC-specific extensions in describing structure layout) and in particular there's no guarantee that the in-kernel ABI is stable across different compilers. But it seems to work well in practice, as long as I tell bindgen not to attempt to parse literally everything (which it fails at).

That said - if you're interoperating with memory-mapped IO or with memory mapped from a foreign process, the other thing to be very careful about is that almost all languages (including C!) by default only make sure that memory writes are coherent from the viewpoint of other code in that language, and not necessarily that they're coherent from the viewpoint of something looking directly at memory. It's generally permissible to write a 64-bit number by writing each 32-bit half separately. It's generally permissible to write something and overwrite it immediately, or combine multiple writes, or so forth. In addition to structure layout, you probably want volatile read/write operations, same as you'd want volatile pointers in C: https://doc.rust-lang.org/std/ptr/fn.read_volatile.html or the atomic types with the "SeqCst" barrier https://doc.rust-lang.org/std/sync/atomic/index.html .


You probably won't have too many problems, Google and friends have had a long running project to get the kernel to compile with clang.

Unfortunately, they're not done yet. A guy I bumped into at a Linux kernel conference once said that their main issue is that some people within the kernel project want to keep GCC as the only way to compile the kernel.


"take C header files, figure out how the structs and functions compile, and generate the Rust / C# structs and fuction signatures. "

I have been playing with idea but often you also need to know what the function does with the pointer you pass in. Is it "in", "out" or whatever? With a lot of functions it would be doable but there are quite a few Win32 functions where it's really, really hard to do direct interop from C#. For these cases I find C+/CLI wrappers easier to deal with.


This isn’t really enough. What you really want is something that hoists C memory management into the the language you’re using; for example, the constructor for a high-level type would manage allocating memory and calling the constructor function, and the deinitializer would map to the custom free/close/release method. This isn’t really anything that I think can be automated; you’re going to have to write this wrapper yourself or find someone who’s already done it.


You might be able to kernelize it in a way where other language features are built on top of it. That's what House team did for a Haskell-based OS with unsafe stuff was stuffed into the H Layer.

http://programatica.cs.pdx.edu/House/

Likewise, metaprogramming routines that generate C code or interfaces for you from language statements closer to host.


There are many tools to generate C# bindings from header files. I haven't used one in a long time so can't remember any names but google and you will find. None of them are perfect so you always need to review the result and change some data types to more standard ones but at least it saves you writing the bulk of the code.

You usually run into the problem that the header files include other header files and you get this long dependency chain where somewhere in the chain some type is not recognized by the converter, to solve this the best is to cherry-pick the dependent types from the included header files and just put everything into one big flat header before passing into the tool. Macros can also be a PITA that you have to rewrite into functions and constants before generating.

You also have http://pinvoke.net/ with almost all win32 functions wrapped already. Same there though, review before use because it's a wiki and the quality is varying.


Taking header files and automatically generating Rust FFI code to match is exactly what rust-bindgen does

https://github.com/rust-lang-nursery/rust-bindgen


F# has a type provider system that can facilitate creation of such tools. And of course, it also has strong interoperability with C#.


This is not meant as criticism of this book as it is of my own reading and comprehension abilities (English is not my mother tongue), but when I attempted to read the book, I found the writing style employed very off-putting and distracting. In contrast, I usually find documentation written in a formal technical style like standards and references to be clear and concise; I don't have to exert mental effort trying to decipher inconsequential verbiage, idioms, and cultural references beyond what I need to understand the material at hand.

Is there a resource like The Rustonomicon, but written in a concise and formal style?


Best bet is probably the API docs? For instance: https://doc.rust-lang.org/std/mem/fn.transmute.html

Not sure how to search to get a collection of the unsafe related functions though.


It's a shame they didn't ever resume work on this. Apparently the guy who wrote it can't work on it any more


Thankfully, Gankro is now a colleague of mine and can work on it as much as they want again :)


omg does that mean Learning Rust With Entirely Too Many Linked Lists is going to get finished up at some point?



It's still getting commits, by the looks of things[1]. As always, I'm sure they welcome contributions.

[1] https://github.com/rust-lang-nursery/nomicon/commits/master


Ada does, and has better and more clear conventions for "unsafe" programming (pointer arithmetic, conversions, etc) but does it naturally without special keywords and esoteric conventions through its package system. Too bad no one will ever give it a chance due to is syntax :(

Rust is nothing but a slapdash C style copy of its semantics.


> Rust is nothing but a slapdash C style copy of its semantics.

Rust is a copy of Ada's semantics for dealing with memory unsafe code? Could you provide an example of what you mean?

> Ada does, and has better and more clear conventions for "unsafe" programming (pointer arithmetic, conversions, etc)

Can you give some examples here of what you mean? I found Interfaces.C.Pointers, but I don't know enough about the language to know if that's what you were talking about


I meant Rust is a copy of Ada in general - it does not deal with unsafe programming in the same way. Instead, you do things like instantiate generic packages/procedure like Ada.Unchecked_Deallocation and Ada.Unchecked_Conversion with the specific conversion or heap deallocation procedure you need. So you are forced to be explicit about every unsafe operation by including such procedures and calling such things - but you are not prone to accidents because you got in the habit of wrapping things in perverse "unsafe" blocks every time you meet a well-thought-out restriction of the language.


> I meant Rust is a copy of Ada in general

Have you ever actually looked at rust? The headline feature of Rust is linear types with checked borrowing, something that Ada does not provide.


Plus Ada has a complicated runtime system that Rust doesn't.


Example?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: