Hacker News new | past | comments | ask | show | jobs | submit login
Rustgo: Calling Rust from Go with near-zero overhead (filippo.io)
282 points by FiloSottile on Aug 15, 2017 | hide | past | favorite | 68 comments



> But to be clear, rustgo is not a real thing that you should use in production. For example, I suspect I should be saving g before the jump, the stack size is completely arbitrary, and shrinking the trampoline frame like that will probably confuse the hell out of debuggers. Also, a panic in Rust might get weird.

> To make it a real thing I'd start by calling morestack manually from a NOSPLIT assembly function to ensure we have enough goroutine stack space (instead of rolling back rsp) with a size obtained maybe from static analysis of the Rust function (instead of, well, made up).

> It could all be analyzed, generated and built by some "rustgo" tool, instead of hardcoded in Makefiles and assembly files.

Maybe define a Go target to teach Rust about the Go calling conventions? You may also want to use "xargo", which is specially built for stripping or customising "std" and to work with targets without binary stdlib support.


This is more complex then just a target. Rust has to communicate to the Go Runtime Environment.

Two main points:

-Go uses a very small stack for GoRoutines to make them dirt cheap. When you exceed this stack, Go maps in more stack for you transparently. Rust generated ASM is running on the Go Stack, but it is expecting when it exceeds its stack to explode, like a C program. As that should be the OS Stack. Larger problem then you think, Rust likes to put a ton of stuff on the stack. This is one of the nice things about Rust is putting _a ton_ of data on the stack is cheap, and makes ownership simpler.

-Go's system calls, and concurrency primitives are cooperative with its runtime. When make they communicate the routine can yield to the runtime. Targeted Rust code would _also_ have to make these calls, as well as 3rd party crates.

Again none of this is impossible, linker directives and FFI magic could import these functions/symbols. But this would also require Go have a stabilized runtime environment for other languages to link against. Currently just stating Go has a runtime is controversial, so I expect this won't happen soon.


> This is more complex then just a target. Rust has to communicate to the Go Runtime Environment.

If you want full Go embedding sure, but this discussion is in the context of TFA whose stated purpose is the ability to build optimised "pure" sub-functions without having to use plan9 assembly.

In that case, what a Go target does is remove the need for a trampoline and manually fucking around with calling conventions.


> -Go uses a very small stack for GoRoutines to make them dirt cheap. When you exceed this stack, Go maps in more stack for you transparently. Rust generated ASM is running on the Go Stack, but it is expecting when it exceeds its stack to explode, like a C program. As that should be the OS Stack. Larger problem then you think, Rust likes to put a ton of stuff on the stack. This is one of the nice things about Rust is putting _a ton_ of data on the stack is cheap, and makes ownership simpler.

This point confuses me; if Rust expects to run on a limited stack, why would it expect to put a ton of data on the stack?

> Currently just stating Go has a runtime is controversial

I've never heard any controversy... The Go community certainly call it a runtime, and there's even a "runtime" package. Do folks from VM languages get grumpy because Go's runtime is statically linked?


> This point confuses me; if Rust expects to run on a limited stack, why would it expect to put a ton of data on the stack?

Rust runs on a C stack, while it's not infinite[0] it's a whole other ballpark than a Go stack since it's non-growable (Rust used to use growable stacks before 1.0): the default C stack size is in the megabyte range (8MB virtual on most unices[1]), in Go the initial stack is 2kB (since 1.4, 8k before).

[0] you can set the size to "unlimited", systems will vary in their behaviour there, on my OSX it sets the stack size to 64MB, Linux apparently allows actual unlimited stack sizes but I've no idea how it actually works to provide that

[1] I think libpthread defines its own stack size circa 2MB so 8MB would be the stack of your main thread and 2MB for sub-threads, but I'm not actually sure


Maybe this would make more sense in llgo. I don't know how it's structured, but if it it uses the abi/calling convention of llvm and possibly llvm's segmented stack it might be (waves hands) a little easier to integrate with rust.

As for concurrency and system calls, I'd say that go people probably would get a tremendous amount of value out of just single threaded rust code that didn't do I/O. Like parsers.


Things may have changed from when I last looked at it but there's another issue which is pinning.

If you ever want to do more than the most trivial FFI you'll eventually want to be able to pass types back and forth(usually opaque to either side). AFAIK Go doesn't offer any pinning if it's GC'd types so you can have the collector move them from under you.

C# has this beautiful thing where you can pass a delegate as a raw C fn pointer. It makes building interop a wonderful thing but you have to make sure to pin/GCHandle it appropriately.


Folks, stop, you are reinventing cgo now.

Defining a Go target for Rust actually makes sense in the context of replacing assembly (which has no runtime, GC or concurrency connotations), I was just too lazy to do it that way :)


> Folks, stop, you are reinventing cgo now.

This is a good thing because `cgo` is really bad. No ALSR, always forking. These are completely _insane_ defaults, it manages to be slower then the JNI [1] which is an FFI from a completely managed stack based VM universe! Not a _compiled_ language.

Somehow a compile language calling a static binary manages to be slower then a dynamic language's runtime calling a static binary...

`cgo` isn't doing _anything right_. It is doing a lot of things wrong.

[1] https://lwn.net/Articles/446701/


No, if you think there are problems with cgo, the solution is to address those problems, not reinvent it. If you have performance concerns, file a bug or comment on an existing one.

Also, I don't see any reason why jni should be slower than cgo. Go has its own scheduler that brings overhead to cgo and needs to switch stacks whereas Java doesn't have to deal with such things.


> No, if you think there are problems with cgo, the solution is to address those problems, not reinvent it. If you have performance concerns, file a bug or comment on an existing one.

Well, sometimes the solution is to reinvent it and provide another solution. Sometimes a project has a specific goal which might preclude them from using your idea, or the people involved just have a slightly different vision.

Sometimes, if you believe the existing project is wrong from the ground up, the solution is to reinvent it as something else. Sometimes it doesn't pan out, sometimes it does. That's the beauty of open source.


>If you have performance concerns, file a bug or comment on an existing one.

Excuse my language, or don't. But the Go-Maintainers really don't give shit about improving their languages performance. Also FFI ALSR is disabled for debugging simplicity. Which I can only imagine means they debug by reading literal core dumps by hand.

Furthermore I'd rather not donate my time and energy to a company as large as Google.

>Also, I don't see any reason why jni should be slower than cneeds to switch stacks whereas Java doesn't have to deal with such things.

The JVM has its own scheduler to keep locks fair, also OpenJDK does green threading in its runtime to allow for GC cycles on JIT'd code. I'm pretty sure Oracle and Azule do a well since JIT execution/cleanup requires doing stack swapping.


That article comment is from 2011, is cgo still slower than JNI?


According to this old SO comment[0] JNI calls on x86_64 have an overhead of about 6ns.

On my machine (2010 MBP @ 2.4GHz), cgo calls have an overhead of ~120ns.

[0] https://stackoverflow.com/questions/13973035/what-is-the-qua...


I mean, kinda but not?

Based on https://github.com/golang/go/issues/12416 / https://golang.org/cmd/cgo/ :

> However, C code may not keep a copy of the Go pointer after the call returns.

It looks like they've just punted on the pinning issue and not allowed code that does it.

Having written a ton of FFI code across a wide range of languages this type of restriction means that you're not going to be able to implement certain things which is unfortunate.


I haven't tried Rust yet, but I've been building libraries in Ruby, Node, and Python that call into a shared Go core, and my experience has been that the best approach is to simply compile static executable binaries for every platform, then call out to them in each language via stdin/stdout. I tried cgo, .so files, and the like, bit this was a lot more trouble and had issues on both Windows and alpine-flavored linuxes.

Is there some issue with this approach that I'm missing? Is the additional process overhead really enough that it's worth bending over backwards to avoid it?


As others have mentioned, it's too slow for what the author was trying to achieve. He had a section where he ruled out cgo for performance reasons, so if that's too slow, spawning a child process will be much slower. That may not matter for your use case, but the author is clearly aiming for almost no overhead considering how easy it is to use cgo to call into Rust.

The thing that should probably be said is that the difficulty is all on the Go side. Rust doesn't have any of the clumsiness that Go has when interacting with other languages. It's fully fluent in the lingua-franca of FFI (C ABI). If you were integrating your Ruby, Node or Python code with Rust instead of Go, there are nice libraries [1][2][3] that make it simple, easy and very low overhead.

For users of these scripting languages, Rust is a nice tool to keep in your back pocket to pull out in the rare cases that you're not getting the performance you need. It means being able to choose your tools based solely on developer ergonomics and existing team knowledge knowing that in the rare cases that you do need do something computationally intensive, you can drop to Rust, push everything through a Rayon parallel iterator, write the performance-sensitive logic and push the result back. It's also really useful to use Rust in environments like Lambda/Cloud Functions that only support those scripting languages since those environments tend to charge based on memory and CPU time and Rust makes it easy to get by with a minimum of both.

[1] https://github.com/tildeio/helix (Ruby)

[2] https://github.com/neon-bindings/neon (Node)

[3] https://github.com/pyo3/pyo3 (Python)


Well, the stated goal was to use rust for small hotspots. These hotspots could take very little time but are called a lot, so the overhead of creating a process / communicating with another process can be quite a lot (think of a function called in a tight inner loop)


That sounds like it could be a nightmare for error handling! How do you work with that? And aren't a great many performance benefits lost when you restrict Go to stdout?

I would think it would be better overall to just create a local http server in Go and use that instead. Or sockets if you're feeling up to it.


The only performance issues with stdout is if you write to a console, the console is rendering it. For example, printing 100000 lines of `Hello World!` to stdout takes 2.134 seconds when output to a console on my computer, 43 milliseconds when redirected to `nul`, (the equiv of > /dev/nul), and 276 milliseconds when redirected to a file. So saying stdout is a performance issue is just FUD. Now, there's still issues with error handling, but that can be solved by implementing a message protocol over stdin/stdout, and you still have a cost associated with launching the binary, but that can similarly be replaced with message protocol over stdin/stdout, allowing the same running instance to serve multiple requests.


The executable has really simple output--it either works and outputs json, or doesn't and outputs nothing--so there's no difficulty with error handling. I guess I can understand wanting tighter integration for more complex scenarios though... an http server is an interesting idea, but could you run into issues with ports being restricted on production servers?

I'm not seeing any performance issues with stdout, but I'm also not writing much data.


> but could you run into issues with ports being restricted on production servers

Sorry, what? Just make the port configurable?


It was a question (note the question mark)... I don't see the need for snark.

Anyway, for my purposes, this wouldn't work, since the executable is embedded in libraries that are meant to run anywhere without any configuration. But yeah I could see that being fine under other circumstances I guess.


No snark. Just puzzled. You can still just make it configurable. Or just pick a random port and communicate it with your child process.


You might be interested in this plugin framework for Go that communicates over stdin/stdout: https://github.com/natefinch/pie


Building the rust code into a syso file might make the (user) build process easier here. This is used for the race detector (based on tsan) and there's an example of building and using one in the dev.boringcrypto branch. This would require a package author to create syso files for all GOOS, GOARCH combinations they care about. Although GOOS might not matter depending on whether any syscalls can be made from rust.

https://go-review.googlesource.com/c/55472


This is crazy. I love it.


Language interop in 2017 is still pretty dismal.

C is still the common denominator, you'd think it'd be easy, but it's hard. Years ago when LLVM was showing promise and Google was going to get Python running on top of it I was hopeful.

I guess nowadays it's a better design to run separate processes and have your languages communicate out of process (pipes, http, etc) rather than in-process.


  Go strives to find defaults that are good for its core use
  cases, and only accepts features that are fast enough to be 
  enabled by default, in a constant and successful fight
  against knobs
Made me chuckle.


I loved that line too.

"A constant and successful fight against knobs" really gets at a lot of what makes Go (often) a joy to use.


If you're already writing Rust, why would you even bother writing Go?


The same reason why Go is used to write many applications where Rust is not.

- Fast compilation

- Multi platform

- Great tooling

- Good std lib

- Easy learning curve

- Fast enough for most scenarios

- Good concurrency model


> Great tooling

With the caveat that I'm still fairly new to Go, I'm not sure that I'd call this a complete win for Go over Rust. With regards to editing tools, yes, Go's has the edge; completion, formatting, and function lookup are all still better than anything Rust has to offer, although rustfmt has come leaps and bounds from where it used to be, and RLS looks very promising. With regards to what I'll call "infrastructure" tooling (building, packaging, testing, etc.), Rust absolutely wins, hands down. Cargo is miles ahead of anything I've heard about in the Go world, which currently has multiple commonly used tools for dependency management (none of which are nearly as good as Cargo), and I've already encountered Makefiles and custom shell scripts for common building tasks that would be a breeze with Cargo in the few weeks I've been building Go. That's not to say that there might not be equally good options out there for Go, but if they exist, they don't seem to be universally used, which mitigates their usefulness.

More on a personal opinion level, I also utterly despise the way GOPATH works. I generally try to group the projects I work on by category rather than by language (i.e. ~/code/projects for side projects, ~/code/forks for open source repos I contribute to but don't maintain, ~/code/scratch for simple projects in each language I use for if I want to try out a package or something, etc.), which GOPATH is completely incompatible with, as I'd need to add each of them to my GOPATH and then put all my Go projects of each type into a "src" directory in each of them and then either move all my non-Go projects there or have a bunch of Go-specific directories littered around in each of them. If I don't put my Go projects in a ":src" directory inside one of the paths on my GOPATH, then I can't use gorename to change the name of a variable or function, which is frankly ridiculous. This isn't to say that my way of organizing my projects is the "best" way, but I feel like enforcing a directory structure outside of the project that the tool is being used in is borderline hostile to the user.


Thankfully, https://github.com/golang/dep is almost stable now, should be merged into tip by the 1.10 release and after that, a reform of GOPATH is planned so a lot of this nastiness should be gone.


Nice! I've looked into dep, and it was clear to me that it would eventually be the standard; I just hope that everyone switches to it once it's released as part of the official tools.


- Fast compilation

In return, you spend more time debugging since there are no interesting static guarantees.

- Multi platform

So is Rust? Even if it weren't, you've already tied yourself to the platform support of Rust at this point.

- Great tooling

Not familiar enough to comment on this one.

- Good std lib

I can't take Go's stdlib seriously with the way they handle errors.

- Easy learning curve

Despite using it for a year I still have to google Go's syntax and semantics daily. By far the least consistent and hardest to learn general purpose language I have touched.

- Fast enough for most scenarios

Sure.

- Good concurrency model

This is literally just wrong. How can you claim to be serious about concurrency when you have no concept of immutability in your language?


- - Fast compilation

- In return, you spend more time debugging since there are no interesting static guarantees.

I spend almost no time debugging my Go code... It generally either works correctly, or fails to compile... and in the cases where it doesn't work correctly, I'd prefer to have a test that makes it obvious what went wrong, and then make the test pass.

- - Great tooling

- Not familiar enough to comment on this one.

Go's tooling is one of the reasons I use it, esp. the very strict formatting style

- - Good std lib

- I can't take Go's stdlib seriously with the way they handle errors.

I prefer the way Go handles errors, cause it makes it so all control flow is visible by default.

- - Easy learning curve

- Despite using it for a year I still have to google Go's syntax and semantics daily. By far the least consistent and hardest to learn general purpose language I have touched.

I'd seriously question this, I was competent in the language after about a week, meaning at the point of just having to look up package specific stuff. Granted, I have a lot of background in C family languages, but still...

- - Good concurrency model

- This is literally just wrong. How can you claim to be serious about concurrency when you have no concept of immutability in your language?

Share memory by communicating, don't communicate by sharing memory. Yes, it's completely different from what most people are used to, but it's a valid paradigm. I'd go look up "communicating sequential processes" and do some reading if I were you.


> Share memory by communicating, don't communicate by sharing memory.

I suspect the parent was referring to the fact that Go just has this as a convention, there's no static guarantee that the programmer gets it right and it is undefined behaviour if they don't (fortunately the race detector exists). Interestingly, Rust does provide guarantees about this sort of concurrency patterns, allowing one to get stronger control around sharing using Go/CSP-like channels.


Yes, but saying it doesn't have a concurrency model cause it's missing a feature that's arguably irrelevant to the concurrency model that's recommended by the language is pointless, and seemingly serves no other reason then to attack the language.


Having some "feature" that means the concurrency model is sound (aka without undefined behaviour) isn't irrelevant to concurrency models. (I quoted feature because there's a variety of different ways to get this guarantee, some of which don't feel like language features, per se.)

On the other hand, it's fair that the convention/tooling is usually good enough; this is a stronger argument and probably a better one to be making than a dubious one about being irrelevant.


> Share memory by communicating, don't communicate by sharing memory. Yes, it's completely different from what most people are used to, but it's a valid paradigm. I'd go look up "communicating sequential processes" and do some reading if I were you.

LOL this is the classic sort of condescension from the Go community that makes the programming languages community rage. There is a lot of irony here because Go is basically founded on an anti-intellectual ignoring of all previous research.


And coming in and trashing a programming language is any better? I wasn't attempting to be condescending, CSP was new to me when I was starting out with Go. I'm sorry if my attempt to give you a reference point for looking up something that you seemingly didn't understand came across poorly.


Why do you think I don't understand CSP? These things are not so novel as you would think from the Go community...


Because you seemed to be saying that without immutability, concurrency is impossible?


I'm guessing this is more intended to facilitate using Rust in projects primarily written in Go rather than vice-versa.


In my and many other opinions, Go is practically superior for the 90% case.


Go is probably one of the worst and most useless language invented for the last 20 years lol.


Go deserves a lot of hate because they set out to create a language from scratch without any legacy baggage, but instead of learning from the mistakes of the past and consulting people who actually know a thing about programming languages, they repeated all the same mistakes and then some.


-


I think you misread the parent comment. :)


What has this got to do with Rust? Nothing. He could have called any C library and it would have been exactly the same. I am pretty sure there is a crypto library or two written in C.

He is just writing a more direct manual version of CGo in assembly that bypasses a lot of what CGo does, to be much faster.

> Before anyone tries to compare this to cgo

The only meaningful message in this blog is it possible to write a faster CGo, that's it. Comparing it to CGo is the only useful possible outcome, but...

> But to be clear, rustgo is not a real thing that you should use in production. For example, I suspect I should be saving g before the jump, the stack size is completely arbitrary, and shrinking the trampoline frame like that will probably confuse the hell out of debuggers. Also, a panic in Rust might get weird.

So when you actually fix all those things you might be back where CGo was at the beginning.

This guy comes across as a classic "but i wanna be cool" hacker who discovers that when you bypass all the normal protections in a library and make some kind of direct custom call, things can be faster.

I guess so what?


Only comment which actually realized what the article was about and...downvoted. Thanks HN-retards.


Is rust actually faster than go? I had no idea.


Rust has an explicit goal of zero-cost abstractions, while Go will allow more expensive abstractions in the service of simplicity.


Go makes you write complex code so that they can be lazy and have a simple compiler.


Whereas Rust makes you write complex code because they put a lot of work into making a sophisticated compiler. It's a strange case of convergent evolution.


Rust code I've written has been fairly straightforward, so not sure why you think Rust is complex.


More complex code that then actually does the thing you intend it to do rather than having subtle concurrency bugs (among other sorts).


For the kind of numerical code here, it's not going to be significantly faster.


curve25519-dalek is 3X faster than the equivalent Go code (i.e. in the stdlib, authored by agl).

This is for a few reasons:

Rust has support for 128-bit integers (i.e. u128) which allows for faster arithmetic when operating on what are effectively multi-u128 bignums in an elliptic curve library.

LLVM is generally more sophisticated about optimizations. It wasn't until recently that go had an SSA-based compiler, so its optimizer isn't nearly as sophisticated as LLVM's, which has been developed for many years now.


There is also gccgo, taking advantage of GCC's backend.

There is some ongoing work into an LLVM based compiler for Go, and LLVM now has Go as official supported bindings.


Go can and should fix the integer problem here. George Tankersley had a whole talk at Crypto Village this year about what a nightmare it is to write competitively fast curve code in Go.


My information may be out of date, but last time I poked through the Go compiler, it did no vectorization or anything equivalent to march=native. So there's plenty of opportunity for Rust to be faster.


As always, general benchmarks only give you limited insights, but this might be interesting:

Rust vs Go http://benchmarksgame.alioth.debian.org/u64q/compare.php?lan...

Rust vs C http://benchmarksgame.alioth.debian.org/u64q/rust.html


You can tweak Go to make it as fast or faster, but, out of the box, the runtime is a bit slower.

Building is faster, though, so that's nice.


what are the things that you can tweak?


Most Go programs can benefit a lot from smarter memory management, which puts less strain on the GC and allows for certain optimizations. Things like refactoring your function to write into a pre-allocated buffer instead of returning a newly-allocated object. Or using a concrete type like `*os.File` instead of `io.Writer` so that the compiler can eliminate a heap allocation.

You can also eliminate some slice bounds-checking by "asserting" the size of the slice outside the hot path; see http://www.tapirgames.com/blog/go-1.7-bce

Go also has a great profiler for discovering exactly which functions/statements are causing slowdown or allocating excessive memory.

At the end of the day, though, I don't think most Go programs can be optimized to run as fast as their Rust equivalents (without dropping into asm, at least). There's just too much overhead that you aren't allowed to disable.


> At the end of the day, though, I don't think most Go programs can be optimized to run as fast as their Rust equivalents (without dropping into asm, at least)

To be clear, I meant programs that don't rely on things like concurrency or networking. For example, I think editing binary blobs (like images) would be just as fast as Rust, if I remember some of the research I did correctly. I don't have any sources on that, and it was a while back, so it's kinda irrelevant, I guess.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: