In short, the maximum possible speed is the same (+/- some nitpicks), but there can be significant differences in typical code, and it's hard to define what's a realistic typical example.
The big one is multi-threading. In Rust, whether you use threads or not, all globals must be thread-safe, and the borrow checker requires memory access to be shared XOR mutable. When writing single-threaded code takes 90% of effort of writing multi-threaded one, Rust programmers may as well sprinkle threads all over the place regardless whether that's a 16x improvement or 1.5x improvement. In C, the cost/benefit analysis is different. Even just spawning a thread is going to make somebody complain that they can't build the code on their platform due to C11/pthread/openmp. Risk of having to debug heisenbugs means that code typically won't be made multi-threaded unless really necessary, and even then preferably kept to simple cases or very coarse-grained splits.
To be honest, I think a lot of the justification here is just a difference in standard library and ease of use.
I wouldn't consider there to be any notable effort in making thread build on target platforms in C relative to normal effort levels in C, but it's objectively more work than `std::thread::spawn(move || { ... });`.
Despite benefits, I don't actually think the memory safety really plays a role in the usage rate of parallelism. Case in point, Go has no implicit memory safety with both races and atomicity issues being easy to make, and yet relies much heavier on concurrency (with a parallelism degree managed by the runtime) with much less consideration than Rust. After all, `go f()` is even easier.
(As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)
He straight ported some C code to rust and found the rust code outperformed it by ~30% or something. The culprit ended up being that in C, he was using a hash table library he's been copy pasting between projects for years. In rust, he used BTreeMap from the standard library, which turns out to be much better optimized.
This isn't evidence Rust is faster than C. I mean, you could just backport that btreemap to C and get exactly the same performance in C code. At the limit, I think both languages perform basically the same.
But most people aren't going to do that.
If we're comparing normal rust to normal C - whatever that means - then I think rust takes the win here. Even Bryan Cantrill - one of the best C programmers you're likely to ever run into - isn't using a particularly well optimized hash table implementation in his C code. The quality of the standard tools matters.
When we talk about C, we're really talking about an ecosystem of practice. And in that ecosystem, having a better standard library will make the average program better.
The only real question I have with this is did the program have to have any specific performance metric?
I could write a small utility in python that would be completely acceptable for use but at the same time be 15x slower than an implementation in another language.
So you do you compare code across languages that were not written for performance given one may have some set of functions that happens to favour one language in that particular app?
I think to compare you have to at least have the goal of performance for both when testing. If he needed his app to be 30% faster he would have made it so, but it didn't need to be so he didn't. Which doesn't make it great for comparison.
Edit, I also see that your reply was specifically about the point that the libs by themselves can help the performance with no work, and I do agree with you, as you were to the guy above.
Honestly I'm not quite sure what point you're making.
> If he needed his app to be 30% faster he would have made it so
Would he have? Improving performance by 30% usually isn't so easy. Especially not in a codebase which (according to Cantrill) was pretty well optimized already.
The performance boost came to him as a surprise. As I remember the story, he had already made the C code pretty fast and didn't realise his C hash table implementation could be improved that much. The fact rust gave him a better map implementation out of the box is great, because it means he didn't need to be clever enough to figure those optimizations out himself.
Its not an apples-to-apples comparison. But I don't think comparing the world's fastest C code to the world's fastest rust code is a good comparison either, since most programmers don't write code like that. Its usually incidental, low effort performance differences that make a programming language "fast" in the real world. Like a good btree implementation just shipping with the language.
I did feel my post was a bit unneeded when I added my edit :)
My point about the 30% was that you mentioned that he got in rust and attributed it to essentially, better algorithms in the rust lib he used. Once he knew that then its hard to say that rust is 'faster' but the point is valid and I accept that he gained performance by using the rust library.
My other point was that the speed of his code probably didn't matter at the time. If it was a problem in the past he probably would have taken the time to profile and gain some more speed. Sure you cant gain speed that can't be had but as you pointed out, it wasn't a language issue, it was an implementation of the library issue.
He could have arbitrarily used a different program that used a good library and the results reversed.
I also agree that most devs are not working down at that level of optimisation so the default libraries can help but at the same time it mostly doesnt matter if something takes 30% longer if that overall time is not a problem. If you are working on something where the speed really matters and you are trying to shave off milliseconds then you have to be that developer that can work C or Rust at that level.
What I think it illustrates more is how much classic languages could gain by having a serious overhaul of their standard library and maybe even a rebrand if that's the expected baseline of a conformant implementation.
>If he needed his app to be 30% faster he would have made it so
That still validates "In short, the maximum possible speed is the same (+/- some nitpicks), but there can be significant differences in typical code" the parent wrote
> He straight ported some C code to rust and found the rust code outperformed it by ~30% or something. The culprit ended up being that in C, he was using a hash table library he's been copy pasting between projects for years. In rust, he used BTreeMap from the standard library, which turns out to be much better optimized.
Are you surprised? Rust is never inherently faster than C. When it appears faster, it boils down to library quality and algorithm choice, not the language.
Also worth noting that hash tables and B-trees have fundamentally different performance characteristics. If BTreeMap won, it is either the hash table implementation, or access patterns that favor B-tree cache locality. Neither says anything about Rust vs C. It is a library benchmark, not a language one.
And especially having performant and actively maintained default choices built in. With C, as described in the post you responded to, you'll typically end up building a personal collection of dusty old libraries that work well enough for most of the time.
I think Rust projects will accumulate their own cruft over time, they are just younger. And the Rust ecosystem's churn (constant breakage, edition migrations, dependency hell in Cargo.lock) creates its own class of problems.
Either way, I would like to reiterate that the comparison is flawed at a more fundamental level because hash tables and B-trees are different data structures with different performance characteristics. O(1) average lookup vs O(log n) with cache-friendly ordered traversal. These are not interchangeable.
If BTreeMap outperformed his hash table, that is either because the hash table implementation was poor, or because the access patterns favored B-tree cache locality. Neither tells you anything about Rust vs C. It is a data structure benchmark.
More importantly, choosing between a hash table and a tree is an architectural decision with real trade-offs. It is not something that should be left to "whatever the standard library defaults to". If you are picking data structures without understanding why, that is on you, not on C's lack of a blessed standard library (BTW one size cannot fit all).
> If BTreeMap outperformed his hash table, that is either because the hash table implementation was poor, or because the access patterns favored B-tree cache locality. Neither tells you anything about Rust vs C. It is a data structure benchmark.
The specific thing it tells you about Rust vs C is that Rust makes using an optimized BTreeMap the default, much-easier thing to do when actually writing code. This is a developer experience feature rather than a raw language performance feature, since you could in principle write an equally-performant BTreeMap in C. But in practice Bryan Cantrill wasn't doing that.
> More importantly, choosing between a hash table and a tree is an architectural decision with real trade-offs. It is not something that should be left to "whatever the standard library defaults to". If you are picking data structures without understanding why, that is on you, not on C's lack of a blessed standard library (BTW one size cannot fit all).
The Rust standard library provides both a hash table and a b-tree map, and it's pretty easy to pull in a library that provides a more specialized map data structure if you need one for something (because in general it's easier to pull in any library for anything in a Rust project set up the default way). Again, a better developer experience that leads to developers making better decisions writing their software, rather than a fundamentally more performant language.
> the Rust ecosystem's churn (constant breakage, edition migrations, dependency hell in Cargo.lock) creates its own class of problems.
What churn? Rust hasn't broken compatibility since 1.0, over a decade ago. These days it feels like rust changes slower than C and C++.
> Either way, I would like to reiterate that the comparison is flawed at a more fundamental level because hash tables and B-trees are different data structures with different performance characteristics. O(1) average lookup vs O(log n) with cache-friendly ordered traversal. These are not interchangeable.
They're mostly interchangeable when used as a map! In rust code, in most cases you can just replace HashMap with BTreeMap. In practice, O(log n) and O(1) are very similar bounds owing to how slowly log(n) grows with respect to n. Cache locality often matters much more than a O(log n) factor in your algorithm.
If you read the actual article, you'll see that Cantrill benchmarked his library using rust's b-tree and hash table implementation. Both maps outperformed his C based hash table implementation.
> Neither tells you anything about Rust vs C.
It tells you rust's standard library has a faster hash map implementation than Bryan Cantrill. If you need a hash table, you're almost certainly better off using rust than rolling your own in C.
One point of clarification: the C version does not have (and never had) a hash table; the C version had a BST (an AVL tree). Moreover, the "Rust hash table implementation" is in fact still B-tree based; the hash table described in the post is a much more nuanced implementation detail. The hash table implementation has really nothing to do with the C/Rust delta -- which is entirely a BST/B-tree delta. As I described in the post, implementing a B-tree in C is arduous -- and implementing a B-tree in C as a library would be absolutely brutal (because a B-tree relies on moving data). As I said in the piece, the memory safety of Rust is very much affecting performance here: it allows for the much more efficient data structure implementation.
I wouldn't consider implementing a B-tree in C any more "arduous" than implementing any other notable container/algorithm in C, nor would making a library be "brutal" as moving data really isn't an issue. Libraries are available if you need them.
Quite frankly, writing the same in Rust seems far, far more "arduous", and you'd only realistically be writing something using BTreeMap because someone else did the work for you.
However, being right there in std makes use much easier than searching around for an equivalent library to pull into your C codebase. That's the benefit.
I don't often do this, but I'm sorry, you don't know what you're talking about. If you bother to try looking for B-tree libraries in C, you will quickly find that they are either (1) the equivalent of undergraduate projects that are not used in production systems or (2) woven pretty deeply into a database implementation. This is because the memory model of C makes a B-tree library nasty: it will either be low performance or a very complicated interface -- and it is because moving data is emphatically an issue.
Can you mention 3 cases of breakage the language has had in the last, let's say, 5 years? I've had colleagues in different companies responsible for updating company-wide language toolchains tell me that in their experience updating Rust was the easiest of their bunch.
> edition migrations
One can write Rust 2015 code today and have access to pretty much every feature from the latest version. Upgrading editions (at your leisure) can be done most of the time just by using rustfix, but even if done by hand, the idea that they are onerous is overstating their effect.
Last time I checked there were <100 checks in the entire compiler for edition gates, with many checks corresponding to the same feature. Adding support for new features that doesn't affect prior editions and by extension existing code (like adding async await keywords, or support for k# and r# tokens) is precisely the point of editions.
> When it appears faster, it boils down to library quality and algorithm choice, not the language.
That's a thin, thin line of argumentation. The distinction between the ecosystem and language may as well not exist.
A lot of improvements of modern languages come down to convenience, and the more convenient something is, the more likely it is to be used. So it is meaningful to say that the average Rust program will perform better than the average C program given that there exist standard, well-performing, generic data structure libraries in Rust.
> It is a library benchmark, not a language one.
If you have infinite time to tune performance, perhaps. It is also meaningful to say that while importing a library may take a minute, writing equivalently performant code in C may take an hour.
> (As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)
Is that beyond just "concurrency is tricky and a language that makes it easier to add concurrency will make it easier to add sneaky bugs"? I've definitely run into that, but have never written concurrent C to compare the ease of heisenbug-writing.
> Despite benefits, I don't actually think the memory safety really plays a role in the usage rate of parallelism.
I can see what you mean with explicit things like thread::spawn, but I think Tokio is a major exception. Multithreaded by default seems like it would be an insane choice without all the safety machinery. But we have the machinery, so instead most of the async ecosystem is automatically multithreaded, and it's mostly fine. (The biggest problems seem to be the Send bounds, i.e. the machinery again.) Cargo test being multithreaded by default is another big one.
You raise a good point here. When I think about writing multi-threaded code, three things come to mind about why it is so easy in Java and C#: (1) The standard library has lots of support for concurrency. (2) Garbage collection. (3) Debuggers have excellent support for multi-threaded code.
Not really, especially as garbage collection doesn't achieve memory safety. Safety-wise, it only helps avoid UAF due to lifecycle errors.
Garbage collection is primarily just a way to handle non-trivial object lifecycles without manual effort. Parallelism happens to often bring non-trivial object lifecycles, but this is not a major problem in parallelism.
In plain C, the common pattern is trying to keep lifecycles trivial, and the moment this either doesn't make sense or isn't possible, you usually just add a reference count member:
In both Go and C, all types used in concurrent code needs to be reviewed for thread-safety, and have appropriate serialization applied - in the C case, this just also includes the refcnt itself. And yes you could have UAF or leak if you don't call ref/unref correctly, but that' sunrelated to parallism - it's just everyday life in manual memory management land.
The issues with parallelism is the same in Go and C, that you might have invalid application states, whether due to missing serialization - e.g., forgetting to lock things appropriately or accidentally using types that are not thread safe at all - or due to business logic flaws (say, two threads both sleeping, waiting for the other one to trigger an event and wake it up).
> (As a personal anecdote, I've probably run into more concurrency-related heisenbugs in Go than I ever did in C, with C heisenbugs more commonly being memory mismanagement in single-threaded code with complex object lifetimes/ownership structures...)
Yes. All `&mut` references in Rust are equivalent to C's `restrict` qualified pointers. In the past I measured a ~15% real world performance improvement in one of my projects due to this (rustc has/had a flag where you can turn this on/off; it was disabled by default for quite some time due to codegen bugs in LLVM).
I was confused by this at first since `&T` clearly allows aliasing (which is what C's `restrict` is about). But I realize that Steve meant just the optimization opportunity: you can be guaranteed that (in the absence of UB), the data behind the `&T` can be known to not change in the absence of a contained `UnsafeCell<T>`, so you don't have to reload it after mutations through other pointers.
Yes. It's a bit tricky to think about, because while it is literally called 'noalias', what it actually means is more subtle. I already linked to a version of the C spec below, https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3220.pdf but if anyone is curious, this part is in "6.7.4.2 Formal definition of restrict" on page 122.
In some ways, this is kind of the core observation of Rust: "shared xor mutable". Aliasing is only an issue if the aliasing leads to mutability. You can frame it in terms of aliasing if you have to assume all aliases can mutate, but if they can't, then that changes things.
I used to use it, but very rarely, since it's instant UB if you get it wrong. In tiny codebases which you can hold in your head it's probably practical to sprinkle it everywhere, but in anything bigger it's quite risky.
Nevertheless, I don't write normal everyday C code anymore since Rust has pretty much made it completely obsolete for the type of software I write.
restrict works by making some situations undefined behavior that would otherwise be defined without it. It is probably unwise to use casually or habitually.
But of course the only thing restrict does in C is potentially introduce certain kinds of undefined behavior into a program that would be correct without it (and then things can be optimized on the assumption that the code is not invoked in a way that it would happen)
Aliasing info is gold dust to a compiler in various situations although the absence of it in the past can mean that they start smoking crack when it's provided.
The simplest example is `memcpy(dst, src, len)` and similar iterative byte copying operations. If the function did not use noalias, the compiler wouldn't be free to optimize individual byte read/writes into register-sized writes, as the destination may overlap with the source. In practice this means 8x more CPU instructions per copy operation on a 64-bit machine.
Note that memcpy specifically may already be implemented this way under the hood because it requires noalias; but I imagine similar iterative copying operations can be optimized in a like manner ad-hoc when aliasing information is baked in like it is with Rust.
That's not a great example, since memcpy() already has all the information it needs to determine whether the regions overlap (src < dest + len && dst < src + len) and even where and by how much. So pretty much any quality implementation is already performing this test and selecting an appropriate code path, at the cost of a single branch rather than 8x as many memory operations.
The real purpose of restrict is to allow the compiler to cache a value that may be used many times in a loop (in memcpy(), each byte/word is used only once) in a register at the start of the loop, and not have to worry about repeatedly reaching back to memory to re-retrieve it because it might have been modified as a side effect of the loop body.
Say you have 2 pointers (that might overlap). You (or the compiler) keep one value read from the first pointer in a register, since the value is needed multiple times.
You then write access the second pointer. Now the value you kept in the register is invalidated since you might have overwritten it through the overlapping pointers.
Yes. Specifically since Rust's design prevents shared mutablity, if you have 2 mutable data-structures you know that they don't alias which makes auto vectorization a whole lot easier.
what about generics (equivalent to templates in C++), which allow compile time optimizations all the way down which may not possible if the implementation is hidden behind a void*?
Unless you use `dyn`, all code is monomorphized, and that code on its own will get optimized.
This does come with code-bloat. So the Rust std sometimes exposes a generic function (which gets monomorphized), but internally passes it off to a non-generic function.
This to avoid that the underlying code gets monomorphized.
> This does come with code-bloat. So the Rust std sometimes exposes a generic function (which gets monomorphized), but internally passes it off to a non-generic function.
There's no free lunch here. Reducing the amount of code that's monomorphised reduces the code emitted & improves compile times, but it reduces the scope of the code that's exposed to the input type, which reduces optimisation opportunities.
In C, the only way to write a monomorphized hash table or array list involves horribly ugly macros that are difficult to write and debug. Rust does monomorphization by default, but you can also use &dyn trait for vtable-like behaviour if you prefer.
I think the way Rust checks borrows also makes it a lot more feasible to avoid allocations/copies; not because it is impossible to do in C, but because doing it in C requires writing very careful documentation and the caller to actually read that documentation. In (safe) Rust this is all checked by the compiler such that libraries can leverage it without blowing their complexity budget.
The Rust version of this is "turn .iter() into .par_iter()."
It's also true that for both, it's not always as easy as "just make the for loop parallel." Stylo is significantly more complex than that.
> to this I sigh in chrome.
I'm actually a Chrome user. Does Chrome do what Stylo does? I didn't think it did, but I also haven't really paid attention to the internals of any browsers in the last few years.
Concurrency is easy by default. The hard part is when you are trying to be clever.
You write concurrent code in Rust pretty much in the same way as you would write it in OpenMP, but with some extra syntax. Rust catches some mistakes automatically, but it also forces you to do some extra work. For example, you often have to wrap shared data in Arc when you convert single-threaded code to use multiple threads. And some common patterns are not easily available due to the limited ownership model. For example, you can't get mutable references to items in a shared container by thread id or loop iteration.
> For example, you can't get mutable references to items in a shared container by thread id or loop iteration.
This would be a good candidate for a specialised container that internally used unsafe. Well, thread id at least; since the user of an API doesn't provide it, you could mark the API safe, since you wouldn't have to worry about incorrect inputs.
Loop iteration would be an input to the API, so you'd mark the API unsafe.
> When writing single-threaded code takes 90% of effort of writing multi-threaded one
That "when" is doing some heavy lifting! More seriously: You raise a very interesting point. When I moved from C++ to Java (10+ years ago), I was initially so nervous to add threads to my Java code. Why? Because it was (then) difficult and dangerous to do it in C++. C++ debuggers were awful, so I didn't think I could debug problems with multi-threaded C++ code. (Of course, the C++ ecosystem has drastically improved in the last 10 years, so I am sure it is now much more pleasant (and safe) to write multi-threaded C++ code.) When I finally sat down to add threads to some Java code, I could not believe how easy it was, including debugging. As a result, going forward, I was much more likely to add threads to my Java... or even start with a multi-threaded design, even if there is only a modest performance improvement.
Even just spawning a thread is going to make somebody complain that they can't build the code on their platform due to C11/pthread/openmp.
This matches squarely with my experience, but it's not limited to threading, and Rust evades a large swath of these problems by relatively limited platform support. I look forward to the day I can run Rust wherever I run C!
While Rust doesn't have C coverage, it has (by my last check) better coverage than something like CPython currently does.
The big thing though is Rust is honest about their tiers of support, whereas for many projects "supported platform" for minor platforms often mean "it still compiles (at least we think it does, when the maintainer tries it and it fails they will fix it)"
Not to be too glib though, there are obviously tools out there that have as much or more rigor than Rust and cover more platforms. Just... "supported platforms" means different things in different contexts.
All too common (not just with compilers) for someone to port the subset they care about and declare it done. Rust's decision to create standards of compliance and be conscious about which platforms are viable targets and which ones don't meet their needs is a completely valid way to ensure that whole classes of trouble never come. I think it's a completely valid approach, despite complaints from some.
In C, one can build data structures with pointers that would require reference counting and heap allocation in Rust. The performance would also depend on what kind of CPU/features it is compiled for.
I'm still confused as to why linux requires linking against TBB for multithreading, thus breaking cmake configs without if(linux) for tbb. That stuff should be included by default without any effort by the developer.
I don't know the details since I'm mainly a windows dev, but when porting to linux, TBB has always been a huge pain in the ass since it's a suddenly additionally required dependency by gcc. Using C++ and std::thread.
Also clang, and in general parallel algorithms aren't available outside of platforms not supported by TBB.
C++26 will get another similar dependency, because BLAS algorithms are going to be added, but apparently the expectation is to build on top of C/Fortran BLAS battle tested implementations.
CPUs are most energy efficient sitting idle doing nothing, so finishing work sooner in wall-clock time usually helps despite overheads.
Energy usage is most affected by high clock frequencies, and CPUs will boost clocks for single-threaded code.
Threads waiting on cache misses let CPU use hyperthreading, which is actually energy efficient (you get context switching in hardware).
You can waste energy in pathological cases if you overuse spinlocks or spawn so many threads that bookkeeping takes more work than what the threads do, but helper libraries for multithreading all have thread pools, queues, and dynamic work splitting to avoid extreme cases.
Most of the time low speed up is merely Amdahl's law – even if you can distribute work across threads, there's not enough work to do.
Multithreading does not make code more efficient. It still takes the same amount of work and power (slightly more).
On a backend system where you already have multiple processes using various cores (databases, web servers, etc) it usually doesn’t make sense as a performance tool.
And on an embedded device you want to save power so it also rarely makes sense.
According to [1], the most important factor for the power consumption of code is how long the code takes to run. Code that spreads over multiple cores is generally more power efficient than code that runs sequentially, because the power consumption of multiple cores grows less than linearly (that is, it requires less than twice as much power to run two cores as it does one core).
Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so. Obviously if this is important in a particular context, it's probably worth measuring it in that context (e.g. embedded devices), but I suspect this is true more often than it isn't true.
>Therefore if parallelising code reduces the runtime of that code, it is almost always more energy efficient to do so
Only if it leads to better utilisation. But in the scenario that the parent comment suggests, it does not lead to better utilisation as all cores are constantly busy processing requests.
Throughput as well as CPU time across cores remains largely the same regardless of whether or not you paralellise individual programs/requests.
That's true, which is why I added the caveat that this is only true if parallelising reduces the overall runtime - if you can get in more requests per second through parallelisation. And the flip side of that is that if you're able to perfectly utilise all cores then you're already running everything in parallel.
That said, I suspect it's a rare case where you really do have perfect core utilisation.
> Multithreading does not make code more efficient. It still takes the same amount of work and power (slightly more).
In addition to my sibling comments I would like to point out that multithreading quite often can save power. Typically the power consumption of an all core load is within 2x the power consumption of a single core load, while being many times faster assuming your task parallelizes well. This makes sense b/c a fully loaded cpu core still needs all the L3 cache mechanisms, all the DRAM controller mechanisms, etc to run at full speed. A fully idle system on the other hand can consume very little power if it idles well(which admittedly many cpus do not idle on low power).
Edit:
I would also add that if your system is running a single threaded database, and a single threaded web server, that still leaves over a hundred of underutilized cores on many modern server class cpus.
If you use a LAMP style architecture with a scripting language handling requests and querying a database, you can never write a single line of multithreaded code and already are setup to utilize N cores.
Each web request can happen in a thread/process and their queries and spawns happen independently as well.
Multithreading can made an application more responsive and more performant to the end user. If multithreading causes an end user to have to wait less, the code is more performant.
> Are people making user facing apps in rust with uis?
We are talking not only about Rust, but also about C and C++. There are lots of C++ UI applications. Rust poses itself as an alternative to C++, so it is definitely intended to be used for UI applications too - it was created to write a browser!
At work I am using tools such as uv [1] and ruff [2], which are user-facing (although not GUI), and I definitely appreciate a 16x speedup if possible.
The engine being written in C++ does not mean the application is. You're conflating the platform with what is being built on top of it. Your logic would mean that all Python applications should be counted as C applications.
When a basic question is asked, a basic answer is given. I didn’t say that I think that’s the coolest or most interesting answer. It’s just the most obvious, straightforward one. It’s not even about Rust!
(And also, I don’t think things like work stealing queues are relevant to editors, but maybe that’s my own ignorance.)
You cannot have it both ways though. Either these are meaningful examples of Rust's benefits, or they are not worth mentioning.
In a thread about Rust's concurrency advantages, these editors were cited as examples. "Don't block the UI thread" as justification only works if Rust actually provides something novel here. If it is just basic threading that every language has done for decades, it should not have been brought up as evidence in the first place.
Plus if things like work-stealing queues and complex synchronization are not relevant to editors, then editors are a poor example for demonstrating Rust's concurrency story in the first place anyway.
Well, what about small CLI tools, like ripgrep and the like? Does multithreading not matter when we open a large number of files and process them? What about compilers?
Sure. But the more obviously parallel the problem is (visiting N files) the less compelling complex synchronization tools are.
To over explain, if you just need to make N forks of the same logic then it’s very easy to do this correctly in C. The cases where I’m going to carefully maintain shared mutable state with locking are cases where the parallelism is less efficient (Ahmdal’s law).
Java style apps that just haphazardly start threads are what rust makes safer. But that’s a category of program design I find brittle and painful.
The example you gave of a compiler is canonically implemented as multiple process making .o files from .c files, not threads.
> The example you gave of a compiler is canonically implemented as multiple process making .o files from .c files, not threads.
This is a huge limitation of C's compilation model, and basically every other language since then does it differently, so not sure if that's a good example. You do want some "interconnection" between translation units, or at least less fine-grained units.
It reminds me of the joke that "I can do math very fast", probed with a multiplication and immediately answering some total bollocks answer.
- "That's not even close"
- "Yeah, but it was fast"
Sure, it's not a trivial problem, but why wouldn't we want better compilation results/developer ergonomics at the price of more compiler complexity and some minimal performance penalty?
And it's not like the performance doesn't have its own set of negatives, like header-only libraries are a hack directly manifested from this compilation model.
I almost ignored this post because I can't stand this particular war, where examples are cherry picked to prove either answer.
I'm very happy to see the nuanced take in this article, slowly deconstructing the implicit assumptions proposed by the person asking this question, to arrive at the same conclusion that I long have. I hope this post reaches the right people.
A particular language doesn't have a "speed", a particular implementation may have, and the language may have properties that make it difficult to make a fast implementation (of those specific properties/features) given the constraints of our current computer architectures. Even then, there's usually too many variables to make a generalized statement, and the question often presumes that performance is measured as total cpu time.
We recently had a post here where the claim being refuted was in quotes in the title, but half the comments were as if the article were making the claim, clearly indicating that people didn't read it (and don't understand how quote marks work).
From the other side of the table, I love performance comparisons, so I always read these things. I also enjoyed your commentary, thanks for writing it :)
I agree, I thought this was a weird example - but I'm not a Rust nor C programmer.
I assume this example is used because programmers of either language reach for asm when looking for raw performance. But to me, it's shouldn't even be a discussion point, since even I know both languages can be made to emit the same assembly.
Also, I think it side-steps the hard parts of the question - which is, what are the performance impacts of Rust safety?
>, what are the performance impacts of Rust safety?
None from the aspect of borrow model, since that is all compile time.
For safety at run time, there is a hit if you use certain structures because you can't figure out at compile time what things like dynamic bounds are. But its no different than C with having to do manual bounds checking.
One example where Rust enables better and faster abstractions is traits. C you can do this with some ugly methods like macros and such but in Rust it’s not the implementers choice it’s the callers choice whether to use dynamic dispatch (function pointer table in C) or static dispatch (direct function calls!)
In c the caller isn’t choosing typically. The author of some library or api decides this for you.
This turns out to be fairly significant in something like an embedded context where function pointers kill icache and rob cycles jumping through hoops. Say you want to bit bang a bus protocol using GPIO, in C with function pointers this adds maybe non trivial overhead and your abstraction is no longer (never was) free. Traits let the caller decide to monomorphize that code and get effectively register reads and writes inlined while still having an abstract interface to GPIO. This is excellent!
I probably enjoy ELF hacking more than most, but patching an ELF binary via LD_PRELOAD, linker hacks, or even manual or assisted relinking tricks are just tools in the bag of performant C/C++ (and probably Rust too, but I don't get paid to make that fast). If you care about perf and for whatever reason are using someone else's code, you should be intimately familiar with your linker, binary format, ABI, and OS in addition to your hardware. It's all bytes in the end, and these abstractions are pliable with standard tooling.
I'd usually rather have a nice language-level interface for customizing implementation, but ELF and Linux scripting is typically good enough. Binary patching is in a much easier to use place these days with good free tooling and plenty of (admittedly exploit-oriented) tutorials to extrapolate from as examples.
In C++ you do it the other way around, have a single class that is polymorphic over templates. The name of this technique within C++ is type-erasure (that term means something else outside of C++).
Examples of type erasure in C++ are classes like std::function and std::any, and normally you need to implement the type erasure manually, but there are some library that can automate it to a degree, such as [1], but it's fairly clumsy.
how do apis typically manage to actually « use » the « bar » of your example, such as storing it somewhere, without enforcing some kind of constraints ?
Depending on exactly what you mean, this isn't correct. This syntax is the same as <T: BarTrait>, and you can store that T in any other generic struct that's parametrized by BarTrait, for example.
> you can store that T in any other generic struct that's parametrized by BarTrait, for example
Not really. You can store it on any struct that specializes to the same type of the value you received. If you get a pre-built struct from somewhere and try to store it there, your code won't compile.
I'm addressing the intent of the original question.
No one would ask this question in the case where the struct is generic over a type parameter bounded by the trait, since such a design can only store a homogeneous collection of values of a single concrete type implementing the trait; the question doesn't even make sense in that situation.
The question only arises for a struct that must store a heterogeneous collection of values with different concrete types implementing the trait, in which case a trait object (dyn Trait) is required.
It's a tradeoff though, as I think traits makes the Rust build times grow really quickly. I don't know the exact characteristics of it, also I think they speed it up compared to how it used to be, but I do remember that you'll get noticeable build slowdowns the more you use traits, especially "complicated" ones.
Absolutely, was not trying to claim otherwise. But since we're engineers (at least I like to see myself as one), it's worth always keeping in mind that almost everything comes with tradeoffs, even traits :)
Someone down the line might be wondering why suddenly their Rust builds take 4x the time after merging something, and just maybe remembering this offhand comment will make them find the issue faster :)
It's never the case that only one thing is important.
In the extreme, you surely wouldn't accept a 1 day or even 1 week build time for example? It seems like that could be possible and not hypothetical for a 1 week build since a system could fuzz over candidate compilation, and run load tests and do PGO and deliver something better. But even if runtime performance was so important that you had such a system, it's obvious you wouldn't ever have developer cycles that take a week to compile.
Build time also even does matter for release: if you have a critical bug in production and need to ship the fix, a 1 hour build time can still lose you a lot here. Release build time doesn't matter until it does.
A lot of C++ devs advocate for simple replacements for the STL that do not rely too much on zero-cost abstractions. That way you can have small binaries, fast compiles, and make a fast-debug kinda build where you only turn on a few optimizations.
That way you can get most of the speed of the Release version, with a fairly good chance of getting usable debug info.
A huge issue with C++ debug builds is the resulting executables are unusably slow, because the zero-cost abstractions are not zero cost in debug builds.
Its not just the compiler - MSVC like all others has a tendency to mangle code in release builds to such an extent that the debug info is next to useless (which to be fair is what I asked it to do, not that I fault it).
Now to hate a bit on MSVC - its Edit & Continue functionality makes debug builds unbearably slow, but at least it doesn't work, so my first thing is to turn that thing off.
You can debug release builds with gcc/clang just fine. They don't generate debug information by default, but you can always request it ("-O3 -g" is a perfectly fine combination of flags).
I think this also massively depends on your domain, familiarity with the code base and style of programming.
I've changed my approach significantly over time on how I debug (probably in part due to Rusts slower compile times), and usually get away with 2-3 compiles to fix a bug, but spend more time reasoning about the code.
Folks have worked tirelessly to improve the speed of the Rust compiler, and it's gotten significantly faster over time. However, there are also language-level reasons why it can take longer to compile than other languages, though the initial guess of "because of the safety checks" is not one of them, those are quite fast.
> How slow are we talking here?
It really depends on a large number of factors. I think saying "roughly like C++" isn't totally unfair, though again, it really depends.
My initial guess would be "because of the zero-cost abstractions", since I read "zero-cost" as "zero runtime cost" which implies shifting cost from runtime to compile time—as would happen with eg generics or any sort of global properties.
(Uh oh, there's an em-dash, I must be an AI. I don't think I am, but that's what an AI would think.)
People do have cold Rust compiles that can push up into measured in hours. Large crates often take design choices that are more compile time friendly shape.
Note that C++ also has almost as large problem with compile times with large build fanouts including on templates, and it's not always realistic for incremental builds to solve either especially time burnt on linking, e.g. I believe Chromium development often uses a mode with .dlls dynamic linking instead of what they release which is all static linked exactly to speed up incremental development. The "fast" case is C not C++.
> I believe Chromium development often uses a mode with .dlls dynamic linking instead of what they release which is all static linked exactly to speed up incremental development. The "fast" case is C not C++.
There's no Rust codebase that takes hours to compile cold unless 1) you're compiling a massive codebase in release mode with LTO enabled, in which case, you've asked for it, 2) you've ported Doom to the type system, or 3) you're compiling on a netbook.
I'm curious if this is tracked or observed somewhere; crater runs are a huge source of information, metrics about the compilation time of crates would be quite interesting.
AFAIK, it's not the traits that does it but rather the generics.
Rust does make it a lot easier to use generics which is likely why using more traits appears to be the cause of longer build times. I think it's just more that the more traits you have, the more likely you are to stumble over some generic code which ultimately generates more code.
> AFAIK, it's not the traits that does it but rather the generics.
Aah, yes, that sounds more correct, the end result is the same, I failed to remember the correct mechanism that led to it. Thank you for the correction!
I think personally the answer is "basically no", Rust, C and C++ are all the same kind of low-level languages with the same kind of compiler backends and optimizations, any performance thing you could do in one you can basically do in the other two.
However, in the spirit of the question: someone mentioned the stricter aliasing rules, that one does come to mind on Rust's side over C/C++. On the other hand, signed integer overflow being UB would count for C/C++ (in general: all the UB in C/C++ not present in Rust is there for performance reasons).
Another thing I thought of in Rust and C++s favor is generics. For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it. I don't know if C compilers specialize qsort() based on comparison function this way. They might, but it's certainly a lot more to ask of the compiler, and I would argue there are probably many cases like this where C++ and Rust can outperform C because of their much more powerful facilities for specialization.
I agree with this whole-heartedly. Rust is a LANGUAGE and C is a LANGUAGE. They are used to describe behaviours. When you COMPILE and then RUN them you can measure speed, but that's dependent on two additional bits that are not intrinsically part of the languages themselves.
Now: the languages may expose patterns that a compiler can make use of to improve optimizations. That IS interesting, but it is not a question of speed. It is a question of expressability.
No. As you've made clear, it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.
Saying that a language is about "expressability" is obvious. A language is nothing other than a form of expression; no more, no less.
Yes. But the speed is dependent on whether or not the compiler makes use of that information and the machine architecture the compiler is running it on.
Speed is a function of all three -- not just the language.
Optimizations for one architecture can lead to perverse behaviours on another (think cache misses and memory layout -- even PROGRAM layout can affect speed).
These things are out of scope of the language and as engineers I think we ought to aim to be a bit more precise. At a coarse level I can understand and even would agree with something like "Python is slower than C", but the same argument applies there as well.
But at some point objectivity ought to enter the playing field.
> ... it's a question of being able to express things in a way that gives more information to a compiler, allowing it to create executables that run faster.
There is expressing idea via code, and there is optimization of code. They are different. Writing what one may think is "fully optimized code" the first time is a mistake, every time, and usually not possible for a codebase of any significant size unless you're a one-in-a-billion savant.
Programming languages, like all languages, are expressive, but only as expressive as the author wants to be, or knows how to be. Rarely does one write code and think "if I'm not expressive enough in a way the compiler understands, my code might be slightly slower! Can't have that!"
No, people write code that they think is correct, compile it, and run it. If your goal is to make the most perfect code you possibly can, instead of the 95% solution is the robust, reliable, maintainable, and testable, you're doing it wrong.
Rust is starting to take up the same mental headspace as LLMs: they're both neat tools. That's it. I don't even mind people being excited about neat tools, because they're neat. The blinders about LLMs/Rust being silver bullets for the software industry need to go. They're just tools.
>in Rust and C++, the standard library sorting functions are templated on the comparison function. This means it's much easier for the compiler to specialize the sorting function, inline the comparisons and optimize around it.
I think this is something of a myth. Typically, a C compiler can't inline the comparison function passed to qsort because libc is dynamically linked (so the code for qsort isn't available). But if you statically link libc and have LTO, or if you just paste the implementation of qsort into your module, then a compiler can inline qsort's comparison function just as easily as a C++ compiler can inline the comparator passed to std::sort. As for type-specific optimizations, these can generally be done just as well for a (void *) that's been cast to a T as they can be for a T (though you do miss out on the possibility of passing by value).
That said, I think there is an indirect connection between a templated sort function and the ability to inline: it forces a compiler/linker architecture where the source code of the sort function is available to the compiler when it's generating code for calls to that function.
qsort is obviously just an example, this situation applies to anything that takes a callback: in C++/Rust, that's almost always generic and the compiler will monomorphize the function and optimize around it, and in C it's almost always a function pointer and a userData argument for state passed on the stack. (and, of course, it applies not just to callbacks, but more broadly to anything templated).
I'm actually very curious about how good C compilers are at specializing situations like this, I don't actually know. In the vast majority cases, the C compiler will not have access to the code (either because of dynamic linking like in this example, or because the definition is in another translation unit), but what if it does? Either with static linking and LTO, or because the function is marked "inline" in a header? Will C compilers specialize as aggressively as Rust and C++ are forced to do?
If anyone has any resources that have looked into this, I would be curious to hear about it.
If you choose to put a boundary in your code that makes it span over several binaries, so that they can be swapped out at runtime, no compiler in any language can optimize that away, because that would be against the interface you explicitly chose. That's what dynamic linking aka. runtime linking is in C.
This is not an issue for libc, because the behaviour of that is not specified by the code itself, but by the spec, which is why C compilers can and do completely remove or change calls to libc, much to the distress of someone expecting a portable assembler.
Dynamic linking will inhibit inlining entirely, and so yes qsort does not in practice get inlined if libc is dynamically linked. However, compilers can inline definitions across translation units without much of any issue if whole program optimization is enabled.
The use of function pointers doesn't have much of an impact on inlining. If the argument supplied as a parameter is known at compile time then the compiler has no issue performing the direct substitution whether it's a function pointer or otherwise.
My point is that the real issue is just whether or not the function call is compiled as part of the same unit as the function. If it is, then, certainly, modern C compilers can inline functions called via function pointers. The inlining itself is not made easier via the template magic.
Your C comparator function is already “monomirphized” - it’s just not type safe.
Wouldn't C++ and Rust eventually call down into those same libc functions?
I guess for your example, qsort() it is optional, and you can chose another implementation of that. Though I tend to find that both standard libraries tend to just delegate those lowest level calls to the posix API.
Obviously. How about more complex things like multi-threading APIs though? Can the Rust compiler determine that the subject program doesn't need TLS and produce a binary that doesn't set it up at all, for example?
Optimising out TLS isn't going to be a good example of compiler capability. Whether another thread exists is a global property of a process, and beyond that the system that process operates in.
The compiler isn't going to know for instance that an LD_PRELOAD variable won't be set that would create a thread.
> Say the program is not dynamically linked. Still no?
Whether the program has dynamic dependencies does not dictate whether a thread can be created, that's a property of the OS. Windows has CreateRemoteThread, and I'd be shocked if similar capabilities didn't exist elsewhere.
If I mark something as thread-local, I want it to be thread-local.
I mean, it’s not that obvious, your parent asked about it directly, and you could easily imagine calling it libc for this.
I beehive the answer to your question is “yes” because no-std binaries can be mere bytes in size, but I suspect that more complex programs will almost always have some dependency somewhere (possibly even the standard library, but I don’t know offhand) that uses TLS somewhere in it.
There was a contest for which language the fastest tokenizer could be written in. I entered my naive 15 minutes Rust version and got second place among roughly 30 entries. First place was hand-crafted assembly.
I am not saying Rust is faster always. But it can be a damn performant language even if you don't think about performance too deeply or don't twist yourself into bretzels to write performant code.
And in my book that counts for something. Because yes, I want my code to be performant, but I'd also not have it blow up on edge cases, have a way to express limitations (like a type system) and have it testable. Rust is pretty good even if you ignore the hype. I write audio DSP code on embedded devices with a strict deadline in C++. I plan to explore Rust for this, especially now since more and more embedded devices start to have more than one processor core.
> On the other hand, signed integer overflow being UB would count for C/C++
C and C++ don't actually have an advantage here because this is only limited to signed integers unless you use compiler-specific intrinsics. Rust's standard library allows you to make overflow on any specific arithmetic operation UB on both signed and unsigned integers.
It's interesting, because it's a "cultural" thing like the author discusses, it's a very good point. Sure, you can do unsafe integer arithmetic in Rust. And you can do safe integer arithmetic with overflow in C/C++. But in both cases, do you? Probably you don't in either case.
"Culturally", C/C++ has opted for "unsafe-but-high-perf" everywhere, and Rust has "safe-but-slightly-lower-perf" everywhere, and you have to go out of your way to do it differently. Similarly with Zig and memory allocators: sure, you can do "dynamically dispatched stateful allocators that you pass to every function that allocates" in C, but do you? No, you probably don't, you probably just use malloc().
On the other hand: the author's point that the "culture of safety" and the borrow checker in Rust frees your hand to try some things in Rust which you might not in C/C++, and that leads to higher perf. I think that's very true in many cases.
Again, the answer is more or less "basically no, all these languages are as fast as each other", but the interesting nuance is in what is natural to do as an experienced programmer in them.
C++ isn't always "unsafe-but-high-perf". Move semantics are a good example. The spec goes to great lengths to ensure safety in a huge number of scenarios, at the cost of performance. Mostly shows up in two ways: one, unnecessary destructor calls on moved out objects, and two, allowing throwing exceptions in move constructors which prevents most optimizations that would be enabled by having move constructors in the first place (there was an article here recently on this topic).
Another one is std::shared_ptr. It always uses atomic operations for reference counting and there's no way to disable that behavior or any alternative to use when you don't need thread safety. On the other hand, Rust has both non-atomic Rc and atomic Arc.
> one, unnecessary destructor calls on moved out objects
That issue predates move semantics by ages. The language always had very simple object life times, if you create Foo foo; it will call foo.~Foo() for you, even if you called ~Foo() before. Anything with more complex lifetimes either uses new or placement new behind the scenes.
> Another one is std::shared_ptr.
From what I understand shared_ptr doesn't care that much about performance because anyone using it to manage individual allocations already decided to take performance behind the shed to be shot, so they focused more on making it flexible.
C++11 totally could have started skipping destructors for moved out values only. They chose not to, and part of the reason was safety.
I don't agree with you about shared_ptr (it's very common to use it for a small number of large/collective allocations), but even if what you say is true, it's still a part of C++ that focuses on safety and ignores performance.
Bottom line - C++ isn't always "unsafe-but-high-perf".
The rust standard library does make targeted use of unchecked arithmetic when the containing type can ensure that that overflow never happens and benchmarks have shown that it benefits performance. E.g. in various iterator implementations. Which means the unsafe code has to be written and encapsulated once, users can now use safe for loops and still get that performance benefit.
The main performance difference between Rust, C, and C++ is the level of effort required to achieve it. Differences in level of effort between these languages will vary with both the type of code and the context.
It is an argument about economics. I can write C that is as fast as C++. This requires many times more code that takes longer to write and longer to debug. While the results may be the same, I get far better performance from C++ per unit cost. Budgets of time and money ultimately determine the relative performance of software that actually ships, not the choice of language per se.
I've done parallel C++ and Rust implementations of code. At least for the kind of performance-engineered software I write, the "unit cost of performance" in Rust is much better than C but still worse than C++. These relative costs depend on the kind of software you write.
I like this post. It is well-balanced. Unfortunatley, we don't see enough of this in discussions of Rust vs $lang. Can you share a specific example of where the "unit cost of performance" in Rust is worse than C++?
I generally agree with your take, but I don't think C is in the same league as Rust or C++. C has absolutely terrible expressivity, you can't even have proper generic data structures. And something like small string optimization that is in standard C++ is basically impossible in C - it's not an effort question, it's a question of "are you even writing code, or assembly".
Yes, it is the difference between "in theory" and "in practice". In practice, almost no one would write the C required to keep up with the expressiveness of modern C++. The difference in effort is too large to be worth even considering. It is why I stopped using C for most things.
There is a similar argument around using "unsafe" in Rust. You need to use a lot of it in some cases to maintain performance parity with C++. Achievable in theory but a code base written in this way is probably going to be a poor experience for maintainers.
Each of these languages has a "happy path" of applications where differences in expressivity will not have a material impact on the software produced. C has a tiny "happy path" compared to the other two.
> On the other hand, signed integer overflow being UB would count for C/C++
Rust defaults to the platform treatment of overflows. So it should only make any difference if the compiler is using it to optimize your code, what will most likely lead to unintended behavior.
Rust's overflow behavior isn't platform-dependent. By default, Rust panics on overflow when compiled in debug mode and wraps on overflow when compiled in release mode, and either behavior can be selected in either mode by a compiler flag. In neither case does Rust consider it UB for arithmetic operations to wrap.
Writing a function with UB for overflows doesn't cause unintended behavior if you're doing it to signal there aren't any overflows. And it's very important because it's needed to do basically any loop rewriting.
On the other hand, writing a function that recovers from overflows in an incorrect/useless way still isn't helpful if there are overflows.
This is a tangent, because it clearly didn’t pan out, but I had hope for rust having an edge when I learned about how all objects are known to be immutable or not. This means all the mutable objects can be held together, as well as the immutable, and we’d have more efficient use of the cache: memory writes to mutable objects share the cache with other mutable objects, not immutable
Objects, and the bandwidth isn’t wasted on writing back bytes of immutable objects that will never change.
As I don’t see any reason rust would be limited in runtime execution compared to c, I was hoping for this proving an edge.
I think it would be quite difficult to actually arrange the memory layout to take advantage of this in a useful way. Mutable/immutable is very context-dependent in rust.
Rust doesn't have immutable memory, only access restrictions. An exclusive owner of an object can always mutate it, or can lend temporary read-only access to it. So the same memory may flip between exclusive-write and shared-read back and forth.
It's an interesting optimization, but not something that could be done directly.
> For instance, in C, qsort() takes a function pointer for the comparison function, in Rust and C++, the standard library sorting functions are templated on the comparison function.
That's more of a critique of the standard libraries than the languages themselves.
If someone were writing C and cared, they could provide their own implementation of sort such that the callback could be inlined (LLVM can inline indirect calls when all call sites are known), just as it would be with C++'s std::sort.
Further, if the libc allows for LTO (active area of research with llvm-libc), it should be possible to optimize calls to qsort this way.
"could" and "should" are doing some very theoretical heavy lifting here.
Sure, at the limit, I agree with you, but in reality, relying on the compiler to do any optimization that you care about (such as inlining an indirect function call in a hot loop) is incredibly unwise. Invariably, in some cases it will fail, and it will fail silently. If you're writing performance critical code in any language, you give the compiler no choice in the matter, and do the optimization yourself.
I do generally agree that in the case of qsort, it's an API design flaw
It's just a generic sorting function. If you need more you're supposed to write it yourself. The C standard library exists for convenience not performance.
> That's more of a critique of the standard libraries than the languages themselves.
But we're right to criticise the standard libraries. If the average programmer uses standard libraries, then the average program will be affected (positively and negatively) by its performance and quirks.
I’m not sure about the other UB opportunities, but in idiomatic rust code this just doesn’t come up.
In C, you frequently write for loops with signed integer counters for the compiler to realize the loop must hit the condition. In Rust you write for..each loops or invoke heavily inlined functional operators. It ends up all lowering to the same assembly. C++ is the worst here because size_t is everywhere in the standard library so you usually end up using size_t for the loop counter, negating the ability for the compiler to exploit UB.
Interestingly enough, Zig does not use the same terminology as C/C++/Rust do here. Zig has "illegal behavior," which is either "safety checked" or "unchecked." Unchecked illegal behavior is like undefined behavior. Compiler flags and in-source annotations can change the semantics from checked to unchecked or vice versa.
Anyway that's a long way of saying that you're right, integer overflow is illegal behavior, I just think it's interesting.
You're qsort example is basically the same reason people say C++ is faster than Rust. C++ templates are still a lot more powerful than Rusts systems but that's getting closer and closer every day.
They are likely referring to the scope of fine-grained specialization and compile-time codegen that is possible in modern C++ via template metaprogramming. Some types of complex optimizations common in C++ are not really expressible in Rust because the generics and compile-time facilities are significantly more limited.
As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20. In practice, few people have the time or patience to generate this code manually so it doesn't get written.
Effective optimization at scale is difficult without strong metaprogramming capabilities. This is an area of real strength for C++ compared to other systems languages.
Again, can you provide an example or two? Its hard to agree or disagree without an example.
I think all C++ wild template stuff can be done via proc macros. Eg, in rust you can add #[derive(Serialize, Deserialize)] to have a highly performant JSON parser & serializer. And thats just lovely. But I might be wrong? And maybe its ugly? Its hard to tell without real examples.
Specialization isn’t stable in Rust, but is possible with C++ templates. It’s used in the standard library for performance reasons. But it’s not clear if it’ll ever land for users.
> As with C, there is nothing preventing anyone from writing all of that generated code by hand. It is just far more work and much less maintainable than e.g. using C++20.
It's also still less elegant, but compile time codegen for specialisation is part of the language (build system?) with build.rs & macros. serde makes strong use of this to generate its serialisation/deserialisation code.
A few years ago I pulled a rust library into a swift app on ios via static linking & C FFI. And I had a tiny bit of C code bridge the languages together.
When I compiled the final binary, I ran llvm LTO across all 3 languages. That was incredibly cool.
I like to say that there are two primary factors when we talk about how "fast" a language is:
1. What costs does the language actively inject into a program?
2. What optimizations does the language facilitate?
Most of the time, it's sufficient to just think about the first point. C and Rust are faster than Python and Javascript because the dynamic nature of the latter two requires implementations to inject runtime checks all over the place to enable that dynamism. Rust and C simply inject essentially zero active runtime checks, so membership in this club is easy to verify.
The second one is where we get bogged down, because drawing clean conclusions is complicated by the (possibly theoretical) existence of optimizing compilers that can leverage the optimizability inherent to the language, as well as the inherent fragility of such optimizations in practice. This is where we find ourselves saying things like "well Rust could have an advantage over C, since it frequently has more precise and comprehensive aliasing information to pass to the optimizer", though measuring this benefit is nontrivial and it's unclear how well LLVM is thoroughly utilizing this information at present. At the same time, the enormous observed gulf between Rust in release mode (where it's as fast as C) and Rust in debug mode (when it's as slow as Ruby) shows how important this consideration is; Rust would not have achieved C speeds if it did not carefully pick abstractions that were amenable to optimization.
It's also interesting to think about this in terms of the "zero cost abstractions"/"zero overhead abstractions" idea, which Stroustrup wrote as "What you don't use, you don't pay for. What you do use, you couldn't hand code any better". The first sentence is about 1, and the second one is about what you're able to do with 2.
I think there's a third question, but I don't know quite how to phrase it. Maybe "how real-world fast is the language?" or "how fast is the language in the hands of someone who isn't obsessively thinking about speed?"
That is, most of the time, most of the users aren't thinking about how to squeeze the last tenth of a percent of speed out of it. They aren't thinking about speed at all. They're thinking about writing code that works at all, and that hopefully doesn't crash too often. How fast is the language for them? Does it nudge them toward faster code, or slower? Are the default, idiomatic ways of writing things the fast way, or the slow way?
Is Javascript significantly slower? It is extremely common in the real world and so a lot of effort has gone into optimizing it - v8 is very good. Yes C and Rust enable more optimizations: they will be slightly faster, but javascript has had a lot of effort put into making it run fast.
Yes. V8 (and other Javascript JIT engines) are very good, with a lot of effort put into them by talented engineers. But there's a floor on performance imposed by the language's own semantics. Of course, if your program is I/O bound rather than CPU bound (especially at network-scale latencies), this may never be noticeable. But a Javascript program will use significantly more CPU, significantly more memory, and both CPU and memory usage will be significantly more variable and less predictable than a program written in C or Rust.
It's complicated, though mostly that complication doesn't change the overall conclusion.
Much of the language's semantics can be boiled away before JIT compilation, because that flexibility isn't in use at that time, which can be proven by a quick check before entering the hot code. (Or in the extreme, the JIT code doesn't check it at all, and the runtime invalidates that code lazily when an operation is performed that violates those preconditions.) Which thwarts people who do simple-minded comparisons of "what language is fastest at `for (i = 0; i < 10000000; i++) x += 7`?", because the runtime is entirely dominated by the hot loop, and the machine code for the hot loop is identical across all languages tested.
Still: you have to spend time JIT compiling. You have to do some dynamic checks in all but the innermost hot code. You have to materialize data in memory, even if just as a fallback, and you have to garbage collect periodically.
So I agree with your conclusion, except for perhaps un-nuanced use of the term "performance floor" -- there's really no elevated JS floor, at least not a global one; simple JS can generate the same or nearly the same machine code as equivalent C/C++/Rust, will use no more memory, and will never GC. But that floor only applies to a small subset of code (which can be the bulk of the runtime!), and the higher floor does kick in for everything else. So generally speaking, JS can only "be as fast" as non-managed languages for simple programs.
(I'll ignore the situations where the JIT can depend on stricter constraints at runtime than AOT-compiled languages, because I've never seen a real-world situation where it helps enough to counterbalance everything else.)
Yes, for most real-world examples JavaScript is significantly slower; JIT isn’t free and can be very sensitive to small code changes, you also have to consider the garbage collector.
Speed is also not the only metric, Rust and C enable much better control over memory usage. In general, it is easier to write a memory-efficient program in Rust or C than it is in JS.
It's not binary. If you try hard enough, I bet you can make an argument that C is faster and you can make an argument that Rust is faster.
There is a set of programs that you can write in C and that are correct, that you cannot write in Rust without leaning into unsafe code. So if by "Rust" we mean "the safe subset of Rust", then this implies that there must be optimal algorithms that can be written in C but not in Rust.
On the other hand, Rust's ownership semantics are like rocket fuel for the compiler's understanding of aliasing. The inability of compilers to track aliasing precisely is a top inhibitor of load elimination in C compilers (so much so that C compiler writers lean into shady nonsense like strict aliasing, and even that doesn't buy very much precision). But a Rust compiler doesn't need to rely on shady imprecise nonsense. Therefore, there are surely algorithms that, if written in a straightforward way in both Rust and C, will be faster in Rust. I could even imagine there are algorithms for which it would be very unnatural to write the C code in a way that matches Rust's performance.
I'm purely speaking theoretically, I have no examples of either case. Just trying to provide my PL/compiler perspective
> There is a set of programs that you can write in C and that are correct, that you cannot write in Rust without leaning into unsafe code. So if by "Rust" we mean "the safe subset of Rust"
Well, unsafe rust is part of rust. So no, we don’t mean that.
If programs were primarily unsafe rust, no one would use the language. Unsafe rust is strictly more difficult to write than C in certain constructs because the complex invariants the rust language requires remain in unsafe, there’s just no compiler proof checking them being upheld.
Is argue that he’s right that generally it’s referring to safe subset and in practice people relax the conversation with a little unsafe being more ok. But as Steve points out it really depends on the definitions you choose.
In general "Is programming language X faster than Y" is a meaningless question. It mostly comes down to specific implementations - specific compilers, interpreters, etc.
The only case where one language is likely to be inherently faster than another is when the other language is so high level or abstracted away from the processors it is going to run on that an optimizing compiler is going to have a hard time bridging that gap. It may take more work for an optimizing compiler to generate good code for one language than another, for example by having to recognize when aliasing doesn't exist, but again this is ultimately a matter of implementation not language.
Sure, but not all compilers are created equal and are going to go to the same lengths of analysis to discover optimization opportunities, or to have the same quality of code generation for that matter.
It might be interesting to compare LLVM generated code (at same/maximum optimization level) for Rust vs C, which would remove optimizer LOE as a factor and more isolate difficulties/opportunities caused by the respective languages.
Huh. I expected two main advantages on Rust's side: usable multithreading (as mentioned) and stack allocation. For the latter, the ownership model makes it possible to stack-allocate things that you wouldn't dare put on the stack in either C or C++, thus saving malloc and free time as well as second-order effects from avoiding fragmentation.
Does Rust not do this for subtle reasons that I'm missing, or does it just not matter as much as I'd expect it to?
Both of those things are important, sure. I wanted this post to be talking about the higher level conceptual question, and then using interesting examples to tease out various aspects of that discussion, more than "here's what I think are the biggest differences between the two."
I think these two things are also things people would argue about a lot. It's hard to talk about them in a concrete sense of things, rather than just "I feel like code usually does X".
Right, sorry, I didn't mean to imply that you should have covered stack allocation in your post. I think your post covers the right material to make its point.
This is more of a side comment about a different question, perhaps "ok fine, but then what are the language differences that could be performance-relevant for one language or the other, even if (as you say) they don't lead to a yes/no answer for your original question?"
I think the only reasonable way to interpret this question is "is Rust written by reasonably competent Rust developer spending a reasonable amount of time faster/slower than an equally competent C developer spending the same amount of time".
I don't think a language should count as "fast" if it takes an expert or an inordinate amount of time to get good performance, because most code won't have that.
So on those grounds I would say Rust probably is faster than C, because it makes it much much easier to use multithreading and more optimised libraries. For example a lot of C code uses linked lists because they're easy to write in C, even when a vector would be faster and more appropriate. Multithreading can just be a one line change in Rust.
Depends. If it takes an assembly programmer 8 hours to implement <X>, can an equally proficient Python programmer spending 8 hours to implement <X> create a faster program?
Let's say they only need 2 hours to get the <X> to work, and can use the remaining 6 hours for optimizing. Can 6 hours of optimizing a Python program make it faster than the assembly program?
The answer isn't obvious, and certainly depends on the specific <X>. I can imagine various <X> where even unlimited time spent optimizing Python code won't produce faster results than the assembly code, unless you drop into C/C++/Zig/Rust/D and write a native Python extension (and of course, at that point you're not comparing against Python, but that native language).
Or honestly, anything involving a hashmap. Of course you can write those in C, but it’s enough friction that most people won’t for minor things. In Rust, it’s trivial, so people are more likely to use them.
The question is what do we mean by "a fast language"? We could mean it to be how fast the fastest code that a performance expert in that language, with no resource constraints, could write. Or, we can restrict it to "idiomatic" code. Or we can say that a fast language is the one where an average programmer is most likely to produce fast code with a given budget (in which case probably none of the languages mentioned here are among the fastest).
It's compilers and compiler optimizations that make code run fast. The real question is if the Rust language and the richer memory semantics it has help the Rust compiler to provide a bit more context for optimizing that the C compiler wouldn't have do unless you hand optimize your code.
If you do hand optimize your code, all bets are off. With both languages. But I think the notion that the Rust compiler has more context for optimizing than the C compiler is maybe not as controversial as the notion that language X is better/faster than language Y. Ultimately, producing fast/optimal code in C kind of is the whole point of C. And there aren't really any hacks you can do in C that you can't do in Rust, or vice versa. So, it would be hard to make the case that Rust is slower than C or the other way around.
However, there have been a few rewrites of popular unix tools in Rust that benchmark a bit faster than their C equivalents. Could those be optimized in C. Probably; but they just haven't. But there is a case there of arguing that maybe Rust code is a bit easier to make fast than C code.
> It's compilers and compiler optimizations that make code run fast
Well, then in many cases we are talking about LLVM vs LLVM.
> Ultimately, producing fast/optimal code in C kind of is the whole point of C
Mostly a nitpick, but I'm not convinced that's true. The performance queen has been traditionally C++. In C projects it's not rare to see very suboptimal design choices mandated by the language's very low expressivity (e.g. no multi-threading, sticking to an easier data structure, etc).
Compilers are only as good as the semantics you give them. C and C++ both have some pretty bad semantics in many places that heavily encourage inefficient coding patterns.
> It's compilers and compiler optimizations that make code run fast.
Compiler optimisations certainly play a large role, but they're not the only thing. Tracing-moving garbage collectors can trade off CPU usage for memory footprint and allow you to shift costs between them, so depending on the relative cost of CPU and RAM, you could gain speed (throughput) in exchange for RAM at a favourable price.
Arenas also offer a similar tradeoff knob, but they come with a higher development/evolution price tag.
Right, but if we assume that programmers' compensation is statistically correlated with their skill, then we can drop "average" and just talk about budget.
I think when designing a language, and a set of libraries for it, the designer has an idea of how code for said language should be written, what 'idiomatic' code looks like.
In that context, the designer can reason about how should code written that way should perform.
So I think this is a meaningful question for a langauge designer, which makes it a meaningful question for the users as well, when phrased like this:
'How does idiomatic code (as imagined by the language creators) perform in language X vs Y?'
I may be biased, but I think that if you have a budget that's reasonable in the industry for some project size and includes not only the initial development but also maintenance and evolution over the software's lifetime, especially when it's not small (say over 200KLOC), and you want to choose the language that would give you the fastest outcome, you will not get a faster program than if you chose Java. To get a faster program in any language, if possible, would require a significantly higher budget (especially for the maintenance and evolution).
I don't think so, but it may not be far behind. More importantly, though, I'm fairly confident it won't be Assembly, or C, or C++, or Rust, or Zig, but also not Python, or TS/JS. The candidates would most likely include Java, C#, and Go.
Purely by the numbers, an "average programmer" is much more likely to use Javascript, Python, or Java. The native languages have been a bit of a niche field since the late 90's (i.e. heavily slanted towards OS, embedded, and gamedev folks)
Where C application code often suffers, but by no means always, is the use of memory for data structures. A nice big chunk of static memory will make a function fast, but I’ve seen many C routines malloc memory, do a strcpy, compute a bit, and free it at the end, over and over, because there’s no convenient place to retain the state. There are no vectors, no hash maps, no crates.io and cargo to add a well-optimized data structure library.
It is for this reason I believe that Rust, and C++, have an advantage over C when it comes to writing fast code, because it’s much easier to drop in a good data structure. To a certain extent I think C++ has an advantage over Rust due to easier and better control over layout.
I'd certainly agree that malloc is the Achilles heel of any real world C. Overall though C++ was not a particularly good solution to memory efficiency since having OO available made the situation look like a fast sprint to the cake shop.
Heavy smalltalk-style OOP in C++ has kind of died out, especially with data structures. So with any templated data structure you’re reducing indirection from vtables and you have the opportunity to allocate however you want, often in continuous slabs to ease memory transfer and caching.
It's easier to write faster code in a language with compile-time facilities such as C++ or Rust than in C. For example, doing this sort of platform-specific optimization in C is a nightmare https://github.com/vitaut/zmij/blob/91f07497a3f6e2fb3a9f999a... (likely impossible without an external pass to generate multiple lookup tables).
> Some people have reported that, thanks to Rust’s checks, they are more willing to write code that’s a bit more dangerous than in the equivalent C (or C++)
I rewrote a C project in Rust some years ago, and in the Rust version I included many optimizations that I probably wouldn't have in C code, thanks to the ability to do them "fearlessly". The end result was so much more performant I had to double check I didn't leave something out!
The article does not mention the possible additional optimisation opportunities that arise in Rust code due to stricter aliasing rules of references. But I don’t have an example in mind. Does anyone know of an example of it happening in real code?
> When noalias annotations were first disabled in 2015 it resulted in between 0-5% increased runtime in various benchmarks.
This leaves us with a few relevant questions:
Were those benchmarks representative of real world code? (They're not linked, so we cannot know. The author is reliable, as far as I'm concerned, but we have no way to verify this off-hand comment directly, I link to it specifically because I'd take the author at their word. They do not make any claim about this, specifically.)
Those benchmarks are for Rust code with optimizations turned off and back on again, not Rust code vs C code. Does that make this a good benchmark of the question, or a bad one?
These were llvm's 'noalias' markers, which were written for `restrict` in C. Do those semantics actually take full advantage of Rust's aliasing model, or not? Could a compiler which implements these optimizations in a different way do better? (I'm actually not fully sure of the latest here, and I suspect some corners would be relying on the stacked borrows vs tree borrows stuff being finalized)
Another issue we have to consider here for the measurements taken then is that it was miscompiling, which, to me, calls into question how much we can trust that performance change.
Additionally, it was 10 years ago and LLVM has changed. It could be that LLVM does better now, or it could do worse. I would actually be interested in seeing some benchmarks with modern rustc.
Many C programs are vailid C++ and are faster when compiled with a C++ compiler because of those stricter aliasing and type rules. Like you though I have no examples.
That seems very odd - if it's possible to make those optimisations without any additional type data then why wouldn't GCC do that anyway? The benefit of stricter type rules is that more information is available to the compiler. Using a different compiler doesn't inherently increase the amount of type information.
I believe the claim is more precisely stated as "Many C programs are valid C++ and are faster when compiled as C++" - i.e., even though the text of the program didn't change, the rules for interpreting that text changed, and it's that difference in interpretation that permits better optimizations.
When the optimizer knows writes can't change the reads, it can reorder and coalesce them. The main benefit of that is enabling autovectorization in more cases. Otherwise it saves a few loads here and there.
Not exactly real world, but real code example demonstrating strict aliasing rule in action for C++. https://godbolt.org/z/WvMb34Kea Rust should have even more opportunities of this due to restrictions it has for writable references.
There are 2 main differences between versions with and without strict aliasing. Without strict aliasing compiler can't assume that the result accumulator doesn't change during the loop and it has to repeatedly read/write it each iteration. With strict aliasing it can just read it to register, do the looping and write the result back at the end once. Second effect is that with strict aliasing enabled compiler can vectorize the loop processing 4 floats at the same time, most likely the same uncertainty of counter prevents vecotorization without strict aliasing.
If you want something slightly simpler example you can disable vectorization by adding '-fno-tree-vectorize'. With it disabled there is still difference in handling of counter.
Using restrict pointers and multiple same type input arrays it would probably be possible to make something closer to real world example.
I believe this advantage is currently mostly theoretical, as the code ultimately gets compiled with LLVM which does not fully utilize all the additional optimization opportunities.
LLVM doesn't fully utilize all the power, but it does use an increasing amount every year. Flang and Rust have both given LLVM plenty of example code and a fair number of contributors who want to make LLVM work better for them.
While people can nitpick, the article is pretty clear that there isn't a single answer. Everything depends on how you constrain the problem. How much experience does the developer have? What time constraints are there? Is it idiomatic code? How maintainable is the code? You can write C with Rust-like safety checks or Rust with C-like unsafety.
When you can directly write assembly with either, comparing performance requires having some constraints.
For what it's worth, I think coding agents could provide a reasonable approximation of what "average" code looks like for a given language. If we benchmark that we'd have some indication of what the typical performance looks like for a given language.
I wrote this at a time when I was pretty anti-LLM, but I do think that you're right that there's some interesting implications of LLM usage in this space. And that's because one version of this question is "what can the average x programmer do compared to the average y programmer in the same amount of time," and I'm curious if LLMs lift all tides here, or not.
Back then the C implementation of the (i.e., "one") micro benchmark beat the Rust implementation. I could squeeze out more performance by precisely controlling the loop unrolling. Nowadays, I don't really care and operate under the assumption that "Python is faster than $X and if it is not, it is still fast enough!"
To answer the headline: No. Rust is not faster than C. C isn't faster than Rust either.
What is fast is writing code with zero abstractions or zero cost abstractions, and if you can't do that (because writing assembly sucks), get as close as possible.
Each layer you pile on adds abstraction. I've never had issues optimizing and profiling C code -- the tooling is excellent and the optimizations make sense. Get into Rust profiling and opimization and you're already in the weeds.
Want it fast? Turn off the runtime checks by calling unsafe code. From there, you can hope and pray like with most LLVM compiled languages.
If you want a stupid fast interpreter in C, you do computed goto, write a comment explaining why its not, in fact, cursed, and you're done. In C++, Rust, etc. you'll sit there examining the generated code to see if the heuristics detected something that ends up not generating effectively-computed-goto-code.
Not to mention panics, which are needed but also have branching overhead.
The only thing that is faster in Rust by default is probably math: You have so many more errors and warnings which avoid overflows, casts, etc. that you didn't mean to do. That makes a small difference.
I love Rust. If I want pure speed, I write unsafe Rust, not C. But it's not going to be as fast as trivial C code by default, because the tradeoffs fundamentally differ: Rust is safe by default, and C is efficient by default.
The article makes some of the same points but it doesn't read like the author has spent weeks in a profiler combing over machine code to optimize Rust code. Sadly I have, and I'm not getting that time back.
> it doesn't read like the author has spent weeks in a profiler combing over machine code to optimize Rust code
It is true that this blog post was not intended to be a comprehensive comparison of the ways in which Rust and C differ in performance. It was meant to be a higher level discussion on the nature of the question itself, using a few examples to try and draw out interesting aspects of that comparison.
> If you want a stupid fast interpreter in C, you do computed goto, write a comment explaining why its not, in fact, cursed, and you're done.
Bit of an aside, but these days it might be worth experimenting with tail call interpreters coupled with `musttail` annotations. CPython saw performance improvements over their computed goto interpreters with this method, for example [0].
Definitely the combination of callgrind (valgrind --tool=callgrind) and kcachegrind, or the combination of HotSpot and perf.
I have toyed with Intel's vTune, but I felt it was very hard to get running so its discouraging before you even start. That said, if you need a lot of info on cache etc., vTune is fantastic.
Social factors mentioned there can make a big difference. I've seen plenty of C code choose safety over efficiency.
Our team writes a lot of C++ code for high-level stuff you'd normally do in say JS or Python. At the rate we make changes, we can't write very tight code. Strings and other structs end up getting copied needlessly due to ownership, like if something takes vector<string>& and internally copies those strings into a map, we don't bother also making an external-owned version taking vector<string*>&. Or less efficient algorithms are used due to ease of safe implementation. Or there are fewer or less optimized libs available. Or it's a webserver and we have to throw threads at it instead of event loops.
The end result is C++ code that's slower than the equivalent Python code, dev time being equal.
As long as you can get the Rust code to compile it's about the same speed. The issue is that rustc is only available on limited platforms (and indeed lack of rustc has killed off entire hardware architectures in popular distros in a bit of tail wagging the dog), rustc changes in breaking ways (adding new features) every 3 months, current rust culture is all bleeding edge types so any rust code you encounter in the wild will require curl rust.up | sh rather than being able to use the 1 year old rust toolchain from your repos.
What good is speed if you cannot compile? c has both. Maybe in another decade rust will have settled down but now wrangling all the incompatible rust versions makes c the far better option. And no, setting cargo versions doesn't fix this. It's not something you'd run into writing rust code within a company but it's definitely something you run into trying to compile other people's rust code.
I've never run into this issue in the wild. It sounds like a hypothetical. Upgrading your Rust toolchain is ridiculously easy, and using a year old outdated toolchain is more or less a philosophical hang up than a technical one.
It's not a hypothetical. I was put off the entire language after it happened to me three times in a row with unrelated Rust written software. This was 1 month after Debian 12 was released (June 10, 2023) and I was running the brand new Debian 12 with rustc 1.63.0 from August 11, 2022. I ran into it with some web serial spidering and epub creation rust software (rust-wildbow-scraper). I ran into with a software defined radio spectrogram visualizer (plotsweep); I actually knew the author from IRC and he was able to edit it to not use bleeding edge rustc features and I managed to compile it. I can't recall the third. In the years since when I've stepped my toes into the "compile random rust programs with repo rust toolchain" and it's been the same.
But as you can see from my specific examples and dates: this is not a hypothetical. rust developer culture basically only writes for latest, having a 1 year old rustc is definitely not enough, and yes, installing compilers from a random website (curl site|sh) instead of my distro's repos is a problem.
Just because it hasn't happened to you doesn't mean it isn't a problem. Rust is a rolling release only compiler.
No, it's not. Your own misunderstandings of rust stable vs nightly and editions and using an unofficial installation method for a toolchain are not Rust's shortcomings. Sorry.
Ah, I see you are confused. I am claiming that rust is only for rolling distros because of it's needs for it's unique "official installation method" because of rapid addition of features making rustc need to be upgraded constantly. These facts are not in dispute though one's appreciation of the consequences apparently is.
One bit I'm surprised isn't mentioned is Rust's "Zero Cost Abstractions" where this can vary a lot in C/C++ where the similar efforts at a given pattern may be dramatically different than Rust's default selection. Even in Rust there will often be other options that are relatively easy to use but could have dramatic differences in performance for a specific use case.
These variances pretty much mean that trying to compare with other "low level" languages is far from an apples to apples comparison.
So, to answer the question, "It depends." ... In the end, I think developers tend to optimize for a preferred style or ergonomics over hard technical reasons... it's mostly opinion, IMO.
Aren't there any scenarios where a C compiler (without assistance by the developer) must be defensive about aliasing, in a way that the Rust compiler must not be?
I guess you could argue that C would reach the same speed because noalias is part of C as well. But I'd say that the interesting competition is for how fast idiomatic and "hand-optimized" (no unrolling, no aliasing hints etc) code is.
Comparing programming languages "performance" only makes sense if comparing idiomatic code. But you could argue that noalias in C is idiomatic. But you could equally well argue that multi threading in Rust is more idiomatic than it is in C and so on. That's where it becomes interesting (and difficult) to quantify.
Yes. Your point about noalias (the keyword is 'restrict' in C, noalias is the LLVM IR annotation) is right.
What I will say is that the fact that Rust uses this so much, and had to turn it off because of all the bugs it shook out, at least implies that it's not used very much in real-world C code. I don't know how to more scientifically analyze that, though.
Yes, in general Rust is faster than C, I would argue, because there are some problems the hinders C performance such as strict aliasing and volatile data simply doesn't exist in Rust, and immutable const propagation and const evaluation works too.
Yes, the same way is that Fortran is faster than C due to stricter aliasing rules.
But in practice C, Rust and Fortran are not really distinguishable on their own in larger projects. In larger projects things like data structures and libraries are going to dominate over slightly different compiler optimizations. This is usually Rust's `std` vs `libc` type stuff or whatever foundational libraries you pull in.
For most practical Rust, C, C++, Fortran and Zig have about the same performance. Then there is a notable jump to things like Go, C# and Java.
> In larger projects things like data structures and libraries are going to dominate over slightly different compiler optimizations.
At this level of abstraction you'll probably see on average an effect based on how easy it is to access/use better data structures and algorithms.
Both the ease of access to those (whether the language supports generics, how easy it is to use libraries/dependencies), and whether the population of algorithms and data structures available are up to date, or decades old, would have an impact.
Theoretically, C is likely faster than Rust only by an unnoticeably small margin. Still, this is unavoidable because Rust works with abstraction that (1) adds overhead per-se albeit tiny (2) forces overhead in the design level.
Practically, that little margin can be removed thru a series of engineering, as both are proper system-level programming languages, which offer tight control over the generated machine code. That is, this whole discussion is basically pointless if we mix in engineering factors.
We better talk about overall engineering costs, and personally I think Rust would not overshoot C easily, mainly due to the limitations that Rust puts on the higher level designs.
Rust is actually few steps above from the bare metal, to enforce its security invariants. Boundary checks (which breaks auto-vectorization of loops), stack probe, fat pointer (wastes register), fixed index type (uint), etc.
There are other hidden costs coming from usage of std. Even `Result` is a bit of inefficiency.
I'm not saying any of these are bad. I'm just saying Rust would be slower than C if *naively* used.
Okay, cool, I can see where you're going with this. I wouldn't exactly agree, because all of these things are stuff you can easily opt out of, but I thought you were maybe suggesting something like "the borrow checker has overhead" which I would take more direct issue with.
(and yeah, the opt out question gets right to what you're saying about "naively used", I saw "unavoidable" but you're not actually saying it's unavoidable.)
Well, if we define a reasonable yardstick for this, and try to write idiomatic code without low-level hacks and optimization, Rust has the potential to be faster because its more strict aliasing rules might enable optimizations that the C compiler wouldn't try.
However I remember reading a few years back that due to the Rust frontend not communicating these opportunities to LLVM, and LLVM not being designed to take advantage of them, the real-world gains do not always materialize.
Also sometimes people write code in Rust that does not compile under the borrow checker rules, and alleviate this issue either by cloning objects or using RefCell, both of which have a runtime cost.
I think the "social factors" section is the most significant. I've heard plenty of anecdotes of people stating that they "code defensively" in C, because it's too easy to mess things up. Whereas with a language such as Rust, you're able to be more aggressive in your optimizations without feeling like you're walking through a minefield unassisted. The end result is that the language with more constraints lets you be more safely free in your implementation.
Sorry, maybe stupid question. But can't this be decided by some benchmarks, using some of the features in the article that purport to make Rust faster?
Part of what I'm getting at here is that you have to decide what is in those benchmarks in the first place. Yes, benchmarks would be an important part of answering this question, but it's not just one question: it's a bunch of related but different questions.
> But we’re not usually talking about that. We’re usually talking about something in the context of engineering, a specific project, with specific developers, with specific time constraints, and so on. I think that there are so many variables that it is difficult to draw generalized conclusions.
The world would be a more reasonable place if more people took this by heart.
struct field alignment/padding isn't part of the C spec iirc (at least not in the way mentioned in the article), but it's almost always done that way, which is important for having a stable abi
also, if performance is critical to you, profile stuff and compare outputted assembly, more often than not you'll find that llvm just outputs the same thing in both cases
See "6.7.3.2 Structure and union specifiers", paragraph 16 & 17:
> Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
> Within a structure object, the non-bit-field members and the units in which bit-fields reside have addresses that increase in the order in which they are declared.
so they're ordered, which i didn't dispute, but alignment is implementation defined, so it could be aligned to the biggest field (like in the article), or packed in whatever (sequential) order the particular platform demands, which was my initial point
Ah, sorry, you're right I forgot about alignment. Yes, alignment is implementation defined, paragraph 16:
> Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.
But, I still don't think that what you've said is true. This is because alignment isn't decided per-object, but per type. That bit is covered more fully in 6.2.8 Alignment of objects.
You also have to be able to take a pointer to a (non-bitfield) member, and those pointers must be aligned. This is also why __attribute__((packed)) and such are non-standard extensions.
Then again: I have not passed the C specification lawyer bar, so it is possible that I am wrong here. I'm just an armchair lawyer. :)
It is indeed part of the standard. It says "Within a structure object, the non-bit-field members and the units in which bit-fields reside have
addresses that increase in the order in which they are declared"[1] which doesn't allow implementations to reorder fields, at least according to my understanding.
> struct field alignment/padding isn't part of the C spec iirc
It's part of the ABI spec. It's true that C evolved in an ad hoc way and so the formal rigor got spread around to a bunch of different stakeholders. It's not true that C is a lawless wasteland where all behavior is subject to capricious and random whims, which is an attitude I see a lot in some communities.
People write low level software to deal with memory layout and alignment every day in C, have for fourty years, and aren't stopping any time soon.
The real question at the core of any production: What's the minimum performance cost we can pay for abstractions that substantially boost development efficiency and maintainability? Just like in other engineering fields, the product tuned to yield the absolute maximum possible value in one attribute makes crippling sacrifices along other axes.
I feel like another optimization that rust code can exploit is uninhabited types.
When combined with generics and sum types these can lead to entire branches being unreachable at the type level. Like Option<!> or Result<T, !>, rust hasn't stablized !, but you can declare them other ways such as an empty enum with no variants.
Sure, in the Result case, less in the option case. I didn't mention it because Infallible is documented and named specifically as an Error "The error type for errors that can never happen". The use of uninhabited types as an unreachable code optimization is useful beyond errors though.
It's a good reason to choose Rust over C++ for that application, and others that share its characteristics. (Or, more to the point of the article, it's a good reason to declare that Rust is faster than C++ for that application.)
It doesn't provide a lot of evidence in either direction for the rest of the vast space of potential programs.
(Knowing C++ fairly well and Rust not very well, I have Opinions, but they are not very well-informed opinions. They roughly boil down to: Rust is generally better for most programs, largely due to cargo not Rust, but C++ is better for more exploratory programming where you're going to be frequently reworking things as you go. Small changes ripple out across the codebase much more with Rust than C++ in my [limited] experience, and as a result the percentage of programming time spent fixing things up is substantially higher with Rust.)
Only if it is repeatable. We have no information on what they learned in the two failed attempts - it is likely that they learned from the failures and started other architectural changes that enabled the final one to work. As such we cannot say anything about this.
Rust does have some interesting features, which restrict what you are allowed to do and thus make some things impossible but in turn make other things easier. It is highly likely that those restrictions are part of what made this possible. Given infinite resources (which you never have) a C++ implementation could be faster because it has better shared data concepts - but those same shared data concepts make it extremely hard to reason about multi-threaded code and so humanly you might not be able to make it work.
In short, the previous two attempts were done by completely different groups of different people, a few years apart. Your direct question about if direct wisdom from these two attempts was shared, either between them, or used by Stylo, isn't specifically discussed though.
> a C++ implementation could be faster because it has better shared data concepts
Data can be modified by any thread that wants to. It is up to you to ensure that modifications work correctly without race conditions. In rust you can't do this (unsafe aside), the borrow checker enforces data access patterns that can't be proved correct.
Again let me be clear: the things rust doesn't allow are hard to get correct.
Only if there is a data race - if there is no data race C++ lets you do it. Rust doesn't let you do things that don't have a race but cannot be proven within the context of rust to not have a data race.
Is there a common pattern for "Is language X faster than language Y" ?
Like what is your definition of faster :
faster to developer, to start, to execute, to handle different workload with the same binaries (like JIT).
Yes. "Is language X faster than language Y?" means "Is language X better than language Y?", which means "Do you like language X more than you like language Y?", which means "Do you have more experience with language X than language Y?" (well, good experiences, I guess).
So "Is language X faster than language Y?" is totally answerable, but the answer depends on the answerer.
>If we assume C is the ‘fastest language,’ whatever that means
I agree that it has no meaning. Speed(language) is undefined, therefore there is no faster language.
I get this often because python is referred to as a slow language, but since a python programmer can write more features than a C programmer in the same time, at least in my space, it causes faster programs in python, because some of those features are optimizations.
Now speed(program(language,programmer)) is defined, and you could do an experiment by having programmers of different languages write the same program and compare its execution times.
One thing I’ve seen MANY times in C that isn’t an issue in rust, is people using slow linked lists because writing a proper btree or hash map takes substantially more effort.
> An example of this from a long time ago is the Stylo project. Mozilla tried to parallelize Firefox’s style layout twice in C++, and both times the project failed. The multithreading was too tricky to get right. The third time, they used Rust, and managed to ship
I am getting tired of those Rust-promo comments citing Firefox or other projects from Mozilla that also fail.
Mozilla has consistently lost market share with Firefox. Nowadays it pushes things into it that the users do not want, so the death-cycle continues here; the whole AI slop is a wonderful example of this. I even had those things hover out (!) of firefox into other parts of my IceWM desktop. Even if this may be a separate bug or related to nouveau, why are those things I don't need, hovering outside of Firefox to begin with? I never asked or wanted for those things; Mozilla dictated that onto me.
Yet there are people such as Steve, who constantly promote Rust - and cite Firefox or Mozilla. Something does not work here; the promo should instead be "thanks to Rust, Firefox is now chasing Chrome realistically again". But this is not happening. So why the promo? You can not promote a new language by pointing at failing projects. That makes no sense.
It gets brought up because the conversation is not “is Firefox better than Chrome,” the conversation is about Rust’s multithreading guarantees. It’s just an entirely different conversation.
For example, your own beefs with Mozilla have nothing to do with the technical choices made by the code.
I love Betteridge's Law, and so one small thing I was trying to do here was subvert it a bit. Instead of "no," in this case, the answer is "the question is malformed."
This post might get the record for people responding to the title without reading the article. Jeez people, it takes five seconds to discover that it subverts expectations.
tl;dr: Rust officially allows you to write inline assembly so it's fast, but in C it's not officially specified as part of the language. Plus more points which do not actually indicate Rust is faster than C.
... well, that's what I get for reading an article with a silly title.
That’s not how I would summarize what I wrote, for what it’s worth. My summary would be “the question is malformed, you need to first state what the boundaries are for comparison before you can make any conclusions.” I think this is an interesting thing to discuss because many people assume that the answer to “is x faster than C?” to be “no” for all values of X.
> many people assume that the answer to “is x faster than C?” to be “no” for all values of X.
This is because C does so little for you -- bounds checking must be done explicitly for instance, like you mention in the article, so C is "faster" unless you work around rust's bounds checking. It reminds me of some West Virginia residents I know who are very proud of how low their taxes are -- the roads are falling apart, but the taxes are very low! C is this way too.
C is pretty optimally fast in the trivial case, but once you add bounds checking and error handling and memory management its edge is much much smaller (for Rust and Zig and other lowish-level languages)
In the real world the difference is rarely significant assuming great programmings implement great algorithms. However those two assumptions are rarely true.
I read the post to see how you would answer, not because I was unclear about what the answer would be, because the only possible answer here is “sometimes.” I especially like the point that Rust can be faster because it enables you to write different things. As I never tire of getting downvoted for saying, I’ve improved the speed of a program by replacing C with Python, because nobody could figure out how to write the right thing in C. If even Python can do this, it must apply to just about every pair of languages.
The article felt fairly dispassionate and even-handed to me, and I say this as someone who dislikes Klabnik very much and also dislikes the Rust community (especially its insidious, forced MIT rewrites of popular GPL software, with which they also break backwards compatibility). It is worth mentioning that there are certain things about Rust that conceivably could make it faster, e.g., const by default (theoretically facilitating certain optimizations), but in practice, thus far, do not.
> especially its insidious, forced MIT rewrites of popular GPL software
Is this some sort of movement?
I was aware that some Rust software had been released under permissive licenses but I didn't know it was activism besides the obvious C-is-obsolete angle.
It did not contribute to a yes/no answer, which is good, because it is not answerable with "yes" or "no", and the article points that out and explains why. So I would disagree; it does contribute to answering the question, in the form of spelling out why it is unanswerable.
Compare:
"Have you stopped beating your wife yet?"
"I do not beat my wife."
The response contributes to the answer, even if it brings you no closer to "yes" or "no".
As I just posted, any speed comparison needs to be based on specific implementations (compiler A vs compiler B), not languages.
When it comes to assembly, the "compiler" is the person writing the code, and while assembly gives you the maximum flexibility to potentially equal or outperform any compiler for any language, there are not too many people with the skill to do that, especially when writing large programs (which due to the effort required are rarely written in assembler). In general there is much more potential for improving the speed of programs by changing the design and using better algorithms, which is where high level languages offer a big benefit by making this easier.
It depends on what you're writing and what the scope is. Here's a good quote about this from Steve Yegge, from his time working at Geoworks (company that did an entire desktop OS written entirely in 8086 assembly as a competitor to Windows for low-end PCs). https://steve-yegge.blogspot.com/2008/05/dynamic-languages-s...
> I went to the University of Washington and [then] I got hired by this company called Geoworks, doing assembly-language programming, and I did it for five years. To us, the Geoworkers, we wrote a whole operating system, the libraries, drivers, apps, you know: a desktop operating system in assembly. 8086 assembly! It wasn't even good assembly! We had four registers! [Plus the] si [register] if you counted, you know, if you counted 386, right? It was horrible.
> I mean, actually we kind of liked it. It was Object-Oriented Assembly. It's amazing what you can talk yourself into liking, which is the real irony of all this. And to us, C++ was the ultimate in Roman decadence. I mean, it was equivalent to going and vomiting so you could eat more. They had IF! We had jump CX zero! Right? They had "Objects". Well we did too, but I mean they had syntax for it, right? I mean it was all just such weeniness. And we knew that we could outperform any compiler out there because at the time, we could!
> The problem is, picture an ant walking across your garage floor, trying to make a straight line of it. It ain't gonna make a straight line. And you know this because you have perspective. You can see the ant walking around, going hee hee hee, look at him locally optimize for that rock, and now he's going off this way, right?
> This is what we were, when we were writing this giant assembly-language system. Because what happened was, Microsoft eventually released a platform for mobile devices that was much faster than ours. OK? And I started going in with my debugger, going, what? What is up with this? This rendering is just really slow, it's like sluggish, you know. And I went in and found out that some title bar was getting rendered 140 times every time you refreshed the screen. It wasn't just the title bar. Everything was getting called multiple times.
> Because we couldn't see how the system worked anymore!
> Small systems are not only easier to optimize, they're possible to optimize. And I mean globally optimize.
Instead, I'd say that Rust & C are close enough, speed-wise, that (1) which one is faster will depend on small details of the particular use case, or (2) the speed difference will matter less than other language considerations.
It’s a bit more fundamental than just "using it badly." The real tension lies in whether a language's safety invariants force a memory layout that is inherently at odds with the CPU cache hierarchy.
In low-latency systems, the true "tax" is often the loss of determinism. If I have to sacrifice a cache-friendly structure or introduce indirection just to satisfy a borrow checker's static analysis, the performance game is already lost, regardless of how "well" I use the language.
To give a concrete example: I previously built a high-frequency bridge for MT4 using a strict Modern C++ stack. I observed that after the initial warm-up, the working set actually settled from 13.6MB down to a stable 11.0MB and stayed there for a 7-day continuous stress test.
This 2.6MB drop was simply the OS reclaiming initialization overhead—a result of manual memory management (via custom pool allocators) preventing heap fragmentation from "pinning" that memory. You don't achieve that level of long-term residency stability by just "using a language well"; you get it by using a toolchain that allows you to treat the hardware as the ultimate source of truth.
The big one is multi-threading. In Rust, whether you use threads or not, all globals must be thread-safe, and the borrow checker requires memory access to be shared XOR mutable. When writing single-threaded code takes 90% of effort of writing multi-threaded one, Rust programmers may as well sprinkle threads all over the place regardless whether that's a 16x improvement or 1.5x improvement. In C, the cost/benefit analysis is different. Even just spawning a thread is going to make somebody complain that they can't build the code on their platform due to C11/pthread/openmp. Risk of having to debug heisenbugs means that code typically won't be made multi-threaded unless really necessary, and even then preferably kept to simple cases or very coarse-grained splits.
reply