Porting Libyaml to Safe Rust: Some Thoughts

jvanderbot · 2024-02-08T20:19:12 1707423552

> The interesting takeaway is this: For all of the different roles that yaml_string_t takes on, it turns out there is an abstraction in the Rust standard library that matches each use case precisely. At all points in the libyaml source code where strings are used, the use case maps directly to one of the several many ways of dealing with strings in Rust. Phew

Reminder! Our forefathers knew and loved ownership concepts, but lacked to tools to do anything but mentally map them. We stand on the shoulders of giants.

vlovich123 · 2024-02-08T18:45:15 1707417915

Great writeup. This part stuck out to me:

> The justification for "unsafe" C and C++ libraries is often that some performance is left on the table with safe Rust, primarily due to bounds checking. On modern hardware this justification has always seemed a bit suspicious to me, and I think this is another piece of (circumstantial) evidence that it often just doesn't matter in practice

That's too strong a statement even though the sentiment is correct. Most software will not see a bottleneck because of bounds checking because it's not in the fast path. All we can say in this case is that the work that libyaml was doing is not impacted by bounds checking. It's not because of hardware but because of software inefficiencies / problem being solved isn't impacted by bounds checking - most likely libyaml is not running at full speed on the underlying hardware and thus any bounds checking is just noise because there's so much slack already.

However, I have definitely encountered situations in very specific circumstances where bounds checking is a bottleneck when you're trying to actually run at hardware speeds and unsafe is required to recover performance - of course I benchmarked very very carefully to the specific unsafe that's the problem. I'll also give a shoutout to the assume crate which I've found a much nicer mechanism to cause the compiler to elide bounds checking in many cases when building optimized while keeping it for debug builds & validating your assertion is correct. The annoying bit is that you can't ask the Rust compiler to elide all bounds check for a benchmark to validate the impact of the bounds checking.

Remember - on modern hardware your CPU is capable of executing ~4-6 billion operations / s, process main memory at ~100GB/s, and access disk at millions of times / s at throughputs of ~1-2 GB/s. This is all consumer grade hardware. Typical software applications are not typically targeting to try to run at full utilization but instead try to offer reasonable performance with a more straightforward solution that can be supported as cheaply as possible maintenance wise because the bottleneck cost is typically the programmer's time rather than how fast the software can be made to run.

epage · 2024-02-08T19:10:20 1707419420

Unless this was updated after your post, I think it was covered in the following line:

> It's certainly possible to build things where bounds checking makes a difference. This just isn't that.

I will add that idiomatic Rust causes a good number of bounds checks to be optimized away. Its impressive the difference this can make. I maintain a parser combinator library that started as a fork. With some simple changes, I've seen massive performance improvements and my best guess is that those changes allowed bounds checks to be removed.

vlovich123 · 2024-02-08T19:43:19 1707421399

Possible I missed it when I read it. Would need the author to indicate whether or not it was edited.

Yes, idiomatic Rust can cause a number of bounds checks to be elided when using iterators. Typically that's when you have all the data that you need to process up front. When you don't have that ability (e.g. data is being fed in piecemeal and you're copying into a temporary buffer), the bounds checking elision is something you need to do yourself.

cmrx64 · 2024-02-09T14:36:24 1707489384

can you elaborate? rust iterators seem quite capable of streaming in the fashion you suggest.

vlovich123 · 2024-02-09T18:11:41 1707502301

If I have a buffer in a library and the external API is “append to buffer”, each append will do a bounds check. I’m not aware of an iterator API would work especially considering that the iterator would hold ownership of the buffer which would be incompatible with the call flow returning to the user after the append.

norman784 · 2024-02-08T19:22:08 1707420128

Can you share what kind of changes, in your case, improved the performance?

wredue · 2024-02-08T19:40:28 1707421228

Avoid directly indexing if possible. Use the collection functions and slices instead.

epage · 2024-02-08T20:09:13 1707422953

I'd add careful use of inlining so the compiler can see both the bounds check and the index to know the bounds check can be removed.

stouset · 2024-02-08T19:16:43 1707419803

I think the general takeaway is that for most use cases, Rust’s additional safety comes at barely any cost. In some cases the safety gains performance (in not checking thing that the compiler can assert statically), in some cases there’s a hit, but it usually ends up somewhere in the middle.

The takeaway is that Rust’s safety and maintainability generally don’t impose corresponding penalties. And in the cases they do and it’s critical, you can always punt to unsafe.

ssokolow · 2024-02-09T14:59:10 1707490750

*nod* Give https://blog.readyset.io/bounds-checks/ a read.

They tried doing a comparison between ReadySet compiled normally and ReadySet with bounds checking removed so thoroughly that they needed to use a patched toolchain to achieve it and found the difference to be within the noise threshold.

Their conclusion was:

> At the end of the day, it seems like at least for this kind of large-scale, complex application, the cost of pervasive runtime bounds checking is negligible. It’s tough to say precisely why this is, but my intuition is that CPU branch prediction is simply good enough in practice that the cost of the extra couple of instructions and a branch effectively ends up being zero - and compilers like LLVM are good enough at local optimizations to optimize most bounds checks away entirely. Not to mention, it’s likely that quite a few (if not the majority) of the bounds checks we removed are actually necessary, in that they’re validating some kind of user input or other edge conditions where we want to panic on an out of bounds access.

vlovich123 · 2024-02-09T18:17:28 1707502648

As I noted originally, that’s not the reason. It’s because no one is trying to write code that can do hundreds of millions of high level operations per second. There’s so much “inefficiency” in the software stack (or a problem domain where more work is being done in the hot path) that bounds checking is in the noise typically. Readyset’s experiment is flawed because any existence of bounds check on some critical hot path would have already been noticed and fixed as low hanging fruit.

So the remaining bounds check is off the hot path where it doesn’t matter. But in the hot path where you should be bounded by HW limits, it can be a significant slowdown. Prediction can help but it’s not free.

So for most people, they only need to care about bounds checking when they’re doing something in the hot path and even then only when their hot path is running into HW limits. If their hot path is some complicated CPU computation, bounds checking should be but a blip unless you do something stupid like check the bounds too frequently.

So the general advice to not worry too much about bounds checks when writing Rust is directionally correct for the vast majority of people, but recognize it’s incorrect in places and it’s hard to notice because it’s such a small thing hidden in code gen without an easy flag to test the impact.

ssokolow · 2024-02-14T04:20:39 1707884439

That's fair.

I still think the "because it's not in the fast path" part of "Most software will not see a bottleneck because of bounds checking because it's not in the fast path" is a bit too much of a blanket statement and could detract from the admonition to benchmark very carefully before optimizing but, otherwise, I agree.

hedora · 2024-02-08T21:52:27 1707429147

I suspect that information about memory aliasing from the borrow checker (that things are not aliased) allows the compiler to add in many optimizations that C/C++ can't perform, and the benefit from that is similar to the cost of bounds checking.

The memcpy vs. memmove thing comes to mind. So does stuff like iterating over the indices of a pointer to one std::vector while modifying another pointer to a std::vector. If you know you have exclusive access to the vector you are reading, then you know the size isn't changing, so you can just iterate from 0 to size and elide any bounds checking.

With C++, if the pointers to the vectors could alias, then it will have to bounds check at runtime. Templates and Link time optimization helps with this sort of thing, but I'm guessing Rust still has an advantage.

TylerE · 2024-02-08T19:01:14 1707418874

Throughputs on modern hardware are insane. Latency/time-til-first-byte can often still be pretty sucky.

vlovich123 · 2024-02-08T21:56:20 1707429380

Care to give an example? A disk read is comparatively slow to compute/memory but it’s still in the tens or hundreds of microseconds if I recall correctly and memory is hundreds of nanoseconds. I am convinced latency issues are a remnant of how we build abstractions into software stacks and not a fault of the hardware.

TillE · 2024-02-09T04:56:24 1707454584

Cache misses will destroy the performance of any CPU-intensive program. You really really want to do mostly sequential reads of RAM if you're processing a lot of data.

TylerE · 2024-02-09T01:12:16 1707441136

Now have 10 or 15 apps hammering away at the same disk at the same time and see how it behaves. Even SSDs slow down as read sizes go down.

norir · 2024-02-08T19:25:15 1707420315

The paradox of bounds checking is that it both rarely matters and always matters.

AceJohnny2 · 2024-02-08T21:49:53 1707428993

Tangential C style survey... For code like this:

    if (some_function(some_memory) != ERROR) {
        // ...
    } else {
        goto cleanup;
    }

This approaches what I dub "Happy Path Indentation Syndrome" (HPIS), where the "normal"/"happy path" functionality of the code happens under the `// ...` inside the passing scope.

If you have several such functions that can each fail, this approach means your "happy path" follows the deepening scope / indentation.

Instead, I much prefer styling it like this (assuming, obviously, that the "happy path" happens always):

    if (some_function(some_memory) == ERROR) {
        goto cleanup;
    }
    // ...

How do people approach this kind of situation?

whstl · 2024-02-08T21:54:08 1707429248

Your preferred version is often called "guard clause" when it's in the beginning of functions. Although there's nothing preventing them from being in the middle...

I do agree that it's much better for readability.

01HNNWZ0MV43FF · 2024-02-09T07:49:27 1707464967

Incidentally the latter is sometimes called railway error handling https://blog.logrocket.com/what-is-railway-oriented-programm...

AceJohnny2 · 2024-02-08T21:53:11 1707429191

> If you have several such functions that can each fail, this approach means your "happy path" follows the deepening scope / indentation.

In fact, I once had an interview candidate approach a simple string parsing problem by checking each char in a nested scope, happily going a dozen deep and not seeing a problem!

Alifatisk · 2024-02-09T13:17:00 1707484620

> goto cleanup;

Isn’t goto a big no no?

frankjr · 2024-02-09T13:54:37 1707486877

Not in C - it doesn't have a better / cleaner mechanism for dealing with error situations.

https://sourcegraph.com/search?q=context%3Aglobal+repo%3A%5E...

Alifatisk · 2024-02-09T17:41:03 1707500463

That’s so interesting ans goes against everything I’ve been taught during high school and uni!

AceJohnny2 · 2024-02-09T20:46:22 1707511582

This is one of those lessons that go:

1. Junior level: don't do it

2. Journeyman level: don't do it

3. Master level: don't do it, except when...

viraptor · 2024-02-08T19:51:21 1707421881

This makes me happy. I've just done a review of yaml parsers available from Ruby recently to improve the error reporting. Unfortunately libyaml is just bad there - many indentation issues at the end of the document will report errors in line 2 with a completely lost context. Unfortunately x2, there's no real alternative.

I ended up parsing with libyaml and when something fails, parsing again using node and displaying errors through https://github.com/eemeli/yaml It's terrible but it works better. I hope to be able to use libyaml-safer in the future if it does improve on errors.

Also, having found an asserting case for libyaml in the past, I'm not trusting it that much.

wyldfire · 2024-02-08T19:12:02 1707419522

> In order to compare the performance of unsafe-libyaml and the safe Rust port, I found a relatively large YAML document (~700 KiB) and built a very simple benchmark using Criterion.

Is criterion environment-aware enough to be able to purge/drop the filesystem cache? It might be interesting to see how the performance looks with a cold cache.

> What these numbers do indicate is that the port to safe, idiomatic Rust does not produce meaningfully slower code.

I think that's reasonable and I would even suggest that Rust's value proposition is somewhat predicated on that. A c2rust port like the baseline you started with is probably best as a working crutch from which someone can rewrite a bit here / a bit there.

vlovich123 · 2024-02-08T19:45:36 1707421536

Disk I/O isn't relevant here since you're testing the performance of the parser, not the speed of reading the string from disk. I would expect a file this small to be read into a buffer up-front & the benchmark code just invokes the parse function on it. Indeed, the document is small enough, that I'd be worried about it hanging out in L3 cache the entire time which would overestimate the parsing performance which would typically see strings in random spots in memory being parsed.

pkolaczk · 2024-02-08T19:21:14 1707420074

Criterion has a warmup phase. So I guess both measurements were using a hot cache.

_benj · 2024-02-08T21:06:28 1707426388

> Mercifully, C does not have exceptions

This made me smile, because while yes, C just segment faults. I think there’s a way to handle segmentation fault? Or is it just the OS that kills the process when it tries to access an invalid memory address?

topaz0 · 2024-02-08T21:13:28 1707426808

Yes, at least on linux, the os sends a signal (namely SIGSEGV) to the process, which the process may or may not handle. But kill also sends signals (including SIGSEGV if you specify), so the answer to your question is that it's not really either/or but both.

(edit to add: note that it isn't trivial to handle seg fault in particular because unless you do some trickery you return to the same instruction that caused the segfault, and just get another segfault see e.g.: https://stackoverflow.com/questions/2663456/how-to-write-a-s... )

01HNNWZ0MV43FF · 2024-02-09T07:54:49 1707465289

Signals are very hard to implement properly, and I imagine signal handlers don't compose well.

At some point you'd be better off using `setjmp` and `longjmp` to write your own exceptions from scratch, like Lua does.

wredue · 2024-02-08T19:52:04 1707421924

I find that you can usually safely disregard the opinion of people who state “<thing> probably doesn’t matter in practice.”

Measure yourselves. Maybe it does. Maybe it doesn’t matter. It doesn’t change that this “it probably doesn’t matter in practice” sentiment is the primary reason a typical web request today is in the order of 20-30x slower than it should be.

For this matter, if you’re using Libyaml to parse one yaml file, you’re probably fine. If you’re AWS and use libyaml, you’re definitely going to feel this change.

edflsafoiewq · 2024-02-08T20:25:43 1707423943

There are an infinite number of ways to do anything, you can't try them all. Rules of thumb are necessary for making progress.

acdha · 2024-02-09T16:37:53 1707496673

This is a weird comment in response to a post where they did measure it themselves and spent time discussing the trade offs. If you’re going on some routine rant about kids these days not caring about performance, it seems like an odd place to start.

The AWS comment seems similarly odd given the investments they have made in security by design. This is the kind of work I’d expect them to support since an exploitable bug is a really big deal for them and they have demonstrated a willingness to put engineers on optimizing a safer approach to avoid the 90s dynamic of treating security and performance as opposing concerns.

autoexecbat · 2024-02-08T20:55:53 1707425753

If you're AWS you could probably put some FTEs on optimizing the new lib too

rurban · 2024-02-09T04:48:53 1707454133

You cannot call it safer libyaml, when it still allows all unsafe yaml features to happen, creating and destroying objects at will, or calling custom code. A safe yaml, such as my YAML::Safe library, has a white-list of allowed classes, and disables all unsafe features by default.

The safe variant should use libsyck btw, which didn't implement all the new unsafe yaml features.

pornel · 2024-02-09T10:47:11 1707475631

Rust uses "unsafe"/"safe Rust" as its jargon term that has a specific narrow definition in Rust's context.

01HNNWZ0MV43FF · 2024-02-09T07:48:17 1707464897

Memory unsafe or logically unsafe?

rurban · 2024-02-09T19:10:35 1707505835

Both. Logical unsafety leading to memory unsafety, leading to exploits. A very popular blogging platform had to closed because of this.