Hacker News new | past | comments | ask | show | jobs | submit login
Error Stacking in Rust (greptime.com)
137 points by samanthasu 10 days ago | hide | past | favorite | 102 comments





Ok, so the original idea of Result<T, Error> was that you have to consider and handle the error at each place.

But then people realised that 99% of the time you just want to handle the error by passing it upwards, and so ? was invented.

But then people realised that this loses context of where the error occured, so now we're inventing call stacks.

So it seems that what people actually want is errors that by default get transferred to their caller and by default show the call stack where they occured. And we have a name for that...exceptions.

It seems that what we're converging towards is really not all that different from checked exceptions, just where the error type is an enum of possible errors (which can be non-exhaustive) instead of a list of possible exception types (which IIUC was the main problem with java's checked exceptions).


There's a critical difference between exceptions and what's happening in this article: exceptions create de facto nondeterministic behavior in programs. They cause every line in a function to potentially result in a return from the function with an unexpected type. Rust's error handling requires explicit return statements and explicit return types. This critical difference results in code that is far easier to document, reason about, and slightly better performance as well.

GP specifically said checked exceptions, which don't create the problems you describe. (They do create other problems, of course.)

And exceptions don't have to be slower than putting errors in return values.

(Having said that, I am still not a proponent of exceptions for error handling.)


> They cause every line in a function to potentially result in a return from the function with an unexpected type.

That’s not non-deterministic. It’s just not statistically typed.


This is an example of the non-determinism (in pseudocode):

  try {
    calculation1 = num1 / x;
    calculation2 = num2 / y;
    calculation3 = num3 / z;
  } catch (DivideByZeroError) {
    error("Which line failed?");
  }
If calculation2 was previously initialized to a default value, then how would we know if the calculation was completed before the exception was thrown without adding another 8 lines of boilerplate? This is compounded by other functions being able to throw their own exceptions. Consider:

  func error(msg) {
    display(msg);
    log("Error encountered: ", msg);
    shutdown_program();
  }
If both the display() and log() functions might throw IO exceptions, then how would we know whether or not the error was logged, even if the exceptions were checked, unless we create custom exception types for every possible error?

In conclusion, we don't know with certainty which path was taken through the code's execution, and this is tantamount to non-deterministic behavior.


Non-determinism is when for same input you get a different output. According to that definition, the above code is deterministic.

> how would we know if the calculation was completed before the exception was thrown without adding another 8 lines of boilerplate?

You can't, exceptions or not. With exceptions, you need try..catch blocks. With return values, you need ifs.

But with exceptions, you have an option of not handling an error where it originated, without completely ignoring the error.

With error results, you can handle the error (which usually means just returning it to the caller) or ignore it (potential bugs). In languages with not-so-strong typing, it can be very easy to ignore the error. In languages with strong enough typing, you must handle the error. This can be made nicer with syntax sugar, but still leads to boilerplate. This also has an unfortunate consequence of more branching in the generated assembly, which can lead to poorer performance on the "happy path". Exceptions also have cost, of course, but it's geared more towards making the happy path fast and exceptional path slow, which seems like a better tradeoff in most cases.

> If both the display() and log() functions might throw IO exceptions, then how would we know whether or not the error was logged

But without exceptions, your code is simply ignoring any potential errors in display(). And in log() for that matter. If they threw exceptions, it would be impossible to just "swallow" an error silently.

> In conclusion, we don't know with certainty which path was taken through the code's execution, and this is tantamount to non-deterministic behavior.

I would disagree with that. People sometimes see exceptions as somehow "breaking" the program or leading it to an inconsistent state, but that's simply not true.

All you need to realize is that...

1. Every line can throw an exception unless proven otherwise.

2. You don't need to handle an exception close to where it was thrown in most cases. Corollary: you shouldn't use exceptions for control flow. If you notice that your callers use try..catch directly around your function, that might be a sign you should use return values, not exceptions. Return values are not forbidden in languages with exceptions!

3. But you do have to dispose of unmanaged resources. This is probably the main sticking point and does require some discipline.

...and suddenly exceptions stop being awkward and become quite nice to work with.


Rust still has panics which is more or less the same mess. It is not as bad though but still annoying especially when writing libraries. When writing a binary, it is possible to just set panic to abort at least

It does seem to be converging somewhere, but a major difference that I really like is pushing humans a little more to care about errors, instead of just letting whatever bubble up from wherever until a catch(...) somewhere.

With checked exceptions, it's very common for the user to end up with only a cryptic message from a leaf function deep inside something, and that's very hard to interpret.

Having a manual stack of meaningful messages that add context is so nice as a user. Even if I do get the stacktrace in a program that threw a deep exception, you typically won't understand anything as a user without access to the code, the stack trace for exceptions is just not meant for human consumption.


> pushing humans a little more to care about errors

This is 100% a reason that I like using SNAFU. The term I use for this is a "semantic stack trace" — a lot of the time, the person experiencing the error doesn't care that it occurred in "foo.rs" or "fn bar()" or "line 123". Instead, they care what the program is trying to do ("open the configuration file", "download the update file").

When I'm putting effort into my errors, I basically never use `snafu::Location` or `snafu::Backtrace`. My error stacks should always be unique — any stack can exactly point to a trace through my program.


The problem with encoding only "what the program is trying to do" in the error is that it only helps users when it's an "expected" situation. For the "open the configuration file" example, it's usually something the user can understand and fix on their own: file is missing, bad permissions, etc.

But errors also need to be useful when reporting bugs to the author of the software. Error context and the error message can't always tell me what specific call stack caused the error, and I will most likely need that when tracking it down. I hesitate to want a backtrace included, as generating those is usually bad for performance, but I think SNAFU's "location" concept is a great compromise.

I see your reply further down about "Users rarely have good access to the developer team", but I just don't buy that line of reasoning. As a developer, I both want to make it as easy as possible for my users to solve problems on their own (so: informative error messages that give the user a good chance of figuring it out themselves), but I'm only human, and I know all the software I write has bugs. So I want my error reporting to have enough information such that the user can contact me and give me as much information as possible about the error, without needing a lot of back and forth, or without me needing to ask them to run things in a debugger or use a special build.

And on top of that, a lot of code is written inside a company, either as a network service, or tooling used only by people inside that company. The developers are very close to the use of that code, and having a lot of information come with errors is essential.

> My error stacks should always be unique — any stack can exactly point to a trace through my program.

That seems like more effort expended when using `snafu::Location` would suffice, without doing extra work that is IMO useless. I'd rather concentrate on other things and have my tools do fiddly, repetitive work for me.


But... It's not the user that is seeing this, it's the developer. You catch at the top of your event loop and you log the stack trace to some place that can be reached by the dev team, be it Jira, some crash reporting tool, etc.

Yeah, but lots of diagnostic work is done by end users in the real world. Users rarely have good access to the developer team, if the team even still exists. Usually there are layers of insulation that mean your problem might be looked at in a few weeks or months only if the company thinks it might be interesting. Meanwhile you have your problem to fix and it is off to stack traces and access logs to try to figure out what went wrong. Maybe some library updated. Maybe there was a permissions change. Maybe some policy change at the OS level. Maybe some external resource went away or changed syntax. It is up to you as the end user to figure it out and fix it, or at least figure out a unique enough error message that you can Google to find someone else with the same problem.

There is nothing more frustrating than a dialog box that says "An error occurred" and then the program shuts down. Frankly I'd rather it crashed hard, at least then I might have some evidence to sift through in the blast zone.


>Yeah, but lots of diagnostic work is done by end users in the real world. Users rarely have good access to the developer team, if the team even still exists.

And hiding details prevents them from being able to know if error X is different from error Y, yes.

It's an unhandled error at that point. You do not know what is relevant, essentially by definition, because otherwise you would have handled it.

Display messages are almost completely unrelated to error handling, and have almost completely unrelated needs. If you decide to combine them, I'm pretty convinced that it's ALWAYS better to show ALL context somewhere, because otherwise troubleshooting frequently becomes impossible. It doesn't have to be a megabyte of stack trace info in a dialog box shown all the time, save it to a file and link to it or something.


The end user might not care, but as the developer I very much care about having a line-accurate backtrace.

When presented with a bug from the field, I also care about finding the path through my code where it occurred, but rarely do I need to know that `foo` called `foo_with_caching` called `foo_with_caching_recursive`. When reading a backtrace, I skip over amounts of "implementation details" to get a big picture. For me, the exact functions / files / line numbers are not relevant, doubly so if I'm working in a situation where the error message isn't tied to a specific git commit and the functions/files/lines have moved over time.

To reiterate my point from above though, my error stacks are all unique — seeing the stack will point me to an exact line in my code where the error occurred, even though I don't include function/file/line as-is.


I don't really agree. Well, I do agree that often if I'm looking at a backtrace, I will be skipping over a lot of stack frames to find the "simplified" path that still is most useful.

But functions? Yep, absolutely need them. Files? Not quite so much, since it's rare that I'd use the same function name between files. (But sure, throw it in anyway.) Line numbers? No, those can be a big help. If a user reports an issue to me, the first thing I will ask them (if they didn't fill out the issue template properly) is what version they're using (and what git hash, if they've self-compiled from a random git checkout). So I can check out the same version on my laptop, and having a line-accurate trace can be very helpful.

> To reiterate my point from above though, my error stacks are all unique

To reiterate mine, my error stacks often aren't unique, and crafting them such that they would be seems like pointless make-work when there are tools can make it so I don't need to care about this.

I really don't get this resistance against including this information. It adds little to binary size and remove little from performance, so why not include it? I agree that backtraces do add a lot to binary size and can murder performance, but this "StackedError" concept with function/file/line information seems like essentially the perfect compromise. Just... include it, and stop worrying about it.


> So it seems that what people actually want is errors that by default get transferred to their caller and by default show the call stack where they occured. And we have a name for that...exceptions.

You've drawn the wrong conclusion - we don't want that by default. We want to chose. In most cases we'll just return the error to the caller, but we don't want it to be the default so we can miss critical points where we didn't want to do that.


> show the call stack where they occured. And we have a name for that...exceptions.

Getting a stack trace isn't a distinguishing feature of exceptions; stack traces predate the notion of exceptions. The distinguishing feature of exceptions is that they're a parallel return path all the way back up to `main` that you can ignore if you don't care to handle the error, or intercept at any level if you do. For some contexts I think this is fine (scripting languages), and for other contexts I think that being forced to acknowledge errors in the main return path is preferable.


I think a lot of it is psychological. Being forced to ask yourself "what do I want to happen if there's an error here?" every single time seems to go a very long way. If the answer is "ignore it" or "bubble it up" then fine, but at least you considered and explicitly answered that question rather than totally forgetting that an unhappy path exists. Default consider vs. default ignore.

That's interesting. To me stack traces + default pass up the stack are the distinguishing features of exceptions.

Suppose we had a version of the ? operator that automatically appended a call stack to the error value returned. Are you saying that that's not "an exception" because I still need to write ? after each falliable function? Or because it's still part of the return type? Or is it specifically only an exception if it works via stack unwinding?


If we're making a distinction between "exceptions" and "errors as return values", then that implies that exceptions are not return values. And so the question to ask to identify each one is: is the error treated the same as a returned value would be? IOW, if it shows up in the usual return type location in a function signature, and if calling this function plops the value into my lap the same as it would for any other value, then it's errors-as-values. Whether or not stack unwinding is used and whether or not a stack trace is provided is an implementation detail. Note that C++ certainly has exceptions, and yet getting a stack trace from them is nontrivial.

I get what you're saying, but this is still very different from (checked) exceptions, both in syntax and ergonomics.

Java's checked exceptions are the worst. Having to declare every exception thrown as a part of your API/ABI makes for brittle, difficult-to-evolve interfaces.

Rust's Result and '?' syntax sidesteps a few of these issues. You can "add" underlying errors to the error return of your function without changing its API/ABI. You don't need to add a bunch of try/catch blocks, cluttering and confusing the code, in order to make sense of this and convert exceptions into whatever your API/ABI specifies. Rust's 'From<>' trait is damn-near magical when it comes to error conversion and propagation.

I get that not everyone is a functional programming enthusiast, but you can't do FP with exceptions. (Well, you can, via a sort of Try monad like Scala has, but it's error-prone and ugly to deal with.) With Result, you can, and it works seamlessly with the rest of the language and syntax.

I don't think Rust's error model is perfect, but it's miles ahead of what I've worked with in most other languages.


I generally disagree with you. I think that Result/Try types are essentially isomorphic to checked exceptions.

> Java's checked exceptions are the worst. Having to declare every exception thrown as a part of your API/ABI makes for brittle, difficult-to-evolve interfaces.

How is this different, in practice, from how it's done in Rust? You have to evolve your Result error type as well. The exact same concerns exist for both. The difference is that you actually have more choice/freedom with Java: you can choose to wrap all of your API's checked exceptions under one base type (analogous to defining a single error type for Result in Rust) so your function throws a single exception type, or you can have your function signature use an ad-hoc union type of several exception types without the boilerplate of wrapping them in a new type. In fact, many people have requested ad-hoc union types in Rust for a long time, because it's so painful to choose between all of your functions returning the same umbrella error type even though it only truly needs a subset of it vs. defining new mostly-redundant error types for each function in your API.

> Rust's Result and '?' syntax sidesteps a few of these issues. You can "add" underlying errors to the error return of your function without changing its API/ABI. You don't need to add a bunch of try/catch blocks, cluttering and confusing the code, in order to make sense of this and convert exceptions into whatever your API/ABI specifies. Rust's 'From<>' trait is damn-near magical when it comes to error conversion and propagation.

As I mentioned above, you can certainly define a base exception type (and you probably should in many cases) in Java, too. Yes, Java's syntax is fairly verbose, but Java's syntax is verbose for almost all of the language. So, is it the checked exception mechanism that is "bad", or is it just that all of Java is verbose? My take is that checked exceptions are, overall, good, and the syntax to work with them in Java is similarly tedious as the rest of the language.

Also, as a tangent, I kind of hate `From<>` in Rust. I think people lean on it way too much. It certainly makes the code shorter and "cleaner", but it also makes it harder to understand because of how implicit it is. And it causes people to miss opportunities where they actually could or should handle an error, just because the types happen to line up so that you can use `?`, instead of thinking about the actual local logic.

> I get that not everyone is a functional programming enthusiast, but you can't do FP with exceptions. (Well, you can, via a sort of Try monad like Scala has, but it's error-prone and ugly to deal with.) With Result, you can, and it works seamlessly with the rest of the language and syntax.

Can you elaborate on this? I feel like Scala's Try and Either are almost exactly the same as Rust's Result.


That's not particularly novel observation; people have been pointing out the equivalence between checked exceptions and Result types for pretty much forever. See for example this thread from decade ago: https://news.ycombinator.com/item?id=9545647

Checked exceptions that don't automatically propagate up the call stack to be specific. There's a subtle but incredibly important difference between just "exceptions" and what you're describing.

Yes and no. When a language has exceptions the code is perpetually wrapped in a fallible computational context. When the Result is reified as a type, you have the option (ha!) to write code that the type system guarantees won’t fail. This is nice.

Let’s not talk about panics, shall we?


That's just implementation details. You can absolutely do Result types with unwinding (and som auto inserted catches) and you can absolutely do exceptions with chained early returns.

The relevant improvement new languages (Rust, Zig, Swift?) bring over old is making it explicit at the callsite what actions throw and how they're composed


Java's main issue is that its `throws` isn't generic. It forces middleware-like code to choose between `throws Exception` and runtime-only plus boxing... both of which lose ALL details and ruin your compile-time safety.

IMO it just poisoned the well, and now everyone* thinks they don't like checked exceptions, when really they just don't like Java's badly crippled version.


You can have generic `throws` markers; e.g.,

    interface Frobinicator<E extends Exception> {
        void frobinicate() throws E;
    }

Which gives you a single exception type, not a list. Squashing the list of possibilities rather uselessly.

You can work around this with N `T extends Exception`s, but now you have to pick the correct one all the time. And e.g. using it in a `map`-style stream with a final collected throw means picking whether you're adding type N or not. Or possibly multiple new types. It rapidly grows to be unusable.

You also can't make a `class MyException<T>`. Or do a `catch (T e)`. There are a lot of blockages in practice to trying to do any of this - exceptions are very special in the type system, which is the problem.


You definitely won't find me defending Java too often. And I certainly agree that there are frustrating limitations. Like you said, it's annoying that Java does have ad-hoc union types, but only for the throws list in function signatures and for the type specification in catch blocks. So, it's definitely painful that you can't use a similar syntax when implementing something like the generic interface example I wrote.

> You also can't make a `class MyException<T>`. Or do a `catch (T e)`. There are a lot of blockages in practice to trying to do any of this - exceptions are very special in the type system, which is the problem.

Agreed.

But, my entire contention with the discussion around checked exceptions is that everyone found some sharp edges and limitations with Java's checked exceptions and instead of deciding that Java suck{ed,s}, everyone seemed to decide that checked exceptions suck. That was the wrong conclusion, IMO, and I truly believe it has slowed progress in programming language design.

It's only recently that statically typed failure modes are becoming mainstream again (e.g., Rust, Swift, and many third-party libraries for languages like TypeScript and Kotlin among others).

Speaking of streams and combinators like map, Swift has the `rethrows` keyword which is absolutely awesome, IMO. It's this kind of progress that I think we've missed out on from everyone rejecting checked exceptions as a concept for the last decade or so. We threw the baby out with the bathwater.


>... instead of deciding that Java suck{ed,s}, everyone seemed to decide that checked exceptions suck.

Oh yes, absolutely agreed. It's particularly strange when it comes from people talking about how amazing Rust/Swift/etc ADT errors are - if implemented in a reasonable way, they're expressively identical, so it just becomes "do you like exceptions or returns" and that's much more opinion than fact. Java's checked exceptions are factually bad.


Go's approach has been to treat errors as a linked list, and thus one would explicitly create a chain of errors by wrapping each one as it passes up the stack. The end result would be an error like 'Error Z: Error Y: Error X', as each error in the list is 'unwrapped'.

The lack of any kind of caller information when creating an error makes it quite important to write decent error messages, which I think is actually quite hard to do.

At the same time I think it depends on what you're building: a library should have good errors (ideally well-typed ones too), but in an application you'd benefit from adding logging at each point in the stack (which can then contain caller information like file and line number) rather than just doing the logging at a system boundary; maybe set it at debug level. Then use tracing for the rest of it (for extra visibility in stuff like Sentry).

At least, I feel like that's how you'd be encouraged to do it in Go considering the opinions of Go's creators.


There's also a usability problem.

Handling results with map, map_err and .ok is way easier to follow that the minimum 4 lines you have to add in Java to do anything about a checked exception (try {} catch {}).

Explicit error handling/ignoring/passing is way better than implicit, so the direction of checked exception is good.

The debate is not really checked exceptions vs Result, it's try/catch vs map_err (and friends). And will always chose the latter.


But, this isn't you complaining about checked exceptions vs Result. This is you complaining about Java's overall syntax style vs Rust's.

Phrased another way, Java's syntax is fairly verbose for everything, not just for try-catching to handle exceptions.


I don't know any language that has exceptions and also has no try/catch type syntax.

> But, this isn't you complaining about checked exceptions vs Result

Yes, I said so exactly

> The debate is not really checked exceptions vs Result, it's try/catch vs map_err (and friends)

The fundamentals are the same, you are forced to handle/discard/Buble up any error, but in my mind (and I assume a lot of other developers), the word "exception" means try/catch, even though like I said the fundamentals are the same.


A simple usability improvement for try..catch in Java would be to make it an expression, so initializing a variable with a fallible operation no longer requires declaring it outside, which is ugly.

> But then people realised that 99% of the time you just want to handle the error by passing it upwards

This seems like a gross exaggeration

> So it seems that what people actually want is errors that by default get transferred to their caller

Hell no


You're not far off. This is one of my favorite topics in programming language design discussions, and I have opinions that some may even say are "controversial". For what it's worth, I've been writing Rust in production since 2016 (not 100% of my time since then, but I've had a good amount of experience with some decently long-lived projects of varying complexity).

First, I assert that Java's checked exceptions are a solidly good feature. Of course it has flaws. The whole rest of the language is also full of flaws, so that's not surprising.

Second, I assert that there are two things that have caused the vast majority of hate toward Java's checked exceptions: programmers not being taught/shown how and when they're intended to be used, and that oft-circulated interview transcript from 2003 where Anders Hejlsberg asserts that checked exceptions are language design "dead end". I don't think he was right in 2003, and I especially don't think the opinion is correct today in light of how much strong static typing has really gained favor with the programming community. But, that opinion really took off and we spent years and years seeing that assessment repeated as a truism, which I think is why it took so long to finally start experimenting with statically typed failure modes again (e.g., Rust and Swift).

Now, here's where I'll get controversial about Rust error handling. I'll try really hard to keep this from turning into an entire dissertation, but I'll elaborate if anyone asks.

It is often a mistake to implement the `From` trait for error types and use the `?` operator everywhere. Error types in an API need to be aware of the context in which they occur, so just converting by type only often doesn't make sense. You may encounter a `FooError` type while your app is doing totally different things, so it's likely that not every `FooError` occurrence means the same thing to whoever is calling into your code. Also, sometimes you can actually handle an error, and getting into the muscle memory habit of just tacking `?` on to everything can lead to mistakenly propagating errors that you might have better handled by doing something else (including perhaps panicking).

There does seem to be a trend toward automatically adding stack traces in Rust errors. This is completely misguided, IMO. And this may be my MOST controversial opinion: stack traces almost *never* belong in a `Result<>` error type. Result types should be relevant to your "domain" (borrowing the term from "Domain Driven Design" even though I do NOT advocate for DDD in general).

Think about it this way: designing an API is about abstraction. So if you write a integer division function that takes two arguments and divides them, it might return `Result<i64, DivideByZero>`. If the caller passes in a 0 divisor, then what business is it of theirs to see what your private functions are called, how many of them are called, and what line of your file they were defined on? That's the leakiest of leaky abstractions.

You might be thinking: "But, if I see an result/error value that I didn't expect while running my program, the stack trace will help me track down the issue!" Yeah, no kidding. So, let's also start adding stack traces to our successful values, too! After, all, if I call my division function and get back a `Result::Ok` with a weird number that I didn't expect, I might want to trace that back, too, right? (This suggestion is sarcastic to prove a point. It should, hopefully, sound ridiculous to add stack traces to every return value from every function.)

The issue is that Rust's Result (and Java's checked exceptions) require a different paradigm. A Result is in the type signature because it's part of your domain's API design. It's just values. It's not *for* debugging. You use a debugger for that or programmatically panic when something is truly unexpected and get the stack trace from that.

Which leads to the corollary to the previous controversial opinion: Rust has unchecked exceptions; they're called panics and they are 100% *okay to use* in the vast majority of applications that the vast majority of day-job programmers work on.

Obviously, context matters, and there are some places where panicking is unacceptable. But, Result is for expected domain failures. Panics are for programmer errors and unrecoverable constraint violations. And I'm not advocating for panics to be "lazy". Rust code that refuses to ever panic (as far as they know, but I hope they aren't indexing any vecs/arrays just in case!) usually leads to overly polluted error types where it ends up being difficult to understand what errors are actually meaningful and what errors are never actually going to happen. Instead of inspecting errors and figuring out which to handle and how, I've seen things just snowball into a giant mess of nested enums with sometimes redundant error "branches" and missed opportunities to actually handle some cases. If you, as the programmer, know for sure that you just added something to a HashMap earlier in your function and you know you didn't remove it, then for the love of all things sacred, just write `map.get("my-key").unwrap()` (or `.expect("message")`--whatever) instead of making the caller have to consider an error that will never happen, is not their fault, and that they can't do anything about!

And, if you do have a situation where panicking is unacceptable (you must be using `#![no_std]`, right??), then don't make a bunch of different error types for all of the possible programmer bugs. Just make a single umbrella `FatalError` type and use that.

For further reading, I really like this piece from the book Real World OCaml, which also has a Result type and exceptions: https://dev.realworldocaml.org/error-handling.html. Specifically, the very last section at the bottom of the page, titled: "Choosing an Error-Handling Strategy". (The old version of that page used to be more plain HTML and the sections had anchors so I could link directly to that section...)

And for further reading about error handling strategy in a no-panic context, I really like the approach described here: https://sled.rs/errors


> Result is for expected domain failures. Panics are for programmer errors and unrecoverable constraint violations.

The problem is that "unrecoverable constraint violations" happen a lot in practice when you're dealing with filesystems, networking...anything that isn't pure computation.

Suppose I have a function that calls other functions that themselves make 3 database queries, two HTTP requests, and reads/writes from a cache directory. It considers all of them (except perhaps the caching) unrecoverable in the context of that function. What should it do?

I see three reasonable options:

(1). return a simple error type saying "Networking failure", "IO Error", etc if any of those fail

(2). return a complex error type that exposes the internal details of all the different things it's doing and which one failed and why

(3). panic if any of them fail

I would argue that (1) is unfit for purpose as you have no idea what's actually going wrong.

And (3) is currently very heavily discouraged, though I think if I'm understanding your argument right it probably makes the most sense. However it leaves your top-level function in the awkward position of needing to make that panic part of its API contract, without the type system to help. It's also highly limiting because the caller now can't distinugish between programmer errors and possibly-transient environmental conditions like a service outage.

(2) is what I'd expect to see in practice right now, and that's what leads to these automatic stack traces, etc. But none of these feel like good options. Ideally I'd want something that is:

- Debuggable (like (2) and (3))

- Part of the type system (like (1) and (2))

- Still allows introspection by the caller (like (1) and (2))

- Doesn't require a ton of boilerplate at each level (like (3), and possibly (1))

(edited for formatting)


No, I don't think you understood the GP's argument. Network and filesystem errors are not always "unrecoverable constraint violations". They're often just simple errors -- things that you should expect to happen, even -- and your (1), or, better, (2), are the most appropriate reactions to those.

"Unrecoverable constraint violations" occur, for example, when you've done a sanity check on some data structure and found that it's in a state that should be impossible, and so continuing from there is unsafe.

Even then, you may choose to handle them in a better way than simply aborting the program. For example, if I'm writing a HTTP service that is backed by a database, and I get a customer request that results in me finding that a column in the database is NULL when it shouldn't be, I'll probably just return a 500 error to the customer rather than panic!(). The assumption is that even though there's a problem with this particular data, that might be the result of an almost-never-hit edge case, and we can still serve other customer requests just fine.

Sure, a simple single-user command-line application may choose to panic!() if a critical data file can't be opened from the filesystem. Maybe that is an "unrecoverable constraint violation" sometimes. But I think there's a lot of nuance you're missing.


It's fun to think through examples like this. But, of course, we need to exercise caution because so much is dependent on the specific contexts of each individual project.

First, I will say that I probably misspoke (mistyped...?) by using the word "unrecoverable". At the end of the day, it's not even really about whether or not something is recoverable, but it's really just about whether the caller might "want" to be aware of it and how much detail the caller needs.

For your example, you end up writing,

> It's also highly limiting because the caller now can't distinugish between programmer errors and possibly-transient environmental conditions like a service outage.

That's the giveaway that the caller needs to know about service outages, specifically. So, you need to handle your HTTP requests and/or database queries in such a way that you can incorporate some of the failures into your function's error type.

But, you SHOULD NOT just implement `From` for converting all of your database library's errors into your function's error type. You have to actually inspect the error returned from the database and return an appropriate error. Specifically, if you're using a SQL db library, it might return an error if your query generating invalid SQL statements--that should be a panic because that's not a "service outage", that's a programmer bug in the implementation of the function. Likewise, an auth error is not the same as an outage. If the db library specifically returns an error that it can't make a connection, then that's the one you'd want to wrap in your error type in this example.

But, again, it all depends on exactly what kind of project we're working on. Your example of doing HTTP, and filesystem, and database queries reminds me of Firefox. Firefox obviously does HTTP stuff, and it uses the filesystem and a SQLite database for settings or configs or something... So, if we were talking about your example function in the context of writing a web browser, then a failed HTTP request is 100% normal and expected because the user's device might connect and disconnect from the internet at any time. So, HTTP failures should be represented in the function's signature. However, since the SQLite database is basically part of the application, itself, any errors when trying to query it are probably panic-worthy. Phrased differently: it's a working assumption of the application that the database is always accessible, so there's no reason to describe failure modes that aren't supposed to ever happen. If the database ever became inaccessible, the top-level main function should catch all panics, log something about them (maybe send off telemetry data, etc), and warn the user that an unexpected error occurred and either tell them to restart the app or just kill ourselves, etc.

Have you ever written a function that returned a `String`? Or a `Vec`? Well, those require memory allocations and they may fail and panic. But, I've never worked in a context where it made sense to try to catch those panics and change those function signatures into `Result<String, OOM>`. My applications choose to assume that enough memory will be available, and I've made the decision to allow the apps to crash and burn if that assumption ends up violated rather than add the large burden of carefully handling that possibility in every line of code in these projects. And, so far, that has been the right call because none of my Rust projects have ever OOM'd yet (and some have literally been running in production for multiple years), and there's really nothing I would want to specifically do if they did-- I'd either figure out how to reduce the memory requirements or increase the server's memory.


> First, I assert that Java's checked exceptions are a solidly good feature.

I agree in theory, but I think they're very poorly implemented, and the syntax and tooling around handling them is terrible. And, frankly, those flaws (yes, I agree everything has flaws) make the overall feature mostly useless, unfortunately. It really doesn't matter where you think all the hate comes from; the hate is there, and it means that very few people use checked exceptions, except for where they're required to when stdlib methods throw them. Ultimately that's all that matters. If no one uses the feature, then it's not a useful feature, regardless of the reasons.

> The issue is that Rust's Result (and Java's checked exceptions) require a different paradigm. A Result is in the type signature because it's part of your domain's API design.

Correct, but in Java, checked exceptions are also a part of the API and ABI, so there's really little difference there, outside of ergonomics. (Which IMO are one of the most important parts!)

> (This suggestion is sarcastic to prove a point. It should, hopefully, sound ridiculous to add stack traces to every return value from every function.)

I don't think that proves a point. Sure, you can argue every proposal into absurdity; it doesn't make the suggestion itself bad.

> Rust has unchecked exceptions; they're called panics and they are 100% okay to use* in the vast majority of applications that the vast majority of day-job programmers work on.*

Yes, and this really bothers me. I wish more people would annotate their functions with `#[no_panic]`. Actually, I wish that was the default, and if you want to write a function that panics or calls functions that can panic, you need to annotate the function with `#[can_panic]`, and the compiler should enforce that, and `rustdoc` should surface that in all documentation.


You might be thinking: "But, if I see an result/error value that I didn't expect while running my program, the stack trace will help me track down the issue!" Yeah, no kidding. So, let's also start adding stack traces to our successful values, too! After, all, if I call my division function and get back a `Result::Ok` with a weird number that I didn't expect, I might want to trace that back, too, right? (This suggestion is sarcastic to prove a point. It should, hopefully, sound ridiculous to add stack traces to every return value from every function.)

I don't think I disagree with the ends you're proposing (don't add stack traces to every value, don't add stack traces specifically to Result::Err(E) variants); however, this is a bad way to justify it. Tools like dtrace / bpftrace do exactly this kind of stack tracing for both success and error cases across entire systems. This is a good thing™, and is actually very useful for both debugging, performance profiling, and understanding what your code is really doing on the hardware.

So I guess I disagree with how you're framing it. I would argue that adding stack traces to every value in Rust would be bad because it is a lot of overhead for something your kernel can and will do better.

The issue is that Rust's Result (and Java's checked exceptions) require a different paradigm. A Result is in the type signature because it's part of your domain's API design. It's just values. It's not for* debugging. You use a debugger for that or programmatically panic when something is truly unexpected and get the stack trace from that.*

This really is the gist of it. However, I will say that in my experience the reason that Result types are nice (over e.g. exceptions) is that putting the error cases in the type contract means that you can have the compiler check when someone hasn't handled an error case (? and unwrap are "handling" it even if they may not always be appropriate), as well as statically verify which variants may be unused. One very frustrating thing I've had to encounter in C++ is finding a whole list of different errors that have been duplicated as multiple different opaque (e.g. behind a unique_ptr<std::exception> or some such) exceptions across the codebase.

Being able to know what variants of error can come out of an API is great! It just happens that working with a rich type system like Rust makes it possible to do all manner of things that languages-with-only-exceptions cannot.


Yeah, fair point about dtrace, et al, but I think my statement is still fine in context, since we're specifically talking about these Rust libraries that collect stack traces for error types.

And I agree and love having statically checked failure modes! So, if you're choosing to panic in Rust, it better be because of something that is really not able to be handled at all (caveat: the top-level event loop or whatever could catch panics/exceptions, print a "Oops! Something went wrong!" message to the user and then either die or try to keep going, etc, but no handling panics/exceptions in "middle" layers.).


characterizing people who think checked exceptions as either bad programmers or unable to have their own opinion on the matter does not do a great service to your argument

Yeah, that whole statement there is probably unnecessary and I can see it being off-putting. I'll edit it if I still can.

However, I just want to make it clear that I wasn't intending to call anyone a "bad programmer". At least not in a personally insulting way. We've all been in a position where we were uninitiated at something. And most of us have been in a situation where we've jumped into a new programming language without having any kind of "formal" education on the design, philosophy, and intended best practices. For example, with Java, one should read documents like: https://docs.oracle.com/javase/tutorial/essential/exceptions..., especially this part: https://docs.oracle.com/javase/tutorial/essential/exceptions....

So, again, that part wasn't actually meant as an insult. We're all uneducated about many things at every point in our lives. And I think that lack of education or guidance on designing error types and handling has caused a lot of people to end up burying themselves in checked exception hell, and dismissing the whole thing because of that frustration.

The other part about cargo-culting... well, yeah, that was me insulting people.


I have a theory that what people actually want is something ala named exceptions + forced try catch with pattern matching + automaitally derived return Type.

You can use 1 type of error enum for your app

for example me, Yes my code can fail and only have 1 type eg: AppError

but I can supplement that with db error,cache error,serde error etc


Addendum, CLU, Modula-3 and C++ checked exceptions, before Java got the blame.

> Consequently, this also means you cannot define two error variants from the same source type. Considering you are performing some I/O operations, you won't know whether an error is generated in the write path or the read path. This is also an important reason we don't use thiserror: the context is blurred in type.

This is true only if you add #[from] attribute to a variant. Implementing std::convert::From is completely optional. Personally I don't prefer it too as it ambiguates the context. I only use it for "trivially" wrapped errors like eyre::Report.


Yup. I absolutely would throw `#[from]` on everything when I started using thiserror, but now only do so in incredibly obvious cases like

  enum CarWontMove {
      EngineTroubles(EngineTroubles),
      WheelsFellOff(WheelsFellOff),
  }
Even then, there’s often some additional context you can affix at that higher level.

SNAFU follows much the same idea: we have an attribute you can add [0] when you want to allow directly implementing `From`. Like thiserror, you can also mark an error as transparent [1] when even the error existing doesn't provide useful information.

[0]: https://docs.rs/snafu/latest/snafu/derive.Snafu.html#disabli...

[1]: https://docs.rs/snafu/latest/snafu/derive.Snafu.html#delegat...


Using #[from] in a thiserror enum is an antipattern, IMO. I kind of wish it wasn't included at all because it leads people to this design pattern where errors are just propagated upwards without any type differentiation or additional context.

You can absolutely have two different enum variants from the same source type. It would look something like:

    #[derive(Debug, Error)]
    pub(crate) enum MyErrorType {
        #[error("failed to create staging directory at {}", path.display())]
        CreateStagingDirectory{
            source: std::io::Error,
            path: std::path::PathBuf,
        },

        #[error("failed to copy files to staging directory")]
        CopyFiles{
            source: std::io::Error,
        }
    }
This does mean that you need to manually specify which error variant you are returning rather than just using ?:

    create_dir(path).map_err(|err| MyErrorType::CreateStagingDirectory {
        source: err, path: path.clone() 
    })?;
but I would argue that that is the entire point of defining a specific error type. If you don't care about the context and only that an io::Error occurred, then just return that directly or use a type-erased error.

This is one of the things I like about SNAFU: it makes this preferred pattern the default and makes it nicer to use. For example, your usage would look something like this with SNAFU:

    create_dir(path).context(CreateStagingDirectorySnafu { path })?;
Note a few points:

1. No need to use the closure

2. No need to carry the source error over yourself (`context` does this for you)

3. No need to explicitly call `clone` on the path (`context` does this for you)


Why does adding `backtrace` to thiserror/anyhow require adding debug symbols?

You'll certainly need it if you want to have human readable source code locations, but doesn't it work with addresses only? Can't you split off the debug symbols and then use `addr2line` to resolve source code locations when you get error messages from end users running release builds?


It should be possible (it'd need to also save memory map), but for some reason Rust's standard library wants to resolve human-readable paths at runtime.

Additionally, Rust has absurdly overly precise debug info.

Even set to minimum detail, it's still huge, and still keeps all of the layers of those "zero-cost" abstractions that were removed from the executable, so every `for` loop and every arithmetic operation has layers upon layers of debug junk.

External debug info is also more fragile. It's chronically broken on macOS (Rust doesn't test it with Apple's tools). On Linux, it often needs to use GNU debuginfo and be placed in system-wide directories to work reliably.


> (it'd need to also save memory map

Typically the memory map is only required when capturing the backtrace and when outputting the stack frames' addresses relative the the binary file sections are given/stored/printed (with the load time address subtracted). E.g. SysRq+l on Linux. This occurs at runtime so saving the memory map is not necessary in addition to the relative addresses.

Not sure if this is viable on all the platforms that Rust supports.

> but for some reason Rust's standard library wants to resolve human-readable paths at runtime.

Ah, I see that Rust's `std::backtrace::Backtrace` is missing any API to extract address information and it does not print the address infos either. Even with the `backtrace_frames` feature you only get a list of frames but no useful info can be extracted.

Hopefully this gets improved soon.

> External debug info is also more fragile.

I use external debug info all the time because uploading binaries with debug symbols to the (embedded) devices I run the code on is prohibitively expensive. It needs some extra steps in debugging but in general it seems to work reliably at least on the platforms I work with. The debugger client runs on my local computer with the debug symbols on disk and the code runs under a remote debugger on the device.

I'm sure there are flaky platforms that are not as reliable.


Your binary usually won't get loaded at the same address in memory. The addresses would be useless without the memory map.

That's solvable though. The bigger problem is how you unwind the stack. the stack is not generally unwindable, unless you're the compiler. Debug symbols include information from the compiler about the stack sizes and shapes to help backtrace with unwinding the stack. It's quite possible to include such symbols in the final binary without adding debug symbols, a lot of compilers just don't have a specification for that.


You don’t need debug symbols to unwind the stack, you just need the .eh_frame section, which compilers emit by default regardless of whether you’re building with debug symbols.

Source: I work on a profiler (Parca) that does stack unwinding. It works fine on Rust binaries with or without debug symbols.


> Your binary usually won't get loaded at the same address in memory.

The addresses you typically see in a backtrace error message (with debug syms disabled) are relative to the sections in the binary file, the runtime address it was loaded at has already been taken into account and subtracted. At least that's how you typically see a backtrace address in a typical native app on Linux.

> The bigger problem is how you unwind the stack.

Rust can unwind the stack on panic when built without debug symbols.


> Then, to be able to translate the stack pointer we will need to include a large debuginfo in our binary. In GreptimeDB, this means increasing the binary size by >700MB (4x compared to 170MB without debuginfo).

Surely that's comparing full debuginfo, right? Backtraces just need symbols, not full debuginfo, and there's no way the symbols are 4x the size of the binary.


There's also split-debuginfo, which allows emission of debug info into a separate file, rather than needing to distribute it in the binary. Then they could capture stack traces, and resolve the symbols later if necessary. That would also address their concern about how long it takes to capture a stack trace, because just gathering the addresses themselves is quick.

The simplest thing that "just work" for me is replacing ? with .context(h!())? and this macro:

#[macro_export]

macro_rules! h {

    () => {

        concat!("at ", file!(), " line ", line!(), " column ", column!())

    };
and then using anyhow::Result.

Solves 99% problems in error handling


Hey all, I’m the author of SNAFU (mentioned in the article). I’m off to bed now, but I’d be happy to try and answer any questions people might have sometime tomorrow.

I’m glad to see SNAFU was useful to others!


Its looks really neat! Two questions:

* can it be used as a build dependency (i.e symbols from the snafu crate don't appear in the generated code).

* I assume you have to use one of the macros (ensure! or location!) when constructing an error that contains a location?


It can't be used as a literal build dependency [0], no. However, the fact that your crates uses SNAFU should [1] be completely hidden from your users. From the outside, you just return a regular enum or struct as your error type. If you were to look at the symbols in the resulting binary, I would expect that you could see references to the trait method `snafu::ResultExt::context` (and similar functions across similar types) depending on how well the code was inlined. If you use other features like `snafu::Location` or `snafu::Report`, those would definitely show up.

You don't have to use the macros, no. When you define your error type, you can mark a field as `#[snafu(implicit)]` [2]. When the error is generated, that field will be implicitly generated via a trait method. The two types this is available for are backtraces and locations, but you could create your own implementations such as grabbing the current timestamp or a HTTP request ID.

[0]: https://doc.rust-lang.org/cargo/reference/specifying-depende...

[1]: There's one tiny leak I'm aware of, which is that your error type will implement the `snafu::ErrorCompat` trait, which is just a light polyfill for some features not present on the standard library's `Error` trait. It's a slow-burn goal to remove this at some point, likely when the error "provider API" stabilizes.

[2]: https://docs.rs/snafu/latest/snafu/derive.Snafu.html#control...


It's technically feasible to add SpanTrace support to thiserror fairly easily (30 mins work - Issue: https://github.com/dtolnay/thiserror/issues/400, PR: https://github.com/dtolnay/thiserror/pull/401). This would solve part of the problem in a way that is meaningfully good for that side of the ecosystem. I suspect you could probably do something similar for Snafu

Without deeply looking into it, I'd expect that to integrate with SNAFU, you could basically write something like this:

    struct SpanTraceWrapper(tracing_error::SpanTrace);
    
    impl snafu::GenerateImplicitData for SpanTraceWrapper {
        fn generate() -> Self {
            Self(tracing_error::SpanTrace::capture())
        }
    }
And then you can use it as

    #[derive(Debug, Snafu)]
    struct SomeError {
        #[snafu(implicit)]
        span_trace: SpanTraceWrapper,
    }
This will capture the `SpanTrace` whenever `SomeError` is constructed (e.g. `thing().context(SomeSnafu)` or `SomeSnafu.fail()`.

Neat :)

What is really annoying with thiserror is the wizard refusal to give us an easy way to print the error chain. No I dont want to convert it to anyhow just to print the error...

Rust is full of these, I’ve found the community simply falls back on user error to understand rust when vexed by in my opinion basic software operations.

As someone who works extensively in cpp/java/python. I want so much to love rust, but unfortunately I haven’t found it to be productive after 6+ side projects.


Rust's community is slightly more fragmented than it should be. The community being built while the language was changing so dramatically (e.g. async) didn't help, but it also is part of what lead to Rust in the first place.

But it's still somewhat young, lots of stuff is being built. So some of the lack of productivity probably just comes from not knowing the right stacks yet.


It’s young, but my experience has been that developer ergonomics is not a focus, to the extent that c++ has a much stronger devex story.

same here.

Interesting approach! We had a similar journey at HASH to figuring out how we deal with stacked errors (as well as collecting parallel errors), developed the `error-stack` crate to solve for it. It works by abstracting over the boilerplate needed to stack errors by wrapping errors in a `Report`. Each time you change the context (which is equivalent to wrapping an error) the location is saved as well, with optional spantrace and backtrace support. It also supports supplying additional attachments, to enrich errors. We spent quite a bit of time on the user output, as well (both for `Debug` and `Display`) so hopefully the results are somewhat pleasant to work with and read.

This seems like a user implement of Zig error return traces: https://ziglang.org/documentation/master/#Error-Return-Trace...

A good error report is not only about how it gets constructed, but what is more important, to tell what human can understand from its cause and trace. In this example, we analyzed and showed how to design stacked errors and what should be considered in this process.

    async fn handle_request(req: Request) -> Result<Output> {
        let msg = decode_msg(&req.msg).context(DecodeMessage)?; // propagate error with new stack and context
        verify_msg(&msg)?; // pass error to the caller directly
        process_msg(msg).await? // pass error to the caller directly
    }

    async fn decode_msg(msg: &RawMessage) -> Result<Message> {
        serde_json::from_slice(&msg).context(SerdeJson) // propagate error with new stack and context
    }

how to capture the virtual stack when `verify_msg` returns an error? Do you have some lint to make sure every error is attached with a context?

I don't think you need a lint. When you define the error type returned by `handle_request`, you decide how the error type returned by `handle_request` will be incorporated. If you've decided to implement `From` then you've decided you don't want/need to add context. Otherwise, the compiler will give you an error when you use `?`.

The time I can think this won't work is when you are reusing error types across places. Recently, I've been experimenting with creating a lot of error types, so far as one unique error type per function. I haven't done this for long enough to have a real report, but I haven't hated it so far.


Inspired by this blog post I just added an `#[implicit]` field feature to the `thiserror` crate. It makes it easy to automatically annotate errors with things like code location (per this blog post), a timestamp, or a backtrace without requiring further modifications to the thiserror crate. I'm hoping that dtolnay will consider it. You can find my PR here: https://github.com/dtolnay/thiserror/pull/402


Probably also worth mentioning: https://crates.io/crates/error-stack

[flagged]


This seems to be a fairly common sentiment. I consider Rust's syntax fairly consistent and elegant for a curly brace language, but evidently I have some blind spots. What quibbles do you have with Rust's syntax?

The explosion of single character sigils and the taint of C++'s template syntax.

> and the taint of C++'s template syntax.

Interestingly enough, nobody says that when talking about TypeScript…


What languages have better template syntax?

Using that (or similar, e.g. with [] instead of <>) syntax for generics is common across many languages, not just Rust or C++. You may find it ugly, but there's plenty of precedent. What do you think would be better?

The only single-character sigils I can think of are '&', '*', '\'', and maybe '?'. Am I missing any?

The ' for lifetimes as well.

That was the third example I gave.

I kinda feel like macros!() should count.

It's probably the explosion of those characters and punctuation characters like in this example: https://x.com/AndersonAndrue/status/1864457598629540348

To quote Tsoding, https://x.com/tsoding/status/1832631084888080750: "Rust is a very safe, unergonomic language with annoying community and atrocious syntax. Which is somehow surprisingly miles better than modern C++."


> It's probably the explosion of those characters and punctuation characters like in this example: https://x.com/AndersonAndrue/status/1864457598629540348

I feel like there's plenty of places to make criticisms of Rust's syntax, but the example they picked has like half a dozen places where the full path to reference an item is used instead of importing it. Sure, there are languages where you're required to import things rather than referencing them in the full path, but there are also languages where you don't have any flexibility in the paths to use something (e.g. Go and Java) or where they dump literally everything into a single namespace with no context (e.g. C and C++). Using the entire path to something instead of importing it is absurdly less common by at least an order of magnitude over importing things directly, so it's not like people are abusing it all over the place (if anything, people probably import things _more_ than they need to, like with top-level functions that might otherwise make their provenance obvious). Having an option to do things verbosely that most people don't actually do is "unergonomic"? It's like saying the problem with Unix file paths is that using absolute paths for literally everything is ugly; sure, it can be, but that's also just not how 99% of people use them.


The first example doesn’t show anything particularly wrong with Rust syntax. You can construct a troll example like this of an arbitrarily complex type in any language with generics.

The second tweet, as is common for these discussions, doesn’t give any specific example of what’s wrong with Rust syntax.

I remain baffled by how many people hate Rust syntax. I really don’t get it!


I really want to get it, I just don’t quite, at least not yet.

Why is old mate artificially constructing the most non-representative example possible?

Full import paths for map and iter? I’ve never seen anyone double-underscore stuff in Rust. Outside of specific stdlib cases, I haven’t seen anyone manually need to (or do) specify iterator adaptors like that.

They could have at least chosen a representative example. This just comes across as fabricating something to get angry at.


Anderson's example has literally one more sigil than the equivalent C++

What's mandated to be a single character? I'm not sure what the popular style is today.

Which statically typed language do you find most agreeable?

Same question for any language.


Rust's syntax is largely irrelevant to its purpose. If you don't see the need for it, might as well learn something else.

idk if its about a few syntax, then it's possible to make a temp proc-macro for those



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: