Rust for C++ Programmers Part 7: Data Types

Nycto · on May 24, 2014

I would love to see an article like this for error handling in Rust. I was really interested in using conditions, but those were apparently backed out. Which leaves error handling through return values and macros. This seems like a step back from Exceptions to me. I want to be convinced otherwise, but I'm struggling to see how this is better than other mechanisms.

kibwen · on May 24, 2014

  > This seems like a step back from Exceptions to me. I 
  > want to be convinced otherwise, but I'm struggling to 
  > see how this is better than other mechanisms.

In a low-level language, guaranteeing memory safety in the face of resumable exceptions would be a nightmare. See Graydon's original post on the choice to avoid exceptions:

https://mail.mozilla.org/pipermail/rust-dev/2013-April/00381...

Selected quote:

  > In particular, to summarize for the impatient: once you get resumable 
  > exceptions, your code can only be correct if it leaves every data 
  > structure that might persist through an unwind-and-catch (that it 
  > acquired through &mut or @mut or the like, and live in an outer frame) 
  > in an internally-consistent state, at every possible exception-point.

  > I.e. you have to write in transactional/atomic-writes style in order to 
  > be correct. This is both performance-punitive and very hard to get 
  > right. Most C++ code simply isn't correct in this sense. Convince 
  > yourself via a quick read through the GotWs strcat linked to:

  > http://www.gotw.ca/gotw/059.htm
  > http://www.gotw.ca/gotw/008.htm

For more on the topic of exception-safety in C++, see the following paper by Bjarne Stroustrup:

http://www.stroustrup.com/except.pdf

I don't think that Rust's error handling solution is ideal, but I think that it might be approaching the best possible solution for its chosen context. Error handling is a hard problem!

kibwen · on May 24, 2014

One last thing that deserves to be mentioned: Rust does have unwinding-on-failure, which is similar to exceptions, with the restriction that unwinding can only be caught at task boundaries. This allows failure in a single component to be isolated and contained. The pertinent distinction here is that the unwinding is not resumable in the normal sense; at best, a parent task can detect that a child task has failed and attempt to restart the task, without having the ability to persist any of the failed task's state.

shasta · on May 24, 2014

> resumable exceptions

The word resumable is important to that quote, which isn't arguing against exceptions in general.

twic · on May 24, 2014

I suspect a similar analysis holds, though. After all, throw-catch style exceptions do involve resumption, just not at the point at which they're thrown. You have a similar challenge in making sure that you haven't left the world in an inconsistent state when an exception emerges from some code.

That said, i am skeptical that there is a significant safety practical difference between the use of checked exceptions, and the use of return values with a try! macro. In both cases, you are forced to acknowledge in the code that an exception can be thrown, which means that you have a chance to do the right thing about consistency.

We could imagine a version of checked exceptions where individual throw sites have to be tagged. A parallel universe version of Java [1] might look like:

  InputStream in = whatever();
  int b = in.read() throw IOException;

Wouldn't that be exactly isomorphic to Rust's use of try! ?

[1] No, not that Parallel Universe version of Java: http://blog.paralleluniverse.co/2014/05/01/modern-java/

TheHydroImpulse · on May 24, 2014

Error handling in Rust is actually pretty awesome. There's a standard `Result<T, E>` type. One can write a potentially fail-able function like:

    fn can_fail(arg: bool) -> Result<(), StrBuf> {
        if arg {
            Ok(())
        } else {
            Err(StrBuf::from_str("Oops! Something went wrong."))
        }
    }

Here, there's no value when the function succeeds `()`. When it fails (I don't mean fail as in a `fail!()` or a panic or anything), you get a string back.

You can pattern match the return value:

    match can_fail(true) {
        Ok(_) => {},
        Err(err) => {}
    }

This can get quite cumbersome, however. That's why there's a `try!` macro that adds composability. The idea is that if you have a function returning a `Result`, wherever that function is being called could also return a `Result`.

     fn higher_up() -> Result<(), StrBuf> {
         try!(can_fail(true));
     }

`try!` is simply:

    match $e { Ok(e) => e, Err(e) => return Err(e) }

This allows error to propagate up the chain.

When you're working in big-ish projects, it'd be best to have something better than a simple `StrBuf` for an error. You'd probably want a struct:

    pub struct LibError {
        message: StrBuf,
        error: Error
    }

    pub enum Error {
        One,
        Two,
        Three
    }

Where `Lib` is the library/project name.

You can then create a new result type based on your new error type:

    type LibResult<T> = Result<T, LibError>;

Then you can use `LibResult<T>` everywhere in your app.

You can view this example done in a few of my own libraries (https://github.com/TheHydroImpulse/gossip.rs/blob/master/src...) and cargo (https://github.com/carlhuda/cargo/blob/master/src/cargo/util...).

That's a simple overview of it. Error handling is super simple, not verbose (thanks to try!) and in your control. Because of Rust's type system, things like `Result<T, E>` is available and are so much better than simple return values (like integers: -1 vs 0 uhhh)

tomjakubowski · on May 24, 2014

In addition to the (very handy) try! macro you can also map over Results and even chain monadic-like operations on them (either on the Ok or Err sides of a Result):

    enum FooError {
        XWasFalse,
        XWasUnknown
    }

    enum BarError {
        VectorTooLarge,
        FooErr(FooError)
    }

    fn foo(x: Option<bool>) -> Result<uint, FooError> {
        match x {
            Some(true) => Ok(42),
            Some(false) => Err(XWasFalse),
            _ => Err(XWasUnknown)
        }
    }

    fn bar() -> Result<Vec<uint>, BarError> {
        foo(None).or_else(|e| {
            // We can recover from an XWasUnknown error returned by
            // Foo, but not from a XWasFalse, so we return the error wrapped
            // in bar's error type.
            match e {
                XWasFalse => Err(FooErr(e)),
                XWasUnknown => Ok(99)
            }
        }).and_then(|n| {
            if n < 100 {
                let vec: Vec<()> = Vec::with_capacity(n);
                Ok(vec)
            } else {
                Err(VectorTooLarge)
            }
        }).map(|vec| {
            vec.iter().map(|_| { 42 }).collect()
        })
    }

edit: added a better example

doe88 · on May 24, 2014

Another really great thing about returning a result Result<T, E> is that your caller code will then must use this return value or otherwise a warning will be emitted by the compiler at compile time. For those interested, more details are provided in the core lib documentation: http://doc.rust-lang.org/core/result/

Pxtl · on May 24, 2014

I tend to practice "only catch what you can handle" in exception-enabled languages - I haven't written in a systems language in almost a decade, mostly bad memories of C. How much does error-handling get in the way when you have to live without stack unwinding?

kibwen · on May 24, 2014

So, normally in Rust, it's no problem to ignore the return value of a function. However, some types are tagged with the `#[must_use]` attribute, which makes it a warning at compile time to ignore the return value of any function that returns that type. Take the following program, which writes a buffer of bytes directly to stdout:

  fn main() {
      let mut out = std::io::stdout();
      out.write([0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x21]);
  }

The `.write()` method returns a Result type. The output of compiling this program:

  $ rustc pxtl.rs
  pxtl.rs:3:5: 3:53 warning: unused result which must be used, #[warn(unused_must_use)] on by default
  pxtl.rs:3     out.write([0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x21]);
                ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Again, it's just a warning, so the program will compile and run as expected:

  $ ./pxtl
  Hello!

If you really don't care about the return value here, the simplest (and probably best) way of appeasing this warning is to explicitly ignore the return type by making use of pattern matching:

  fn main() {
      let mut out = std::io::stdout();
      let _ = out.write([0x48, 0x65, 0x6c, 0x6c, 0x6f, 0x21]);
  }

In Rust, the underscore is a pattern that means "I don't care about this thing, completely ignore it". The advantage of using the underscore here rather than an actual variable, e.g. `let x = out.write(...)`, is that it will be impossible to refer to the return value later on and thus explicitly expresses your intent to ignore it. (Furthermore, if you assigned the return value to a variable and then didn't use it later on, Rust would emit yet another warning, this time for having an unused variable.)

The warning message alludes to a second way of silencing this error, which is by sticking the `#[allow(unused_must_use)]` attribute on top of your function. This will silence any warnings that arise from that function. If you wanted to disable this warning for your entire program, you could instead stick the `#![allow(unused_must_use)]` global attribute at the top of your program. Alternatively, you could compile the program with the `--allow unused_must_use` flag to completely silence all warnings of this type.

(One final note: in all cases where you see word "allow" used above, if you replace it with "deny" it will turn the warning into a compile-time error, thus enabling you to enforce a more rigorous error-handling strategy if you so choose.)

teacup50 · on May 24, 2014

"only catch what you can handle" is incredible nonsense. Your code is the only code that knows how the code it's calling might fail -- it MUST catch all exceptions and either handle them, or re-raise them with a well defined type that is documented and declared in your API.

Anything else just leads to buggy software that has a try/catch block at the top level of the event loop/main/thread start function to deal with all the errors that leak out of its implementation and leave the process in an undefined state.

Exceptions are simply broken and awful. Java does them sorta right with checked exceptions, but the only safe thing is to not do them at all.

Pxtl · on May 24, 2014

Obviously you should be converting exceptions that leave your library into other areas, but internally? Converting exceptions over and over and over again just means losing information from those exceptions, or worse hiding them. If I'm forced to dump a stack-trace to the text file, I want the exact exception that caused the problem, not some vague "Operation Exception" that quintuply wraps my actual desired exception, or worse completely threw it out to "cleaned it up for me" and tells me nothing about what went wrong.

I just helped a teammate work through a bug the other day where somebody decided to "handle" a case-sensitivity problem in their home-brewed SqlLite data-access code by simply returning null for the data member if you got the wrong case. This resulted in improperly-cased column names producing objects with null members - no error happened because they were valid SQL queries, but the dictionary-reading code was silently failing when it was reading the result-set. If the program had just blown up when there was a miss on the dictionary of column names? We would've quickly found out about that stupid case-sensitivity.

Defensive coding just means your bugs go non-local and become data problems instead of exceptions.

teacup50 · on May 24, 2014

That doesn't make any sense. Defensive coding means not silently discarding errors (by returning null, in this case), and has nothing at all to do with exceptions.

As for rewrapping errors, yes, each subsystem should have its own error space. You don't lose data by nesting errors; on the contrary, each level can add additional context to an error result that makes debugging an unexpected issue far easier.

Daishiman · on May 24, 2014

Tell that to the Erlang guys, who have been writing some of the most fault-tolerant code of the past two decades with an explicit catch-what-you-can-handle attitude by design.

Their failure model lies in proper task supervision, coding for the expected case, and letting errors propagate up to the task level, where you can either kill a task, log and handle, propagate, or do whatever you wish.

teacup50 · on May 24, 2014

No, Erlang has been writing fault-tolerant code with an explicit functional, immutable design, with very explicit semantics for defining process supervision and restart at every point in the heirarchy.

That's not "catch-what-you-can-handle", that's "use functional programming and pervasive consideration of fault handling to ensure that you can handle faults at any layer".

pohl · on May 24, 2014

Java's checked exceptions are the worst — a failed language experiment if there ever was one.

Thank Guava for Throwables.propagate().

teacup50 · on May 24, 2014

The only failed language experiment are exceptions themselves.

I don't use Java APIs that don't throw checked exceptions; if your code does that, I won't even consider working at your place of business, because that means you don't understand that you've written a massive pile of ill-defined failure-prone code.

Unchecked exceptions are GOTO on steroids, and those GOTOs are part of the API contract. Java makes exception handling explicit and compiler checked -- hacking around checked exceptions makes exception handling implicit and human-checked, meaning that there's absolutely no static verification of a critical component of your API contract.

The problem isn't checked exceptions, the problem is that exceptions suck, and the only way to use them in a way that doesn't expose your code base to implicit GOTO failure modes is to use checked exceptions.

On our production software, we don't use exceptions at all, except where required by an API; instead, we always use monadic error handling. We have an uncaught exception handler for threads/thread pools/etc that does one thing: log the exception, and terminate the running Java process via System.exit(), allowing the process's watchdog to restart the failed process.

By its very nature, an uncaught exception is unexpected and places the process in an unknown state; the only safe thing to do is exit. Since the throwing of an uncaught exception triggers full process failure, it very much encourages defensive, safe practices that ensure that all error cases are handled and compiler-checked.

The result: our code is far more stable and reliable than any other project I've worked on, especially projects that have made use of runtime exceptions.

pohl · on May 24, 2014

I won't even consider working at your place of business...

Praise be! The feeling is mutual. I agree to disagree.

teacup50 · on May 24, 2014

Unfortunately, you'll lower the total value of the ecosystem by producing code and advocating practices that lower the level of reliability and correctness of code -- so agreeing to disagree doesn't really solve the issue that you write bad code.

pohl · on May 24, 2014

So you're saying that it's not possible to write reliable and correct code in a language like C#, and not one C# programmer in the world is worthy of being a colleague to the great teacup50. That's the logical conclusion of your assertions, for all of its exceptions are unchecked, and it is otherwise semantically similar. If it's possible to write reliable and correct code in C#, then it's possible to write reliable and correct code in Java minus checked exceptions.

As far as APIs are concerned, the important thing is that the API is documented to throw something. It's not at all important that the compiler forces you to pollute either the immediate method's body or its signature and the body of the calling method, etc.

teacup50 · on May 24, 2014

No, I'm saying it's not possible to write reliable and correct code in C# using exceptions without also doing all the heavy lifting of the compiler. You can also write reliable and correct code in dynamically typed languages, which involves doing even more work on behalf of the compiler.

This is not unique to C#; if you review coding standards for C++, you'll see plenty of people who have adopted a no-exceptions approach, Google included. Simply put, exceptions are a failed experiment, because checked exceptions are the only mechanism by which the type of your methods is fully defined.

As far as API documentation, that something gets thrown is part of the return signature, and it's no more pollution than expressing the return type is.

Your willingness to employ ambiguity as a means to avoid having to do the work necessary to fully specify your system's behavior is a lazy and logically flawed position; it creates a cognitive load for all consumers of your APIs, and breaks the utility of the compiler that we rely on to write and maintain reliable software more easily.

pohl · on May 24, 2014

Excuse me, but what exactly is it that you think you know about how I write code?

teacup50 · on May 24, 2014

You rely on human validation of your code's return type values via unchecked exceptions, and don't understand why your code is resultantly ambiguously defined.

pohl · on May 24, 2014

LOL...If you're just here to argue against your own imagination I guess you don't need me here.

teacup50 · on May 24, 2014

If you're ignorant to the degree that you don't understand how exceptions are part of the function's return type, it has nothing to do with my imagination.

pohl · on May 24, 2014

It's only the imaginary me in your head that doesn't understand that.

proksoup · on May 24, 2014

Is it maybe explicit programming? In my own mind I don't understand why errors should receive special treatment in the language. I prefer to explicitly handle them always.

ihnorton · on May 23, 2014

I'm curious why this isn't automatically destructured:

    struct IntPoint (int, int);

    fn foo(x: IntPoint) {
        let IntPoint(a, b) = x;  // Note that we need the name of the tuple
                                 // struct to destructure.

nrc · on May 24, 2014

In Rust, we are quite strict about the syntax used for destructuring matching the syntax used for instantiation. There is a name in the tuple-struct declaration, therefore you must use the name when pattern matching (even though it is redundant).

kibwen · on May 24, 2014

What do you mean by "automatically destructured"? If you mean the extra step of destructuring in the function body, that's not necessary. You can destructure like that anywhere that a pattern is accepted, which includes function parameter lists:

  struct Foo(int, int);

  fn bar(Foo(a, b): Foo) {
      println!("a: {}, b: {}", a, b);
  }

  fn main() {
      let qux = Foo(1, 2);
      bar(qux);  // a: 1, b: 2
  }

kzrdude · on May 24, 2014

What they mean is that there would be special cases for let patterns so that

     let (a, b) = Foo(a,b);

is a fine destructuring. It would be a special case for let, since the pattern would have to be more explicit in function arguments and in match, but I think they have a good point.

kibwen · on May 24, 2014

I should emphasize that tuple structs are a somewhat obscure feature. Their main use case is to support newtyping:

  struct Meters(f64);
  struct Miles(f64);
  let meters = Meters(10.4);
  let miles = metric_to_imperial(meters);
  let Miles(raw_miles) = miles;

The single-arity case above constitutes the vast majority of tuple struct usage. And, as you may expect, in any other context besides tuple structs a single-arity tuple is completely silly (the only reason that we have syntax for single-arity tuples at all is to make writing macros easier).

Ultimately it's just not a feature that would be pulling its weight. If you want a structure with multiple fields where destructuring is not necessary, just use a struct in the first place. Honestly, if we found a better way to support newtyping then I wouldn't be sad if we got rid of tuple structs entirely.

stormbrew · on May 24, 2014

For some reason I have an instinctual reaction that you should have to specify the type, but on reflection I'm not sure why. It is completely redundant, and since the fields aren't named for the purposes of a destructuring assignment like this any 2-field tuple is essentially equivalent.

Hm.

kibwen · on May 24, 2014

Tuple structs aren't used often, but the whole point of them is to force you to name a type. The idea is to restrict the types that you can call a function with; it turns a given tuple from a structural type to a nominal type.

For example, say you have two functions, where each function takes a single tuple of two floating point numbers:

  // Converts a Cartesian coordinate to a polar coordinate
  fn to_polar(coord: (f64, f64)) -> (f64, f64) { ... }

  // Calculates the area of a rectangle
  fn area(rect: (f64, f64)) -> f64 { ... }

Now, if you have a tuple that represents a coordinate, perhaps you don't want to feed it to the `area` function. Likewise for feeding a rectangle to the `to_polar` function. But because tuples are just structural types, something like `area(to_polar((2.5, 3.7))` is completely legal.

If you didn't want to allow this, or even if you just wanted to have greater control over all these anonymous tuples floating around, you'd use tuple structs to give them names:

  struct CarteCoord(f64, f64);
  struct PolarCoord(f64, f64);
  struct Rectangle(f64, f64);
  struct Area(f64);  // bonus round!

  fn to_polar(coord: CarteCoord) -> PolarCoord { ... }
  fn area(rect: Rectangle) -> Area { ... }

Taking the above steps makes `area(to_polar(CarteCoord(2.5, 3.7)))` a compile-time error. It's all about how strict you want to be with your types.

stormbrew · on May 24, 2014

Sure, and all of that makes sense for function arguments, but for a destructuring assignment I'm not sure the same constraints are meaningful.

winter_blue · on May 24, 2014

I just read the first section, just the section on structs -- what's different here, from C? It provides all of the same features, with a slightly different syntax.

nrc · on May 24, 2014

With structs, pretty much the only difference is the syntax. Enums are probably where the biggest differences are from C, as far as data types go.

kzrdude · on May 24, 2014

Read the rest!

To start you off, the best and most major difference to C is that Rust structs can have destructors.