Rust for C++ programmers – part 2: control flow

masklinn · on April 20, 2014

> If we want to index over the indices of `all` (a bit more like a standard C++ for loop over an array), you could do

    for i in range(0, all.len()) {
        println!("{}: {}", i, all.get(i));
    }

I'd suggest using `enumerate`[0] instead of creating a range on an existing collection:

    for (i, n) in all.iter().enumerate() {
        println!("{}: {}", i, n);
    }

[0] http://static.rust-lang.org/doc/0.10/std/iter/struct.Enumera...

acqq · on April 20, 2014

I have rather questions like: Why all.inter().enumerate() and not just all.enumerate() ? Why for a in all.iter() and not simply for a in all ?

Apparently is the language clever enough to match the types (which is good), why isn't than simply clever enough to recognize then that in on the array needs the iteration? Why isn't possible to directly enumerate the array?

Even (at least this one) C++ has something simple:

http://msdn.microsoft.com/en-us/library/jj203382.aspx

    for( auto y : x ) { // Copy of 'x', almost always undesirable
        cout << y << " ";
    }

    for( auto &y : x ) { // Type inference by reference.
        // Observes and/or modifies in-place. Preferred when modify is needed.
        cout << y << " ";
    }

    for( const auto &y : x ) { // Type inference by reference.
        // Observes in-place. Preferred when no modify is needed.

Not even "in" keyword is used there but it still allows iterations through anything that can be iterated without writing too much red tape, by invisibly expanding to these famous STL declare iterator with the unnecessary long declaration, iterate between begin and end constructs:

    range-based for: 
     Automatically recognizes arrays.
     Recognizes containers that have .begin() and .end().
     Uses argument-dependent lookup begin() and end() for anything else

masklinn · on April 20, 2014

> Why all.iter().enumerate() and not just all.enumerate() ?

enumerate is a method of the Iterator trait, so you need an iterator. This way it can have a truckload of reusable methods without polluting every collection's namespace I guess, and still things work for collections with multiple iterators (just ask for the right one).

The iteration method does not have to be called ``iter`` I'd think (e.g. a tree structure might provide both a depth-first and a breadth-first iterator, none of which would be accessed via `iter`?)

pcwalton · on April 20, 2014

Yes, and in fact there are multiple kinds of iterators even for arrays. ".mut_iter()" and ".move_iter()" are very common.

acqq · on April 20, 2014

I'd do:

    for x in all     does .iter()
    for (x,y) in all.enumerate()   does .iter.enumerate()

And were there's no one iter() like trees, only by them it would have to be specified, as they wouldn't have iter():

    for x in all.depthfirstiter()
    for (x,y) in all.depthfirstiter().enumerate()

95% of times 5% percent of the constructs are used, why not making them pleasant?

masklinn · on April 20, 2014

> I'd do: for x in all does .iter()

See pcwalton's answer in an other subthread, an Iterable trait (which for would use instead of Iterator) was tried and didn't work out previously, but it's not shelved.

> for (x,y) in all.enumerate() does .iter.enumerate()

That makes literally no sense. What if enumerate() is the thing returning an iterable because reasons? Now it calls enumerate() on the wrong object, or blows up because there's no .iter in the first place.

You could have 30 methods on the Iterable trait (or directly on your vector) which would just delegate to the corresponding Iterator. Not sure there's that much value in it.

> for x in all.depthfirstiter()

Except according to your previous declaration that would result in a call to `all.iter.depthfirstiter()`. Or the compiler has to go magic things around, traverse the chain until it gets the first (or last? Or all of them? How do you pick which one's right?) Iterable, an inject an iter() calls in it…

pcwalton · on April 20, 2014

It's simpler to have iterables and iterators be different, and to not overload "for" in potentially confusing ways.

pcwalton · on April 20, 2014

enumerate() also has the advantage of guaranteed elimination of bounds checks. In general, iterators are Rust's solution for achieving C-like performance for common patterns with safety and without bounds checking.

masklinn · on April 20, 2014

That as well.

By the way, I always wonder: is there a reason for no Iterable trait, which for could just take directly and implicitly call `.iter()` on? E.g. why

    for e in my_vec.iter() {

but not

    for e in my_vec {

?

pcwalton · on April 20, 2014

You'd have to overload "for" to detect whether the expression implemented Iterable or Iterator to decide what to do, and that was felt to be too complex (e.g. what if you implement both?) There was an attempt to have a unified Iterable trait at one point [1], but I believe it didn't work with the trait matching rules.

[1]: https://github.com/mozilla/rust/issues/7597

masklinn · on April 20, 2014

> You'd have to overload "for" to detect whether the expression implemented Iterable or Iterator to decide what to do

Iterator implements Iterable as a trivial self-return (by default), and `for` only and always takes an Iterable?

> what if you implement both?

Depends what sort of weirdness you're trying to do I guess.

> I believe it didn't work with the trait matching rules.

Ah. Too bad.

pcwalton · on April 20, 2014

Yeah, I would rather have "for" defined only to operate on Iterables. If we modified the trait matching rules we could probably do that backwards compatibly post 1.0.

stormbrew · on April 20, 2014

Throwing some thoughts on this entire conversation from the perspective of someone who's been using C++ for a long time:

One of the reasons why the C++ for (:) syntax above feels like such an improvement over the traditional iterator for loop is that it cuts out a lot of noise that's the same for almost every container loop you ever do. The same objection could be raised as in another branch of this conversation, that you might have other kinds of ranges on an object, and I've even done that before (as obj.something_begin() and obj.something_end()). But even before ranged-for came it almost always turned out ugly and I regretted it.

If most uses of for in Rust call iter() (I don't know that they do, my experience with it thus far is really minimal, but it does seem common so far), it seems like a case where use should lead design and the extra noise cut out.

Basically, if I have more than one kind of range on an object I'd probably rather have functions on the object that return the 'unusual' ranges for the iteration. So rather than having:

    for i in obj.iter() # the normal thing that I use all the time
    for i in obj.bfs_iter() # something I use sometimes to be more explicit
    for i in obj.dfs_iter() # another thing I use sometimes

I'd rather have:

    for i in obj # the normal thing that I use all the time
    for i in obj.breadth_first() # something I use sometimes to be more explicit
    for i in obj.depth_first() # another thing I use sometimes

Or if there is no sensible default, to be forced to name the thing I actually need.

It seems entirely sensible to me to expect the for loop construct to take an Iterable object. It actually makes more sense to me than for it to take an iterator directly, conceptually.

On the question of 'what if both?' I'd propose making it so hard (or impossible?) to do that. If for took Iterable, and Iterators also implemented Iterable as "return this", there'd really be no concept of having both (except in that if you have an iterator, it is always both) and for itself would be left fairly uncomplicated while still being able to take both Iterable and Iterator objects.

I'm probably missing some reason why this isn't an obvious way to go, as a definite Rust newb, but it would feel natural to (my C++-rotted perhaps) brain. Also I know there's no language changes until post-1.0, so for now it is what it is.

[edit] Huh. Ok, re-reading it looks like this was almost all covered ground. Never mind.

nrc · on April 20, 2014

I actually used enumerate in an earlier version of this post, but in the case I used it didn't work (pointed out to me on /r/rust). I'm going to go into enumerate and other iterator functions in future posts, so left it out for now. I agree this is a better formulation.

shmerl · on April 20, 2014

What is the reason for using overly minimalistic, assembly style keywords like fn, str etc? Even C++ doesn't go that far (I think string is more pleasant to read than str). It's not like it's saving much, and it only worsens readability.

jfager · on April 20, 2014

It's a matter of taste. I personally feel that your specific examples make Rust more readable, because of how little space gets wasted on those frequent, unambiguous names. Ubiquitous boilerplete-ish things that show up everywhere should recede a bit, imo; let me spend my line width budget on what's unique and important about my code.

But again, it's just a matter of taste and not really worth wasting a lot of time worrying about. The important thing about Rust is its semantics (speed, safety, control), not its syntax.

bjz_ · on April 20, 2014

> The important thing about Rust is its semantics (speed, safety, control), not its syntax.

Agreed. I love Haskell syntax above all else, but I like Rust's semantics too much to let that get in the way of me using it.

shmerl · on April 20, 2014

Sure, I didn't want to sound dismissive, I'm very interested in Rust semantics.

Derbasti · on April 20, 2014

if `str` is too short, surely `int` is, too:

    function fibunacci(integer index) {
        if (index <= 1) {
            return 1;
        } else {
            return fibunacci(index-1) + fibunacci(index-2);
        }
    }

compared to:

    fn fib(int n) {
        if (n <= 1) {
            return 1;
        } else {
            return fib(n-1) + fib(n-2);
        }
    }

Personally, I prefer `fn`, `int`, and `str`. I actually think that the short names are a nice way of distinguishing between built-in keywords (abbreviated, like `str`), and user-defined names (typically whole words). Also, short names are useful since they can be used as pre/post-fixes to function names like `strsplit`.

oscargrouch · on April 20, 2014

i think 'fn' is too much minimalism.. better 'fun' or even 'func' .. but maybe they wanted to distinguish the sintax from Go's 'func'

int -> int (ok)

str -> string

but anyway.. nothing that can prevent someone to enjoy the language.. but the 'fn' for me is the most odd.. getting used to it

gmjosack · on April 20, 2014

your comparison isn't fair. you're making an argument against fn/int vs function/integer but the largest difference in the code is between index and n. I prefer the version with a longer variable name despite the length.

also shouldn't the returns be unnecessary here since the conditional is an expression?

alkonaut · on April 20, 2014

Completely agree. Some language constructs that are "almost symbols" such as fn certainly works well to abbreviate, but for function names and types it feels outdated and hard to read. If a common function name is too long to spell out like iterate() then that is a candidate for sugaring out completely, not cutting in half.

Even in good C codebases that run static analysis, there are errors due to mixing up "memcpy()" and "memcmp()". UT4 is one example where this happened.

pcwalton · on April 21, 2014

> If a common function name is too long to spell out like iterate() then that is a candidate for sugaring out completely, not cutting in half.

We can't sugar it out completely because there are multiple types of iterators: ".mut_iter()" and ".move_iter()" for example just on arrays.

> Even in good C codebases that run static analysis, there are errors due to mixing up "memcpy()" and "memcmp()". UT4 is one example where this happened.

Well, that particular error would be unlikely to happen in Rust because the return types are different, and there aren't the implicit coercions that there are in C.

Alphasite_ · on April 21, 2014

I'm perhaps missing the issue completely, but isn't the argument about sensible defaults?

nrc · on April 20, 2014

It was a design decision made long ago and seems to have stuck (although is slightly less extreme than it used to be). I also used to hate the abbreviation, but I find you get used to it quickly and after a few weeks it doesn't impact readability at all.

bjz_ · on April 20, 2014

Ergonomics. More commonly used things should be shorter and easier to type. Less used, potentially dangerous stuff should be longer - `unsafe` for example. It looks weird at first, but you quickly get used to it.

stormbrew · on April 20, 2014

On this syntactic readability level I find the semicolons, in terms of both the fact that they are usually necessary and that the final semicolon in a statement group has meaning based on whether or not it's present, much worse than any terse keywords.

If anything the terseness of keywords seems almost necessary because rust expressions seem to tend towards the wordy whenever type annotations come into play, and I'm just glad it didn't devolve into haskell-style symbolics for everything.

acqq · on April 20, 2014

I still find it's a pity that so new language has "you must write let" that even Basic programmers don't have to do during at least 30-40 years:

https://www.powerbasic.com/support/help/pbcc/let_statement.h...

I remember not having to write let in Basic even in eighties. Anyway, since 1995, http://en.wikipedia.org/wiki/Limbo_(programming_language) people have an idea that there is a simple way to have "define and assign" that is different from "assign" without requiring a keyword: use := for define and assign, x:=exp is equivalent to let x=exp, use = for just assign. Google Go just reused that idea. Even if it's used today by Go, I still don't understand why Rust avoids it, because it makes sources significantly more readable: the variable is not obstructed by the keyword.

let mut is even more "stuff" before the variable in the language where declarations otherwise always follow the variable (which is good!) I'd use x := expression and x : mut = expression.

pcwalton · on April 20, 2014

It wouldn't work in Rust because "let" supports arbitrary (irrefutable) patterns, and Rust has full pattern matching. The parser has to know when a pattern is coming up. Languages that use :=, like Go, don't have a complete set of destructuring features. The convenience of being able to have an arbitrary pattern in "let" bindings far outweighs the disadvantage of having to type "let".

"x : mut = expression" wouldn't work in Rust because types follow :, and "mut" is not part of a type. "mut" is needed after each binding because one pattern can contain some mutable bindings and some immutable bindings.

jnbiche · on April 20, 2014

As someone who is learning OCaml at the same time as Rust (and who already had some Haskell), I can attest that the power of pattern matching is well-worth the extra "let". Plus, in addition to the signal it provides to the compiler, it also provides me a signal to know that I've got a new variable and the option here to do destructuring pattern matching. Very cool.

pcwalton: after getting over the hump of Rust's pointer semantics, I've found that it's a very elegant and fairly simple language. So I was partially wrong about my assessment of Rust as a "complex" language (I say "partially" because the pointer semantics do add a significant layer of unavoidable complexity).

I'd encourage anyone put off by the complexity of Rust's pointers to just push on through. I think you'll find it's worth the trouble.

Rust is definitely the language of the future for games, embedded programming, system programming and other areas of programming that require a high level of determinism. I can't wait until it's stable enough to use for real projects (hopefully sometime this year).

bjz_ · on April 20, 2014

> after getting over the hump of Rust's pointer semantics, I've found that it's a very elegant and fairly simple language

Definitely, I cannot stress this enough. Back when I first looked at the language in 0.3 it was awfully complex and I had doubts about hwo successful it would be. But ever since 0.4, with the removal of classes and argument modes, the core devs have been working extremely hard to make the core semantics of Rust as simple as possible, whilst at the same time making the language powerful and extensible. It will take time to dispel the myths still hanging around though, so I would urge you to share your experiences more widely.

steveklabnik · on April 20, 2014

Have you read http://static.rust-lang.org/doc/master/guide-pointers.html ?

jnbiche · on April 21, 2014

Yes. That was the guide, along with the pointer semantics table, that basically convinced me to give Rust a second chance, since if pointers could be summed up in a table, they were definitely comprehensible even for a PL peon like me.

steveklabnik · on April 21, 2014

Cool, thank you. I was wondering if it helped or hurt. Great!

acqq · on April 20, 2014

Can you please point me to the examples of the kind of the pattern matching constructs that don't allow using := ? I know it's obvious to you, I don't have your experience but I'd really want to understand. Thanks in advance.

Regarding "mut is not a part of a type," so what? From my perspective (of somebody who implemented just much simpler languages than Rust) it's something like a "storage attribute." It doesn't prevent me to build the syntax to be more readable: it's a keyword anyway. It's certainly not that it can't be there because otherwise it's unparsable.

Having x = expr, x := expr and x : mut = expr make all pretty clean and obvious to me...

bjz_ · on April 20, 2014

I would check out: http://pzol.github.io/getting_rusty/posts/20140417_destructu...

For example:

    struct Foo { x: (uint, uint), y: uint }

    let foo = Foo { x: (1, 2), y: 3 };
    let Foo { x: tuple @ (a, b), .. } = foo; // => a == 1; b == 2; tuple == (1, 2)

In terms of mut:

    let (mut x, ref y, &z) = (1, 2, &3);
    x = 2;   // reassignment is ok, because we bound `x` as mutable
    x += *y; // we bound `y` as a reference, so it must be derefed first
    x += z;  // we used a `&` pattern to get the value stored in the reference, so
             // derefing is not necessary
    *y += 1; // this would not compile, because y is immutable

acqq · on April 20, 2014

Thanks. Now I see that my preferred approach would collide with current declaration logic of having mut ref and & in front of the names. Otherwise it would seem possible

    foo := Foo { x: (1, 2), y: 3 };
    Foo { x: tuple @ (a, b), .. } := foo; 
    ( x: mut, y: ref, &z ) := (1, 2, &3);
    xx: mut := 33;

I see that having "x: mut" has one character more than "mut x" but I don't believe it makes the text less readable, quite the contrary because it's certainly less surprising when everything that describes the variable follows the ":"

Thanks a lot, I've certainly learned more about Rust this way.

bjz_ · on April 20, 2014

Showing let bindings with their explicit type declarations might help:

    let mut x: int = 2;
    let x: &mut int = &mut 2;
    let (mut x, ref y, &z): (int, int, &int) = (1, 2, &3);

The pattern binding goes to the left of the `:`, and the input type goes to the right. Also note that you can pattern match on types too:

    let (mut x, ref y, &z): (_, int, &_) = (1, 2, &3);

acqq · on April 20, 2014

Thanks, if _ is allowed as the type, why are explicit types written at all?

Is the last example equivalent to

    let (mut x, ref y, &z): (_, _, &_) = (1, 2, &3);

and can we go even further

    let (mut x, ref y, &z) = (1, 2, &3);

If we can, when do we actually need specifying the types? To do the type conversion or what?

Probably because I don't know Haskell (sorry) but I still imagine x y and z as names and even mut, ref and & as the "access" specifications, which define somehow the "type" of x y and z and not the type the thing accessed with the given name.

bjz_ · on April 20, 2014

I might not have been clear enough. I was just trying to show how the current syntax fits in with `:`, and why your suggestions for `: mut` would not fit with that. All of these are valid Rust code:

    let (mut x, ref y, &z): (int, int, &int) = (1, 2, &3);
    let (mut x, ref y, &z): (_, _, &_) = (1, 2, &3);
    let (mut x, ref y, &z) = (1, 2, &3);

The types are still the same in each case. No conversions or dynamic typing is happening in any of them - they are all statically typed.

Rust, like Haskell and ML, uses an advanced form of type reconstruction called Hindley-Milner[0] type inference. This is a constraint solving algorithm that can deduce types from very little information. Sometimes you do need to add some annotations though to help the inference along a little, but that is very rare (although more common in Rust than in Haskell or ML).

The grammar could be expressed as:

    'let' <pattern> (':' <type>)? '=' <expr>

Where `<type>` is the type of the expression to the right of the `=`. As I said before: in practice it is very rare to explicitly annotate a `let` binding.

[0]: http://en.wikipedia.org/wiki/Hindley-Milner

gnuvince · on April 20, 2014

Pretty sure your parser would need unbounded lookahead to support this syntax.

acqq · on April 20, 2014

What do you think how the parser parses anything on the left side, e.g. in C:

    *( arraybase + f1( d ) * f2( e, f ) + f3( g ) ) = x;

pcwalton · on April 20, 2014

The issue is that "expression" is the same nonterminal in C, whereas "expression" and "pattern" are separate nonterminals.

acqq · on April 21, 2014

In C a difference still exists between a lvalue and a rvalue. Even if you can unify something at one level, you have to generate completely different code. C++ is even harder. So I can imagine something can be done when there would be an intention, probably something would have to be a bit more unified. But I guess nobody even tried.

BTW, I really like the basic goals of Rust, namely making something directly linkable with C but having as much as possible safety and more modern features. And I appreciate the work of the developers of Rust. Even introductory materials would benefit by promoting what we can do that wasn't as safe to be done in C.

acqq · on April 20, 2014

I've tried to find any meaningful example of "why not." Is "let" supposed to be needed because of something like this?

   let (a, b) = get_tuple_of_two_ints();

I don't see that it's fundamentally different to parse if it would be written like this:

   (a, b) := get_tuple_of_two_ints();

It's true that the parser has to wait for := to do know the semantics of the expression on the left side of it, but it can definitely be done.

pcwalton · on April 20, 2014

It would require making "expression" and "pattern" into the same nonterminal in the grammar, and then deciding afterwards whether something that was parsed was an expression or a pattern. This is sometimes called a "cover grammar" because it "covers" two distinct nonterminals. It is possible--ECMAScript 6 does it--but it's definitely a complex way to parse. It doesn't seem compatible with Rust's design goal of being easy to parse and regular, e.g. for tooling.

bjz_ · on April 20, 2014

Rust can pattern match on structs, fixed length vectors, references, and it has keywords to bind them as mutable and/or by-ref. Doing all that without a `let` would extremely challenging.

mpartel · on April 20, 2014

Having programmed a lot in both Scala and Ruby, I'd argue the opposite. The keyword makes new variables easier to spot quickly, and with syntax highlighting, I really don't see `let [mut]` as clutter. Admittedly tastes in such things will always vary.

sixbrx · on April 20, 2014

I actually like "let" to introduce variables, it's what I scratch on paper when doing psuedo-code since before I met Rust or any other functional language. To omit it (or something like it, e.g. "var" or "val") would be to omit crucial information. It's short, clear, and has a long history in mathematics and logic too.

Theriac25 · on April 20, 2014

I'd like to inform you that you have bad taste and your post doesn't have anything to do with the article.

logicchains · on April 20, 2014

Why is the symbol '|' used for or in pattern matching, not '||'? If '|' is used for bitwise or elsewhere in the language then this might lead to confusion.

masklinn · on April 20, 2014

> If '|' is used for bitwise or elsewhere in the language then this might lead to confusion.

`||` is the logical or[0], why would it lead to less confusion?

(and yes, `|` is the bitwise or[1])

[0] http://static.rust-lang.org/doc/0.10/rust.html#lazy-boolean-...

[1] http://static.rust-lang.org/doc/0.10/rust.html#bitwise-opera...

logicchains · on April 20, 2014

I suppose I just assumed that logical or is closer in meaning to the or in pattern matching than bitwise or is.

adamnemecek · on April 20, 2014

It was probably inherited from ML.

pcwalton · on April 20, 2014

ML and Scala. Haskell uses a semicolon, but that wouldn't work in Rust's grammar.

jhgg · on April 20, 2014

What's the purpose of:

    y if y < 20

In the match block, can't we just use x?

masklinn · on April 20, 2014

Yes, you could write this as `_ if x < 20`. But not as `if x < 20`, I'm pretty sure the pattern is mandatory.

pcwalton · on April 20, 2014

Yes, you could just use x.

dcsommer · on April 20, 2014

Another great entry. Keep 'em coming!

xixixao · on April 20, 2014

The two issues in the comments so far, iteration with index variable and assignment/destructuring without explicit let are both handled beautifully by CoffeeScript. At least Rust gets right "everything is an expression". One of the best things CoffeeScript "brought".

cmrx64 · on April 20, 2014

As if coffeescript originated the features... We don't have iteration with index variable because it's seen as extraneous and unnecessary. If you aren't using iterators, you're paying for unnecessary bounds checks (LLVM often can't optimize them out). The explicit let mostly comes from the ML heritage and want for nice pattern matching.