Rust for C++ programmers – part 3: primitive types and operators

xioxox · on April 23, 2014

I understand that there is probably a performance motivation for having int/uint having machine-dependent sizes. However, it seems to me that having different sizes on different platforms is a potential security hazard if the programmer doesn't think of this. It also gives rise to porting bugs. Isn't the idea of rust to make a securer language than C++? I would have thought mandating 32 or 64 bit for int/uint would make more sense, and if the programmer needs more or less they would have to think about that.

pcwalton · on April 23, 2014

It's not out of concern of performance.

The idea was in fact to reduce correctness gotchas; ints are often used for indexing into arrays, and you want the size of the integer to reflect the maximum size of an array on your platform.

What kind of security issues did you have in mind? Integer overflow can be a problem, but it is usually a memory safety issue, which Rust doesn't have due to bounds checks and iterators.

xioxox · on April 23, 2014

I'm not familiar with the bounds checks in rust, so that sounds a bit more reassuring. However, I could imagine logic errors if people assume they have a certain size, but they don't, as I assume you don't trap wrapping and so on.

If they're intended for indexing arrays, I would have given then a name to reflect that, and discouraged people from mixing them with normal integers for doing arithmetic with. Perhaps forcing people to convert integers to array indices would be good. I try to use size_t and friends when programming C++.

alok-g · on April 23, 2014

Their use as index to arrays sounds reasonable. However in this case it may be wiser to give them less friendly names than those for the 32-bit and 64-bit counterparts. To not have portability issues, it is important that the programmer understands when int and uint should be used (index to arrays, the sizes of which could be machine dependent) and when specified-size integer types (when representing application-specific data).

pcwalton · on April 23, 2014

Yeah, we considered naming them intptr_t and uintptr_t, but the "use an int to index an array" is so ingrained into programmers' collective consciousnesses that we decided it was pretty futile to go against the grain here.

alok-g · on April 23, 2014

I don't really buy that reasoning. :-) If the goal is not to clean up bad things, we may just stay with C++.

When a new language comes and the programmer is already having an intention to learn something new, probably is working on a new project, this state is the best opportunity for the cleanup. Once the language matures, legacy codebases become rampant, that opportunity is gone forever with that language.

Did you consider naming the type something like 'index' itself, if that is the predominant good use-case for the type?

Ygg2 · on April 23, 2014

I do remember a discussion on mailing list that all uint/int would be just fancy typing for u64/i64.

simias · on April 23, 2014

Wouldn't that mean that the code would have a huge performance penalty when running on 32bit hardware?

I think C99's way of handling that is fine, if you want to use whatever native width happens to be on your hardware you just use int/unsigned, otherwise you have stdint.h.

Like C, rust could mandate a minimal range for each type in the standard, but setting the type width in stone seems dangerous to me, both for backwards and forwards compatibility.

coldtea · on April 23, 2014

>Wouldn't that mean that the code would have a huge performance penalty when running on 32bit hardware?

Or it could just forget 32bit hardware altogether. Even mobile phones are going 64bit already -- and the language will be ready for production in 1-2 years, where even more platforms will be 64bit.

It's not like they have to compromise just to be able to run on some embedded stuff.

mercurial · on April 23, 2014

Having been the victim of a fun Javascript bug recently (something like ("1" + 10)*2), I must say that I agree that type coercion is the wrong thing to do 99% of the time (probably with the exception of attempting to divide integers).

sehugg · on April 23, 2014

I think that one safe exception is upcasting of integer types (e.g. i8 to i32) and I think it's a little awkward to program in Rust without it.

It seems also that using the "as" syntax for both upward and downward casts makes code a little harder to read. Either allowing implicit upcasting or having separate syntaxes for safe vs. unsafe numeric casts would make more sense to me.

masklinn · on April 23, 2014

A colleague recently went through that as well, a (JS) graphing package would sometimes display bar charts with an incorrect scale (it would pick the wrong maximum value, so part of the bars would be cut off).

Turns out the package expected numbers, but sometimes a bug meant the data source would give arrays (of a single number each), the package's own max (using <) would perform a lexical comparison (comparison would coerce to string and a graph with a value at 800 and one at 1800 would pick a y-scale for 800 and cut off 1800) but further operations would coerce to number, so there was no error anywhere and most graphs looked correct.

repsilat · on April 23, 2014

Add me to the list. The other day I was debugging some simple Python code that worked something like

    return 10000 if x > 10000 else x

and it wasn't working because x was a string. Instead of raising an exception, though, Python decided that all strings are "greater than" all integers, and code was essentially equivalent to

    return x

masklinn · on April 23, 2014

> Instead of raising an exception, though, Python decided that all strings are "greater than" all integers

Yep. That's one of the things they changed/fixed in Python 3, comparisons are not defined across and between any and all types:

    >>> "foo" > 3
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: unorderable types: str() > int()

nnq · on April 23, 2014

99% of such bugs can be avoided in a weakly typed language by simply having the string concatenation operator different from the "+" operator (PHP has the "." operator for this, VisualBasic has "&" ...Haskell has "++" ) ...yet Javascript, Python and Ruby seem to stubbornly avoid this simple way of achieving some sort of sanity (and yeas on this one PHP is above Python and Javascript!).

Also, having "<" and ">" also work for strings is another semantic abomination. You need different operators or just functions for doing this on strings (and if you treat strings as collections they can also be your collections operators).

These problems are not cause by "type coercion" itself (even if lots of people agree that "type coercion" in general is a bad idea...), they are caused by mixing scalar operators and vector operators and converting one way or another when doing this. The only solution out of this is to either throw and error for these coercion cases, or, when it actually makes sense, like in Matlab you have scalars and matrixes, have a new separate set of operators for these cases (like ".+" and friends family of operators).

lifthrasiir · on April 23, 2014

No. PHP's `+` operator always tries to convert to the number, and uses 0 (!!!) when it is not possible. (e.g. what is `"hello" + "world"`?) Having different operators hardly matters; operators should cause a visible and clear error if they were given unexpected operands. Python is better in that regard since it does not allow `str + int` and so on.

nnq · on April 23, 2014

You're right, I forgot about this abomination. I retract any praise for PHP :) But I still think having one operator "+" for scalars, one like "&" for strings and another one like "++" for lists makes a lot of sense even in a dynamic language.

saurik · on April 23, 2014

FWIW, I would not call this "coercion", I'd call this "destruction": while I personally prefer languages that statically don't allow even-potentially-lossy conversions (such as string to number, or float to int) without explicit syntax bounding the error, languages that blow up at runtime if you try to numeric-add "hello" yet accept the string "0" without complaint are "acceptable" in a way that PHP (and MySQL) simply destroying data (whether by replacement or truncation) and returning/storing garbage in an attempt to satisfy the required type constraints is not :/.

bpbp-mango · on April 23, 2014

Python is safe from this:

    Python 2.7.5 (default, Feb 19 2014, 13:47:28) 
    [GCC 4.8.2 20131212 (Red Hat 4.8.2-7)] on linux2
    Type "help", "copyright", "credits" or "license" for more information.
    >>> "0" + 5
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    TypeError: cannot concatenate 'str' and 'int' objects

Aqwis · on April 23, 2014

Have you used Python a lot? That's not how it works. Python is not weakly typed and

"string" + 5

throws an exception.

mercurial · on April 23, 2014

Yes (though it treats 1 as True and 0 as False, which it really ought to avoid).

masklinn · on April 23, 2014

Historical imperative though, for backwards compatibility with

    if some_bool == 1

or

    if isinstance(some_bool, int)

written before the bool type was added (which only happened 2.3)

unwind · on April 23, 2014

I dream of a language where booleans cannot be compared explicitly against literals, that would be awesome.

I have powerful dislike for code like the future version of the above, i.e.

    if some_bool == True

since that, in my mind's machine, just generates a bool result, which then (in the logic of the code) must be compared to True. If the result of == doesn't have to be compared, then why does some_bool? It's brutally inconsistent and fantastically annoying.

Unfortunately, to many programmers it's also second nature to write such comparisons. Gaah. :(

Dylan16807 · on April 23, 2014

>where booleans cannot be compared

Sounds like a way to make XOR be a big pain.

Or do you only want to ban comparing to literals? It can make things clearer to write something like "if bool==false" rather than "if not bool", especially when double negatives get involved.

Only ban comparing to literal true? Sure, go for it. But then the language is more complex for no real benefit.

masklinn · on April 23, 2014

Equality might be used for other things, e.g. a map with boolean keys, stuff like that. So I don't think it would be workable.

You could try removing Eq, Ord and TotalOrd from Rust's bool and see what breaks.

mercurial · on April 23, 2014

Haskell is a more complex case. (++) is the list concatenation operator, eg: [1,2,3,4] = [1,2} ++ [3,4]. While the default String type is nothing more than a linked list of Unicode codepoints, it's notoriously inefficient. The preferred way of representing strings is usually Data.Text, which being a monoid, supports <> for concatenation. However, if you only kept the addition operation for number, you could perfectly well do 1 <> 2 and obtain 3 (you can actually do that with Sum 1 <> Sum 2, obtaining Sum 3 as a result - wrapping with Sum is necessary due to the fact that numbers are also a monoid under multiplication). What would save you from a runtime error is the fact that the monoid concatenation operator expects its two parameters to have the same datatype.

nnq · on April 23, 2014

EDIT: My bad, I retract all reference to Python, its dynamic strong typing really prevents these kings of problems. The behaviors I was thinking on happen in Javascript (and sometimes in Ruby as I recollect).

adamnemecek · on April 23, 2014

Ruby is the same as Python in this aspect.

saurik · on April 23, 2014

It isn't as much about the coercion as the consistency: the division case returning a float when given integers is fine if the division operator always does that. Some languages (VB being an example; I wish more were) have a "string concatenation" operator that is separate from numeric addition; 1 & 2 is always going to be "12".

alok-g · on April 23, 2014

I think too that integer division should by default yield floating-point. And likewise for uint subtraction, which should yield int and not uint by default. The programmer may appropriately cast to uint if she knows that the result will not be negative.

camus2 · on April 23, 2014

what about rust for webapps? are there any frameworks out there?

mercurial · on April 23, 2014

Not really. A lot of the infrastructure has kept changing for a while (eg, the IO facilities were recently rewritten), so code for older versions of Rust (like [1]) fall behind and do not get updated anymore.

With Chris Morgan's rust-http library (which I think is also used by Servo), you should have the base on which to build a web framework. Somebody apparently wrote (and still maintains) a pure Rust Postgresql driver ([2]), so you could in theory write a simple CRUD webapp, if you're willing to write the routing yourself.

1: https://github.com/erickt/mre

2: https://github.com/sfackler/rust-postgres

chrismorgan · on April 23, 2014

At present it's largely Watch This Space.

The language isn't stable enough yet and there are still a few pieces that need writing, anyway.

b0b_d0e · on April 23, 2014

Kinda, yes, but not really. Currently, the best http framework in Rust is rust-http (https://github.com/chris-morgan/rust-http) and it has the things needed for handling requests and sending responses. But as far as actual full featured web frameworks, there is only one that I know of (discounting mre since it hasn't been updated in a long time) and that's because I am the creator of it. oxidize (https://github.com/jroweboy/oxidize) aims to be an inherently safe and fast web framework inspired by several other frameworks, but I don't consider oxidize to be even in a pre alpha state right now (pre alpha to me means, don't try to write code with it cause it will change drastically in the near future). I've rewritten the code for it twice already and a third time is sitting in a branch that doesn't compile right now. Progress has been a little slow lately since it became a group project for a class of mine, and that meant that I needed to manage several people working on it rather than actually contributing code. Over the summer I hope to get it in a good state and then make a post about it and fill out documentation, make tests, submit it to techempower benchmarks, and other necessities.

Having said that, I'm not confident yet that you want to be coding a web site in rust. In the name of safety, rust's compiler can be very difficult to please and in the name of speed it doesn't compile blazingly fast as well. Some people may have a different opinion from me, but besides writing a web app to learn the rust language, what advantages for the web does rust give you that go or haskell doesn't? When you are writing a web app, developer productivity is priority number one (in my opinion), and having to fight the borrow/lifetime checker and trying to find out why your struct doesn't fulfil `Send` are not things I would like to have to do when I'm making a web site. The tradeoff will be that hopefully a web site written in rust will be both fast and concurrently safe, but you will be sacrificing developer productivity for the reasons mentioned above. Irregardless of those issues, I'm still very determined to make a rust web framework that is both fast and easy to develop for (probably by providing examples of common web idioms in oxidize so people can see how to do it), just, I want people that use my framework to be aware of the tradeoffs that they will make. In short, I consider oxidize an experiment in trying to make a powerful, expressive, and extensible web framework in rust, and along the way I've started to question whether this will have any practical applications, but in spite of that, I still am trying to achieve the previous listed goals.

mercurial · on April 23, 2014

What I like about it is that it's pretty devoid of macros. I'm curious as to how you are planning to handle the environment of the webapp (configuration, connection pool, business objects...), given that you settled for plain functions with a given signature.

b0b_d0e · on April 23, 2014

Thats the rework that I'm working on! I'm debating between making a context trait and having the static function signature include a context instead of a request and the everything request related would be included/appended in the context, or having the user implement "controller" trait on their struct that has a method I will call after settling the routes (something like https://github.com/jroweboy/oxidize/blob/incoming/tmp/matchb... where maybe a macro can generate this method from the urls. Note that this code doesn't compile though :) While I was initially fond of the latter, it's been very painful to work with in practice and I don't see it actually working, so I think that I'll be testing out to see if I can rework the static functions to have a context. I think the hardest part is that I want to eliminate any need to have any static variables, but that means I have to store them all in a struct that requires Send to be fulfilled and thats more over head for programmers. In the near future, I will be looking into using the different parts of rust-http directly rather than implementing the server trait to see if I can gain any benefits from that :)

mercurial · on April 23, 2014

Personally, I'd be inclined to have the context separate from the request, since they're really two different things. Also, you could introduce a FromRequest trait (trivially implemented by Request, returning self) to allow the user to build structures directly from a request (and its converse ToResponse), giving your function a signature of:

    fn (context: Context, req: FromRequest) -> ToResponse

    trait FromRequest {
      fn from_request<'a>(req: &'a Request) ->  &'a Self;
    }


    impl FromRequest for Request {
      fn from_request<'a>(req: &'a Request) ->  &'a Request {
        req
      }
    }

This would remove a lot of boilerplate for well-behaved applications. You'll also need a notion of middleware, which I guess would take a Request and return either a ToResponse or a request (eg, an authentication handler ought to return the response immediately in case of authentication failure), and similarly on the way out, something taking a Response and returning a Response.

mercurial · on April 23, 2014

Ah, obviously this wouldn't work, from_request would need to return Self and not a reference for it work, which means some overhead in the case of Request, since you'd have to clone it.

sisalcat · on April 23, 2014

From the project home page: "prevents almost all crashes (in theory)"

How does it prevent index out of bounds errors and division by zero? No, not even in theory. What a ridiculous claim.

nrc · on April 23, 2014

All array indexing is bounds checked, so those kind of errors are prevented. We're still working on the story around overflow checking

sisalcat · on April 23, 2014

This prevents buffer overflow errors, but not crashes.

dbaupp · on April 23, 2014

It prevents the OS killing a misbehaving application (which is what is meant by "crash" in that context). This allows, for example, a multithreaded server application to continue even if one worker task indexes an array incorrectly.

kzrdude · on April 23, 2014

It also prevents an equally important problem: silent passing of errors. It's common for out of bounds accesses or stores in C or C++ to just pass silently, potentially corrupting data.

b0b_d0e · on April 23, 2014

index out of bounds errors are handled in separate ways depending on the kind of array you are using. If I understand how it works correctly, you can be working with one of three main kinds of arrays: slices with a known length, slices with an unknown length, or a growable vector. In the case of slices with a known length, ie: let a = [0]; then trying to say a[1] is a compile error since the compiler knows the length of the array at compile time, it will not let this code compile. In the case of an unknown length at compile time, the compiler cannot really help you here, but it will cause a task failure when you try to access out of bounds (since it does know the length at runtime, it checks to make sure that it is in bounds). This is similar to what happens when you are trying to access an element that doesn't exist on growable vector as well. I hope this information clears up somethings for you and even more so I hope my information is correct! I'm still learning rust, so I still have misunderstandings quite often.

dbaupp · on April 23, 2014

Indexing a fixed-length array (I.e. known at compile time) out of bounds is not prohibited at compile time. All array types trigger a task failure on out-of-bounds access (unless you explicitly opt-in to unchecked indexing).

guelo · on April 23, 2014

I was going to ask if array[random_integer()] would compile only some of the time.

b0b_d0e · on April 23, 2014

Oh thanks for clearing that up! Is there any specific reason why the rust compilier currently doesn't check that a static int accessing into a known length slice doesn't result in a compile error? (I'm only referring to cases where the index you are trying to access is known at compile time as well as the length of the slice)

nikbackm · on April 23, 2014

It would be inconsistent and not really that helpful. How often do you use (invalid) constant indexes with fixed length arrays?

Probably better to handle all out-of-bounds errors the same way.

sisalcat · on April 23, 2014

> it will cause a task failure

i.e. a crash.

Jweb_Guru · on April 23, 2014

Not at all. Tasks can (and do) die without killing the main program. They also don't segfault, and as the stack unwinds destructors get called (which Rust can actually ensure is safe). If you've ever had the pleasure of dealing with a framework that likes to send SIGKILL indiscriminately to important programs, you'll appreciate the difference between the two.

coldtea · on April 23, 2014

No. Something recoverable, like an exception.

But it seems you have your mind set, and wont accept any other answer.

dbaupp · on April 23, 2014

It doesn't necessarily take down the whole program, ice. it's recoverable, unlike memory corruption or a segfault.

(Yes I know you can install signal handlers for segv, but that is not isolated at a language level like a task failure is with Rust.)

sisalcat · on April 23, 2014

Yes, I can catch SIGSEGV. And I can catch Java NullPointerExceptions and ArrayIndexOutOfBoundsExceptions. So Java prevents all crashes, lol.

dbaupp · on April 23, 2014

Yes, Java applications are rarely killed by the OS/memory manager for trying to access memory they don't have control over.

azth · on April 23, 2014

Don't argue with him/her, the parent poster is clearly acting immature.

sisalcat · on April 23, 2014

That's a strange definition of "crash" you have there. I should go tell my customers next time that their programs definitely didn't crash.

alkonaut · on April 23, 2014

Java raises an exception. If you handle it in code, the app does not crash. The crash is a consequence of failure to handle and recover from the exception.

Granted, it is very rare that you can/should recover from nullpointer or OOB exceptions anyway. What you can do though is try to clean up things, show a polite message and shut down the application in a semi-controlled way.

"Preventing crashes" isn't the best description of Rust's novel protection systems. In my opinion, the best part is providing guarantees about data integrity and security, e.g. preventing heartbleed type read-overruns and preventing data races in concurrent code due to shared mutable state.

AlisdairO · on April 23, 2014

It's not that uncommon to want to recover from NPEs in a webapp environment. You want your thread of execution to error out, return a 500, and rollback whatever db changes it's made, but you want the webapp as a whole to keep running.

lucian1900 · on April 23, 2014

And this is precisely what task failure in Rust lets you do. Requests run in tasks, if the task fails, show an error message and make a new task for the next request.

AlisdairO · on April 23, 2014

Yes, sorry - I wasn't meaning to state that rust is bad in this regard. Just addressing the point of catching NPEs and so on.

alkonaut · on April 23, 2014

Of course when you have a web application running requests using some thread pool or even more lightweight worker, then what we call the "application" is likely the request and you most likely don't want to kill the app server because you have good reason to believe the other requests are unaffected by the failing requests.

coldtea · on April 23, 2014

>Granted, it is very rare that you can/should recover from nullpointer or OOB exceptions anyway.

Actually it can be quite common, especially in complex web / network programs with many components. In Java such errors do not corrupt memory like in C, and since they are isolated, no reason not to continue the operation of the overall program. The error could just be due to a resource not being found, or faulty input coming from outside -- that is, nothing that prevents you from continuing to work on other requests.

That's somewhat like Erlang handles the case, if I am not mistaken.

adamnemecek · on April 23, 2014

I mean you get an error but not a segfault. Rust crashes on division by zero but honestly, how many crashes have you encountered in the real world due to division by zero?

pbsd · on April 23, 2014

Probably quite a few. There's also more than just division by 0 to be wary of: http://blog.cmpxchg8b.com/2013/02/the-other-integer-overflow...

dbaupp · on April 23, 2014

You get a task failure (similar to an exception) on division by 0 too.

berkut · on April 23, 2014

Quite a few if you include what happens after a division by zero without proper checking / clamping.

Had one the other day when renormalising a Cumulative distribution function wasn't taking into account a possible empty set, and the lookup of the index was invalid due to the division by zero.