Type-Safe Unions in C++ and Rust

flyx86 · on Oct 7, 2016

> but it’s the first language I’ve experimented with that has made them a first-class feature

It is a feature of ALGOL68, Pascal, Ada and quite some newer languages:

https://en.wikipedia.org/wiki/Tagged_union

Manishearth · on Oct 7, 2016

One interesting thing about Rust is that none of the language features are really new. Even the borrow checker is from research papers and languages from quite a while ago. The only thing I can think of that might be truly unique to Rust is the concurrency safety model (Send+Sync), though that might be old too.

Rust has just managed to take all these features and put them together well, and strive to be more than a research language by working on things that would make others actually use the language.

(Of course, this particular feature is common in many, many languages)

dannyobrien · on Oct 7, 2016

Yes, worth noting that this was an initial goal of Rust: to use research that was rarely implemented, but established and non-novel among CS academics. See http://tim.dreamwidth.org/1784423.html -- I remember some comment about this being part of the choice of the name, too, but I can't find anything specific or definitive about that.

Manishearth · on Oct 7, 2016

The name "Rust" has so many origin stories; IIRC the creator kept changing the story each time he was asked :)

pcwalton · on Oct 7, 2016

> Even the borrow checker is from research papers and languages from quite a while ago.

Eh, that's underselling Rust's contributions. Rust is more flexible than anything I know of when it comes to enforcing aliasing-xor-mutability. Cyclone for example was much more restrictive in disallowing aliasing (see Grossman's "Existential Types for Imperative Languages").

The key feature that Rust has is flow-sensitive permissions on unique loan paths, which is actually pretty novel as far as I'm aware.

Manishearth · on Oct 7, 2016

I guess, yeah. I meant that what Cyclone does is along the same lines.

> The key feature that Rust has is flow-sensitive permissions on unique loan paths, which is actually pretty novel as far as I'm aware.

Huh, right.

catnaroek · on Oct 7, 2016

Indeed. Regions are 20 years old. Substructural types are 30 years old. Type inference is 40 years old. Hierarchical module systems fancier than Rust's are at least 30 years old. With the exception of regions, all this stuff is older than I am, myself. But Rust took all these isolated good ideas and made a practical programming language out of it.

junke · on Oct 7, 2016

Ada and Pascal are not really research languages, though. But still, there are improvements taken from research, see Ada 2012 contracts for example.

Manishearth · on Oct 7, 2016

Right, like I said, this particular feature exists in a million languages. I'm talking about the whole set of features that Rust has; some are from research languages and may have never been seen in the industry before (e.g. regions/borrow checking), but they're not really new. This particular feature in Rust descended from the ML family (since Rust used to be ML-like), which in turn probably got it from Ada or w/e.

nv-vn · on Oct 7, 2016

ML predates Ada by about 7 years, so that's unlikely. The way it is implemented in ML is most likely just an implementation of the mathematical concept of sum types rather than a feature influenced by existing programming languages.

Manishearth · on Oct 7, 2016

> ML predates Ada by about 7 years, so that's unlikely.

Ha! I didn't know that. Thanks :)

optimuspaul · on Oct 7, 2016

sorry if this is a dumb question, but what classifies a language as a research language?

bluejekyll · on Oct 7, 2016

That's a great question!

I would probably restate it though. What is research vs. a product. I just read about 1nm transistors, but we're probably looking at 10 years before Intel, et al, have built all the infrastructure to reliably deliver a CPU based on it.

In the case of a language, I would say it's similar, do you have the support infrastructure in place? Rust is amazing in this regard: Cargo, crates.io, docs.rs, rustup, etc. on top of that you have at least one large company and many others pushing the language in a large and distributed product.

I would classify Rust as a production ready coding platform.

optimuspaul · on Oct 7, 2016

Thanks!

It seems very obvious now. I was thinking the purpose of the language was for performing research, but that didn't make much sense. I see now that it's the language itself that is the subject of research, not the tool. English is fun.

grayrest · on Oct 7, 2016

There are lots of languages in the world. The majority of them are written by a single person for themselves or (if there's funding involved) a small group of people who deeply understand the compiler. These are either research, hobbyist, or company languages depending on the context of the author. They exist to explore a particular idea or solve a specific problem and don't aspire to be widely used.

If you hear the term used disparagingly it's because they tend to have issues you generally wouldn't want to put up with when you're on the clock: compiler bugs, lack of error messages, large missing pieces in the standard lib, spectacularly bad performance, etc because those weren't the problems the language was meant to explore.

vvanders · on Oct 7, 2016

Yup, a lot of people like to say X has Y, however execution of Y is just as important(if not moreso) than the initial idea of Y. Technical ideas don't thrive in isolation, they need to be nurtured and grown.

Just look at the dominance of Javascript in the programming landscape, you're seeing decades of high quality executions of the language(I include community in the "execution" definition) despite it being far from the best technical language.

bandrami · on Oct 7, 2016

the concurrency safety model (Send+Sync), though that might be old too

Smalltalk has had that since the early 1980s

dbaupp · on Oct 7, 2016

Like steveklabnik, I'm extremely curious to hear more about how Smalltalk's concurrency model is just like Rust's. The latter is a fairly flexible model to defend against data races (which seems to be a concept only really formalised in the late 80s) that puts most of the power in the programmer's hands (i.e. no need for compiler-inserted locks on every object, etc.) that comes from a finely balanced combination of the trait system and the manner in which Rust controls mutation.

It would be great to see how other languages have achieved a similar balance, and my impression is that Smalltalk (and like most languages) does not put nearly as much effort into controlling mutability, but maybe I'm wrong. Could you post some links/a description that explores how Smalltalk achieves a similar level of safety?

nickpsecurity · on Oct 7, 2016

"which seems to be a concept only really formalised in the late 80s"

http://brinch-hansen.net/papers/1975a.pdf

http://brinch-hansen.net/papers/

His first, concurrent OS was RC 4000 in 1969. It had many mechanisms in place. He got the key parts of the safety problem figured out by 1972. His language to handle much of it statically at compile-time was done in 1975. His Boss 2 system same year used coroutines + similar concepts to run 100 activities at a time with a proof of deadlock freedom. Finally, he used Concurrent Pascal to implement Solo system which ran its processes safely without using physical, memory protection. These summaries came from "Evolution of Operating Systems" on 2nd link.

So, the stuff was well-established by the mid-70's with operating systems using it in production. Just ignored by mainstream like a lot of good stuff for various reasons. ;)

dbaupp · on Oct 7, 2016

I don't have time to read it in full right now, but a quick glance over that paper doesn't show any formalisation of data races, which is specifically what that parenthetical is about, not just general "concurrency safety". Having a detailed description of concept seems important because Rust is fairly precise in what it defends against, possibly giving the programmer more flexibility and power than in a system that attempts to solve many problems (but of course giving them less assistance when writing something that fits within the rules of the more defensive languages). Of course, it's possible that a language "accidentally" solves that problem without it having been formalised, it just seems less likely.

nickpsecurity · on Oct 7, 2016

It's in the disk buffer and monitor examples. Here's a relevant quote:

"A disk buÆer is a data structure shared by two concurrent processes. The details of how such a buÆer is constructed are irrelevant to its users. All the processes need to know is that they can send and receive data through it. If they try to operate on the buÆer in any other way it is probably either a programming mistake or an example of tricky programming. In both cases, one would like a compiler to detect such misuse of a shared data structure."

"To make this possible, we must introduce a language construct that will enable a programmer to tell a compiler how a shared data structure can be used by processes. This kind of system component is called a monitor. A monitor can synchronize concurrent processes and transmit data between them. It can also control the order in which competing processes use shared, physical resources."

Sounds like he understands the problem is concurrent processes stepping on each others toes using a shared, data structure. He has nice drawings and charts to go with it. Plus, an early model to solve it statically at compile time. Jweb_Guru noted some limitations of his method but it understands and solves the fundamental problem. Hansen's own work improved on it later plus it was obsoleted by things like Ravenscar, SCOOP and now Rust's method. Nice that he had a whole OS protected from concurrency errors at compile time in the 70's, though. :)

dbaupp · on Oct 7, 2016

None of that looks like a formalisation or even a description of a data race, at least not in the modern sense.

nickpsecurity · on Oct 7, 2016

Im not a concurrency expert. Just had basic explanations and training common with other developers. How it was explained to me was two or more tasks trying to simultaneously access a shared resource for reading or writing. These accesses might not happen in the desired order, causing incorrectness. Then there were lock-related issues on top of that.

Hansens work formalized what I just described in terms of English, diagrams, and compiler checks. He started with sequential operations on private data in modules. He says if two or more share thd same private data they might not execute in desired order. The monitor pattern enforced user-specified order on function calls to shared data. Built-in to language & compiler.

If my description of race conditions is inaccurate or insufficient, I'd appreciate a link to one that you think is more accurate that I could use as a comparison point against the Hansen paper. Otherwise, his description of problems implementing concurrency sounds exactly like what I learned in books on multithreading, supercomputing, etc. Shared resource used in incorrect order due to concurrency.

Note: Also, his colleagues Dijkstra and Hoare were still inventing and developing formal verification at the time. Tooling sucked. Standard practice, like he did with Algol and COBOL compilers, was writing things like this in precise English with code examples or diagrams. Not sure if you were expecting a HOL model or something when you said "modern" but I figured Id mention stuff was primitive then.

dbaupp · on Oct 8, 2016

I'm not looking for a formal mathematical model or anything, just a precise plain English description of a data race (not a race condition), e.g. like the following:

A data race is when

- two threads access a single memory location,

- at least one of which is a write, and

- at least one of which has no synchronisation^

(^ synchronisation in the sense of things like atomic instructions, not necessarily full locks.)

nickpsecurity · on Oct 8, 2016

Alright that's clear. It's also in the first paper I gave you in the first illustration. The problem you're having is you've constrained the definition of concurrency or race conditions to only be about the language common among application programmers, esp threading. Concurrency is broader than that: it can apply to processes, threads, systems, protocols, or even hardware circuits. All that is required is two or more active things working with a shared resource in a way where operations might get interleaved and out of order in a way that makes computation incorrect.

Knowing that, Hansen starts on problem with RC4000 (1969) describing protecting communication between "processes" via message buffers, message queues, and checks performed on them to ensure validity. That and Dijkstra's critical regions are where the foundations were laid. His next step was the formalization the problem in 1972:

http://brinch-hansen.net/papers/1972a.pdf

Section's 1 and 2 cover basics of concurrent operation + race conditions. It was clear to me by the abstract but should be extra clear he's talking about them by this statement about Algorithm 1:

"The copying, output, and input of a record can now be executed concurrently. To simplify the argument, we will only consider cases in which these processes are arbitrarily interleaved but not overlapped in time. The erroneous concurrent statement can then be executed in six different ways with three possible results."

That's definitely a concurrency error. He then talks about mutual exclusion and synchronization with an await primitive. Closer to modern language but the focus on operating systems in the early 1970's means he says processes instead of threads and things like disk buffers or message queues instead of shared memory. Although his examples in 1972 paper are clearly shared memory since it's at algorithm level.

So, they discovered the problem around late 60's, implemented an early solution for sharing resources among processes in 1969, fully described race conditions + some other stuff by 1972, had a safer-by-design language to catch it at compile time by 1975, and applied that to implement a concurrent, production OS (Solo) for day-to-day use by academics by 1976.

So, race conditions were both formalized and initially solved in early to mid 1970's. I don't know how much more you need given they had an OS running 100 jobs/users at once interacting on shared resources without visible failures using their concurrency model. Mainstream OS's and apps are still having race conditions pop up on occasion. Clearly, what they were doing was addressing root cause if no race conditions occurred after a successful compile of "multiprogrammed" apps. :)

dbaupp · on Oct 9, 2016

You're still talking about "race conditions" and "concurrency errors", whereas I am talking specifically about data races not arbitrary race conditions or concurrency errors. I don't see how the first illustration demonstrates anything about data races, nor how the rest of your comment applies to this: a data race isn't just plain old interleaving of executions/possible results.

A data race is when a program may give undefined results (in the sense of undefined behaviour in C) because you've violated core language semantics. Solving "all" concurrency problems is a much broader and far more restrictive paradigm than just ensuring that a program has defined behaviour, and not having restrictions is key to systems programming. (In fact, it's not even clear to me how one can "solve" race conditions at a language level without tying into some sort of machine-readable product specification or without just removing concurrency entirely: a single piece of non-determinism may be fine in one program but bad, i.e. a race condition, in another.)

> The problem you're having is you've constrained the definition of concurrency or race conditions to only be about the language common among application programmers, esp threading

No, I'm not having any problem here: you said yourself that you're not an expert. A data race (not arbitrary race condition) only makes sense if there's shared memory and thus thread is a perfectly reasonable description (although "thread" is also often used to just mean thread of execution, not literal pthread thread). It's becoming clear that we're talking about different things since you're not in-tune with the jargon/subtle terminology.

nickpsecurity · on Oct 9, 2016

That is in fact where dispute came from: race conditions vs data races. I saw race and concurrency thinking you were talking about race conditions. My mistake.

Hmm. Ill have to look into the old stuff further to see about whether any of it, including Hansen, covers the accepted definition of data races. There's potential that Hansen's does but Im holding off until I think on it more.

gpderetta · on Oct 8, 2016

> Just ignored by mainstream like a lot of good stuff for various reasons. ;)

This paper introduced Monitors. It was of course hugely influential.

Mutex and condition variables, the building blocks of monitors made their way into posix threads and from there into C++11.

I believe Java is a much closer realization of Hansen model as every java object is a Monitor.

nickpsecurity · on Oct 9, 2016

Good point. I should qualify that statement better. Hansen's model was statically guaranteeing freedom from issues at compile time. There were projects aiming at that which could've implemented a similar model but didn't. More work could've gone into eliminating its limitations, etc. The Ada and Eiffel people carried the torch, though, with interesting progress. Now Rust is. Ada folks aren't resting, though, as ParaSail is pretty neat.

Jweb_Guru · on Oct 7, 2016

This system does not support recursion. It also does not have to reason about atomics and so on because it is not intended for multicore systems; indeed, it assumes that all access is serialized through a mutex, and doesn't consider the existence of types like single-threaded reference counters (like Rust's Rc). It is describing threads with exclusive access, and a scheduling mechanism for waiting on locks, but the mechanism is entirely dynamic (and the paper assumes the existence of a virtual machine, which is frequently invoked to deal with tricky cases).

Some relevant lines:

> A Concurrent Pascal compiler will check that the private data of a process only are accessed by that process. It will also check that the data structure of a class or monitor only is accessed by its procedures

is significantly less than what Rust provides, because:

> Processes cannot operate directly on shared data. They can only call monitor procedures that have access to shared data. A monitor procedure is executed as part of a calling process (just like any other procedure).

> If concurrent processes simultaneously call monitor procedures that operate on the same shared data these procedures will be executed strictly one at a time. Otherwise, the results of monitor calls would be unpredictable. This means that the machine must be able to delay processes for short periods of time until it is their turn to execute monitor procedures. We will not be concerned with how this is done, but will just notice that a monitor procedure has exclusive access to shared data while it is being executed.

Rust allows shared access to immutable data and does not require either serialization or the invocation of a virtual machine; it also allows nondeterministic operation to be used as long as it cannot cause memory unsafety.

Additionally, there are other, more basic things that the system is not capable of. For instance:

> (Strictly speaking, a compiler can only check that single monitor calls are made correctly; it cannot check sequences of monitor calls, for example whether a resource is always reserved before it is released. So one can only hope for compile time assurance of partial correctness.)

A major part of what Rust brings to the table is the ability to statically avoid problems like use after free and double free even in a concurrent setting. How would I avoid resource leaks in a system like this without a per-thread allocator (which would likely prevent sending across threads)?

Of course, it's not clear to me that this matters anyway, since I can't destroy a thread or shared data structure!

> Dynamic process deletion will certainly complicate the semantics and implementation of a programming language considerably. And since it appears to be unnecessary for a large class of real-time applications, it seems wise to exclude it altogether. So an operating system written in Concurrent Pascal will consist of a fixed number of processes, monitors, and classes. These components and their data structures will exist forever after system initialization.

I agree that people have a strong tendency to not know what past work has been done, and Concurrent Pascal is admirable, but what Rust does is not the same thing. Rust specifically tackles a lot of hard problems in concurrency that simply do not exist as long as you (1) assume away deallocation for shared data structures, and (2) serialize all access to those data structures.

(There are lots of other specific points I could go into; for instance, Concurrent Pascal apparently lacks facilities for generic programming, so I probably couldn't combine two correct concurrent data structures and expect a third correct concurrent data structure to come out. But that stuff isn't directly related to concurrency).

nickpsecurity · on Oct 7, 2016

"I agree that people have a strong tendency to not know what past work has been done, and Concurrent Pascal is admirable, but what Rust does is not the same thing. "

I'm not saying it does. The claim I replied to said preventing things like data races was done till well into the 80's. I showed systematic investigations of the problem plus a production solution had been done by mid-70's. You've nicely showed how far ahead methods like Rust got. ;)

dbaupp · on Oct 7, 2016

> The claim I replied to said preventing things like data races was done till well into the 80's

No, it definitely didn't: it said data races as a concept seemed to not be formalised until the late 80's.

nickpsecurity · on Oct 7, 2016

The word formalized might be where disagreement started. If you meant precise & useful, Hansen formalized the definition Ive seen in some books. If you mean in mathematical specification, you might be correct. Im not sure when that got started.

stcredzero · on Oct 7, 2016

my impression is that Smalltalk (and like most languages) does not put nearly as much effort into controlling mutability

VisualWorks got the ability to make certain objects immutable sometime in the early 2000's. The VM engineers were particularly proud of that one.

stcredzero · on Oct 7, 2016

Smalltalk has had that since the early 1980s

That really depends on which Smalltalk you're talking about, I think. Various implementations really had different concurrency models. Some implementations used native OS threads. Others ran synchronously "inside the image" but spawned threads to interact with the OS.

gpderetta · on Oct 7, 2016

that's interesting, AFAIK smaltalk is fully dynamic and has no compile-time type checking, so how would it statically enforce the equivalent of Send and Sync constraints?

stcredzero · on Oct 7, 2016

In most implementations, Smalltalk is compiled to bytecode, and uses late binding. It's usually run on a JIT.

how would it statically enforce the equivalent of Send and Sync constraints?

In some Smalltalks, normal execution is synchronous. Many of them also use read and write boundaries for various purposes. The former gives you Send and Sync constraints for free. The latter can be used in difficult edge cases. (Like when you're calling out to the OS.)

Manishearth · on Oct 8, 2016

> The former gives you Send and Sync constraints for free.

Uh, no, it gives you thread safety for free.

When my comment was talking about Send and Sync I was talking about the specific way Rust's typesystem enforces thread safety. I'm not saying that other languages don't, I'm saying that Rust's method of enforcing it might be one of the few unique things it does

pcwalton · on Oct 7, 2016

Smalltalk is dynamically typed. How can it have a static type system feature?

Manishearth · on Oct 7, 2016

(Yeah, just to clarify, in my comment above, I'm talking about systems similar to Rust's specific system for enforcing thread safety; not arbitrary systems that enforce thread safety in a different way)

steveklabnik · on Oct 7, 2016

I'd like to read more about this, do you have a recommendation of where I can look? I have a basic fluency in Smalltalk, but it's been a while.

Manishearth · on Oct 7, 2016

Interesting, thanks!

rurban · on Oct 7, 2016

Putting them well together in Rust is debatable.

Writing proper and safe concurrent code in Rust still looks horrible compared to languages which supports it natively in the type system, e.g. ponylang type capabilities. You still have to manually maintain locks, and it's also much slower.

There's a safety model, but it's only best practice, not enforced by the language nor the compiler. So calling it "safe" and "truly unique" is way off. Even parrot has a better, safe and lockless threading model, which guarantees safety.

Manishearth · on Oct 7, 2016

> You still have to manually maintain locks, and it's also much slower.

You ... don't. That's just the concurrency model that the abstractions in the stdlib expose, but Rust's safety is generic enough that you can use different abstractions (e.g. lockfree ones). See crossbeam for some alternative models. There are also transactional memeory impls in Rust. They still use Send and Sync for safety though. Manually maintaining locks is a feature of the concurrency library used, not the safety model.

Pony's system is actually pretty close to that of Rust. Sync is "immutable", Send is "isolated" (sort of). Of course, capabilities are different from auto traits, but the idea behind using these two capabilities for concurrency safety is similar.

(So I was wrong that Rust's concurrency safety system is unique, since Pony has something based on the same concepts)

> There's a safety model, but it's only best practice, not enforced by the language nor the compiler.

Yeah, Send and Sync are technically a part of the stdlib (and can be reimplemented outside of it, aside from a small interaction with statics -- Send/Sync are treated as special by the compiler when it comes to statics). However, it is enforced by the compiler in the sense that if you avoid best practice (using Send and Sync), you can't write parallel code without dropping to `unsafe`. If you do that you can design safe abstractions around that and use whatever concurrency enforcement you wish, though pretty much everyone sticks to Send and Sync since it works well with the rest of the language.

(Because of the statics thing, "not part of the language" is debatable, anyway)

gpderetta · on Oct 7, 2016

Having a language-imposed concurrency model would be useless and actively harmful in a system language. Rust provides the building blocks to build whatever safe abstractions are appropriate for the problem and domain at hand, plus a set of out of the box abstractions relatively low level that will be familiar to people coming from other system languages.

And of course the means to get rid of any abstraction and safety when required.

edit: accidentally a word

jonathanstrange · on Oct 7, 2016

> Having a language-imposed concurrency model would be useless and actively harmful in a system language.

It works fine in Ada.

Edit: Why am I downvoted for this? A language-defined concurrency model is indispensable for having safe and deadlock-free concurrency in the language (rather than in arbitrary utility libraries). That's why concurrency was explicitly included into Ada and into the additional Ravenscar profile. The rationale for this has always convinced me and it also works fine in Ada, so anybody care to elaborate what's wrong with it?

nickpsecurity · on Oct 7, 2016

It worked great for Concurrent Pascal and Solo:

http://brinch-hansen.net/papers/

I agree it's better to have mechanisms that can be turned into a proper model. A good default, though, greatly improves consistency and reliability in real-world systems.

rwmj · on Oct 7, 2016

Only if development of the language and OS are separate. If you co-develop the language with the OS, then it makes perfect sense to push safety features into the language (or conversely, to remove them from the language when they are no longer appropriate).

LISP-strength macros give you most of this capability.

gpderetta · on Oct 7, 2016

OS are not the only use cases for system languages and developing a language to be tightly tied to an os (and viceversa) is a great way to condemn both to obscurity.

amalcon · on Oct 7, 2016

Erm... I think Dennis Ritchie would have had a few things to say about that.

gpderetta · on Oct 7, 2016

good point; though, while designed to work well with each other, neither unix nor C require the other.

bluejekyll · on Oct 7, 2016

If you want a portable C, *libc is practically a requirement. Having POSIX is also a huge boon.

pcwalton · on Oct 7, 2016

> There's a safety model, but it's only best practice, not enforced by the language nor the compiler.

What? How is it not enforced?

> You still have to manually maintain locks, and it's also much slower.

Slower? I don't think it can be any faster even in theory!

dkersten · on Oct 7, 2016

I've come to the conclusion that most "modern advanced" features were actually in ALGOL.

Algebraic datatypes with (even just basic) pattern matching (preferably with compile-time exhaustiveness checking) are things I've wanted in C++ ten years ago. This gets me part of the way there, at least. :)

barrkel · on Oct 7, 2016

The tag in Pascal is (a) optionally stored (only the type of the tag is needed to distinguish between the different variants in the declaration, the field can be omitted) and (b) not actually strongly typed (that is, the tag is not consulted before access to one of the variants is made).

It is however a fine way to encourage sensible use of unions. But it's mostly just a suggestion.

titzer · on Oct 7, 2016

They look a lot nicer in ML and Haskell, IMO.

Of course I implemented them in Virgil, too. (shameless plug: https://github.com/titzer/virgil)

harrisi · on Oct 7, 2016

I don't intend to derail the discussion further, but I'm curious about Virgil. I can't find any information on it that tells me who is behind it or how to contact them. Do you have anywhere I could look to learn more about the project in general?

titzer · on Oct 7, 2016

Sure! It's pretty much just my perennial side project since at least 2004. Some information is available in publications (latest: http://dl.acm.org/citation.cfm?id=2491962), and there is some documentation in the wiki branch (https://github.com/titzer/virgil/tree/wiki).

masklinn · on Oct 7, 2016

They're specifically saying that it's the first language they've experimented with, not the first language in general, FFS.

frankmcsherry · on Oct 7, 2016

I know, right. The full quote is apparently less fun to argue against:

> I’m aware the idea for type-safe unions isn’t unique or original to Rust, but it’s the first language I’ve experimented with that has made them a first-class feature.

logicchains · on Oct 7, 2016

Note C++2017's std::variant is based upon boost::variant, which has been around since at least 2004 (http://www.boost.org/doc/libs/1_31_0/doc/html/variant.html, http://www.boost.org/users/history/).

boost::variant however lacks a nice visit method that takes lambdas, instead requiring the user to create visitor classes. This verbosity may be part of the reason it wasn't adopted in mass, in spite of its advantages in terms of type safety.

jb1991 · on Oct 7, 2016

They are implemented completely differently. The only real similarity is in the name and API. Boost variant suffers from significant performance penalties that the new std::variant does not have.

alexeiz · on Oct 7, 2016

If you look at http://www.boost.org/doc/libs/1_62_0/doc/html/variant/design..., Boost variant has no performance penalty due to backup storage if any of its bounded type is nothrow default-constructible. In practice, you just need a single nothrow default-constructible type in boost::variant to satisfy that requirement. This can be `int` or `boost::blank`. And that's that - no performance penalty.

amluto · on Oct 7, 2016

> Boost variant has no performance penalty...

I have a big project that uses boost::variant. The runtime performance may be fine, but I swear that just compiling the header took about 5 seconds.

Pre-C++-11 fake variadic templates are just painful.

(IMHO two of the major missing features in Rust that are really need are integer generics and varidic generics. Without integer generics, arrays over 32 elements don't work correctly, and that's really annoying sometimes.)

gpderetta · on Oct 7, 2016

Boost.variant suffers for being correct by default (i.e. strongly exception safe) plus opt-in for speed, while the new std::variant can get in an invalid state if an exception is thrown at a bad time. Let's say that the trade-offs will be hotly debated until the standard actually ships.

jb1991 · on Oct 7, 2016

The problem with Boost.variant is that all data is doubly allocated, including at least 1 heap allocation as a copy. This is not obvious and is certainly not what a lot of users would want. But you are right, it does this to avoid exceptional situations.

Sharlin · on Oct 7, 2016

The tradeoffs have been hotly debated for years :) The current spec is finally something the committee could agree on, after countless proposals, counter-proposals, endless email discussions and long evenings at committee meetings.

gpderetta · on Oct 7, 2016

I agree that the current spec is better than none at all, but it feels like a bad compromise. I would have preferred either a fully exception safe variant or one with an explicit empty state.

verroq · on Oct 7, 2016

Isn't this just an algebraic data structure?

gpderetta · on Oct 7, 2016

well, yes. The nice thing is that in C++ can be implemented purely as a library.

Sometimes C++ feels like the high level languages assembler.

bjz_ · on Oct 7, 2016

Unfortunately that means they lack lots of the pattern matching niceties that you get in languages with builtin ADTs. There is an impressive paper about implementing pattern matching on sub-classes, but it's pretty hackily done using the preprocessor, and could definitely do with some language support: http://www.stroustrup.com/OpenPatternMatching.pdf

jb1991 · on Oct 7, 2016

It is widely speculated that pattern matching and many other syntactic enhancements are coming to C++ because of the big door that std::variant has opened.

squidbidness · on Oct 7, 2016

IIRC Stroustrup mentions a desire for features such as these for versions of C++ after C++17.

MichaelMoser123 · on Oct 7, 2016

C++11 c++14 c++17 does that mean that we might see a match statement in c++20? It would be a significant addition - similar in complexity to anonymous functions.

MichaelMoser123 · on Oct 9, 2016

i remember that Bjarne Stroustrup held a talk in cppcon15 where he talked a lot about the GSL library [1] - it would add type annotations that can be checked by a tool, so as to check for potential memory problems (to me that sounds like a poor man's borrow checker).

One year later: i see the template library [1] but i don't see the analysis tool. Does anybody know what happened with this initiative?

[1] https://www.youtube.com/watch?v=1OEu9C51K2A

[2] https://github.com/Microsoft/GSL

pjmlp · on Oct 9, 2016

The tools are called clang tidy and Visual Studio 2015 Update 3.

Other vendors might eventually add support as well.

masklinn · on Oct 7, 2016

> The nice thing is that in C++ can be implemented purely as a library.

The bad thing is that they suck.

gpderetta · on Oct 7, 2016

They are certainly nowhere as nice as a builtin language feature, but they are not too bad. They are in fact quite usable.

In the next 10 years C++ might even grow another leg and incorporate them in the basic language (together with proper pattern matching).

Manishearth · on Oct 7, 2016

In languages where they are builtin they get used quite a lot to great effect, whereas this is cumbersome in C++, both with boost and the new std::variant stuff. "usable" might not cut it for something you want to use ubiquitously :)

It's good that C++ has this, but it hinders some programming patterns. Of course, C++ has other programming patterns that it's great at to compensate :)

stcredzero · on Oct 7, 2016

C++ is what it is. It is "itself." I mean that in the Irish euphemism for moonshine kind of way. C++ is kind of like its own const keyword. It is what it is. const isn't immutability. It's something else. It isn't "pure" -- but it's still pretty darn useful.

orlp · on Oct 7, 2016

I have no clue what you're trying to say.

stcredzero · on Oct 7, 2016

My guess is that you're missing a cultural referent, and you're unused to programming language discussion comments that aren't contentiously positive or negative, but are rather whimsical.

Benjamin_Dobell · on Oct 7, 2016

Believe it or not, even a weakly typed language like JavaScript can take advantage of type-safe unions, thanks to Flow - https://flowtype.org/docs/union-intersection-types.html

jjnoakes · on Oct 7, 2016

Is there any demand for a syntactic sugar layer on top of C++? Something to make these new features more ergonomic? Something one could opt-in to for newer code, or code that doesn't need to be backward compatible to 1990? Something that outputs valid C++ and so works with any tool chain?

pcwalton · on Oct 7, 2016

That would be a compiler for a language to C/C++. That design decision is virtually always a bad idea: LLVM has simpler semantics, allows proper GC, allows proper debug info, is widely portable, and avoids a needless AST serialization/deserialization step.

Don't compile to C.

jjnoakes · on Oct 7, 2016

Please don't tell others what not to do.

First, there are cases where compiling other languages to C or C++ is better than via LLVM; interoperability and portability are two (yes, C and C++ are more portable than LLVM).

Second, I'm not really talking about compiling some wildly different language to C++. I'm talking about some simple syntactic sugar. The same syntactic sugar that would be in an existing C or C++ compiler's front end, except instead of adding it to all the frontends in the world and waiting years, I'd like to implement it just once and now.

pcwalton · on Oct 7, 2016

Effort in this direction would be better spent just adding new features to clang (or GCC). It'll be so much easier than writing a new parser and semantic analysis for C++, which you will have to do if you want the new feature to behave properly in the presence of templates, etc. You'll end up wanting to just use clang's front end to help you, like most C++ tools do—and at that point why not just modify clang itself?

People bring up obscure platforms (though, interestingly, I can't recall anybody actually naming one of those platforms) all the time as motivation for compiling to C. But if you really need to do that, you can always revive the LLVM C backend. The fact that nobody has bothered to revive it and keep it up to date enough to be merged into LLVM proper, to me, is a strong indication that few people need support for these obscure platforms.

jjnoakes · on Oct 7, 2016

> Effort in this direction would be better spent just adding new features to clang (or GCC).

In your opinion. This isn't a fact. For example, I know first hand of code that has to compile with compilers that are not in the set { gcc, clang, msvc }. What do I do then? (Oh, and those compilers are closed-source).

> You'll end up wanting to just use clang's front end to help you

Probably, although there are other options too

> and at that point why not just modify clang itself?

The changes I make, if useful to the community, would probably end up back in clang. But they would also remain a separate tool, because of all those other pesky compilers I work with.

> you can always revive the LLVM C backend

I'm not sure how that helps me; not only does it seem a terribly roundabout way to get what I'm looking for (instead of sugar -> c++ -> my c++ compiler, you are proposing sugar -> clang -> llvm -> c -> my c compiler), but I don't think objects produced by that pipeline would link with C++ objects produced from my native compilers, among other issues.

> The fact that nobody has bothered to revive it and keep it up to date enough to be merged into LLVM proper, to me, is a strong indication that few people need support for these obscure platforms.

Or, is it evidence that people using more obscure platforms (we're talking fortune 500 companies here) stick to languages that exist on their platforms for various reasons?

solidsnack9000 · on Oct 7, 2016

Although I don't agree with this poster, it seems odd that they're being downvoted.

valarauca1 · on Oct 7, 2016

>>>yes, C and C++ are more portable than LLVM

C and C++ are more portable then their own compiler's backend? LLVM is clang/clang++'s assembler

I get what your saying if you compile to C it can be more portable. But then you need to compile to some C standard.

K&R C? ANSI C? C90? C99? C11? Why not have a compiler flag for each? What about embedded C? What companies embedded C?

I can repeat this for C++

Compiling to C/C++ in a platform and standard agnostic way is impossible.

jjnoakes · on Oct 7, 2016

Lots of languages do it already. How is it impossible?

valarauca1 · on Oct 7, 2016

No.

Lots of languages can give you a dump in 1 standard of C. You can't specify what C standard, or what platform to output outside of ARM/x64. NIM, Cython, C++ (via clang), [Ada/Cobalt/Fortran]-GCC, Perl, Agol, Oracle PL/SQL only support x86_64 and ARM.

Your argument was

    interoperability and portability

1) Rust outputs to the same object file format as C, and can be natively linked against C/C++ code. So interoperability is no issue.

2) You have no portability gain as you have 3 choices x86, x64, and ARM. All of which Rust already supports.

Also your main request

    $ rustc your_program.rs --emit llvm-ir
    $ llc -march=c -o your_program.c your_program.ll

or if you want C++

    $ rustc your_program.rs --emit llvm-ir
    $ llc -march=cpp -o your_program.cpp your_program.ll

jjnoakes · on Oct 8, 2016

Who needs more than one dialect of C or C++? I don't. Any systems I'm interested in may not support clang, but they have C11 compilers and C++14 compilers.

Rust is not interoperable with C++ as far as I know, and even if it was, I'm not interested in Rust for this discussion for reasons I've mentioned elsewhere.

The LLVM C backend was removed. Julia devs have a version that supports the LLVM bit code that they require, but I believe it is not everything LLVM can produce.

I was under the impression that march=cpp output C++ code that rebuilt the IR, not implemented the source program.

Thanks for the advice, although none of it really applies to what I'm suggesting.

posterboy · on Oct 8, 2016

> Please don't tell others what not to do.

kind of paradox to say that

jjnoakes · on Oct 8, 2016

Is it?

Mine was worded as a polite request and his was worded as a command.

jb1991 · on Oct 7, 2016

Lot of syntactic sugar recently added to C++ and a lot more in C++17. The structured bindings in particular are really excellent ways for unpacking tuples and structs with type inference into new variables automatically.

jjnoakes · on Oct 7, 2016

Yeah but if you read the article and look at the syntax for C++ match-like statements, it leaves a lot to be desired.

jb1991 · on Oct 7, 2016

For pattern matching, that's true. And my response is, stay tuned. C++ has a lot more to offer in this area in the future. std::variant is opening a big door for all kinds of new features in the language, and for those who cannot wait, there are pattern matching libraries out there to peak at.

bluejekyll · on Oct 7, 2016

I feel like this point is often made about C++: "in the future we will have that feature too!"

If you don't want to wait, want to experiment with something else, learn a new language, etc., Rust is awesome.

Obviously the counter argument to this is, "but there's 30+ years of C++ in production", but be honest, who actually wants to work on a 30 year old codebase?

jjnoakes · on Oct 7, 2016

That's why I made my proposal above. What if you could have the nice syntax now (via translation to c++ source) instead of waiting for the future?

Why do I have to wait for every compiler on every system I want to build on to update to some future standard when things could be done now across the board in many cases?

froydnj · on Oct 7, 2016

GCC is very close to the 30-year mark (0.9 released 22 March 1987). Linux turned 25 this past year; Firefox (the C++ parts, at least) is very close to 25, depending on what code you want to count. Apache is 20ish years old. Even LLVM, which I think of as a relatively young project, is getting close to 15 years. Maybe (hopefully?) very little code survives from the initial versions, but they've all withstood the test of time.

It's certainly pleasing to be able to sit down and write something from scratch or nearly from scratch. But it can be equally pleasing to extend a multi-million line codebase to do something new, or to enable the codebase as a whole to do something new/better.

bluejekyll · on Oct 7, 2016

:) I'm glad others enjoy doing it. Definitely one of the things I like about Rust at this point is the freshness.

tomne · on Oct 7, 2016

The one thing I like about Rust is the fact that you can easily expose a C-like interface for libraries, which allows you to contribute to old codebases without much of a problem, best of both worlds (It has a non-trivial cost when starting, sure, but so does keeping C++).

clappski · on Oct 7, 2016

It's not about want, it's about a requirement that they're maintained, expanded.

bluejekyll · on Oct 7, 2016

Yes, and perhaps I'm shirking my duty as a developer by wanting to avoid some ugly gnarly code. I will say this, I would be more likely to want to contribute to projects like these if they incorporated Rust as an alternative to C or C++ for their development.

pix64 · on Oct 7, 2016

I think at that point you're creating a new language or at least a new (incompatible) version of the language. In this case, why limit yourself to C++ at all?

jjnoakes · on Oct 7, 2016

I don't see the distinction you are suggesting. Call it a new language if you like, but if it compiles to C++, looks 99% like C++, and just smooths over the warts, then whatever you call it, that's what I'm suggesting.

(I want it to compile to C++ so I can use the compilers I use and it can interoperate with the C++ code it has to interoperate with and C++ programmers that work on this code wouldn't have to learn more than a few extras that might make their lives easier).

nambit · on Oct 7, 2016

You're looking for some sort of code generator for C++ which only adds syntactic sugar? Interesting idea but I don't think a good one exists yet.

jjnoakes · on Oct 7, 2016

Yes, that's exactly what I proposed.

lfowles · on Oct 7, 2016

Well we did get `for (auto x : range)` not long ago!

marvel_boy · on Oct 7, 2016

Newbie here. What are the differences with Swift Enumerations?

nathankleyn · on Oct 7, 2016

Compared to Rust enumerations, they are one and the same. In fact, looking at the documentation for Swift Enumerations yields this little nugget:

You can define Swift enumerations to store associated values of any given type, and the value types can be different for each case of the enumeration if needed. Enumerations similar to these are known as discriminated unions, tagged unions, or variants in other programming languages. [1]

Haskell and other ML family languages also have similar constructs, although these are usually modelled using "Algebraic Data Types", or "sum types". [2] [3]

[1]: https://developer.apple.com/library/content/documentation/Sw...

[2]: https://wiki.haskell.org/Algebraic_data_type

[3]: https://www.schoolofhaskell.com/school/to-infinity-and-beyon...

masklinn · on Oct 7, 2016

Details (some features they provide or can provide), Swift's enumerations and Rust's enum are both implementation of the older concept of Algebraic Data Types (ADT), which can be found in pretty much any language inspired by ML (directly or indirectly).

Manishearth · on Oct 7, 2016

Swift enums are basically the same as Rust ones. The differences between C++ variants and Rust enums listed in the blog post apply between C++ and Swift too.

killercup · on Oct 7, 2016

Came here to say that for the specific problem at hand (`ConnectionState`) you should probably use session types in Rust -- but someone on /r/rust beat me to it _and_ the author already added it to the post while I was reading! :)

sidlls · on Oct 7, 2016

This is neat. I'm writing a very badly constructed toy application to get used to Rust, and some of the differences have been interesting to note.

In this case, when I first encountered Rust's enums the first thing that came to mind is the fact that C and C++ both offer the ability to support tagged-unions, but certainly not as a first-class entity and definitely with a lot more cruft, with or without safety checks.

For me the jury is still out on where the best "fit" for Rust is. I really appreciate the enforced safety of Rust for higher-level systems applications programming. I'm not convinced yet it won't just get in the way pointlessly for much lower level programming (especially embedded). Perhaps that's just my relative novice understanding of Rust, though.

bluejekyll · on Oct 7, 2016

It depends, do you want a piece of code that will be more reliable and require less maintenance after development?

Even after more than a year of learning the language I do find that some things take me longer to build, but the end product is far better.

sidlls · on Oct 7, 2016

I can write reliable code requiring less maintenance in C or C++. What you're engaging in is language zealotry. Rust isn't a panacea.

bluejekyll · on Oct 7, 2016

I think you're reading into the comment too much. Rust by definition, will create a safer variant of whatever similar code you write in C or C++. This isn't really debatable, things like bounds checking on arrays, strongly typed error results, thread safe memory sharing semantics. These will absolutely guarantee that in general you will have safer code in Rust.

What I said is that it might take you longer to write it in Rust, than something similar in C/C++, but you won't have some of the guarantees you get from the Rust semantics. So the tradeoff is up to you; write code faster, or write code safer.

sidlls · on Oct 7, 2016

Ah, I see now. Thank you for clarifying. Not that I agree entirely, mind, just that your initial comment was clarified.

oldmanjay · on Oct 7, 2016

I'm always fascinated when people decide to have an opinion on facts. That little spark gets fanned into the flames of religion so easily, and it's no exaggeration to say it is a primary driver in the shape of our civilization.

sidlls · on Oct 7, 2016

I'm almost never fascinated when folks mistake opinion for fact. It's the primary driver in many language wars, as I see it. Some come to recognize that no language is perfect, especially their favored one. Others don't.

oldmanjay · on Oct 7, 2016

Language wars are just another religion, so I agree.

The real tragedy of being human is that no amount of intelligence can save you from being religious.

petters · on Oct 7, 2016

Hopefully there can be a new function returning optional at some point in the future.

jb1991 · on Oct 7, 2016

I'm not sure I understand; in C++17, why can't you write a function that returns std::optional?

_y4o5 · on Oct 7, 2016

I think the example is either naive or then contrived. There's probably some behavioral changes that go with the state of the connection in which case you'd be much better served by having a State interface and then implementations for different actual states.

For example:

class State { ... }; class Connected : public State { ... };

std::unique_ptr<State> state;

dbaupp · on Oct 8, 2016

That has downsides like, for one, requiring allocations for every state transition, and the runtime infrastructure (plus loss of static assurances) required to do downcasts when one needs functionality that only exists on a specific state. For closed state spaces like this example, a discriminated union is far more controlled and has many advantages, whereas subclassing is often better suited to open (or large) sets of states.

mike_hock · on Oct 7, 2016

It's clearly a contrived example to show how std::variant can be used for a state machine.

radarsat1 · on Oct 7, 2016

The first example could have been implemented in C or C++ using a union. Wouldn't necessarily be type-safer, but the reminder of writing myConnection.connected.m_id instead of just myConnection.m_id is also a pretty good way to ensuring that you remember to check if (myConnection.m_connectionState == CONNECTED) before ever accessing myConnection.connected. That said, having compiler errors is much better. I'd love to be able to achieve this in pure C. One way might be to use opaque types with accessor functions to return a pointer to the correct part of the union according to the requested connection state, a bit like std::variant::get_if.

Animats · on Oct 7, 2016

Someone just re-invented discriminated variants from Pascal.

jnbiche · on Oct 7, 2016

As someone pointed out elsewhere, Pascal's discriminated variants ("variant records") aren't typesafe, since the tag isn't checked before the variant is accessed. From Essential Pascal, 4th edition: "The use of a variant record type is not type-safe." The whole point of the article's discriminated unions/records in Rust and C++17 is type safety.

Maybe this has been built into one of the newer Pascal descendants, but it hasn't been around for years and years. And embedded C programmers have been using these kinds of "unsafe" discriminated unions for years.

That said, I do think Ada has had these kinds of actually typesafe variant records for quite a while. But again, Rust isn't making any claims to innovation. Even the language name ("Rust"), is a reference to the language being based on rusty old best practices.

pcwalton · on Oct 7, 2016

None of us would claim to have invented them. Sheesh, we wrote the bootstrap compiler in OCaml :)

pjmlp · on Oct 7, 2016

This seems to have become quite prevalent.

Apparently the majority of CS degrees don't teach history of programming languages.

On my language design lectures in the mid-90's we had to learn all major ones, all the way back to Fortran.

Manishearth · on Oct 7, 2016

This really doesn't seem like a reinventing to me (from either the Rust or C++ side). "Reinventing" implies ignoring history in the way you mention.

Tagged unions are prevalent in so many languages that the designers of both C++17 and Rust are bound to know about them. This is "borrowing" (or "stealing" :p).

The original Rust was ML-like and had an OCaml compiler, so Rust's enums definitely descended from those. I can't talk for C++ for sure, but like I said it's common in so many languages that they're bound to have derived inspiration from them.

pjmlp · on Oct 7, 2016

I guess I was ranting a bit out of place....

nickpsecurity · on Oct 7, 2016

I agree with Manishearth. It looks like they're learning more than forgetting. The Rust history page indicated it cherry-picked good things from a number of languages. Looks like this is just another one they've evolved into a type-safe default. A Good Thing.