Hacker News new | past | comments | ask | show | jobs | submit login
Rust: Enable WebAssembly backend by default (hellorust.com)
473 points by a_humean on Nov 26, 2017 | hide | past | favorite | 228 comments



This makes me so excited! Great job to Alex and the entire team getting this done.

I know Rust is not the most perfect language for some. For me, it’s really exciting that there is finally a high level language which can scale for every potential use case.

Rust is great for kernels and the embedded space; Excels in the systems space; Is gaining traction in the native application space; Has great tooling for web backends; and, now, with WASM, it’s capable of targeting the web (Yes, without DOM access this is limited, but my understanding is that is coming at some point in the future).

Are there any other runtime free, memory safe languages that can target all of those use cases?


> Rust is great for the embedded space

Honestly, I would disagree here, unless we're talking in terms of potential, rather than actual ability right now. There's certainly some level of support, but I really did not get the feeling that Rust was 'great' for embedded, rather that it could be.


> but I really did not get the feeling that Rust was 'great' for embedded, rather that it could be.

My experience is very limited here, but I have used TockOS on one of their dev boards. That was very pleasant, no stdlib makes it a little harder than std Rust, but still felt more deterministic than writing in C.


I was thinking a lot in terms of device support too, I guess. That's important to me; I know with C that I don't have to think too hard about which device, at least in terms of "Does this board support my language well".

It's probably even more important for professional embedded engineers, where the hardware decisions may not even be made by the software developers, for various reasons.


Again, not speaking from experience here, but it should be easy enough using tools like https://docs.rs/bindgen/0.31.3/bindgen/ to generate FFI interfaces to C libs where that is needed.

So, I wouldn't necessarily say that's a blocker. Obviously everyone should make the best choice given their business requirements, etc.


Yup, bindgen works great. It has a little problem with c macros but they're easy enough to redefine into a static constant to make them work.


Good point, although one advantage with Rust is that it plays well with C -- so you could keep some C drivers or libraries, and do the rest of the work in Rust. That reduces attack surface area while providing a smoother on-ramp to full device support.


> one advantage with Rust is that it plays well with C

Advantage over what? This can be said about the majority of programming languages

For example, I’ve recently done an embedded ARM Linux project where I used C# and .NET Core for higher level parts, C++ for lower level, [DllImport] and C API between the two.


You're considering the "call into C" case, which is only half the story. "C calls into you" will always be easier in C++ and Rust, or any other language with "no" runtime, than heavier languages that rely on them. Yes, it can work, but it's a disadvantage.


In C# I use callbacks where I find appropriate (logging, also some rare events). Under the hood, the runtime marshals delegates (most languages would call them lambdas) into unmanaged C function pointers.

Ease of use is the same when I consuming the C API from e.g. C++. No disadvantage for C#.


"C calls into you" means you already have a C code base and now need to call your C# functions.


Right, C# is OK when C code calls into it.

https://stackoverflow.com/a/5235549/126995


One neat trick with .NET on Windows, is that you can actually export static methods in assemblies as unmanaged entry points. In other words, things can LoadLibrary/GetProcAddress them, and invoke them as native.

C# doesn't support this out of the box, but it can be easily done by post-processing the generated assembly. There's a NuGet package for that.

https://www.nuget.org/packages/UnmanagedExports

I'm not sure if any of that works on other platforms, or with .NET Core. Probably not.


I once tried to use that trick, for nVidia optimus integration.

Didn’t work because that recompilation step broke debugger, and invalidated .PDB debug symbols.


To be clear, I’m not saying it’s not possible. I’m saying that one less runtime is an advantage.


I actually think .Net runtime does a well above-average job with this - including clear declarations of managed and unmanaged code - so your ARM project sounds great & what I would expect. I just don't think it's true of "majority" of programming languages, at least if you count by number of users or HN popularity. We're both right.


Besides C, C++ and .NET, I have some experience with Python, Perl, VB6 and VBA. All of them have some form of C interop reasonably easy to use.

I also have some experience with Java, JavaScript and PHP. They don’t.


You have seem Python, Perl or VB6/VBA being called from C code, in a seamless, high-performance way? I suppose it's all a matter of degree, rather than black-and-whites, and perhaps things have improved - but I'm just surprised.


A bit more complex than .NET, but only a bit.

Python: https://stackoverflow.com/a/33485103/126995

Basic: http://codingdomain.com/visualbasic/win32api/callback/

About the performance… Neither of these languages are particularly fast in the first place :-) But I don't see a reason why these callbacks might be too slow.


Same, I have a number of devices here on my desk that will never get a Rust compiler targeting them. With C, it's simple, the "only" thing you have to worry about is what bugs or quirks the C compiler will have on that specific platform, not if there's a compiler at all.

Now I've left embedded development behind me for a while now, things are mostly Linux on relatively modern ARM in the field where I used to do stuff, so Rust would probably be an option there, but I still have a hard time considering them to be "embedded".


Agreed, I've been waiting 3+ years for first-class support for bit fields


I think another contender here would be OCaml/Reason, which has both JavaScript and WebAsm targeting, is used in systems space (eg Mirage). Its a bit different in technical choices (functional, GC, for example). Facebook is using Reason for Messenger for example, on mobile.


Honest question, Is reason really targeting wasm right now? Do you know a resource describing this? I cannot find anything about this.



Yes, but a byte code interpreter ported to wasm via emscripten != a compiler with a WASM backend.


Isn't the stuff that comes out of the OCaml compiler byte code?

As far as I can tell, OCaml has a GC and WASM (at the moment) doesn't so this seems like the only viable solution to me.


OCaml also has a native x86 backend, which produces quite efficient code.


I couldn't find anything on this either. The assembly emitters in their repo don't suggest a WASM backend: https://github.com/ocaml/ocaml/tree/trunk/asmcomp


AFAIK Reason is a frontend for OCaml and can target whatever OCaml can target. Note that for targeting JavaScript they are using BuckleScript, an OCaml backend that's not built by Facebook.


I consider Rust to be functional, although people would dispute that, I'm sure. GC is a huge difference though.


Rust doesn't have great support for function composition or partial application, both of which I think are essential to be considered a "functional" language.

Others will have different opinions for sure, and that's fine.


Any links to instructions for Reason WASM targeting?



There is a project TrueBit is working on for ocaml and wasm integration.


"Rust is great for kernels and the embedded space"

I'm no expert, but from a layperson's perspective, the rust usage in these areas depends on a lot of "unsafe" blocks, fighting the borrow checker, etc.

Does that ratchet it down from "great" for this space to something like "ok" or "usable"?

Or is this just the best we can do in terms of a memory safe language for kernel and/or bare metal embedded?


The point of unsafe blocks is to contain unsafety. In a language like C, the natural pattern is to expose raw pointers through the entire codebase and document how to use them correctly, when they can be mutated, who's responsible for freeing them, etc. In Rust, the natural pattern is to implement wrapper types and other abstractions that themselves make heavy use of unsafe blocks but whose APIs are safe.

This isn't unique to kernels. The standard heap-allocated vector type in Rust (on which the standard string type is based) is full of unsafe blocks: https://doc.rust-lang.org/src/alloc/vec.rs.html The problem of implementing a heap-allocated vector, with some elements uninitialized but available for use and with the ability to reallocate the entire vector and copy the members, is not a problem that lends itself well to the borrow checker's view of the world, so raw pointers are a fine way to solve it. Use of vectors themselves, their iterators, references to items in the vectors, etc. does match the borrow checker's view of the world, so the average Rust program running in userspace doesn't have to care that the standard library itself isn't implemented in safe Rust.

So copious usage of the "unsafe" keyword in a codebase can be a sign that the code as a whole is more safe: instead of having large portions of the code be unsafe (or instead of having the code pretend it's containing unsafety when it's not), only the parts that need raw memory access use it, and they maintain invariants used by the rest of the code.

It is certainly possible to do better - there are no machine-checked proofs that unsafe code does what it claims and upholds the invariants it claims to uphold. There could be, and I'm really interested to see what people do with languages that lend themselves better to proofs (usually dependently-typed, Turing-incomplete-but-very-powerful languages). But right now the state of the art is that such languages are a pain to use and writing proofs is even more of a pain. I think that's where the compromise is right now. The borrow checker is, arguably, a system for proving things about your code; it just only knows how to prove specific things. (Way more than C's type system proves, but way less than you'd prove in a perfect world.)


This is a teaser / preview and not necessarily what is going to come. But Rust could evolve to providing checks like this:

https://i.imgur.com/oqtEDoV.png

From miri, page 27 in their slide deck. The miri project has changed a lot since the slides were written — and it is now close to being merged into rustc.

miri: https://github.com/solson/miri


The usual problem in C is ambiguity over "how big is it", the cause of most buffer overflows. Rust tries to contain that, but doesn't solve the problem at the language level. Part of the trouble is that there's no way to talk about uninitialized memory in Rust. If you implement something like a growable array (rust calls this a "Vector", a poorly chosen name), it has space in use and space that's uninitialized and unused. There's no way to express that concept within the language. If you had a way in Rust to say that only elements 0..N are initialized, and only they can be accessed, you could express that situation. Initializing another slot means more elements are now initialized. This is run-time checkable with some overhead.

Once you can say that, you can prove it. The proof techniques are well known. The first step is to be able to talk about it within the language. A predicate like

    initialized(A,i,j)
meaning that array A is initialized from elements i to j inclusive is all that's needed. Then, when you write

    assert(initialized(A,i,j));
    A[j+1] = 0;
    assert(initialized(A,i,j+1);
you've expressed how much of the array is now initialized. Then automatic theorem proving takes over and proves that if the first assert is true, so is the second.

This eliminates the need for "unsafe" in many places. A fancier type system is not needed.

It only takes one wrong piece of unsafe code to enable a buffer overflow attack.


I work on proving things about unsafe Rust for my day job, and you really don't know what you're talking about if you think what you're proposing would be remotely sufficient to guarantee safety in a modern systems language. I'm not saying a way to deal with uninitialized data would not help, but that is not the primary source of potential unsafety in Rust, and it's completely infeasible for the language to include all the features that would be required to prove all unsafe Rust code safe (at least, given what we know now--if you have ideas about weak memory that I'm not aware of, feel free to ping me!). However, it is possible to do machine-checked proofs about unsafe Rust, and once those proofs are done, they remain valid forever, so I'd prefer the current approach (which is to verify the standard library and some major third party ones, and make the tooling for proving correctness of other Rust libraries as easy as possible).


Of course it's not sufficient. It does make possible to talk about partially initialized arrays, which is a necessary step to checking their access, either at run time or by formal methods.

The "Weak" problem is mostly a back-pointer problem. If you had some way to talk about a back pointer locked in an invariant relationship with a forward pointer, that would deal with back pointers for trees and such. I sketched out a solution for this on YC earlier this year.[1] This doesn't deal with true circular lists, but those are relatively rare; people tend to use arrays for that today.

It's good to hear this criticized from someone who does program proving. I used to do that, a long time ago. What I usually get from the Rust crowd is macho assertion that they don't need checking of their unsafe code. History and the CERT archives indicate they do.

[1] https://news.ycombinator.com/item?id=14302823


Weak memory isn't related to weak pointers. What you're referring to is actually pretty easy to deal with (well, not easy, it's pretty tedious to prove in most of the individual cases, but it's not conceptually difficult). Weak memory refers to atomic operations with weak consistency (like relaxed, release-acquire, and perhaps most confoundingly consume). Modeling them along with lifetimes and ownership tracking requires a fairly sophisticated language model with lots of intermediate tokens and ghost state, which would be a tremendous burden to include in a programming language. Fortunately, it looks like most features of safe Rust extend straightforwardly to weak memory models, but that isn't always the case for unsafe Rust.


> Part of the trouble is that there's no way to talk about uninitialized memory in Rust.

You mentioned Vec right after that. Which is interesting, because Vec deals with uninitialized memory just fine. https://doc.rust-lang.org/std/vec/struct.Vec.html#method.res... allocates more space, but doesn't allow you to read it unless you explicitly add elements.

> It only takes one wrong piece of unsafe code to enable a buffer overflow attack.

Isn't that the case for every language? Here, you can limit the unsafe bit to the underlying datastructure if any unsafe actions are required. Maybe Vec uses some unsafe bits, maybe not - but the usage of it is all safe code.


In many other languages `unsafe` taints the block of code it is in, which propagates upwards. Rust is the other way around, where `unsafe` means "I'm doing something unsafe here, but let's not tell anyone who doesn't look". It can be a lot harder in Rust (than other languages I've used) to know whose code you have to trust: language implementors (probably ok) or random crates (urk).


Could you give some examples of these other languages? My only exposure to a similar feature is D's @safe/@trusted, where if you're writing @safe code, you'll always wrap unsafe code with @trusted stopping any propagation.


C#


C# has both unsafe blocks and unsafe methods. The behaviour is the same as in Rust. (Unsafe function can be called only from unsafe scope and safe functions can contain unsafe blocks)


I don't think you are correct, unless the language has changed substantially.

https://docs.microsoft.com/en-us/dotnet/csharp/language-refe...

Among other things, you need to opt in to unsafe with the `/unsafe` compiler flag. I'm pretty much 100% certain that the `unsafe` keyword applied to scopes does not mean "treat the contained scope as if it were safe, but permit the use of unsafe operators" which is how it works in Rust.

Edit: to put some more words here, I think what you may be saying is "the unsafe fragment of C# behaves like Rust", which sounds sane to me: `unsafe` decorates regions and the compiler will complain at you if you use unsafe things outside such scopes. When I said "C#" I meant the larger language in which programs are either "safe" (verifiable) or not, and for which `unsafe { .. }` forces you into the unsafe fragment.

Rust doesn't have safe and unsafe language fragments; there is one language which follows rules similar to (as you say) unsafe C#, but which advertises itself as safe. In that language, `unsafe` moves you back in to "safe" code (or we could agree that it was never safe in the same sense that languages like C# are).


You're wrong, and that's exactly what "unsafe" block does in C# - it does not affect the outer blocks. So, for example, if you wrap your pointer arithmetic inside "unsafe", you don't need the "unsafe" modifier on the method. If you put the modifier on the method, then you don't need the modifier on the class. And the code that uses your "unsafe" classes and methods doesn't need to be in "unsafe" context.

If it were otherwise, no C# program would compile without "unsafe", because large parts of the standard library core (stuff in System) is unsafe. The only reason why "unsafe" is useful is precisely because it lets you isolate unsafe operations in a way that lets safe code invoke into them (assuming that you uphold your safety guarantees).

The /unsafe switch is a different thing - it needs to be turned out for the C# compiler to allow "unsafe" anywhere in its input, and is basically a project-level declaration that "this has some unsafe code somewhere in it".


Sure it does, Assemblies with unsafe code are tainted and fail verification.

There are deployment scenarios, like IIS, where admins can disable the execution/loading of Assemblies with unsafe code.


"unsafe" is a C# feature. Verification is a CLR feature. And they are only related in a sense that things that you can do inside "unsafe" blogs generally produce unverifiable assemblies. But the reverse is not true, you don't have to use an "unsafe" block to consume an unverifiable assembly.

Nor does "unsafe" breaks verification by itself - it's only the specific features that you might use inside (like pointer arithmetic) that would do so, and even then not all of them. For example, C# requires "unsafe" for any use of sizeof() that is not known at compile-time, and for which it needs to emit the sizeof IL opcode. So e.g. sizeof(int) is okay, but sizeof(IntPtr) is unsafe; also, sizeof all structs are considered unsafe. However, the IL opcode for sizeof is not unverifiable, and so such an assembly would pass verification, despite requiring "unsafe" in C#.


I know that, but unless one wants to become a MSIL and verification engine expert, the simplest explanation is that unsafe taints Assemblies.


Yes, but that's not what OP was talking about. Within an assembly, an unsafe block still doesn't taint any code that calls into it. And between assemblies, an unsafe assembly doesn't taint a safe one that depends on it. So it doesn't "propagate outwards".


> The usual problem in C is ambiguity over "how big is it", the cause of most buffer overflows

Buffer overflows in C are caused by accessing uninitialized memory, regardless of why the program does it (how big is it vs null pointer derefs vs various other ways).

Rust does solve that problem at the language level. There's no such thing as an uninitialized pointer, which is the only right solution I'd argue. If you do access out-of-bounds in a safe block, you panic, not cause a vulnerability. How is the problem not solved by this, combined with iterators?

What you're talking about, tracking uninitialized/initialized at runtime, can totally be done in rust too, e.g. with a [Option<T>] or such... but then you'll obviously pay the runtime price of doing such checking.

The more common case though is someone wishes to have a mutable vector which has a capacity and an accessible subset of data, which Vec implements.

What you're claiming can be proven easily is not so trivial to prove for the common case; it may be `j+n` where `n` may derive from user input.

It would be cool if blocks of rust now could be "theorem-proved-unsafe", but I think realistically it's a huge amount of work for a small win.

Writing and proving theorems about code is typically much more time-consuming than exhaustively testing or informally verifying a block of code via review, and when the unsafe code is small and simple enough, the results we get are already "good enough".

> It only takes one wrong piece of unsafe code to enable a buffer overflow attack.

Yup, though for many programs you can entirely avoid unsafe code (outside of the well-reviewed and well-tested stdlib), and you can pay special attention to unsafe code in review and testing.

That's a damn sight better than other low level non-gc'd languages like C++/C where the default is unsafe and all code is suspect.


Ah, no, buffer overflows are caused by failure to check bounds. Practically, this is usually only a serious problem when the buffer overflows into an area that has been initialized with something like a return address.


> (rust calls this a "Vector", a poorly chosen name),

What exactly makes it "poorly chosen"?


I think it’s based on the C++ name? (a name which Stepanov supposedly now regrets, https://stackoverflow.com/questions/581426/why-is-a-c-vector...)

But it disagrees with all past use of the name “vector” in mathematics and numerical computing, which is reasonably consistent and well defined and dates from the mid-19th century.


I think it agrees pretty well with the mathematical definition, it's just a slight generalization.

A Rust (C++, etc) Vec<T> of length n where T implements addition and multiplication in a way that satisfies field laws is the "reasonable" representation of a n length "mathematical" vector over the field T. It just generalizes that storage type to situations where T is not a field.

It doesn't support addition of two vectors and multiplication by a element of T using the normal mathematical syntax, but that's a fairly reasonable choice given how we regularly use vectors as a sequence of independent numbers and the different priorities for math syntax and programming language syntax.


What makes a C++/Rust/etc vector distinct from an array is that it's resizable. That is its defining property.

On the other hand, a mathematical vector is not a resizable thing: a vector in R² is of a fundamentally different type from a vector in R³.

A type named "vector" should be more like Vec<T, n> for some type T and some integer n. Failing that, the reasonable representation of an n-length mathematical vector, in C++/whatever, is an array.


The whole idea of mutation is not common in mathematics, so of course mathematical vectors don't commonly resize.

The idea of appending an element to make a new vector is however extremely common, particularly in induction like proofs.

Vec<T, n> is in a sense already what we have, it's just that we are not capable of storing the n in the type information^0 so we store it at runtime.

^0 For a number of reasons. n can change with mutation while types can't. We don't necessarily know n ahead of time when building a vector (which happens in math too) but we have to know types at compile time. And we just don't have a great way of storing numbers in types in Rust anyways.


Just to clarify, a vector space is defined over a field but vectors themselves do not form a field.

Also more importantly in mathematics vectors aren’t really growable. When you change the number of elements you change the dimension (and thus the “type” for that vector). It is then ill-defined to add two vectors of different dimensions.


This is quite standard in the functional programming world; Common LISP, Scala, and Haskell all use "vector" to refer to (usually growable) contiguous-memory arrays with efficient random access, as opposed to "lists" with efficient append and prepend but inefficient random access.


I might be nitpicking but in Haskell and typical lisps “lists” don’t have efficient append. They are basically singly linked lists. A structure with efficient append and prepend (sometimes called a deque) is implemented as a finger tree in Haskell (with logarithmic random access), and as a vector of arrays in C++ (constant random access).


Good point. Sometimes efficient append, but usually just efficient prepend ^_^


I think it's wrong to say that unsafe blocks "contain" the unsafety. Unsafety can't really be contained once it's there -- that's what makes it unsafe.

The advantage is that you should be able to write application code without unsafe blocks; they should be in a library. And the library has a contract: if unsafe behavior does leak outside the library, it is clearly the fault of the library, and can't be blamed on the (safe) caller.


Unsafe blocks "contain the possible sources of unsafety" may be a more correct way to phrase it.

It is better to know that only a specific subset of the codebase could be responsible for memory unsafety even if that behavior may leak. If nothing else, it helps focus reviewing and testing effort.


I'm pretty sure y'all are saying the same thing. :-) my favorite way of describing it is "Rust lets you build safe abstractions, even if its core uses unsafe."


Terrific, in-depth, answer...and I appreciate it.


A really nice answer. It's too bad, I suppose, that the same abstraction can't be used "all the way down". I suppose then we'd be stuck with some high-level encoding of the chip's ISA, perhaps generalized to allow the programmer the ability to define new instructions, and an increased number of registers, etc.


You'd be surprised at how little unsafe you need. Consider https://os.phil-opp.com/page-tables/ for example, it uses the type system effectively to eliminate some issues, in the end, it only has two calls to unsafe.

On some level, yes, it's the best we can do. Unless you want to add hardware semantics into your language. Unless you do, writing two bytes to 0xB8000 is going to be an arbitrary memory access. At least, that's my take.


As someone who was worked with this page and develops a rust kernel, I have to disagree.

The two places of unsafe are barely just the minimum.

When interacting with UEFI you have to include a bunch more unsafe clauses since you'll be FFI'ing to PE-calling convention C code to wrap the UEFI methods. You'll have to allocate memory without being able to malloc which means passing a raw memory block, allocate in C, to rust and tell the frame allocator to use that.

Setting up a syscall is unsafe, you have to use a naked function with internal unsafeness and the entire construct is unsafe because a syscall can easily crash the system if it doesn't return properly.

You will have to dereference null pointers because on kernel level a null pointer is a valid address (my UEFI places a memory region at 0x00 for about 30 pages which is a great starting region) unless you waste a buffer region. But Rust and LLVM don't allow that safely at all.

The very nature of a kernel is unsafe. It has to do very unsafe things in an unsafe way to enable software running in Ring 3 to make safe assumptions about it's environment as long as it gets scheduled.

Of course, Rust is somewhat better than C since it's type system allows one to express things without the entire construct blowing up every 2 minutes unless you're developing a kernel for years.


> The two places of unsafe are barely just the minimum.

Yes, this is only one subsystem. Obviously, as you say, there are more than that in a realer kernel. But even larger ones have still shown that the vast majority of their stuff works in safe code.

> Setting up a syscall is unsafe

Fun detail, this isn't unsafe anymore on x86! You don't need naked functions; you can

  extern "x86-interrupt" fn some_handler()
and LLVM knows how to properly do the codegen! This is kinda emblematic of what I mean: you can do more with safe code than you'd think.


>Fun detail, this isn't unsafe anymore on x86!

On x86, not x86-64, which has the syscall instruction, replacing interrupts for syscalls (technically x86 had sysenter)

The syscall instruction requires writing the address of the syscall function into the LSTAR MSR. Additionally, the syscall entry is purely naked, only the EIP and Segments are preserved and in a way that requires the caller to handle it.

Sysret is also particularly nasty in being unsafe in various ways, including being buggy on all Intel CPUs.

Atleast it's faster than interrupts and you don't technically need the -mno-redzone for it if it's not part of your interrupt handling code.


> Consider https://os.phil-opp.com/page-tables/ for example, it uses the type system effectively to eliminate some issues, in the end, it only has two calls to unsafe.

However, those two calls to unsafe rely on an invariant maintained by the rest of the code that is not marked unsafe. Effectively, the entire module has to be treated as unsafe code by the programmer.


Yes, absolutely. It's still a far cry from what many people assume, which is that the whole shebang must be.


No, that's sort of the problem, isn't it? Safety actually does depend on the whole shebang, or rather, the correctness of all the code that supplies inputs to the "unsafe" block. I think safety has to be thought about at level of the unit of code that can enforce its own invariants, rather than the specific syntax that does the dirty work.


Safety depends on any module containing unsafe code upholding the invariants that are required. But once you've done that, it's safe.

The question is, how much code is in this state, vs being purely safe? My experience and that of others shows that it's generally a smaller amount than many people not experienced with Rust assume.

Basically, I find the "there's always unsafe code somewhere at the bottom and therefore it's the same as 100% unsafe" to be overly reductive.


> Basically, I find the "there's always unsafe code somewhere at the bottom and therefore it's the same as 100% unsafe" to be overly reductive.

I totally agree; it's not the same. However, I think there is disagreement about whether saying only "there's always unsafe code somewhere at the bottom" is better or worse than saying "it's 100% unsafe".

I tend to write some sketchy unsafe code, and the only thing I can say with a straight face is "please don't use this if you care about safety", i.e. treat it as if "it's 100% unsafe". I'm slightly weirded out that Rust has a keyword that amounts to "I understand UB really well; tell others to trust me on it transitively".

I'll be less weirded out when the UB story gets shaken out more, or if cargo gets an "audit-unsafe" option that shows you all unsafe blocks you transitively depend on (does that exist yet?). Still like Rust, but letting randoms write unsafe code is scary.


> However, I think there is disagreement about whether saying only "there's always unsafe code somewhere at the bottom" is better or worse than saying "it's 100% unsafe".

Are you saying that Rust doesn't have a better memory safety story than C++? In practice, that hasn't been our experience.


I would say that Rust has a different memory safety story from say C#, which distinguishes between "is safe" and "contains unsafe code outside the CLR itself". I find that very-binary distinction more helpful than Rust's memory safety story.

I don't use C++ anymore, but I am not clear on how Rust has better memory safety guarantees than C++. I do like its story more, for sure, and I could see how they might turn in to guarantees (but I could also believe that someone could identify a similarly safe fragment of modern C++, and don't want to start that).

Edit: will rust-lang.org take a PR changing "guaranteed memory safety" to "memory safety story"? ;)


> Edit: will rust-lang.org take a PR changing "guaranteed memory safety" to "memory safety story"? ;)

I hope not. The value proposition of Rust is that if you stick to writing code without unsafe, then you have a guarantee that you will not fall victim to memory unsafety unless there is a bug in the language or in some unsafe block in a dependency you're using.

Personally, I find your quibbling in this thread pretty strange. Rust very clearly provides more memory safety guarantees than C++. For starters, you could not take the guarantee I just stated above and apply it to C++ because there is no such thing as "C++ without unsafe blocks."


I really don't have a position on C++; I don't use it, and don't really want to make supportive or limiting statements.

I have used other memory safe languages, and I find their guarantees qualitatively different from Rust. I would say of them "you will not fall victim to memory unsafety unless there is a bug in the language or in the standard library" with no mention of other dependencies. The standard library is then fairly thoroughly audited.

By comparison, you have e.g. the `memmap` crate, which I like and use, written by serious people known to you, where `unsafe` is used in construction of the memory map. It is then on the user to ensure that there are no concurrent writes to the backing file for the lifetime of the map, at the risk of (as I understand it) UB. Would you describe such a program as "safe"? Or at least, can you understand why I might not? If a concurrent modification happens, is the bug to be found in the `unsafe { MmapMut::map_mut(&file)? }` block, or perhaps farther away where you safely open and modify the file before dropping the map?

I'm sorry if this comes off as quibbling; I do actually care about getting these things right, as I want to rely on them and do more interesting things (e.g. de-abomonate mapped memory).


C# has unsafe blocks just like Rust does. Likewise, Java has sun.misc.Unsafe. Python has ctypes. How is that different from Rust?


HN wants me to stop posting, so this will be the last one. You two know how to find me elsewhere.

What I should have written is: "you will not fall victim to memory unsafety unless there is a bug in the language or in the standard library, or you explicitly opt out of safety". Using `unsafe` in C# is opting out of its safety guarantees (and requires the `/unsafe` flag); I'm pretty sure none of the C# team would say "guaranteed memory safety" of unsafe C#. I expect the same is true of Java, and have no clue about Python.

To the extent that we are talking about unsafe C#, I think you are right; it's mostly the same as Rust. But, no one would claim that unsafe C# has "guaranteed memory safety" where the guarantee is "as long as your code and all the code you bring in don't have memory safety bugs". At least, I hope not.

My original point was just meant to be that there is the risk of harm when you try and quantify a language as "more safe" because there are fewer regions where unsafe operations are permitted. People might read "guaranteed memory safety" and misunderstand what it means, as evidenced by me not actually knowing what it means.


There is some tooling but not a ton. Check out the Miri link elsewhere in the thred; eventually I expect us to have amazing tools here. It’s gonna take some time though.


Woops, I think I misunderstood the context and repeated what you were already saying. I have become that which I hate. :(


For kernel-level programming, I think Rust is basically the same as C or C++ in this regard. I haven't seen a kernel written in Rust where I am convinced there is a submodule that can be treated as if its correctness doesn't impact the safety of the entire kernel.


I don't understand this argument at all.

"Rust is not perfect, so it's basically the same as C"

The fact that "american fuzzy lop" and "syzkaller" have gone through the kernel and found dozens of use-after-free's (see https://github.com/google/syzkaller/blob/4bd70f/docs/linux/f...), shows that even incredibly carefully programmed C messes up in totally rote ways.

The basic cleanup code which is able to be exploited here would not be in an unsafe block if this were rust. In fact, because rust abstracts memory management out better, most of the cleanup code wouldn't exist at all.

Rust also allows better abstractions. For example, Rc<T> in rust is much easier to use without screwing up than the manually handled reference counters littered throughout the kernel. It seems obvious to me that numerous deadlocks in the kernel could be avoided by not hand-rolling refcounting everywhere.

You're right that there's still the potential with rust for some code to compromise the safety of the entire kernel.

But with rust that's a subset of code, not practically every single line of the whole thing.

It seems to me like memory corruption/use-after-free/etc only being able to happen in a small fraction of code is better than it being able to happen anywhere.


> Rc<T>

Rc requires memory allocation. When you boot a kernel you don't have that. Until you set it up, you can't use anything that requires a heap, only stack values.

The best solution I found is to use C code to allocate raw memory and then go through the elaborate process of setting up kernel address space, after which you setup a frame allocator, after which you can setup interrupts and finally use pagefaults to normally use kernel heap.

But wait there is more! During a pagefault interrupt you can't use the kernel heap in a writeful manner that is not inplace. Since a page fault during a page fault is a double fault and thusly your only option is to crash the kernel. You can only read from the kernel heap, modify in place in the heap and write to pagetables.

Same goes for some other interrupts. Which btw, violate a lot of ways Rust code likes to work. An interrupt is like a coroutine but you don't have a nice runtime that abstracts it for you, instead the IRQ just smashes into the active execution and let's hope you haven't held any locks to important kernel stuff at that moment because you will have to crack that lock open.


My point is really that rust allows you to build those abstractions, regardless of that specific one.

Sure, the kernel has reusable things like its embedded linked list etc, but they're not type-safe, and they can't represent ownership.

With rust's type-system, a generic rc-like thing could be created which is suitable for the world of the kernel, even if it's not the stdlib Rc.


You can't have RC until you have setup a heap. The stack is not a safe place to put these things.

Without any memory management, things get hard fast, especially considering you still need to parse the memory layout to find out which parts of memory are even usable.


The original comment was "Rc<T> in rust is much easier to use without screwing up than the manually handled reference counters littered throughout the kernel."

That's heap vs. heap. It comes after the very brief stage you're mentioning.


Why is kernel programming any different from user-level programming here?

I tend to think that the driver and network level, which is where most problems in the real world tend to lie, is similar to userland in how often unsafe code gets used and the impact of unsafe code on the correctness of the kernel. In fact, the fact that you can write some drivers in userland on certain OS's--and they can crash without bringing down the whole OS--is strong evidence of this.


x86 is unsafe. Period. A lot of devices are unsafe. Period.

You can write some device drivers in userspace. But you can't handle page faults in userspace. Those need to be handled by the kernel and will be 100% unsafe because any mistake will lead to a certain and quick double or triple fault.

Kernel Level Programming is very very different to normal user-level programming. There is no malloc. There is no segmentation. Nothing prevents you from dereferencing a null pointer. In fact, in kernel-land a null pointer is valid memory. You don't have any of the forgiving properties of Ring3 assembly code in kernel-land.


What you are describing is <10% of, say, the linux kernel. The rest of it reads very much like user-land code (and dereferencing a null pointer will often earn you a nice warning message if not a panic).


Those are usually the important 10% of a kernel that build the foundation of everything else.

Once you have VMM and Paging setup, preventing a null pointer deref is not hard (simply unpage 0x0)


> a submodule that can be treated as if its correctness doesn't impact the safety of the entire kernel.

You don't have to go all that way to be better than C or C++. Rust can give you a submodule A whose correctness doesn't impact the safety of any uses of submodule B.


If the correctness of submodule A impacts the safety of the entire kernel, then it impacts the safety of the uses of any other submodule, because you can't really contain the consequences of the unsafe behavior.


> you can't really contain the consequences of the unsafe behavior.

That's true, but the property I described can still help you track down the root cause and limit the number of places that interact with it.


Even if you wrote your entire program in unsafe, you're still enjoying an experience at least as good as C++; you just don't get the compile time memory safety. And its unlikely your entire program would ever have to be in unsafe.


Having never used rust and having barely used C, here are some uninformed opinions.

A memory safe language for the kernel doesn't seem possible using "traditional" safety tools. I'd expect a "safer" language would look a lot like formal verification. If you can make a safer language that doesn't look a lot like formal verification, you can probably automate a lot of formal verification.

There are still some big advantages to rust. By explicitly marking unsafe code, you've made it a lot easier to figure out what needs to be focused on. I imagine you can even formally verify just the unsafe parts of your ecosystem.

Also, rusts tooling just seems more pleasant.


ESPOL for Burroughs B55000 in 1961 already allowed for some kind of safety.

It was a systems programming language based on Algol and already had the notion of unsafe code for low level code.

Those modules had to be marked as UNSAFE and could only be executed if allowed to do so by the sysadmin.

Similar exemples could be provided about PL/S, PL/8, Mesa, Modula-2, Modula-3 and many others.

One big contribution that Rust brings into the table is to educate a whole generation that belives the myth that C was the very first systems programming language, that there are actually safer alternatives.


> Those modules had to be marked as UNSAFE and could only be executed if allowed to do so by the sysadmin.

Safety vs. authorization are completely orthogonal. The former is an objective property of a program, to be established or refuted with formal proof. The latter depends on the subjective whims of one specific human being, in this case, the sysadmin. And, in my experience, the typical sysadmin is not a good enough semanticist to determine whether a program is safe or not.


The point being that unsafe code had to be marked as such to be compilable and that tainted the binary, which was only allowed to execute after given permission.

Formal proofs can also contain logic errors.

Any kind of safety improvement is better than what C offers.


> The point being that unsafe code had to be marked as such to be compilable and that tainted the binary, which was only allowed to execute after given permission.

And my point is that this is useless in the absence of someone or something capable of reliably determining whether the code is safe to run.

> Formal proofs can also contain logic errors.

So you have competent people write them.


Regarding formal proof systems, not only can they contain logic errors, they are still very far from being usable for the common programmer to do regular programming.

As for competent programmer, we all know how many corporations search who they hire.


Nim comes to mind. https://nim-lang.org/


Thank you for mentioning Nim here. :)

To expand a bit, Nim features its own JS backend with a high proportion of the stdlib supporting it. You can access the DOM via the `dom` module[1] and build some pretty cool stuff[2][3]. It doesn't target WebAssembly (yet) however.

On the systems side, Nim compiles to C/C++/ObjC and manages memory via a soft real-time GC (or via a choice of other GCs including boehm).

1 - https://nim-lang.org/docs/dom.html

2 - https://nim-lang.org/araq/karax.html

3 - https://picheta.me/snake/ (source available here: https://github.com/dom96/snake)


> Is gaining traction in the native application space

Are there any Rust-first UI frameworks rivaling something like Qt, yet?



Rust is not really good for web backends, it has no library for talking to popular service, the construct for networking are still uncertain / unstable / hard to use.


For reference, what I have used that makes me claim this without being misleading (not necessarily all at the same time):

https://rocket.rs/, https://rusoto.github.io/rusoto/rusoto_core/index.html, https://docs.rs/postgres/0.15.1/postgres/, https://docs.rs/kafka/0.7.0/kafka/, https://docs.rs/reqwest/0.8.1/reqwest/, as well as many other supporting libraries. I've not run into many things where there isn't already support.

Also, in terms of networking, if your doing blocking IO, the stdlib is fine. Otherwise, from my experience, I really like https://tokio.rs/ for non-blocking (I know people are torn on this one) and have used it in my DNS impl, https://github.com/bluejekyll/trust-dns

For more info on web and Rust, this is a great page: http://www.arewewebyet.org/


Modern C++ with RAII checks all of your boxes except for web backends. (Though I've heard good things about WT.)

Disclosure: I only write software for scientific computing and have no interest or experience in web development.


>which can scale for every potential use case

For a certain type of lowish-level programming, yes. But there's a reason that many projects include something like Lua for the majority of the actual application-level work.


We switched from Go to Rust in production. After the initial ramp-up time, it's really a lot more productive than Go for us, at least.


May I ask what "production" is?


Real-time fraud detection as service for adtech companies. We started with Java but got tired of GC pauses, perhaps due to an inexperienced team but we were not able to tame it. Then we moved to Go and now we are in Rust.


That is a lot of language swapping. How long of a period did this happen over?


that sounds like data science. why aren't you in python?


He mentioned GC pauses being an issue. Python's GC is significantly less sophisticated than Java/Go, which were also not satisfactory enough.


> but got tired of GC pauses

Lol, were you guys making 3D simulations or something? "GC hiccups" usually only become an issue in high-performance-demanding applications.


Decisions in online ad bidding have to be done in fractions of a second, so unpredictable latency could cost a lot of money.


Some companies seem to be fine with that.

https://www.sociomantic.com/

They use D, with their own GC implementation.


Ah I see. Makes sense. I wish all stacks of adtech can give high priority to performance. The current online ad landscape is an abomination.


the actual product environment, not development research or testing.


What area of computing you are talking about?


We've been happy with Rust, as well.


Ditto! Would love some clarification.


Interesting! Can you tell what are you working on? It may help others.


Asking for the next Pirates movie? (couldn't resist)


I'm so so so excited to see this land! I have a little hello world demo: https://gist.github.com/steveklabnik/d86491646b9e3e420f8806a...

To give you an idea of file sizes, the above produces a 116 byte wasm file. If you leave off the no-std stuff it produces a 3059 byte wasm file.

EDIT: I'm actually being told that there was a bug in wasm-gc that's since been fixed, even with libstd it should be able to eliminate all of it and get the tiny file now.


That’s gorgeous. How much bigger does it grow if you heap allocate? Does it include 2mb of extra code like in normal desktop builds?


This Rust code [1] heap allocates some things (e.g., a string with `String::from`), and results in this [2] WASM file, which is 65kB, or 25kB after `gzip -9`.

[1]: https://github.com/killercup/wasm-experiments/blob/bf3b30eed...

[2]: https://github.com/killercup/wasm-experiments/blob/bf3b30eed...


Thats much smaller than I expected, especially given how large rust native binaries usually are. What allocator does that use? Does that 65k include its own malloc implementation like we needed for asmjs, or does WASM expose a system malloc library or something?


wasm exposes "memory sections", which the module states up-front how much it wants. It can also generally call a "grow memory" function that makes this space bigger. That's it, from the direct wasm perspective.

That said, this target uses https://github.com/alexcrichton/dlmalloc-rs


It should be possible to reduce that a bunch more, 65kB sounds high.

Running the binaryen optimizer on that wasm shrinks it by 6%. Probably more can be done on the Rust side.


Absolutely, yeah. Having a real linker will help with that as well. Until then, you can also compile with

    [profile.release]
    opt-level = "s"
(i.e., optimizing for size), to get it down to 23kB gzipped.


I’m away from my computer right now, so may take a while to give you an exact number, but when I first did this, I forgot to turn on optimizations, and the libstd version was 173kb pre-gzip. So I’d imagine it’s similar to that, or at least, that gives you an order of magnitude.


Could you please add an example of using cargo instead of calling rustc directly? How do I configure cargo.toml?


Here's hello world: https://github.com/steveklabnik/semver.crates.io/commit/dc3b...

(I wanted to try to port semver parsing to the web today, but I hit an LLVM assertion, so it'll have to wait until we can fix that issue. This stuff is still very raw!)


So when will servo become a webapp so we can run it inside firefox?

To clarify, this is sarcasm. What I really want to say is when can we run firefox inside firefox? But it's not entirely written in Rust. The point of that question is to illuminate some of the absurdity of web apps. i.e. what function does the outer browser serve in this scenario and why do we need it?

Edit: All the answers point to the outer browser as providing a sandbox. I contend that is supposed to be an OS function. I also think that tabs in the browser came about because OSes didn't provide a decent way to handle multiple instances of applications. See what's happening here?


Your question is related to the technical aspects of the ecosystem, but not to the way that people use computers in general.

A good rhetorical question in response is, why is the web so much more popular when using computers than installed applications?

The implication you've made is that there is no benefit to the web, but that is obviously not true, as web usage far outstrips installed applications (besides the browser) at this point. Given this, it becomes necessary to have common tooling to target the different browsers out there.

Getting back to the core of your question, if there was a framework for which you could target all users on the internet (with a single codebase), it ran natively on all platforms, was easily run by everyone merely by clicking on an icon, which then cleans itself up after running... then maybe native apps would be a good avenue for delivering all applications.


What is the advantage of wasm over Java and Flash? I don't really see technical advantages. There is only the social/legal aspect: wasm is not controlled by a single corporation.


There is an immense technical advantage: wasm doesn't enforce a high-level memory model like the JVM's classes or Flash's ActionScript. It allows straightforward use as a compiler target for low level languages which do not use a bytecode verifier for safety.

Any attempt to use those languages or their existing ecosystems of code on the JVM, Flash, or Javascript will be incredibly fragile and convoluted.


The WASM memory is just a Uint8Array in the JS VM. You can use a big byte array and du manual memory management in JVM too. It's a well known pattern on the JVM.


That's entirely unhelpful for the call stack, values stored in registers, loading and storing other sizes, atomics, etc...

Wasm solves that because it's aware of what it's doing. Java et al do not.


wasm is more baked into the web platform than either are; flash wasn't as bad as java in this regard, but still. Nobody wrote flash libraries that your JS code could use, they used it for the equivalent of a big old <canvas>.

It also does not require a second runtime; integrating wasm support into existing JS VMs took very little code compared to an entire JVM or flash runtime.


> Nobody wrote flash libraries that your JS code could use

This isn't precisely true. Before widespread availability of similar HTML 5 features, Flash was used for file upload (especially multiple files w/ progress bars), clipboard access, and pre-websockets network push. Client-side image processing, too, although you might consider that as part of the "big old <canvas>" usage. Your overall point is right on, of course.


Ah that is fair! I forgot about those, but you're right.


Isolation. WASM is a target that allows users to effortlessly download and run arbitrary code from the web on all platforms with a modern web browser; an environment that is sandboxed such that arbitrary file system access or other system access is relatively safe and limited even without knowing the intentions of the author.

This is why javascript is so popular with its current monopoly on the web. No other language can do that, but with WASM other languages, such as Rust, can be part of the picture.


I don't want to attack Rust or WASM, as I am pretty excited about this, but it's not that easy.

That anything that runs into the browser is isolated sandboxed and whatnot is irrelevant. It's still arbitrary code running on often unwilling clients.

Don't believe me? Think about ad-blockers.


That's why I qualifed with "relatively safe". Arbitrary code from potentially malicious/incompetent sources is arbitrary code, but its a platform on which its relatively safe to run arbitrary code.


It's no different than the existing JS here.


Most JS is unwanted code running in the browser. Seriously, the web devs are the ones that want it, not the user.


Absolutely, not different at all. I am just saying that is naive to think that the web platform is safer than other platforms. Many "attacks" happen completely inside the sandbox.


The web platform is not completely safe, but it is far safer than, say, an unsandboxed exe running on Windows.


Oh yeah, it's not a panacea for sure.



Windows 2000? Then you can only run up to Firefox 12. (And Rust code came to Firefox a few years later.)


> I contend that is supposed to be an OS function

Different sandboxes for different needs. Processes, Jails, Docker, Hypervisors/VMs, Interpreters, they virtualise different resources with different levels of abstraction, and they all have pro's and cons.


It still serves as a sandboxed interpreter/GUI framework and as a frontend for finding and loading apps, regardless of how complex the apps are.


It would certainly become interesting if Firefox was largely just a WASM interpreter with OS extension in which a Browser Engine runs, written in Rust, compiled to WASM. Though I imagine to get proper sandboxing I'd probably implement a Ring System similar to x86 Rings for privilege seperation.


It serves the same purpose as the outer kernel on a VM host: isolation between fully-featured inner environments using a solid, featureful platform. There are tinier platforms you can use for isolation, and yes, they're more secure, but they don't get you the wide real-world deployability that full-featured kernels or browsers get you.

I would genuinely like to be able to run Servo inside my un-jailbroken Chromebook. I already run an SSH client in the browser (using Google's Native Client instead of wasm, but same difference), and I don't see why Servo needs access to my SSH private keys. To me, the fact that the only way I can run Servo is to give it access to my entire user profile is the absurd thing.


It's in active development. https://github.com/browserhtml/browserhtml This way you don't have to depend on the browser including certain functionality because you can fall back to downloading a whole rendering engine.


I feel like the direction web and applications in general are heading is that of near-native performance in a sand-boxed execution environment managed by something like a browser. This potential is what excites me the most about webassembly, even more so than enabling the transition away from Javascript.


Maybe I just can't find it, but the largest blocker for using Rust on the web for me is a lack of a exposure/binding system (like embind).

I'm not particularly interested in writing entire webapps in Rust but I do see it being useful for smaller self-contained modules inside a larger JS app. Not having a robust binding API makes that difficult.

EDIT: But unrelated criticisms aside, congrats on the progress! I'm excited to see more first class support of WASM from languages other than C/C++.


I agree that the best use-case here isn't full apps in Rust, but writing high-performance modules.

The webpack team is working on stuff so you can just drop a rust file into your proejct, and it will do everything needed for super easy integration.

https://crates.io/crates/stdweb sort of gives you the bindings you want, by the way.


I am excited for WASM for the same reason; not for wholesale migration, but targeted optimization. Hope documentation for this (likely very popular use-case) improves as WASM matures.


Can anyone describe the (overhead) cost of calling into WASM code in the implementations so far? A lot of optimize-able areas are things like RxJS or Promise implementations (eg Bluebird), or generally libraries making the stack trace like [userspace code] -> [library code, approx 20 stack frames] -> [more userspace]. Should we think of the WASM-JS bridge cost like system calls, or like inline assembly?


I'm curious about this too.

I hand-wrote a perlin noise generation function in asmjs when that was the future. Then I called my noise function from a loop in javascript to paint an image (1 call per pixel - about 1M calls). Running the perlin noise code with the asmjs engine in Firefox turned out to be a bit slower than running all the code through FF's regular javascript engine. The asmjs code ran fast, but the FFI overhead at the boundary between JS and asmjs made the system slower overall.

Given how easy it looks, I might try the same experiment with Rust & WASM to see where we're at. That said, this is the sort of thing which can definitely be improved over time.


These days it's probably possible to use a buffer in the asmjs/WASM side and generate the image data there, only exporting back to the JS side after the job is complete. That should significantly cut down on the overhead.


You need to look at serialization overhead for exchanging data. For example, I don't think web assembly has direct access to JavaScript strings? What about JSON?

Seems like you're either converting a lot of strings (which wastes memory) or calling lots of JavaScript string methods from web assembly, which is not likely to be fast in an inner loop.


To be clear, I haven't done anything serious in WASM that would hit any limits here.


I guess it will depend heavily on the type of your exchanged parameters. If it's a few integers then it's no problem since both JS and the WASM side can easily read those.

However for complex objects (like Javascript objects, strings or DOM objects) objects you need a serialization mechanism for those. That might be similar to what typical system call implementations do. However system calls mostly have a very simply set of parameters, so transferring Javascript objects might be far worse. If you would want to share Javascript objects on both sides you would also need some kind of reference-counted handles, which can identify the objects.

I haven't done anything with WASM yet, but I'm pretty sure the domains you mentioned (promises, asynchronous programming and also DOM handling) are the ones which will least likely benefit from WASM. Lower level math algorithms might however benefit a lot.


I think the real issue is the lack of DOM access rather than binding to JS.


Isn't the point of the whole architecture that from a security perspective all IO will always be through JavaScript.

Replicating the attack surface of the browser in JavaScript land in WebAssembly land goes against the whole design of it.

It's for the community to make nice libraries that expose the browser features to WebAssembly land in JavaScript. It's also the right place to deal with cross browser differences.


Well the IO is ultimately handled/sandboxed by the browser not by JavaScript so as far as security is concerned JavaScript and WASM should have the same access to IO/Web APIs. I think the attack surface you are talking about is really a implementation detail and browsers could use a common middleware that both JavaScript and WASM can use.

In a distant future I would see JavaScript actually compiled to WASM(maybe the browser would pipe it directly into a WASM compiler before to execute it) to reduce the maintenance costs and perhaps also the attack surface you are talking about.

The whole point of WASM was to cut off the middle man (i.e. JavaScript) so that we can reach native-like performance. WASM should be the ultimate dominator not JavaScript.

>> It's for the community to make nice libraries that expose the browser features to WebAssembly land in JavaScript. It's also the right place to deal with cross browser differences.

I believe the community could make nice libraries compiled to WASM that anyone can use regardless of the programming language. JavaScript would be just another language(albeit a popular one, at least due legacy reasons).


>I believe the community could make nice libraries compiled to WASM that anyone can use regardless of the programming language. JavaScript would be just another language(albeit a popular one, at least due legacy reasons).

I understand and respect the aim -- the browser should be ambivalent about programming languages.

But this is not intention, because it is quite unfeasable. WebAssembly allows one to target the web with much more low-level languages, but that also means that the languages dictate the memory structure of things. In WebAssembly world there are no 'javascript arrays' or 'object hashmaps'.

Imagine mapping the Javascript DOM API surface (with all its dynamic datastructures and callbacks) to C++, Haskell and Smalltalk. Three languages that may compile to WebAssembly, but would have drastically different internal datastructures. And all of that data marshalling that you would need to do, would need to happen for every dom feature for every language that compiles to WebAssembly.

OR! We just implement an API that does nothing more than interop with JS land. This doesn't change often and it will be easy to support many different languages quickly.

Think of Javascript as the BASH of the web. In a shell script you would pipe the output of one written-in-c program to the input of a written-in-rust program. With HTML+JS you would hookup the heavy part of the application and plug the IO's exactly how you want them. Its closer to configuration, really, much like a shell script.

So there needs to be some glue language, and we need to support Javascript and its DOM operations in a way that is completely backwards compatible, and at least as fast (read: tighly coupled) as it is now. And we want to limit the attack surface. There is no other choice than that Javascript will be that glue language.

But its fitting, because Javascript like BASH is a language whose strongest competitive advantage is compatibility. Like BASH it will be the glue language and shell of a platform.


The web APIs are described using the Web IDL and I believe that could be translated to C++ , Haskell and pretty much any language with more or less effort. In fact I think there are bindings for some languages such Python and Objective C though they are not open. I see no reason why a WASM interface could not be exposed (the code itself could be developed in C/C++ and compiled to WASM)for the web APIs and exposed to the client.

https://developer.apple.com/library/content/documentation/Co...

https://www.gnu.org/software/pythonwebkit/


Supposedly they are adding things like polymorphic inline cache, garbage collection, threads, etc, that would make it more like a VM. Perhaps enough that Python, PHP, Ruby, or similar languages could run in WASM without downloading the whole runtime. That plus native DOM access might make the front end web as diverse as the back end web.

I'm curious whether the fragmentation that would bring on is a net add, or a net drain on the web as a whole.


I wouldn't say that fragmentation on the backend is a net drain as a whole...


What do you mean by 'first class support' of WASM in C/C++? Is it somehow possible to produce wasm binaries without utilizing emscripten?


Yes, llvm has built in wasm support for a while now.


The plan for emscripten is actually to replace its own wasm compiler with LLVM, and possibly drop its asm.js compiler in favor of running wasm2asm on LLVM's wasm output: https://github.com/kripken/emscripten/issues/5827

Aside from keeping emscripten's code smaller / more maintainable and allowing the team to focus more on its role as high-level tooling, this should improve the size and performance of emscripten output since IIRC it's currently missing out on a lot of optimization opportunity by producing wasm as transpiled asm.js.


> IIRC it's currently missing out on a lot of optimization opportunity by producing wasm as transpiled asm.js.

That's actually not true: the asm.js => wasm path emits better code (smaller, faster) than the wasm backend path currently.

However, the wasm backend path is being improved, and should eventually get to parity.


Ah okay, interesting. Is it because wasm doesn't yet add any new functionality over asm.js that using asm.js as an intermediary step isn't inherently worse?

In that case, it sounds like the LLVM backend will only yield clear user-facing benefits when new features like pthreads are introduced?


Well, the "asm.js to wasm" path actually isn't pure asm.js anymore. We added i64 support and other things a while back, as intrinsics. So the asm2wasm path isn't limited by asm.js. It's weird ;) but it produces good code...

The wasm backend does have other benefits, which is why we'd like to move emscripten to use it by default:

* It uses LLVM's default legalization code, so it can handle LLVM IR from more sources (i.e. not just C and C++ from clang).

* We can stop maintaining the out-of-tree LLVM that asm2wasm depends on.

The LLVM wasm backend isn't ready yet (larger output code, slower compile times, a few missing features) but it's getting there.


Incidentally, that's how this target works in Rust; we also have an emscripten-based target.


I would like to code WebAssembly by hand. Are there any good tools / tutorials on this?


Do you mean something like this? https://mbebenita.github.io/WasmExplorer/


Probably not any good tools or tutorials, but the wast format is not that hard [0][1] and most compilers accept that format. Just gotta keep the values on the stack in mind while you write.

0 - http://webassembly.org/docs/text-format/

1 - https://webassembly.github.io/spec/text/index.html


And there is already a browserify transform to compile inline rust to webassembly for browser code: https://github.com/browserify/rustify


Great. I like the way Rust is going on about.


What is the Rusty way of handling memory management for a game engine client deployed on WebAssembly? When a program is running a game loop, there are certain entities which will stick around across frames, and there are certain entities which definitely will not. Is there a way to enforce that most of the implementation of the game reside in pure functions, while making memory management nearly bulletproof?


Nothing about this is specific to WebAssembly; you’d write the same Rust code as you would for any other target.


In that case, what's the Rusty way to handle a game loop, in which one wants to encourage the use of pure functions? I must confess: I'm thinking of using Rust and a few other languages as a compiler target for a unified Web Client + Server game engine. Is there a particular way in which Rust could support this better than other languages compiling to WebAssembly?


Rust is not super worried about purity. That said, Rust is also more pure-ish than not; the large control over mutability and sharing helps. Controlling allocations is also generally easy, as there's no special things that are doing allocation behind your back. Arenas/memory pools are well supported too.

> better than other languages compiling to WebAssembly?

The best supported languages for this are C, C++, and Rust. So that's really the comparison here, IMO.


Such a great news!

I'm looking forward DOM integration with wasm, and potentially a way to compile from scala native to wasm.


Once rust is implemented in rust, I can run rustc in browser and get binaries for windows, for example?


Rust is written in Rust, with the exception of LLVM. C++ can be compiled to wasm too though, so that's not the blocker.

The bigger issue is that rustc uses features that aren't in wasm yet, like threads and file loading and such. It'll happen eventually, but not just yet...


Rust glued to React Native or the latter actually ported to Rust sounds like Nirvana.


And webrender as rendering layer for desktop apps to replace electron


Has anyone given any thought to dynamic linking wasm objects to save bandwidth?


It totally works, it's just not particularly easy to do. wasm modules have import and export sections; you'd instantiate the std module first, the instantiate the module that depends on it, pointing its imports at std's exports. I don't know of any tooling that makes this super easy at the moment though, you basically have to do it by hand as far as I know.


I don't think you can at runtime from wasm though. Imports are resolved at load time. Sure there are call indirect that let you dynamically change the function index but not the import/export. Now if you were willing to reload, you could use the JS API to create modules at runtime with dynamically downloaded pieces, but once created, that's it I think.


Sure, I'm talking about "instantiate one big module" vs "instantiate several smaller modules, then finally the last module". Which is a bit different than what you're talking about; you're right that after you've instantiated, you can't change imports.


great, thanks, that clarifies things quite a bit.


I would prefer compilers that produce compact statically linked "binaries".


Assuming that the Rust stdlib eventually is compatible/compiled for WASM, wouldn’t it be better if that was cached for all sites which use it, rather than downloading it each time?

I can imagine a world where all dependencies are compiled for WASM, similar to crates.io, and then all apps using those libs would benefit from sharing and load blazingly fast because they only load the site specific code.

Wouldn’t that be a good thing?


There are two problems with that, at least for the stdlib.

First, the stdlib contains a lot of compiler version-specific code so the savings would not be that great unless (until?) it becomes much more stable, along the lines of glibc.

Second, a large portion of stdlib is generics, which are be monomorphized per-app. Further, much of stdlib usage should be inlined and optimized into the app itself anyway.

The portions of the stdlib statically linked into an app will generally always be smaller than the whole stdlib. It may make sense to dynamically link some parts of the platform, like the memory allocator, but the whole stdlib is not a good candidate.


That is a great point about the monomorphized code. I hadn’t considered that.


For what its worth, the web community is currently pretty conflicted on using this approach. jQuery is often loaded from CDNs, while React (and friends) are almost always bundled into each app individually. I'd like to say there's a principled reason behind this, but I think the main reason is that the tooling (webpack, browserify) makes it simpler to embed react than use a CDN version.

Actual benchmarks are pretty mixed. If you have a cached CDN copy of react it can be a little faster. But if you get a cache miss, the extra overhead of hitting another host to load dependancies is often 10x-100x slower for unlucky users than downloading 80k(?) of additional gzipped javascript. Its especially bad with react because unless you do isomorphic rendering, your page isn't visible at all until react is downloaded.

I suspect Rust's stdlib will compress to a similar size (maybe a little bigger), and the speed saving you're hoping for won't really be worth it in practice; even if you get around the monomorphising problem.


This is what we have CDN:s for


How does the performance of WASM compare to JIT'd JS code?


It's going to be heavily dependent on what you're doing. WASM can be as fast as native code but is more likely to be 1/2 the speed of native on realistic benchmarks. However, the overhead of moving between WASM and JS is quite large so in practical use your WASM version might be slower than a pure JS version. WASM vs JS benchmarks I've seen show WASM anywhere from 40x faster than pure JS to 10x slower than pure JS, due to this overhead.

If you're doing a large amount of CPU intensive work that can be isolated from the rest of the code that is an ideal scenario for WASM. You have the translation overhead of JS->WASM once, then do the computation, then WASM->JS once to get the result. A small function or code that needs to call back into JS (for DOM access, to query some state in your app, etc) is going to have a lot of those translations so won't see much or any benefit.



This makes me wonder: does WASM allow a JIT (specializer)? I.e., does WASM allow self-modifying code or code to write code?

If not, then JIT'ed JS could still be faster than WASM.


Wasm code is normally JIT-ed by the JS VM. That is, browser implementations don’t have an entire separate VM for wasm.


Actually, I don't know why so many people are so excited about webassembly. Yes, it is faster than JS, but JS itself is not that slow and on the other hand we loose the open nature of the human readable web.


Minifiers don't output any readable JavaScript anyway. Yes, it can be "decompiled" back to something readable, but so can WASM.


Actually, with minified JS you do not have to "decompile" anything. It is just pressing the pretty print button in your dev-tools and you can use your normal debugger (just without propper names) to step through the code.


WASM has a non-lossy text based format; everyone involved doesn’t want to lose view source.


So and that source is as easy to read and step through as minified JS (didn't know something like that exists)?

And does the one who deployed the code has to use the non-lossy text format or can you simply convert any WASM binary to something human readable (with browser dev-tools)?


A bit easier, given that it's formatted in a readable way.

Right now, the text format is the only option. People are already talking about how to make it connect back to the original source, if you have access to it, but that's a work in progress.


This is a really tired meme. It's not any less accessible than minified js.


Is it even faster? There is no particular reason why asm.js should be slower besides the initial parsing time.


You are absolutely right (don't know why you are downvoted). On most browsers, WebAssembly is only a tiny bit faster than asm.js (except Safari on iOS11 where wasm is about 3x faster, but this mainly because their Javascript performance is so bad compared to the other browsers). But that just shows how good asm.js already is. The only major thing that WebAssembly currently has over asm.js is proper support for 64-bit integers.

The "initial parsing time" advantage of WebAssembly shouldn't be underestimated though, this is important for bigger code bases, and WebAssembly binaries are also a bit smaller (after compression) than the same code as asm.js.


You seem like someone who actually knows something about WASM and is polite enough to articulate an answer.

Why do you think that WASM is a good way to go? Why is it better than JS and do you think using it does not endanger the open source nature of the web?

So far I only read, it is as accessible as minifed JS, but actually I know how to read a lot of high level languages, but assemblers look to me much less accessible as high level languages (probably the reason why high level languages were invented in the first place).

So far I understand that WASM is faster during execution and that parsing times are shorter compared to asm.js (and probably JS too).


The most important reason is that Javascript can focus on being a programming language again, not a compile target, and WASM can focus on being an efficient compile target, but doesn't need to be a language. The WASM spec can be extended independently from the Javascript spec, this enables faster introduction of new features (and the process is actually working!).

WASM is especially important to close the performance- and power-efficiency gap that exists on mobile between native applications and Javascript applications. Writing a JS app that doesn't use garbage collection, and doesn't get re-jitted while running isn't trivial, in WASM this is guaranteed. Only the initial parsing is burning a few more CPU cycles compared to a native application, from then on, the difference is small or non-existent.

In the end, WASM is also an important counter-force to the closed ecosystems on Android and iOS. It is the key to have a system that's both open and reasonably secure.

I am not concerned about the view source aspect, but this is actually an important topic to the WASM designers. Browsers have a "view source" on WebAssembly, which shows the ASCII representation, and this is surprisingly readable, since WASM is a higher level representation than traditional CPU assembly (the name WebAssembly is a really poor choice IMHO).

In the end, shipping the high level source code to client devices, and compiling the code there is a massive waste of resources, it's better if this only happens once on the developer machine. The "open source aspect" needs to come from the developer by hosting the original source code on e.g. github, but even if developers want to keep their code closed, the WebAssembly view-source provides enough info to reverse-engineer their code, and with time, better disassembly tools will be created which can recreate a "highlevel representation", stuff like this also exists for traditional executables.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: