The best WebAssembly runtime may be no runtime

flohofwoe · on Dec 11, 2023

Firefox actually uses this technique to harden some components:

https://hacks.mozilla.org/2021/12/webassembly-and-back-again...

sp332 · on Dec 11, 2023

"Take existing C code, compile it to WebAssembly, transpile it back to C, and you get the same code, but sandboxed." Nice!

turbolent · on Dec 11, 2023

Just to clarify: Compared to wasm2c, w2c2 does not (yet) have sandboxing capabilities, so assumes the translated WebAssembly module is trustworthy. The main "goal" of w2c2 so far has been allowing to port applications and libraries to as many systems as possible.

westurner · on Dec 12, 2023

https://gvisor.dev/docs/architecture_guide/platforms/ :

> gVisor requires a platform to implement interception of syscalls, basic context switching, and memory mapping functionality. Internally, gVisor uses an abstraction sensibly called Platform.

Chrome sandbox: https://chromium.googlesource.com/chromium/src/+/refs/heads/...

Firefox sandbox: https://wiki.mozilla.org/Security/Sandbox

Chromium sandbox types summary: https://github.com/chromium/chromium/blob/main/docs/linux/sa...

Minijail: https://github.com/google/minijail :

> Minijail is a sandboxing and containment tool used in ChromeOS and Android. It provides an executable that can be used to launch and sandbox other programs, and a library that can be used by code to sandbox itself.

Chrome vulnerability reward amounts: https://bughunters.google.com/about/rules/5745167867576320/c...

Systemd has SystemCallFilter= to limit processes to certain syscall: https://news.ycombinator.com/item?id=36693366

Nerdctl: https://github.com/containerd/nerdctl

Nerdctl, podman, and podman-remote do rootless containers.

mattsan · on Dec 12, 2023

Oh dude imagine using cosmocc on the final C code and have a single binary that runs on 6 different OS's

IshKebab · on Dec 11, 2023

This is already in Firefox: https://rlbox.dev/

chrismorgan · on Dec 11, 2023

I strongly recommend reading through this site if this topic interests you at all (which it probably does if you’re reading this comments thread!). Lot of explanation and detail about how it works, what it does, and how to use it yourself. It’s good stuff. And yeah, it’s been in Firefox for two years now: https://hacks.mozilla.org/2021/12/webassembly-and-back-again....

10000truths · on Dec 11, 2023

This feels like a really roundabout way to implement something that the compiler should be responsible for. All the pain and effort of C -> WASM -> C could be avoided if GCC or clang had some option to add bounds checking instrumentation for each memory access in the compiled output.

brundolf · on Dec 11, 2023

Maybe, but don't underestimate network effects. What's important about wasm is its universality - both where it can run and what can target it - which is already making for a powerful ecosystem of tools and compatibility

GCC and clang could implement their own bounds checking rules, but C -> WASM -> C is actually <C | anything> -> WASM -> <C | anything>

gwervc · on Dec 11, 2023

The "universality" exists only for now because wasm is at the toy-language level. The more it will evolve towards being helpful for production, the more opinionated and complex it'll become which will reduce the number of languages supported and platforms it runs one.

Source: every language vm under the sun (CLR, JVM, Neko, etc.)

kaba0 · on Dec 11, 2023

JVM supports more languages than ever.

broken-kebab · on Dec 12, 2023

That's however not because JVM is not opinionated. It just offers enough market/libs/tools/stability to make it worth the pain.

brundolf · on Dec 12, 2023

And runs more places than ever. I don't really understand the GP's point

fluoridation · on Dec 11, 2023

That's so sad, and so true. Even now I'd argue there's only like a handful of languages that have complete WASM support.

russellbeattie · on Dec 11, 2023

Totally aree. This is such a an obvious thing to do, I'm amazed that I had never considered it a possibility until now. I guess I thought of sandboxing as something only VM-based programming languages were capable of. For decades we've been dealing with buffer overflow exploits in C apps - you really have to wonder why this hasn't already been an option in GCC and other compilers, or simply another pass in OS make compilation steps.

I'm sure it's not a cure all, adds overhead and not applicable in all cases, but every small addition to security wouldn't be a bad thing. I don't see any reason why every command line utility in Unix based OSes, for example, couldn't be sandboxed. Think like wget or curl for example.

edflsafoiewq · on Dec 11, 2023

C programs are already sandboxed in the WASM sense at the process level by the MMU machinery.

rockwotj · on Dec 12, 2023

Sure and containers give you syscall restrictions and OS protections, but you don’t see people embedding containers inside of other applications. People generally sont like sidecars, so embedding wasm makes a lot of sense to have in process

wahern · on Dec 12, 2023

On Linux seccomp is what provides syscall restrictions, and seccomp was originally added to Linux to support untrusted app sandboxing--CPUShare and later Chrome Native Client (NaCl). See https://lwn.net/Articles/332974/. This is why classic seccomp, as opposed to BPF seccomp, only permitted a small, fixed set of syscalls--read, write, and _exit.

A seccomp classic sandboxed process will be at least as secure as any WASM runtime, no matter the engine. Even though the former is running untrusted native code, the attack surface is much more narrow and transparent.

trealira · on Dec 12, 2023

> Sure and containers give you syscall restrictions and OS protections, but you don’t see people embedding containers inside of other applications.

It sounds like you're implying that C coders will opt out of the sandboxing provided by the OS, but that's not possible without coding kernel level code. For userland processes, the sandboxing isn't optional, and your process will be sent a SEGV signal if it tries to access memory it's not allowed to access.

rockwotj · on Dec 12, 2023

I am not implying that. I am just saying it’s more than memory sandboxing Wasm gives.

trealira · on Dec 12, 2023

Then, my bad for misinterpreting what you wrote.

intelVISA · on Dec 12, 2023

How am I going to sell wasm shovelware if people know The Truth?

lmm · on Dec 12, 2023

If those utilities cared about that sort of thing they'd already have been rewritten in OCaml 20 years ago. The reasons unix utilities are written in C and C compilers don't do bounds checking are political, not technical.

saagarjha · on Dec 12, 2023

Because this breaks the C ABI, so you can only do it with programs that were designed to be sandboxed anyways.

ithkuil · on Dec 11, 2023

NaCl solved that long time ago

flohofwoe · on Dec 11, 2023

NaCl was CPU specific, and the way they solved this problem with PNaCl (by using a subset of the LLVM IR) was more or less a hack which most likely involved at least as much machinery in the browser as a WASM runtime (and it compiled slower and also performance wasn't any better than even the first WASM runtimes, the only thing that PNaCl had going for it was straightforward multithreading support, which took far too long to materialize in WASM - on the other hand, Spectre/Meltdown would also have affected PNaCl hard).

Having worked with asm.js, NaCl, PNaCl (plus a couple other long forgotten competitors like Adobe's Alchemy/Flascc) and finally WASM: WASM is the natural result of an evolutionary process, and considering what all could have gone wrong if business decisions would have overruled technology decisions, what we got out of the whole endeavour is really pretty damn good. It's really not surprising that Google abandondend PNaCl and went full in on WASM.

lambda · on Dec 11, 2023

Yeah, WASM was the result of taking a look at NaCl, realizing it could never be specified and independently implemented, and writing a spec for something better that could.

WASM is superior in pretty much all ways to NaCl other than being a bit more time to go through the whole specification and implementation process. NaCl was a great prototype, and WASM is a much more polished result because of it.

jancsika · on Dec 11, 2023

Ooh, I'd love to know a few more details about both.

What kept NaCl from being independently implemented? And what did WASM learn from NaCl?

Edit: ah, someone already mentioned NaCl being CPU-specific.

ithkuil · on Dec 11, 2023

(P)NaCl was just too close to the past, roo many memories of activeX, flash and other crap still loomed heavy in people's hearts and they wanted a clean cut with the past, an open web, free from the dictat of corporations etc.

I do understand why people wanted something different and rejected (p)NaCl. But the reason is not technical, it's political (in the broad sense). Any technical issues could have been solved by fixing and extending (p)NaCl, but everybody involved understood it was not going to work

krasin · on Dec 12, 2023

I used to work in the Native Client team at Google, both on regular NaCl and pNaCl. It failed for technical reasons too. And maybe, mostly for technical reasons.

pNaCl's biggest mistake was using LLVM bitcode as a wire format (see, for example, [1]). Another problem was trying to use glibc, instead of newlib. That resulted in a lot of low-level incompatibilities, as glibc itself is too target-specific. And implementing shared objects between portable and native world was really messy.

asm.js appeared as a simpler alternative to NaCl, and then it was quickly replaced by wasm, developed mutually by Mozilla, Google and others.

1. https://groups.google.com/g/native-client-discuss/c/h6GJ8nQd...

phickey · on Dec 11, 2023

it solved it so well that nobody outside of chrome ever implemented nacl, and chrome's nacl team became their webassembly team

ithkuil · on Dec 11, 2023

That's perfectly compatible with the claim that from a technical pov pnacl solved the issue.

There were other good reasons why that wasn't ultimately the road taken, but they largely were know technical reasons.

DylanSp · on Dec 12, 2023

Context for anyone that's unfamiliar - this is referring to e what Wikipedia calls "Google Native Client" [1], not the cryptography library.

[1] https://en.m.wikipedia.org/wiki/Google_Native_Client

PaulHoule · on Dec 11, 2023

It helps so much that the WebAssembly "virtual machine" promises so little (compared to, say, Java.)

Lerc · on Dec 11, 2023

That really epitomizes the proverb,

Either make something so simple that it obviously has no flaws or make something so complex that no flaws are obvious.

apatheticonion · on Dec 11, 2023

Huuury up. I need WebAssembly in my life. My two use cases are:

1) Web UIs with DOM access from wasm/threads and all that goodness. I want to write my entire web application as a Rust wasm application without thunking through JS.

2) A native application with dynamic plugins as WASI libraries. Writing CLI tools and desktop applications in Rust with a practical method of loading plugins dynamically is .

maxflowrn · on Dec 11, 2023

for 1) there is https://yew.rs

galdosdi · on Dec 12, 2023

No, yew.rs does not address this problem (WASM can't directly touch the DOM or other browser APIs such as WebGPU, only JS can) It just wraps around calls to JS.

Everytime you use yew.rs or some other "pure rust" or whatever framework, it's making tons of JS calls under the hood.

This is OK for some purposes, but if you're trying for performance on a workload that isn't CPU-bound, it sucks. And it's another abstraction layer that's bound to leak, I imagine.

This is a significant issue with WASM that a lot of us are waiting with bated breath to be fixed. We want to really be able to target the web browser with NO javascript involved.

flohofwoe · on Dec 12, 2023

> WASM can't directly touch the DOM or other browser APIs such as WebGPU

And that's a good thing. Both APIs (DOM and to a lesser extent WebGPU) are built on top of the Javascript object model. Trying to map the JS object model to WASM inside the WASM spec would be a really bad idea.

> This is a significant issue with WASM

It's a non-issue really, brought up again and again by people who didn't really look into WASM in detail yet and what it is good for (hint: not to replace the HTML+CSS+JS combo).

The DOM is so slow to begin with that going through a (potentially automatically generated) Javascript shim from WASM wouldn't move the performance needle.

For APIs like WebGPU it would make more sense to have a WASM-friendly API, but that's really a design wart of WebGPU's Javascript API (WebGL actually did a slightly better job there). And besides, the overhead of calling from WASM into web APIs is mostly negligible compared to the overhead that happens inside the API (speaking specifically of WebGL and WebGPU here from my own experience).

apatheticonion · on Dec 12, 2023

1) Optimising an application for page load speed is affected by the layers of thunking.

For one thing, when a browser does the initial load of the page html, it scans for <script> tags, downloads all of them in parallel and does some preemptive compilation while preserving the declaration order for actual evaluation.

WASM modules do not receive that same optimisation because they are loaded during JavaScript evaluation, meaning the browser cannot know they exist until the JavaScript is evaluated forgoing their ability to be optimised ahead of time.

Perhaps some level of optimisation is available through the use of service worker, but it's unlikely that the initial load will be as fast in this context as there is additional overhead associated with establishing a SW and a SW only interacts with the network layer so AOT compilation optimisations will be unavailable.

2) Tooling

One of the areas of promise for WASM driven web applications was being able to exclusively use the tooling of the language you're consuming. The reliance on JavaScript for WASM modules inherently relies on the wild world of JavaScript tooling.

Personally, I have many esoteric use cases that require access to DOM APIs but no access to GUI (third party scripts sandboxed within an iframe for consumption). These use cases _must_ be optimised for initial page load. I have experimented with using Rust for my use case and, while it is faster during runtime, the initial load is slower.

3) Extras I'd like to see

I would love it if I could embed html (as a sort of manifest) within the wasm module such that the browser could use wasm as an entry point - removing the need to load html first, then wasm. As good as http2 multiplexing is, for whatever reason, loading html +1 file is slower than loading html with the application embedded within the file.

I'd also like to see us eliminate the need for the `Cross-Origin-Opener-Policy: same-site` and `Cross-Origin-Embedder-Policy: require-corp` headers as they make multi threading in the browser largely impractical.

flohofwoe · on Dec 13, 2023

> is evaluated forgoing their ability to be optimised ahead of time.

...not sure what you mean with that, but WASM isn't actually AOT compiled in browsers anymore, instead there are several tiers of JIT-ting on a per function level. E.g. when a WASM function is only called once it is compiled quickly but without optimization, and then when called more frequently, higher compilation tiers will kick in which compile the WASM function in the background with more optimizations.

This is pretty much the same strategy as used with Javascript code.

You can also split your WASM code into several dynamically loaded modules (and if neeeded, loaded and instantiated in parallel), but (AFAIK) unlike with Javascript bundlers this can't be done automatically. You'll have to design your code that's compiled to WASM from the ground up for being split into several modules (similar to how DLLs are used in native code).

> The reliance on JavaScript for WASM modules inherently relies on the wild world of JavaScript tooling.

I write all my WASM code with C/C++ tooling only, no npm or similar involved. There's only a minimal .html file which defines how a WebGL/WebGPU canvas integrates with the browser environment.

> I'd also like to see us eliminate the need for the `Cross-Origin-Opener-Policy: same-site` and `Cross-Origin-Embedder-Policy: require-corp` headers as they make multi threading in the browser largely impractical.

This I fully agree with. There is a "legal" client-side workaround using service workers btw: https://dev.to/stefnotch/enabling-coop-coep-without-touching....

triyambakam · on Dec 12, 2023

Does Yew provide any kind of assurances on runtime errors?

prmph · on Dec 11, 2023

This whole webassembly thing does not seems like it will solve real problems, The more I learn about it, the more it seems like another kludge built on top of the kludges we have now.

Why doesn't HTML get an overhaul, with a massive set of new and actually useful and styleable components (e.g., data table with sorting/paging/filtering capabilities built right into the browser), ability to scope CSS, and things like that? That's how the shittiness of web dev can actually be solved so that reliance of client side code can be drastically reduced

flohofwoe · on Dec 11, 2023

WebAssembly is "just another ISA" for a "software CPU" that happens to work well for traditional compiled languages, it doesn't really have to do much with high level web development centered around HTML+CSS.

For instance, personally I really appreciate that I can bundle an assembler written in C89 in the mid-90's and last updated in 2008, and a home computer emulator written in C and C++ into a VSCode extension without having to worry about platform compatibility or security issues (https://marketplace.visualstudio.com/items?itemName=floooh.v...). Stuff like this wasn't really possible until asm.js and then WebAssembly.

kllrnohj · on Dec 12, 2023

> that happens to work well for traditional compiled languages,

It doesn't, though, which is a big problem. WASM starts the heap at 0x0 which breaks the null handling of 99% of traditional AOT languages _and also_ breaks the null handling of optimized higher level runtimes that rely on page faults to identify that a null deref happened before backtracking and converting that into an exception.

The MVP priority for WASM was to protect the host and make it simple to be the host. The guest features, security, and general sanity is very poor as a result.

flohofwoe · on Dec 12, 2023

Any embedded platform without MMU would have that same problem though.

Programming languages and their runtimes should not depend on such hardware features IMHO (at most use them optionally for performance optimization).

Also, my guess is that WASM engines outside the browser would absolutely be able to trap on "zero page" accesses.

AFAIK the limitation of address zero being a regular memory location when running WASM in browsers is just a side effect of the WASM heap being a regular Javascript ArrayBuffer object, which cannot start at an index greater than zero (and adding an offset for every heap access would be prohibitively expensive). A WASM runtime which doesn't require to interoperate with Javascript could map the WASM heap to regular virtual memory with the first page being read/write protected.

Dylan16807 · on Dec 12, 2023

Traditional AOT languages don't make promises about what happens if you try to dereference null. And they don't depend on a page fault happening.

For higher level runtimes, can you name some common ones that use page faults that way? I've never heard of that technique.

kllrnohj · on Dec 12, 2023

> Traditional AOT languages don't make promises about what happens if you try to dereference null. And they don't depend on a page fault happening.

The language doesn't make any promise yet on every major platform for the last 40 years a null deref resulted in a segfault because the lower address space is marked invalid.

This is absolutely, 100% baked into the practical expectations of every traditional AOT language at this point.

> For higher level runtimes, can you name some common ones that use page faults that way?

Android's ART does. I'd be shocked if OpenJDK doesn't for the same reason. Same with .NET or any other mature JIT runtime. It's obviously ridiculously expensive to insert null checks before every object access, and since the CPU & some page protections can do it for free why would you pay that?

neonsunset · on Dec 13, 2023

> It's obviously ridiculously expensive to insert null checks before every object access

This is actually the other way around. Both .NET and JVM do it the same way - they inject the smallest instruction that dereferences a pointer (e.g. cmp byte ptr [rax], al or ldr xzr, [x2]) and when it throws a hardware exception, it is then caught by the registered runtime handler, which then subsequently resumes the execution of managed code from the hardware exception location and raises corresponding Null Reference/Pointer exception there.

The only expensive part is when the exception does get raised, which costs roughly 4us in .NET and 11us in OpenJDK hotspot (Java has cheaper exceptions when you throw them out of Java, costing about 700ns at throughput, but much more expensive NPEs).

As a result, null-checks are almost always cheap, or free by being implicit (reading array length can be a null-check itself since it will be dereferencing nullptr + 0x08 or so depending on the implementation detail), and the compilers are pretty good at eliding them as well where possible.

kllrnohj · on Dec 13, 2023

Yes, but this doesn't work in webassembly because 0x0 is a valid address. That's my point. To do .NET & JVM in webassembly requires doing an actual null comparison and branch as you no longer get to just deref and catch the resulting segfault as there won't be a segfault. So you have to do an actual branch everywhere, which makes null checks way way more expensive.

Dylan16807 · on Dec 12, 2023

> The language doesn't make any promise yet on every major platform for the last 40 years a null deref resulted in a segfault because the lower address space is marked invalid.

If you're using an offset that's not super reliable, and some major platforms of the past 40 years do give you memory at or near zero, and of course with a language like C the compiler is allowed to assume it won't be null and that can cause all sorts of weird program corruption even when a page fault would be safe-ish.

flohofwoe · on Dec 13, 2023

> on every major platform for the last 40 years a null deref resulted in a segfault

Not on the Amiga ;)

Matheus28 · on Dec 11, 2023

It doesn’t look like you know what webassembly is supposed to solve. It has nothing to do with styling. It’s about deploying native code in a safe sandboxed way. It’s not just about web pages. My use of wasm is web games that perform a lot better than anyone could achieve with JavaScript.

AlienRobot · on Dec 12, 2023

Honestly I don't get why so much software gets access to the entire filesystem just by running it. They should have made it so an .exe can only modify its own directory, a temp directory, and any file or directory passed to it by the user through the system's file picker or command line. That would solve 99% of the use cases.

Every time I program something that deletes files I get worried about accidentally deleting the entire filesystem by mistyping something. I shouldn't have to worry about that.

One of the reasons that webapps get as much trust as they do is simply because they don't have unrestricted file access. I wish there was an application format that promised the same on the desktop.

prmph · on Dec 11, 2023

I know what webassembly is. In addition to whatever else it is used for, I hear a lot about it allowing devs to avoid JS and just write browsers apps in whatever language they want to use. That there is a clamor for that is borne out of the massive pain that is the current HTML/CSS/JS based web dev.

So what I'm saying is, instead of wasting innovation effort focusing on the use of to improve the experience of writing web apps, let's use that to solve the actual problems with web dev.

Capricorn2481 · on Dec 12, 2023

"In addition to whatever else it's used for" is pulling a lot of weight there. People excited about it just for not writing JavaScript just aren't aware of what's out there. There's so many ways to avoid using JS these days. The value add of web assembly has always been portability and sandboxing. This is a big deal for software preservation.

brokenbyclouds · on Dec 12, 2023

making web apps is the least exciting thing wasm could be used for. plugin systems, embedded, serverless use cases are much more interesting off the top of my head.

i think we live in a world where we can do both: we can do wasm and we can continue making improvements to the big three webdev tools.

pjot · on Dec 11, 2023

What are you writing the games in?

Matheus28 · on Dec 11, 2023

C++ with a little bit of JS glue to interact with web APIs, https://florr.io

beebeepka · on Dec 11, 2023

Wouldn't code written in JS be just as fast after being turned into wasm?

flohofwoe · on Dec 11, 2023

You can't really turn Javascript into WASM without also compiling the whole Javascript runtime to WASM because of Javascript's highly dynamic nature.

You could use a more restrictive Typescript subset like https://www.assemblyscript.org/ though.

Also languages like C, C++ or Rust let you exactly define the layout of data on the heap, which is crucial for performance (look up data-oriented-design), and WASM preserves this in-memory layout (since it uses a simple linear heap, like C, C++, Rust, etc... but unlike Javascript, C# or Java). Achieving something similar in a high level language like Javascript would involve mapping all application data into a single ArrayBuffer "pseudo heap", and at that point, it's easier and more maintainable to do the same thing in C (or C++ or Rust).

Having said all that: modern Javascript engines can perform surprisingly well (in general I'm seeing that Javascript performance is underrated, and WASM performance is often overrated - sane Javascript, WASM and native code can all be in the same performance ballpark, but native code usually has the most "optimization potential").

hutzlibu · on Dec 12, 2023

Assemblyscript is unfortunately not that fast (yet?), same as javascript in most cases, also because it uses a garbage collector.

"in general I'm seeing that Javascript performance is underrated, and WASM performance is often overrated - sane Javascript, WASM and native code can all be in the same performance ballpark, but native code usually has the most "optimization potential""

And strong disagree. Javascript is indeed quite fast, but if you use a native compiled wasm libary in the right way (avoiding too many calls to wasm and back) - you will get a worlds difference in performance.

flohofwoe · on Dec 12, 2023

> avoiding too many calls to wasm and back

Well yeah, that because each call is basically an "optimization barrier" for the compiler (on both sides of the call), and of course the call itself also adds overhead, although that has been drastically reduced over the years.

TheCoelacanth · on Dec 11, 2023

No, because the semantics of JS aren't amenable to producing fast wasm.

ajross · on Dec 11, 2023

Webassembly doesn't solve the same problems as styleable components, scoped CSS, and things like that though. Webassembly solves the problem of "how do I deploy this software that isn't written in Javascript to the browser client?".

Sure, it also is sorta about performance, but JS interpreters are actually shockingly fast in the modern world. Mostly wasm is about moving your existing C++ code (or whatever) for your existing formats or algorithms or tools into a web/mobile client without having to hire someone to recode it all in TypeScript or whatever. That's not sexy, but it's valuable.

dudus · on Dec 11, 2023

I think webassembly actually acts as a reliable target when building software. Every platform will have a fast web assembly runtime. So you just build for it and you can deploy for any platform mobile or desktop.

gnulinux · on Dec 12, 2023

WebAssembly is like an openly designed Java virtual machine that's intended to run on all systems. You're completely confused. WebAssembly is extremely exciting and I believe it'll soon be the thing in the tech industry. With Wasm you can write one app and run it in the following platforms TODAY:

1. Desktop: Linux + Windows + OSX

2. Browsers: Chrome + Firefox

3. Phone: Android (no iOS)

4. Embedded

it's absolutely amazing. The biggest problem nowadays is packaging and various runtime things, but what I listed above 100% works, you just need to do the work to package it in N different platforms. How is this not appealing?

Imagine you write Google Sheets, it works on your browser, on OSX, on Linux, on Windows and on Android. It's the same binary.

paulddraper · on Dec 11, 2023

WebAssembly is solving a completing different problem than better webpage UI.

brundolf · on Dec 11, 2023

It's about much more than web browsers at this point, in fact I think its biggest impact will end up being outside of them

jchw · on Dec 11, 2023

I mean, just because it's not solving problems relevant to what you're working on, doesn't mean it's not solving problems. The implications of a relatively fast universal runtime that can safely sandbox untrusted code are quite far-reaching, and it will probably take a while to see the end-game. Docker and containers have been around for a fairly long time, and while the uptake of containers to solve problems has gradually increased over time, it's still nowhere near the peak. I think this is likewise true for WebAssembly, which has more potential than we currently know what to do with.

Personally I think the focus for web technologies should be deprecating old things and being very judicial about adding new things. New things that get added have to be maintained by all browser engines effectively forever, and this is part of why every browser engine is a huge multi-million dollar per year endeavor just to maintain. Meanwhile, a lot of features being added to browsers don't necessarily justify all of this cost. Layout engines are unmanageably complex already. CSS is unmanageably complex already. I think adding even more stuff is hardly the solution, but rather, we need to figure out how to actually utilize what's here better before we can actually come up with successors. Rather than adding more scoped CSS and CSS modules junk, I'd rather just have improvements to CSS-in-JS. Maybe some targeted new APIs that make it easier to implement these features in the browser, not entire new paradigms that require implementing hundreds of thousands of new lines of code that will have to be maintained indefinitely. Likewise for web components: the concept is fine, but every browser has to maintain all of this forever, and all of the edge cases that come from it; will it yield so much benefit from what exists today? Will it actually stop people from shipping megabytes of Javascript, or could they have already stopped doing that if they really wanted to, and all this will do is mop it around a bit? A large node_modules folder disappears when a web app disappears. A large Chromium source code checkout only continues to grow larger effectively forever.

WebAssembly though, gets a pass from me, because it's far more than "Web", but the Web part adds a lot to the overall package. It's just a win all around.

prmph · on Dec 11, 2023

The unmanageable complexity is exactly the result of the current architecture of browsers. My proposal is to rip off all that cruft and actually address the pain points of web dev in a modern way.

For example, CSS is a disaster. That browsers need to be very complex to implement it correctly is the more reason why it should be replaced.

Already__Taken · on Dec 11, 2023

you'd break the web and that is worse overall

VikingCoder · on Dec 11, 2023

Some language -> WASM -> C -> Cosmopolitan C Compiler -> Actually Portable Executable that runs on just about anything...?

CalChris · on Dec 11, 2023

Why not just Some Language -> C -> CCC -> executable? What is WASM bringing to the table?

Or better yet Some Language -> MLIR.

turbolent · on Dec 11, 2023

Very few languages have "Some Language -> C" or "Some Language -> non-common OS / arch combo". The "just" part is a whole new backend, which is a massive amount of work for common languages.

But it turns out many languages do have "Some Language -> WASM" now. WebAssembly brings portability to the table.

kllrnohj · on Dec 12, 2023

No more than "Some language -> its native toolchain" already is or isn't.

VikingCoder · on Dec 12, 2023

Did you read up on Cosmopolitan C Compiler when it was discussed here?

No, it's not exactly the same amount of portability as some language and its native toolchain. If you give me source code in some language that I can transpile to C or C++, or into WASM (and then into C), then I can give you a single file which can be executed on Linux, BSD, Mac, or Windows.

That's not remotely the same thing as "its native toolchain."

You, a developer or tech savvy person, might not see any difference.

But if I hand a non-tech-savvy person a file and they can just run it, no matter what OS they're using - with no pre-requisites - that's kind of magic.

kllrnohj · on Dec 12, 2023

> then I can give you a single file which can be executed on Linux, BSD, Mac, or Windows.

Using what libraries & syscalls? A portable IR and a portable executable are very different things, yet the WASM crowd loves to conflate them. WASM is only a portable IR, it doesn't get you portable programs at least not if you want your program to do anything beyond a trivial hello world.

VikingCoder · on Dec 12, 2023

For WASM, that's what WASI is all about, yes?

And for Cosmopolitan Libc, there's documented Functions:

https://justine.lol/cosmopolitan/functions.html

And if you want to see things beyond a trivial hello world, you can check out some examples:

https://github.com/shmup/awesome-cosmopolitan

https://github.com/burggraf/awesome-cosmo

Or you can see a pretty big list of pre-compiled Actually Portable Executables here:

https://cosmo.zip/pub/cosmos/bin/

turbolent · on Dec 11, 2023

Exactly!

pjs_ · on Dec 11, 2023

Back to the good old days of downloading and running .exes from the public internet :)

theonething · on Dec 11, 2023

The first time I did that was at an internship when I needed a music player to listen to my music CDs on my work PC's CD-ROM drive while working.

Searched for it (probably on Lycos), found one, downloaded the .exe, launched it, popped a CD in. When I pressed play and heard the music start in my headphones, I had the feeling I just performed magic. Big smile on my face as I went back to work.

Viruses, etc. weren't in our reality in those naive and heady days.

jrockway · on Dec 11, 2023

I remember people being worried about viruses pre-Internet. You had to be careful about borrowing software from friends and things like that.

thenickdude · on Dec 12, 2023

Indeed, my DOS machine got infected by a Jazz Jackrabbit disk borrowed from a friend.

I had a text-mode launcher menu for running programs, and that had an "anti-virus" feature built-in that checksummed programs, and alerted you if their checksum ever changed (since these viruses spread by infecting .exes), which is how I found out about it!

afandian · on Dec 11, 2023

Be fair, it’s .ocx files.

ncruces · on Dec 11, 2023

I don't really buy the security argument if: “some traditional WebAssembly compilers can decorate memory accesses with bound-checking code. w2c2 currently can’t, but it totally could.”

Also, in my experience, naive bounds checking will eat up a lot of cycles. But maybe the C compiler can eliminate a bunch of them.

keithwinstein · on Dec 11, 2023

wasm2c (part of WABT) does this transpilation in a spec-conforming way; it passes all* the WebAssembly tests and enforces the memory-safety and determinism requirements and the rest of the spec. The memory bounds-checking itself doesn't have a runtime performance impact because it's all done with mprotect() and a segfault handler. (There are some other differences between w2c2 and wasm2c that also have to do with spec-conformance and safety; e.g., enforcing type-safety of indirect function calls. This costs <4 cycles but it's not zero.)

Re: bounds checks, the thing that consumes cycles isn't the bounds check itself, it's Wasm's requirement that OOB accesses produce a deterministic trap, even if the result of an OOB load is never observed and could be optimized out. wasm2c has to prevent the compiler from optimizing out an unobserved OOB load, and that forced liveness defeats some compiler optimizations (probably more than it needs to). But even with all that, we're talking like a <30% slowdown compared with native compilation across the SPECcpu benchmarks.

If you want to transpile arbitrary Wasm to native code in a spec-conforming way, you're probably better-off using wasm2c (which, disclosure, I work on). If you trust the Wasm module, or you're good with the isolation you get from your operating system and don't need Wasm's determinism, w2c2 seems great. Both of these are far less battle-hardened than V8 or wasmtime, especially when you include the fact that now you need an optimizing C compiler in the TCB.

---

* The Wasm testsuite repo has recently merged in the "v4" version of the exception-handling proposal, and WABT is still on "v3". But it does pass all the core tests (including tail calls) at least until GC is merged.

kaba0 · on Dec 11, 2023

So how is it any different than just adding bound checkings for normal C code?

keithwinstein · on Dec 11, 2023

Well, a bunch of ways.

It's much faster to execute than adding a software bounds-check on every load. (Because the module declares its memories explicitly, it's very easy for a runtime to use a zero-cost strategy to enforce that memory loads/stores are all in-bounds.)

But Wasm's safety is more than bounds-checking memory loads/stores. E.g., Wasm indirect function calls are safe, including cross-module function calls for modules compiled separately, because there's a runtime type check (which wasm2c does very efficiently, but not zero-cost).

And, Wasm modules are provably isolated (their only access outside the module is via explicit imports). Whereas if you wanted that from "normal C code," it's a lot harder -- at some point you'll have to scan something (the source? the object file?) to enforce isolation and make sure it's not, e.g., jumping to an arbitrary address or making a random syscall. There's obviously a huge amount of good work on SFI but it's not easy to do either on "normal C code" or on arbitrary x86-64 machine code.

stefanha · on Dec 12, 2023

> it's very easy for a runtime to use a zero-cost strategy to enforce that memory loads/stores are all in-bounds

I believe your statement is only true for wasm32 on a 64-bit host where guard pages can be placed around the memory.

Has anyone come up with a zero-cost strategy for wasm64?

This is something that CPU vendors could help with. x86 used to have segment registers but the limit checks were removed in x86_64 so FS/GS cannot be used for this purpose anymore.

kaba0 · on Dec 11, 2023

If a c code can automatically be compiled to wasm that is compiled to a safer c and machine code, then the same c code can also be transformed the same way without that extra step to that output. It’s either that the original c code that can be compiled to wasm is a subset of all c codes (partially true), or that compilers trade off safety for performance (wasm has more rigid control flow, for example).

keithwinstein · on Dec 11, 2023

The goal usually isn't for one party to take a C program and transform it into "safe" machine code. The goal is for a possible adversary to take a C program and produce an IR, and then for somebody else (maybe you) to validate that IR and produce safe machine code. Wasm is a vastly better interchange format between distrustful parties than C would be!

(There are probably even better interchange formats coming on the horizon; Zachary Yedidia has some cutting-edge work on "lightweight fault isolation" that will be presented at the upcoming ASPLOS. Earlier talk here: https://youtu.be/AM5fdd6ULF0 . But outside of the research world, it's hard to beat Wasm for this.)

Less important: I don't think going through Wasm has to be viewed as an "extra step" -- every compiler uses an IR, and if you want that IR to easily admit a "safe" lowering (especially one that enforces safety across independently compiled translation units), it will probably look at least a little like Wasm, which is quite minimal in its design. Remember that Wasm evolved from things like PNaCl which is basically LLVM IR, and RLBox/Firefox considered a bunch of other SFI techniques before wasm2c.

zyedidia · on Dec 12, 2023

Thanks for the shout-out -- in case someone wants to check it out, the code for Lightweight Fault Isolation is available here: https://github.com/zyedidia/lfi.

ncruces · on Dec 11, 2023

30% is in the ballpark of what I'm expecting, actually.

SpaghettiCthulu · on Dec 12, 2023

is it possible to `mprotect` less than a page? If not, how does this bounds checking work?

flohofwoe · on Dec 11, 2023

AFAIK the (software) bounds checking isn't necessary if the WASM heap has large enough "guard bands" at the top and bottom to trap memory accesses (because the WASM bounds checking isn't fine-grained, but only against the WASM heap boundaries).

E.g. see here:

https://docs.google.com/document/d/17y4kxuHFrVxAiuCP_FFtFA2H...

(I'm not 100% sure but I think that at least V8 has implemented this in the meantime)

azakai · on Dec 11, 2023

Yes, that is exactly how production Wasm VMs optimize heap accesses. Bounds checks cause no overhead as a result. That's the case in all modern browsers, as well as in wasm2c.

azakai · on Dec 11, 2023

I like the term "VMless" (or "VM-less") for this. (Inspired by "serverless" I guess)

ledgerdev · on Dec 12, 2023

This is an amazing idea! Am I missing something or does this sound like a viable way to build cross ecosystem packages with strong sandboxing? I am thinking generating and compiling wasm to a native lib, then calling that native library from pip/npm/maven/nuget packages. I wonder if one could auto generate those binding and build a repo that pushes to all ecosystems from a single code base.

My use case would be a wasm based validation lib that could be used in the browser, in edge proxies like cloudflare workers, and in a variety of backends(py/node/.net/java). This approach would negate the need to package a wasm engine like wasmtime.

spankalee · on Dec 11, 2023

I love the idea of AOT WASM compilers, but they will definitely be complicated by WASM GC. I wonder if compiling to another GC language, like Go, might help, or if they can just use a GC library like tgc or Oilpan.

csjh · on Dec 11, 2023

I'd imagine the majority won't support GC, at least not for some time

slimsag · on Dec 12, 2023

I think WASM has a problem which isn't really noticed/worked on yet (at least, that's my perspective)

The WASM GC will be good, not because it's a GC, but because it would allow languages compiled to WASM to interoperate with e.g. the DOM using the same memory manager so you're not e.g. manually managing the memory of DOM objects while JS may hold references to them according to the terms of the GC.

But this also means the DOM API will be tied to the GC, right? I don't know this for fact, but logically that seems correct. So languages that do NOT have good use of the GC are going to be at a disadvantage when it comes to interacting with the DOM, presumably.

Meanwhile, the runtimes of languages like Go and modern Java (after Java gets goroutine-like behavior from project loom) rely on small ~8kb stacks and the ability to swap the stack to a new goroutine using setjmp/longjmp - but WASM isn't a traditional register machine like x86 ASM, instead it's more of a stack machine.. with no setjmp/longjmp.

So not only do languages like Go and Java need to 'decouple' their runtimes for goroutine-like behavior from their GC and make use of the WASM GC instead (which is not tailored to the usage patterns of such languages), but they also need to emulate their own register machine 'on top of' the WASM stack machine so that their goroutine-like behavior works in WASM. Back in 2017 my coworker did this for Go, and so far as I know the implementation has not moved on from that approach since because there is no alternative.

So for Go and Java, you'd be working with a GC that isn't designed for the language and presumably that has performance implications.. and you'd be working with an emulated register machine on top of a sandboxed stack machine (WASM)... that all starts to seem quite a bit far from 'low level like native code' that WASM aims to promise.

I hope I'm wrong, but I fear Go/Java do not have a bright, performant, future when it comes to WASM.

Zig, C++, Rust - all probably have a bright future here. But the challenges for higher level languages are stark

[0] https://news.ycombinator.com/item?id=32078190

csjh · on Dec 15, 2023

I wouldn't be so sure, Java was definitely in mind while the GC proposal was being worked on. Maybe not as much with the goroutine/Java concurrency stuff, but not sure.

rockwotj · on Dec 12, 2023

I mean Kotlin already does, I believe Java will too, but you’re right that python, js, ruby, golang, rust and c/c++ don’t use Wasm GC

csjh · on Dec 15, 2023

The subject of the parent comment was about AOT WASM compilers - definitely languages using WASM GC right now, and (hopefully) more to come

3cats-in-a-coat · on Dec 12, 2023

There's no "best" approach here. If you have RAM and enough cores, you can use JIT to achieve better performance over time for hot loops. Much better.

But if you need statically compiled code, no runtime is much simpler to handle, with a lot less to go wrong. But also expect performance that's just "ok". Nothing to call home about.

AndrewDucker · on Dec 11, 2023

How much does this limit the code you can write? Presumably WASM is made safe by limiting the ability of the code in some way. Otherwise the checks that occur as part of the C -> WASM -> C pipeline could be built into your standard C compiler.

So what do we lose as part of this round trip?

Just speed? Nothing else?

aseipp · on Dec 12, 2023

You could build Software Fault Isolation into an existing compiler toolchain, and people have done it, but WebAssembly comes with an existing constraint which is why it's designed the way it is: a WASM file may be generated by an unknown or untrustworthy participant and needs to be consumed in a trustworthy context, i.e. served from a random HTTP server into a users browser. Therefore the WASM file format has stringent requirements and a strict validation algorithm placed on it before you should execute it.

So, you're doing a classic thing in verification, which is just writing a "checker" that implements some (known good) algorithm in the smallest and most correct way possible, and then using that to check whether much bigger "unknown" things are safe. The goal is that the checker is much easier to implement correctly than auditing everything by hand.

For example, you might have a media player with extensions (like foobar2000); traditionally extensions would be delivered as .dlls, because developers did not release the source. This would be a use case similar to the browser, where WebAssembly would be a good choice instead of random .dll files. They may not want to release the code, but you don't want to trust a random blob of binary code. If you trust your WASM implementation, you don't need to trust that the binary blob is harmless (it will just be rejected or forbidden from doing bad things.)

If you're not dealing with "random binary blob from potentially untrusted source", i.e. you are running the compiler yourself on some code you downloaded, and then running that code, then you don't really need WASM for this, because you could reasonably trust the compiler to uphold the security guarantees using SFI techniques. For example, if you wanted to make sure zlib was safe from buffer overflows in your main process, to reduce blast radius, a pure SFI toolchain would be fine. You can trust it works and then just compile zlib yourself.

But there's generally a lot more mindset and movement around WASM than anything else, so people use it for all of these cases, even cases where they control both the compiler generating code, and where the code is being run.

legulere · on Dec 11, 2023

The difficulty of sandboxes is to offer a usable, useful API to the outside world that is still secure. Operating system processes would perfectly fine sandboxes except for the huge hole current syscalls and other out of process APIs rip.

IshKebab · on Dec 11, 2023

I think it would be more interesting to compile to LLVM IR, so you don't have to deal with C. Part of the appeal of Wasm is that it's a modern system that doesn't have to deal with 70s nonsense.

turbolent · on Dec 11, 2023

You do not have to deal with the generated C, simply consider it the IR.

The main benefit of generating C over LLVM IR is portability: C is supported by far more systems than LLVM can target.

For example, it enables porting Rust applications to Mac OS 9 (https://twitter.com/turbolent/status/1617231570573873152), or porting Python to all sorts of operating systems and CPUs (https://twitter.com/turbolent/status/1621992945745547264).

The main "goal" of w2c2 so far has been allowing to port applications and libraries to as many systems as possible. For more information, see the README of w2c2.

IshKebab · on Dec 11, 2023

You do have to deal with the generated C though. Unless you just generate it and throw it away?

logicchains · on Dec 11, 2023

>I think it would be more interesting to compile to LLVM IR

C is a stable format that's backwards compatible for decades; LLVM IR changes with every LLM release. Unnecessarily tying stuff to a LLVM version is a nightmare waiting to happen.

kaba0 · on Dec 11, 2023

Obligatory ‘C is not a low-level language’ blog post.

Standard C doesn’t specify any specifically low-level detail, no cache, no vector instructions, nothing.

remexre · on Dec 12, 2023

But we're not expecting that the produced C (or LLVM IR) has already had target-specific optimization applied, so as long as the C compiler is an optimizing compiler, that's not a problem?

brundolf · on Dec 11, 2023

That would

1. Put more onus for optimization on the converter

2. Mean you can't target as many platforms

connicpu · on Dec 11, 2023

To be fair for point 1, if you have LLVM IR you can just run the full LLVM optimizer suite over it. But overall LLVM IR is a poor target because it constantly changes with every LLVM release.

brundolf · on Dec 11, 2023

LLVM does some optimizations, but compilers that target it normally do their own optimizations before generating the IR, because they know more than LLVM does about the higher-level source language and what can be done with it

So eg, you may generate LLVM IR directly and only get LLVM's optimizations, or you may generate C and compile it with Clang and get Clang's optimizations + LLVM's optimizations

You could always implement your own pre-LLVM optimizations in your LLVM IR generator, but as I think we all know, that's a huge amount of extra work (which is the OP's point)

shermantanktop · on Dec 11, 2023

C is not a large language, and a transpiler can choose to generate code that only uses a subset of the features of the language, and can choose to be consistent about safety mechanisms that the underlying language doesn't guarantee.

When you say "70s nonsense" are there specific C features you are concerned about? I would think that transpilation can just avoid bad practices like passing around void * pointers and then casting them optimistically, or even the use of char * for strings in favor of a bounds-checked alternative.

lmm · on Dec 12, 2023

C can't represent lots of information that would be useful, e.g. while C99 has a limited "restrict" function, you can't express any more complex detail of "these things can only alias under these circumstances". If the hardware supports doing anything useful with sum types (e.g. atomic operations), there is no way to expose that in the C model. C specifies a concept of trap representations but anything you would actually do with them is undefined behaviour, so you can't write code that e.g. safely errors out when someone tries to use an uninitialized pointer. Even for arithmetic on standard hardware, the ability to fully express even quite basic things is limited - you can't write code that e.g. permits signed overflow to occur and handles it if it does. You effectively can't use any floating point rounding modes except the most broken one, at least not if your code ever uses threads. I'm sure there's more.

flohofwoe · on Dec 11, 2023

LLVM IR is a moving target, and AFAIK actually also target platform specific because of things like different per-CPU/per-platform ABIs.

C on the other hand is standardized, stable, ABI agnostic and compiles on pretty much anything that has a CPU.

nbittich · on Dec 11, 2023

So you take a language that compiles to wasm, then you take your wasm output and transpile it to c? Welcome to the frontend world I guess

pjmlp · on Dec 11, 2023

All the stuff trying to use WebAssembly outside the browser, without looking into all bytecode formats that have trailled the same path for decades since early 1960's is getting tiresome.

mort96 · on Dec 11, 2023

Do you have any actual interesting critiques of the wasm bytecode format and what it has failed to learn from Java bytecode or CLR bytecode or whatever? Because that would be interesting, and I'm sure there are design choices to analyze and criticise. Empty references to the potential existence of problems isn't very interesting though.

pjmlp · on Dec 11, 2023

Start with Burroughs Large Machines from 1961, do a stop at Xerox PARC, visit ETHZ, Bell Labs, Amsterdam University, Caltech, Tao Group, UCSD, IBM and Unisys mainframes, Microsoft R&D, and already there are several bytecodes to look into.

The hype around WebAssembly is also not very interesting for those of us that have read hundreds of papers on those bytecodes, but hey there is VC money to burn.

jrockway · on Dec 11, 2023

I think that network effects win over technical merit every time, and this time WebAssembly has the momentum. I think it's pretty neat that you can take a web server written in C++, implement some new compression algorithm in Go, and then have that web server compress pages with your Go code without having to recompile the webserver or touch any of its code. That's what WebAssembly outside of the browser offers; you can just write stuff in your language of choosing for other products. All the VC hype just means that everyone picked the same backend so there is some possibility of traction.

In reality, some flaws are preventing adoption. C and Rust are pretty much the only viable languages with WebAssembly support, and since those are also the chosen implementation languages for runtime, nobody is really deriving this benefit. (Anything with a runtime seems to fail for programs beyond "hello world".)

(I tried a "real program" and WebAssembly once, but it ran out of memory when compiled with either gc or tinygo. We had to write the browser side of things in Javascript, which was a shame. The code in question was: https://github.com/pachyderm/pachyderm/blob/master/src/inter.... Somewhat complex, but not so complex you can't do it in Javascript. So a bit of a shame that WebAssembly didn't work out.)

The other thing that I think WebAssembly should fix (but people seem to want to kill me when I mention this) is that the Typescript compiler should just output WebAssembly and we can forget about node_modules and Webpack and that whole nightmare. There is AssemblyScript, but it doesn't run React, so doesn't matter for this use case.

Someday I will go insane and just compile node or deno to WebAssembly and just ship the whole VM with my app embedded in it. Then you'll have a real reason to want to kill me! Muahahaha.

mort96 · on Dec 11, 2023

There's some cool stuff going on with GC in WebAssembly, which should really help bring higher level languages efficiently to WASM: https://v8.dev/blog/wasm-gc-porting.

pjmlp · on Dec 12, 2023

Some of those bytecodes were already polyglot, this idea is as old as UNCOL, from 1958.

https://en.wikipedia.org/wiki/UNCOL

mort96 · on Dec 11, 2023

I know that there are existing bytecodes. I'm asking what you think WebAssembly should have learned from them but didn't. But based on your response I'm guessing you don't really have that kind of thoughtful critique of WebAssembly?

pjmlp · on Dec 12, 2023

Here is one, despite all the security marketing, programs inside the sandbox can still be tricked into internal memory corruption, as there is no way to prevent bounds checking inside the same linear memory segment.

So they are bound to black box attacks, where a clever sequence of function calls can eventually result in something being allowed that wasn't before, as the internal memory state of the WASM module is now corrupt.

mort96 · on Dec 12, 2023

This is correct, and it's something people who use WASM for security need to be aware of. My program can use WASM to execute untrusted user-supplied code (such as web pages) safely, but compiling my application to WASM and running it in a sandboxed VM doesn't protect me from security bugs in my own code.

Is this an example of WASM not learning from other bytecodes though? 1) Has other bytecodes fixed it? It's not like compiling your C application to CLR protects you from internal memory corruption bugs either, right? And 2) Was protecting against internal memory corruption even a goal? To me it always seemed like the purpose of WASM is to be a cross platform way to run untrusted code sandboxed, just like JS but faster and more appropriate as a compilation target; and whether you compile to WASM or JS, your C program might have memory corruption bugs.

If you're only complaining about the VC-funded hype machine, I could probably be convinced that WASM is sold as something it isn't. I haven't seen it myself really but it's exactly what those sorts of people are wont to do. But that's surely a critique of the hype machine, not of WASM failing to learn from other bytecode designs, no?

CyberDildonics · on Dec 12, 2023

You have no sources or evidence for these claims because they aren't true.

pjmlp · on Dec 12, 2023

"Everything Old is New Again: Binary Security of WebAssembly"

https://www.usenix.org/conference/usenixsecurity20/presentat...

One of many security presentations, finding the others is an exercise for the reader that actually cares about security.

CyberDildonics · on Dec 12, 2023

This doesn't say what you're claiming. It just says that webasm stack frames are next to each other like C. A C program that crashes will crash in webasm.

mort96 · on Dec 12, 2023

"A C program that crashes will crash in webasm" is exactly what pjmpl said. The sandbox doesn't protect the sandboxes program from internal memory corruption.

pjmlp · on Dec 12, 2023

Thanks for confirming not having any clue in Infosec.

CyberDildonics · on Dec 12, 2023

Explain specifically and technically what you think the problem is. You can't overrun buffers and overflow stacks in any language without it crashing.

There is also no expectation that data in one buffer in webasm would be separate from another buffer.

mort96 · on Dec 12, 2023

You should stop arguing. He's correct in this critique, and you don't even disagree that he's correct. The problem is that it's a critique of how (he alleges) hype/VC people have over-sold WASM, not an actual problem with WASM.

CyberDildonics · on Dec 12, 2023

You should stop arguing, because the claim of insecurity is not being backed up by anything.

Show me an exploit, show me with detailed and technical information, show some sort of evidence.

Both of you are just repeating your claim more forcefully, never adding anything that backs up your claims.

Compiling C to webasm doesn't fix C bugs within the local memory space of webasm? No one has those expectations except people with a bizarre vendetta against a simple and benign technology like webasm.

mort96 · on Dec 12, 2023

pjmlp's claim is: there are people out there who over-hype WASM, who claim that simply compiling a C program to WASM will automatically make it secure. pjmlp is claiming that these over-hyped claims are wrong (in their words: "programs inside the sandbox can still be tricked into internal memory corruption"). You agree that those over-hyped claims are wrong (in your own words: "compiling C to webasm doesn't fix C bugs within the local memory space of webasm"). I agree that those over-hyped claims are wrong.

If one wants to argue against pjmlp (as you and I both do), there are two ways to go: 1) you can say that nobody is making those claims, or 2) you can say that if people are making those claims, the problem is with those people, not WASM.

I choose #2, because it seems plausible to me that VC-backed startup types over-hype WASM. If I was more familiar with the discourse around WASM, I might try to argue #1 if I thought I could demonstrate that over-hyping WASM doesn't really happen. Maybe you could go the burden of proof route, that it's their responsibility to prove that people are over-hyping WASM and take the discussion from there.

But you ... don't seem to be engaging with the argument. You agree that "compiling C to wasm doesn't fix C bugs within the local memory space of wasm", yet you say that "the claim of insecurity is not being backed up by anything" when the claim of insecurity is that compiling C to wasm doesn't fix C bugs within the local memory space of wasm. That's literally what was meant by "internal memory corruption" in pjmlp's very first comment which mentioned the security angle. Again, they're wrong to mention this as a flaw of WASM because making C code resistant against "internal memory corruption" was never a goal of WASM, but that doesn't mean that their "claim of insecurity" is wrong.

To show an example of the kind of exploit pjmlp is talking about, just take any C program with some kind of exploitable buffer overflow, run it in wasmtime or wasmer or whatever, and exploit the buffer overflow. Maybe the buffer overflow lets an attacker write 101 bytes to a 100 byte buffer and therefore flip an "isAdministrator" flag from 0 to 1. You, I and pjmlp all agree that such security issues can exist; WASM doesn't magically protect C code from memory corruption. And that is not a flaw in WASM, which is what your argument should be focusing on.

CyberDildonics · on Dec 13, 2023

You both need to stop conflating bugs in a C program that become bugs in webasm with insecurity.

Insecurity would be escaping the VM it runs in. In a native compiled binary you have infinite permissions and can make system calls. You can't do that in a webasm VM.

Webasm is not insecure because a C program that crashes also crashes in webasm.

You still haven't shown any evidence of escaping the VM or actual insecurity.

mort96 · on Dec 13, 2023

Nobody is conflating bugs in a C program with bugs in WASM. Nobody is saying that there are sandbox escaping bugs. I give up.

CyberDildonics · on Dec 13, 2023

If you can't escape the VM, then where is the insecurity?

If someone is dumping privileged data into a VM that's insecure no matter what, why would you blame webasm?

mort96 · on Dec 13, 2023

Web pages can have insecure JavaScript even if the sandbox isn't escaped. Sandbox escape isn't the only possible vulnerability in sandboxed applications. This is basic stuff that I know you already agree with so I don't understand why you keep pressing it.

"Why would you blame WASM" is the right question. As I have said in LITERALLY every single comment so far, blaming WASM instead of the alleged hype people is where pjmlp is wrong. He's not wrong in the assertion that insecure programs may remain insecure when run in the WASM sandbox. But you refuse to listen. This conversation is like talking to a much less polite ChatGPT.

CyberDildonics · on Dec 13, 2023

You're arguing against claims no one is making. No one thinks webasm magically fixes bugs. I never said that and no one else did either.

mort96 · on Dec 13, 2023

Fucking hell, please pay attention to the discussion you're in. pjmlp's claim is: marketing around WASM suggests that running C programs in WASM instead of natively magically makes those C programs safe.

CyberDildonics · on Dec 13, 2023

I doubt anyone is suggesting "magic" and obviously when people talk about 'safety' they mean preventing crashes and escaping the VM.

What specific marketing are you talking about? Link what you're referring to.

mort96 · on Dec 13, 2023

Read. My. Comments. I am not claiming that WASM has been misleadingly marketed. pjmlp is. Ask him to link to what he's referring to.

cygx · on Dec 13, 2023

A reference was already provided. Here's a direct link to the demo of a cross-site scripting attack via webassembly:

https://www.youtube.com/watch?v=glL__xjviro&t=450s

CyberDildonics · on Dec 14, 2023

That looks like you have to load up a local file with an exploit, use a png library not being used by major software that also doesn't check for issues with the png file (because they already need to deal with malicious files) and the end result is that it will run javascript if javascript is able to be run from webasm in that context.

It is still worth looking at and is actual information, so I appreciate that.

cygx · on Dec 16, 2023

Don't focus on the specific exploit, it's a general issue:

In order to be useful, your wasm application will likely have to be able to make systems calls, or whatever its equivalent might be on your particular host environment. If you can corrupt internal state, you can control the arguments to these calls. The severity of the issue will depend on what your application is allowed to do: If all it has access to is a some virtual file system, the host will still be safe. But if that virtual file system contains sensitive data, results may nevertheless be catastrophic if, say, it can also request resources over http.

baq · on Dec 11, 2023

> hey there is VC money to burn.

given the absolute idiocy some of them finance WASM is a paragon of sensibility and should be injected money via IV. special mentions go out to softbank, tiger global and a16z.

CyberDildonics · on Dec 12, 2023

What is your actual technical criticism?

pjmlp · on Dec 12, 2023

Selling WASM as the solution of all problems in the world, and being the first at anything.

CyberDildonics · on Dec 12, 2023

Where did anyone say any of that?

pjmlp · on Dec 12, 2023

Plenty of people selling WebAssembly as if the first of a kind across social media and VC backed products.

Now we even get kubernetes with WASM containers redoing IBM mainframes/micro computers, Java and .NET application servers as if never done before.

CyberDildonics · on Dec 12, 2023

I think you're hallucinating this, you have no links.

Also, this isn't technical criticism that I asked for.

Who cares what "people are saying" anyway? What is your technical criticism?

pjmlp · on Dec 12, 2023

No, I am not doing your work, that is all.

CyberDildonics · on Dec 12, 2023

I see we've reached the "prove my claims for me" section.

pjmlp · on Dec 13, 2023

I have better things in life than to waste my time with you, life is precious on this planet.

CyberDildonics · on Dec 13, 2023

Why make all these claims so adamantly then get so upset when someone asks you for basic evidence?

kouteiheika · on Dec 12, 2023

> Do you have any actual interesting critiques of the wasm bytecode format and what it has failed to learn from Java bytecode or CLR bytecode or whatever?

I'm not OP, and I don't know if you could have figured this out by just looking at the bytecode formats that came before it, but I think the biggest design mistake of WASM is most likely it being a stack machine. It gives very little (if any?) practical benefits and it just massively complicates everything, both the compilers and the VMs.

I'm not speculating here. I have a pet VM that I'm developing for a register-based IR which achieves roughly the same execution performance of guest programs as wasmtime, but compiles them into native code 160 times faster, doesn't compromise on security, has a bytecode format which takes roughly as much space as WASM, and with an implementation that is vastly less complex (wasmtime's Cranelift is ~150k lines of code; my codegen is, maybe, ~2k lines of code, depending on how you count).

mort96 · on Dec 12, 2023

This is a good one. I'm not an expert (I've written a few toy VMs, some stack-based and some more register-like, but nothing focused heavily on optimization) but I've also heard that stack-based bytecode formats are pretty much strictly worse than register-based ones.

fwsgonzo · on Dec 11, 2023

Not just that, but which device today doesn't have hardware virtualization where you are literally in a sandbox, running on the actual CPU with all the advanced instruction set extensions available to you?

Some of these WASM run-times are literally hundreds of thousands of lines of code running just-in-time compilation. Inferno-level safety hazard.

keithwinstein · on Dec 11, 2023

It takes tens or hundreds of microseconds to launch a new thread on Linux, and tens or hundreds of milliseconds (or more) to launch a new VM.

It takes tens of cycles to instantiate a Wasm module and call one of its exported functions.

There are some serious benefits to OS-mediated hardware isolation, but there are also some real benefits to the "ahead-of-time" isolation you can get from something like Wasm (e.g. via wasm2c->a C compiler->machine code, but also with more mainstream tools like wasmtime).

fwsgonzo · on Dec 11, 2023

That's an impressive feat if true, but I wonder when you would need it outside of a seriously threaded architecture. That is, the server you are embedded in has one thread per client or backend. In a modern server architecture I suspect you could still use KVM if you put your mind to it. For example, switching between internal processes doesn't have to be done according to Linux scheduling. KVM is just a hypervisor architecture, and even though it requires you to call it from the current thread, you can still build fast process isolation on it. Source: I have done it.

Launching a new VM is not something that should be done outside of a restart or reconfiguration.

I think for me, what WASM brings to the table is perhaps reduced Linux-isms. Everything has become a little bit Linux-or-nothing, and if WASM presents a unified API towards all operating systems that is a good thing. I'm still not happy that Browsers are de-facto operating systems now, and with WASM even more so.

rockwotj · on Dec 12, 2023

I gave a talk at Cloud Native wasm day talking about some of the stuff you can do with a WasmVM. Redpanda (where I work) is a storage engine who’s performance is predicated on kernel bypass (Direct IO, thread per core, locked memory). You can use stack switching to context switch between the VM and host application in a handful of cycles. Also having good upstream tooling for a lot of popular languages is big for adoption.

Link to the talk if interested: https://youtu.be/t4-Al2FoU0k

fwsgonzo · on Dec 12, 2023

Cool. I watched it twice! I thought you meant the stack switching done by wasmtime, which is not "a handful of cycles," but I stand corrected: It's fiber/coro switching. Alfred, a friend of mine, gave a talk on using those on a bare metal unikernel we were creating back in the day! :)

rockwotj · on Dec 12, 2023

Yeah it’s just the stack switching itself that is a handful of cycles, but there is not much more overhead for the full VM switch if you structure your embedding the right way. Code the code is source available if you want to peek at it!

https://github.com/redpanda-data/redpanda/blob/dev/src/v/was...

alex_suzuki · on Dec 11, 2023

But… it‘s new!