Reaching the Unix philosophy's logical extreme with WebAssembly

p1mrx · on Aug 29, 2023

Reminds me of that time I wrote a StringToExecutableFile() function for running [e.g. a Rust binary] from C++, but that depended on several layers of build system horror to embed the executable file as a string, and it wasn't cross-platform.

Imagine a utility function that dumps an embedded string to an unlinked temporary file, sets the +x permission, and returns a /proc/self/fd/N filename so you can exec() a subprocess. It's somewhat difficult because of write^execute limitations in Linux.

Running WASM in process seems like a much saner idea.

paulfurtado · on Aug 29, 2023

This is of course not the purpose of your post, but since you're interested in this topic, I wanted to mention that you can now create memory-backed files on linux using the memfd_create syscall without using any filesystem (nor unlink) and you can also execute them without the /proc/self/fd trick by using the execveat syscall. In glibc, there is fexecve which uses execveat or falls back to the /proc trick on older kernels.

p1mrx · on Aug 29, 2023

Looks like memfd_create is from Linux 3.17 (2014), which was after I wrote the function. I sort of miss the days when simple stuff was hard.

tester756 · on Aug 29, 2023

>I sort of miss the days when simple stuff was hard.

what? what's the point?

for me it's the most annoying thing when the simple stuff is hard because why would it be?

bheadmaster · on Aug 29, 2023

Hacking is the art doing things with software that don't seem possible.

In other words, it's just fun :)

tester756 · on Aug 31, 2023

Imo hacking is different thing

bheadmaster · on Aug 31, 2023

You're entitled to your opinion, of course - language is subjective afterall - but the meaning of the word "hacking" as I'm using it comes from the MIT hacking community as described by Richard Stallman [0] [1].

[0] https://youtu.be/D7PVrK58iGw?t=58

[1] https://stallman.org/articles/on-hacking.html

IshKebab · on Aug 29, 2023

I agree. I was looking into how you start a child process in C++ recently and I was surprised and not at all surprised to find that the answer is still fork and execve. Ridiculous.

cylemons · on Sept 6, 2023

Wouldn't that be OS specific anyway? Like, Windows has no concept of forking processes but instead it uses the CreateProcess function.

intelVISA · on Aug 29, 2023

Depending on how hacky you can also just re-invent a non-relocating ELF loader

fallat · on Aug 29, 2023

Neat talk but like... doesn't it all seem convoluted? What is the function of a Rube Goldberg web service? To allow it be made of anything?

The problem with using Rust in Go is that you entirely miss all the parts you don't use Rust; you get the VM overhead of WASM so it kills Rust perf; you most likely introduce problems at the boundaries of Rust/Go.

Again it's a neat idea but why in all things sane would anyone intentionally do this outside of puzzle-solving satisfaction?

xena · on Aug 29, 2023

Author of the talk here. When I am doing conference talks to help explain abstract concepts or ideas, I typically prefer to employ a strategy called surrealist satire. This basically helps people understand where something fits into the stack by demonstrating how something fits into the mold and then by doing another completely impractical thing with that surrealist solution. The goal of this is to help people hook something into a greater set of context (due to the assumptions I made about the audience, I had to explain a bit more about the topic than I would have otherwise at say a WebAssembly conference) so that they can figure out how things that seem unrelated are actually quite related.

In terms of performance numbers though, I have quite intentionally NOT included performance benchmarks in this talk because getting stable performance information is nontrivial. I plan to write something in the future about WebAssembly vs native code as a subprocess (the differences with windows may surprise you!), but that is not a thing for today.

fallat · on Aug 29, 2023

> I typically prefer to employ a strategy called surrealist satire

Ah, gotchya.

some_furry · on Aug 29, 2023

> Neat talk but like... doesn't it all seem convoluted? What is the function of a Rube Goldberg web service? To allow it be made of anything?

I don't want to be the guy who explains the joke, but sometimes Xe creates elaborate shitposts that aren't entirely shitposts but contain a very fun element to them rather than being for fully practical sake. Shitposting being an anagram for top insights after all.

That's the lens I used when interpretating this talk.

fallat · on Aug 29, 2023

Roger that o7

tedunangst · on Aug 29, 2023

The current reference (only?) implementation of jpeg-xl is a c++ library, which I do not entirely trust to run in process in my go web server, and yet I would like to process images. Conveniently, the build system for jpeg-xl seems to support building to wasm, so if I can jam that into my process, I'd be a lot happier.

fallat · on Aug 29, 2023

Ah, now THAT's an interesting aspect I wish someone would have brought up over the years I've seen WASM.

I guess WASM as a target and embeddable VM really helps with security in those cases. Couldn't we also do the same though with any number of arch/vm pairings?

I guess what WASM brings to the table is a compile target friendly enough for things like C and C++, i.e. low level code, and a reasonable VM implementation. It just has to be accepted by everything (both as an output format, and virtual machine impl in the language choice) to work, I think...

Edit: Like why arent we using https://stackoverflow.com/questions/4221605/compiling-c-for-...

aidenn0 · on Aug 29, 2023

I don't know the current state, but the JVM had historically quite bad isolation. Javascript's isolation in browsers is actually pretty good. It's possible that the existing non-browser WASI implementations are terrible at isolation; since they exist specifically to allow access system resources they might be bad at denying access to system resources...

kaba0 · on Aug 29, 2023

The JVM’s security in the Applet days were bad due to actually being capable at many functionality — it is trivial to properly sandbox something that has zero capabilities.

There is nothing inherent in the JVM that would make it less secure, we just realized in the meanwhile that blacklisting is not the way ahead, but whitelisting is.

IshKebab · on Aug 29, 2023

Library sandboxing is a well known application of WASM. Search for the WebAssembly Component Model. There's also a way to use WASM to sandbox a C library - search for RLBox.

nine_k · on Aug 29, 2023

This looks more than a bit like the Inferno OS, created by some of the same folks who created Unix.

WASM allows to distribute cross-platform binary code written in memory-safe Rust, running on pretty compact VM. Think about a plugin architecture that could be built on top of that.

fallat · on Aug 29, 2023

Java (like mentioned in the article) does the same though

IIRC Inferno OS is like Java in this regard.

How will WASM change what Java has already done?

Why would I compile Rust to WASM when I can compile Rust natively to any number of platforms? And use FFI?

I think WASM, while nice, doesn't bring much new to the table. It's been hyped for years, and I still only see it used here and there very sparingly.

nine_k · on Aug 29, 2023

This is how.

The installed size of OpenJDK 17 JRE on my machine is 186MB, according to the package manager.

I suspect that the WASM VM embedded in the program demonstrated in the blog post is 1.5 to 2 orders of magnitude smaller.

geokon · on Aug 29, 2023

I'm no expert but the JVM is very modular these days and just the minimal modules also give you an order or two smaller runtimes. My guess is a set of minimal OpenJDK modules will be on the same order as a WASM VM. Would be curious to hear from someone more in the know

Looking a the JRE size is a bit misleading bc it's been sort of deprecated. You're not really supposed to make Uberjars to run on a JRE anymore but are expected to bundle with the JVM modules you need. It can make very small bundles..

But naturally an Uberjars would be smaller. I think small executables are possible but are also just a nongoal now in the JVM world. Meanwhile they're obviously still very relevant in the webspace and hence WASM

you're not really gunna send JVM bundles dynamically over the wire.

I do sort of agree with the parent that while the goals are slightly different it feels like WASM reinvented the JVM without really bringing any huge improvement (while you loose several decades of libraries)

brabel · on Aug 29, 2023

There are many JDKs available, some of which specializing in embedded, like this one: https://en.wikipedia.org/wiki/JamaicaVM

kaba0 · on Aug 29, 2023

There is no such thing as a JRE anymore, the way to package a Java application for quite some time now is by using jlink/jpackage that creates a stripped “JRE” of only the used modules helping both the size and loading times.

38 · on Aug 29, 2023

I thought the Unix philosophy was do one thing well:

https://wikipedia.org/wiki/Unix_philosophy#Do_One_Thing_and_...

how is running anything through a giant virtual machine (web browser) anywhere close to that? the browser is the monolith people. using a web browser to deliver an application is, always has been, and always will be the slowest, and most bloated way to do that. the benefit of course is, that the result is user friendly and cross platform. but lets not kid ourselves, this is as far from the unix ideals as you can get.

ngc6677 · on Aug 29, 2023

see my comment in this thread about https://github.com/internet4000/find (for a summary); (web-)apps can be encapsultated, from the URI you access them, and the code that is served by the loaded page. It could also run client side only, local first (web apps manifest, service workers, sqlite/postgres wasm etc.)

YoshiRulz · on Aug 29, 2023

It's probably referring to the Markdown to HTML translator library, an example of how Wasm enables Unix-style composition.

beeburrt · on Aug 28, 2023

This person is amazing. I wish I was half as smart and just as funny

xena · on Aug 29, 2023

Thanks, I try!

ilaksh · on Aug 28, 2023

Wasn't there some kind of components proposal that would let modules written in different languages running in web assembly interop with each other?

WASI is great but there was an opportunity to standardize on something much more powerful.

Anyway I think the logical extreme of Unix philosophy started with Unix, moved on to Plan 9, and continued to improve from there. It's just that those further advancements are less popular and well known.

An old-fashioned file is not the logical extreme of anything. Not that it isn't useful or interesting.

I stead we don't even have a good way to do any kind of networking.

phickey · on Aug 28, 2023

Wasi co-chair and Wasmtime maintainer here: we agree! Wasi Preview 1, which this article is about, was a first attempt at porting some of these Unix ideas to Wasm. We found pretty quickly that unix isn't the right abstraction for Wasm. Not only is it not really portable to platforms like Windows without reinventing a compatibility layer like cygwin, it also doesn't really make sense in a Web embedding, where users end up implementing something like a unix kernel in Javascript.

Wasi Preview 2, which we are aiming to launch by the end of the year, rebases Wasi on the Component Model proposal, which enables composition of Wasm programs, including those which are written in different languages, and which do not trust each other. Wasi is now specified in the Wit IDL, which has a strong type system for representing records, variants, lists, strings, and best of all, external resources, including sugar for constructors, methods, and destructors.

Instead of basing everything on the filesystem abstraction, the core Wasi primitives are the `input-stream`, `output-stream`, and `pollable` resource types, for readable and writable bytestreams, and a pseudo-future: you can `poll-oneoff` on a `list<pollable>` and it will block until one is ready, and return a `list<bool>` indicating the set which are ready. `wasi:filesystem/types.{descriptor}` is the resource for files, but if you need to read, write, or append to a file, you can do so by calling a method on `descriptor` that returns a `input-stream` or `output-stream`.

Preview 2 is also adding networking: wasi-sockets for platforms which support sockets, and wasi-http for those which don't, like the Web.

We are closing in on shipping Wasi Preview 2 but its not quite fully baked yet - changes related to resources are slated to land in the net few weeks. The spec definitions are on github: https://github.com/WebAssembly/wasi-io/blob/main/wit/streams... https://github.com/WebAssembly/wasi-filesystem/blob/main/wit... . Stay tuned for much more approachable documentation, tutorials, and so on, once we are confident it is a stable target ready for users.

ori_b · on Aug 28, 2023

Have you looked at capabilities, like E, Mont-E and its descendants?

https://en.wikipedia.org/wiki/Object-capability_model

https://monte.readthedocs.io/en/latest/taste.html#cooperatio...

https://en.wikipedia.org/wiki/E_(programming_language)

phickey · on Aug 28, 2023

Yes, CM resources are unforgable references.

fassssst · on Aug 28, 2023

Nothing like reinventing COM :)

mananaysiempre · on Aug 28, 2023

“COM but you can actually implement it from the docs” is surprisingly compelling, honestly. There is just an absurd number of obscure corners in the original, from DCE RPC all the way to the highest levels (although those are not the only source of COM grief—IDispatch is an abomination; IStream is needlessly annoying; IMarshal is awful to use but at the same time I don’t think actually has a convincing equivalent elsewhere; etc.).

marcus_holmes · on Aug 29, 2023

Ah, the pain of getting a DCOM connection working shudder

COM was pretty reliable. Yes, it was needlessly annoying and gave CS folk the screaming ab-dabs, but it worked and was predictable.

I can see COM-in-WASM being really useful. Especially if we can dynamically load components. And not only for browser coding.

bsder · on Aug 29, 2023

Nobody has a good IPC/RPC-based abstraction set right now. And everybody is kind of struggling with that.

Look at the latest Microsoft thing about embedding Python in Excel. They're going the wrong direction. What everybody wants is to be able to drive Excel from Python aka an API that people could hook into.

Even WASM is kind of ... weak ... because it has to deal with the lowest common denominator--a web page with a single thread of execution and no access to anything.

And, COM wasn't terrible--it's just that the languages attempting to support it were very underpowered at the time. COM with VB6 created a huge ecosystem that probably still isn't really matched today.

fassssst · on Aug 29, 2023

COM works well, it’s still heavily used on Windows. The Windows Runtime API’s are just COM! It lets C++\C#\JavaScript\Rust all talk to each other in the same app.

I just think it’s funny since it was so heavily derided by people. It took awhile for me to appreciate its reasons to exist, and I still don’t like some of the vocabulary they used.

tech2 · on Aug 29, 2023

And thus AREXX rises from the ashes of the corpse of Amiga.

bsder · on Aug 29, 2023

Please clarify for those of us who aren't Amiga aficionados.

tech2 · on Sept 2, 2023

REXX itself was a language made by IBM used for all manner of purposes, AREXX was a slightly simplified take on that for the Amiga. A number of programs on the Amiga would have an AREXX "port" which provided a set of APIs which could be called from AREXX to control aspects of the application. This meant that AREXX was often used to control multiple applications or chain them together (a bit like a batch control language used to achieve an objective by having each of the programs perform part of some greater whole).

lukeh · on Aug 28, 2023

Streams sounds more like DrawBridge!

noelwelsh · on Aug 29, 2023

Very glad to see this. It's a much better solution that The Unix Philosophy (TM) of sending around unstructured bytes.

frigid · on Aug 29, 2023

Hmm, a strong type system? Is there any overlap between the Wit IDL and other similar projects like crABI?

actionfromafar · on Aug 28, 2023

The network really is the computer, this time!

Animats · on Aug 28, 2023

> An old-fashioned file is not the logical extreme of anything. ... Instead we don't even have a good way to do any kind of networking.

One can take this further. There was a forgotten system called ChiOs which started with "A bit is a file. An ordered collection of files is a file." and went downhill from there.

"Everything is a file" has its limits.

I rather liked QNX. QNX is a microkernel with microservices. Everything is an inter-process function call. Call, wait for response or timeout. Works both locally and remotely. Very fast locally. There's a POSIX library, but when you call "read", some glue code makes a call to the file system service.

As a base abstraction, remote procedure calls work better than files. Files implemented via RPC are simple. APIs via RPC work are very similar to local function calls. RPC via a file interface is complicated. You have to put a protocol on top of stream/file oriented sockets to get a message interface.

The point that the author seems to be making is that WASI offers a standard API. It's one that crosses a memory protection boundary, like an inter-process function call. This is a reasonable way to do things. For historical reasons, neither Unix nor Windows supports that approach well.

magila · on Aug 28, 2023

The WASM Component Model

https://github.com/WebAssembly/component-model

Cloudef · on Aug 29, 2023

Not sure if we continued to improve from plan9, it's more like features were copied from plan9 but due to the fundamental differences in the foundation they aren't really the same and requires boilerplate to be actually usable.

petesergeant · on Aug 29, 2023

> As a little hint for anyone here, when someone openly introduces themselves as a philosopher, you should know you're in for some fun

I have found quite the opposite to be true in the real world, but Xe gets a special exemption for being a consistently high-quality writer and thinker.

syrusakbary · on Aug 29, 2023

WASIX may be of relevance to the article https://wasix.org/

xena · on Aug 29, 2023

I would use WASIX, but my WASM runtime of choice doesn't have WASIX support yet.

IshKebab · on Aug 29, 2023

Isn't most of the point of WASM to enable portable executables? Why wrap your portable bytecode with an entirely non-portable platform API? Seems mad to me.

jeffparsons · on Sept 12, 2023

To play devil's advocate: targeting WASIX still gives you the benefit of compiling existing code once and running on any OS-architecture pair that Wasmer supports. I long for this sort of thing when building software that needs to run on Linux on multiple architectures inside AWS and also on developers' MacBooks. (Ever tried cross-compiling to macOS? Yeah, I prefer pulling teeth, too.)

Caveat: you'd need to use Wasmer forks of, e.g., rustc and Tokio, but I imagine there are at least some people who would be okay with that.

hoten · on Aug 28, 2023

Wonderful talk, now I finally understand what WASI is.

ngc6677 · on Aug 29, 2023

In this paradigm, the browser's URL Omnibox can also be used as CLI/repel/input. So each "URL or search" to a browser, can open an app that accepts commands and inputs, and allows outputs, through the interfaces, or back to URLSearchParams (which could then be piped again). Maybe related (diclaimer: working on it), https://github.com/internet4000/find ; can be used with "local first web apps", and all the web "apps" which accept input from the URLSearchParams. Great talk!!

bmacho · on Aug 29, 2023

When I search on https://internet4000.github.io/find the history disappears (Edge, Firefox). I don't know how did you do that, I can't imagine why on the Earth would you do that, but I suggest to remove that code. Even trying out the search is a total PITA, you can only search once, then you can't go back and try out a different search term

baudaux · on Aug 29, 2023

The new OS I am creating https://www.exaequos.com is WebAssembly in a POSIX environment. So it is running in every web browser

rapnie · on Aug 29, 2023

Oof. My Firefox browser on Ubuntu crashed and all my open VSCodium's went along with it. Happened when loading for a second time. First time worked.

mkl · on Aug 29, 2023

What's the connection between your Firefox and VSCodium? Or did your system run out of virtual memory and start killing things at random?

rapnie · on Aug 29, 2023

No connection that I can think of. Might be the memory. 16 GB physical. But I didn't have a lot open (browser + 3 vscodium) and it happened in a heartbeat after clicking in the browser.

baudaux · on Aug 29, 2023

It should not use a lot of memory when the login prompt is displayed and even when bash is launched

ngc6677 · on Aug 29, 2023

fun project! always love to see more unix cli in the browser.

It is super nice with the local first approach, and I wonder what it would take to be able to `mount https://example.org` from `test.com`, if both websites are running the "wasm web OS" your linked to.

Cheers!

baudaux · on Aug 29, 2023

Programs are compiled with a modified version of emscripten for supporting the system calls of exaequOS. I do not use WASI

1z4n4g1 · on Aug 29, 2023

Any relation to: (Maybe Data is a Bad Idea?) 06-21-2016 https://news.ycombinator.com/item?id=11945722

amelius · on Aug 29, 2023

Instead of "files", they should have called it "objects".

Unix is an object-oriented OS.

(The nomenclature is quite backwards. It would be like building a database of employees, companies, etc. then saying "everything is an employee", companies are just a special kind of employees.)

frou_dh · on Aug 29, 2023

"We have persistent objects, they're called files." -- Ken Thompson

https://en.wikiquote.org/wiki/Ken_Thompson

DharmaPolice · on Aug 29, 2023

The problem with "everything is an object" it doesn't tell you a great deal off the bat. It's like saying "everything is a thing". Sure, it's true but it's not helpful.

In your "companies are just a special kind of employee" example well if that is true within your data model then that is quite interesting and contributes to the listeners understanding of how things work. The alternative "companies are a just a special kind of entity" is less useful.

bykhun · on Aug 28, 2023

I wonder if we can reliably expect React-in-Rust framework that compiles to wasm instead of js? Or maybe just a React library written in Rust, but you can still write the source code in javascript?

Or maybe at least Typescript-to-WASM compiler to skip JS output?

capableweb · on Aug 28, 2023

I mean probably, yeah, but why? You'd still need JS to involve the DOM with anything, and that's the performance expensive stuff, so might as well just do the whole thing in JS.

patmorgan23 · on Aug 28, 2023

Isn't there work on letting WASM call the DOM/Browser API's directly without having to go through JavaScript?

mr_toad · on Aug 29, 2023

Like nuclear fusion, it’s always just over the horizon.

csjh · on Aug 29, 2023

I doubt the gains from React rewritten in Rust would be significant; the expensive code is the stuff you write in components

Also Typescript -> WASM exists in some form through assemblyscript

Havoc · on Aug 29, 2023

Seems a bit silly injecting all that garbage collected untyped stuff straight back into a context where one of the aims was to get rid of it

TheRealPomax · on Aug 28, 2023

At that point, why even bother with React? Just write a real user interface.

yazaddaruvala · on Aug 28, 2023

React-likes (including Vue, Preact, etc) have the most ergonomic interface for UX devs to develop with.

_navierstokes · on Aug 29, 2023

What do you define as a "real" user interface?

dekhn · on Aug 29, 2023

The talk was amusing but much less interesting than it could be. Yes, we have adapters and shims and they make interfacing easier, sometimes.

I guess after all these years of waiting for WebAssembly to actually be useful (as in: I can write apps as easily as I can on desktop, with similar performance, but with the safety guarantees of a web sandbox), I finally realized what I wanted: web pages that are backed by virtual machines. Every page is just a view into a VM's framebuffer. User events are delivered just like X11 or win32 events. The page works like a desktop app. Any code that can run in a VM, can run in the window.

After all, the browser is just an inner platform and you should be able to run an virtual machine in your inner platform.

discreteevent · on Aug 29, 2023

I think that this was what Alan Kay was saying when he said that web pages should be objects (where he had previously said that objects are virtual machines all the way down)

dekhn · on Aug 29, 2023

Huh. https://en.wikipedia.org/wiki/Croquet_Project

When I say VM, I mean, "VMWare-style VMs" not "VMs that run a language the same way on multiple platforms".

galaxyLogic · on Aug 28, 2023

Everything is not a file. There are files and file-contents. The content of a file is not a file.

The magic trick happens with #SheBang because that tells you who should interpret and execute the file-contents. That is "magic" because in a sense you must read (part of the content) to know how the contents which is just 0s and 1s should be interpreted.

wahern · on Aug 28, 2023

There are two dimensions to "everything is a file". The first is the concept of a hierarchical filesystem namespace. The second is the notion that read and write can be used as a universal interface to any resource from the perspective of the runtime environment (e.g. kernel). People focus myopically on the first aspect, but it's really the second that is the most important and most enduring.

The word "file" is ambiguous, and was used ambiguously in Unix documentation and papers. It of course meant an object with a name in the filesystem. But it also can refer to an object (whether or not existing on or even referenced by the filesystem) which is accessed using read/write, independent of how you acquired the reference (i.e. file descriptor) to that object. If you look at the whole of the Unix system, read/write over file descriptors is the backbone of the environment; process inheritance, shell redirection... much of the time a program has no idea how a resource reference was acquired, just that it can be accessed using read/write.

kaba0 · on Aug 29, 2023

> People focus myopically on the first aspect, but it's really the second that is the most important and most enduring

But something you can read and write to is not called a file, it is called a stream. It is no accident that Linux differentiates between “files” that you can write to at arbitrary places, or ones that are append-only.

wahern · on Aug 29, 2023

> Certain files do not refer to disk files at all, but to I/O devices. [...] An effort is made to make these special files behave exactly the same way that ordinary disk files behave. This means that programs generally do not need to know whether they are reading or writing on some device or on a disk file.

> [...]

> Files are uniformly regarded as consisting of a stream of bytes; the system makes no assumptions as to their contents.

> [...]

> There is not distinction between "random" and sequential I/O. The read and write calls are sequential in that, for example, if you read 100 bytes from a file, the next read call will return bytes starting just after the last one read. It is however possible to move the read pointer around (by means of a "seek" call) so as to read the file in any order.

Source: https://www.bell-labs.com/usr/dmr/www/notes.html

Though the abstraction is incomplete, the core concept in Unix is that "files" are opaque streams of bytes. Ancillary functions (e.g. seek, ioctl) are then layered atop the basic API to deal with files of different types as required.

Consider that before Unix most file APIs were record or block oriented. Unix unified the I/O model behind a single "file" abstraction--a stream of bytes. Unix stretched the term "file" to encompass a broader range of I/O tasks, but in turn also changed the treatment of traditional disk files in a way that made them look more like non-disk file I/O. This abstraction couldn't completely hide how I/O was serviced on the other side, but one can't criticize "everything is a file" without understanding the context of the time; nor can one full appreciate the value-add.

Also, something I hadn't notice before is how DMR emphasizes the synchronous nature of the API. Apparently many I/O APIs back then were asynchronous. Unix made the I/O model synchronous, but made it easy to create and juggle multiple processes so that you could implement various asynchronous models if you wanted. IOW, they flipped the default case. With the contemporary concern with intra-process I/O concurrency, that's an evolution we seem to be recapitulating.

galaxyLogic · on Aug 30, 2023

So when you "mount" a device like a USB-drive, what happens? A single "file" (the USB drive) becomes multiple smaller files?

Is there also a reverse operation, turning multiple small files into a single bigger file?

teruakohatu · on Aug 28, 2023

> Everything is not a file.

A file is a collection of data, which usually has an associated identifier (filename and/or path). A USB mouse drive is not a file, but it can be represented as such.

kazinator · on Aug 29, 2023

The Unix Philosophy:

A) Tasks should done by combinations of simple tools combined into an elegant pipe.

B) Each tools should do three things, one of them at least so-so well:

Those three are:

1) Hackily parse the output of the previous tool by flaky assumptions like that a certain delimiter is always present, that spaces will not occur in such and such an item, or that such and such a field is from this column to that one and never overflows.

2) Do the actual processing correctly and efficiently --- provided nothing in the data, exceeds a 1023 character limit, or overflows an addition of two ints or doubles.

3) Produce output in some way that is hard to parse correctly for the next tool, like columns that may be empty, contain items with spaces, or overflow so that column widths are not reliable.

imiric · on Aug 29, 2023

That's a very twisted definition of Unix.

1) This is not done by every tool. Instead, there are tools built for specific parsing purposes, and each tool is only in charge of interpreting its command-line arguments or, optionally, stdin in some suitable way. Tools are agnostic of the output of previous tools, and only care about processing data that makes sense to them. It's the task of the user to ensure this data is structured correctly.

2) Where are you getting these limits? Shell limits can be controlled by ulimit(1p), but the standard file descriptors function as unlimited streams of data.

3) Again, every process is free to choose the best way to output data. Some have flags that make processing the output easier by another tool, otherwise they default to what makes sense for the user.

You seem to have a bone to pick with the fact that the data shared between processes is unstructured, and that the user must handle this on their own, but this is what enables building independent tools that "do one thing well", yet are still able to work together. Sure, in a tightly controlled environment, tools can share structured data (e.g. objects in PowerShell, JSON in NuShell, Murex, etc.), but this comes at the expense of added complexity, since each tool needs to handle this specific format, encoding, etc. This is difficult to coordinate and scale, and arguably the rich Unix ecosystem wouldn't exist if some arcane format was chosen 40+ years ago, or if a new one needs to be supported whenever something "better" than JSON comes along. Leaving the data unstructured and up to each process and user to handle, ensures both past and future compatibility.

eviks · on Aug 29, 2023

What if the format chosen 40+ years ago wasn't arcane, and arguably you could avoid the legitimate mess of parsing poorly structured data for 40+ years?

imiric · on Aug 29, 2023

And what format would that have been? It would have predated XML and JSON, so it must've been a bespoke format created for this purpose.

Whatever it were, it would need updating, which means all tools would need to be updated to support the changes, while maintaining backwards compatibility. This is a mess in practice, and probably only acceptable to a single project or organization that maintains all tools, but it's not something that allows an open ecosystem to grow.

It's naive to think that a modern solution can "fix" this apparent problem. New shells and environments can be created that try to address it, but their future is uncertain. Meanwhile, the fact that Unix still exists today in many variations is a testament that those early decisions were largely correct.

eviks · on Aug 29, 2023

The downsides you cite are even worse for the unstructured data, it's an even bigger mess in practice with poor "current compatibility"

And the fallacy of alive=right is also worse than "naivety" since it prolongs the pain for a few more decades longer than necessary (it's a big part of the reason why all those much better tools face uncertain future)

Gabrys1 · on Aug 29, 2023

Plain is arcane?

Mime type: text/arcane

jl6 · on Aug 28, 2023

The UNIX philosophy’s logical extreme is the NAND gate. Does one thing, does it well, and can be composed into arbitrary applications.

Drakim · on Aug 28, 2023

Only if the NAND gate is expressed as a file

Waterluvian · on Aug 28, 2023

Okay… please… someone implement ‘nand’ and write a script that does something simple using only it.

jewel · on Aug 28, 2023

Sure! Here's an implementation of nand and a 2-bit adder that uses it: https://github.com/jewel/nand

Waterluvian · on Aug 28, 2023

Hah this is awesome!

I’m going to poke around and see how much it would take to stretch this into a basic Turing machine emulator.

porridgeandrice · on Aug 29, 2023

https://nand2tetris.org

and my implementation of the projects: https://github.com/porridgewithraisins/nand2tetris

hmcq6 · on Aug 28, 2023

https://nandgame.com/

thaumasiotes · on Aug 28, 2023

Great find. People who like this might also like http://incredible.pm/ .

I don't know how I feel about the message "This is optimal!" displaying when you use one NAND gate to build an inverter at level 2. Level 1 forces you to build an NAND gate out of (1) an AND gate, plus (2) an inverter. It feels like it'd be more optimal to just reuse that inverter.

(And then level 3 is an AND gate, the other primitive† you started with...)

† Technically, you start with two primitives, implementing f(a, b) = a ∧ b and g(a,b) = ¬(b ⟶ a). You also get the ability to provide 0 and 1 as fixed inputs, thus ¬a ≡ g(a,1). What you lose after implementing the NAND gate is the ability to provide a fixed 1 input - instead, you are informed that you assume all gates automatically draw from this. Worst of all possible worlds.

Someone · on Aug 28, 2023

One thing? It does a logical and and then negates the result. The shell doesn’t really let users compose commands this way, but

  alias NAND="AND | NEG"

Also, a NAND command would need to have 2 stdin’s.

thaumasiotes · on Aug 28, 2023

> One thing? It does a logical and and then negates the result.

Sorry? Take a second to think about what you're saying.

In what conceivable sense is "doing a logical and" one thing while "doing a logical nand" isn't?

You can equally claim that

    x & y

is really just

    (x ↑ y) ↑ (x ↑ y)

Skgqie1 · on Aug 29, 2023

Not the person you asked, but the initial point was taking that idea to the extreme. The concept of "one thing" in the NAND case is more extreme (in terms of granularity). No need to be sorry, it's ok.

thaumasiotes · on Aug 29, 2023

Right, the person I asked is claiming that NAND is two things. You appear to have responded as if you disagree with my comment while not actually disagreeing with any part of it.

What did you think I meant by this question?

>> In what conceivable sense is "doing a logical and" one thing while "doing a logical nand" isn't?

Skgqie1 · on Aug 29, 2023

Sorry, I will try to make it more clear.

NAND is more complex than AND, in the sense that it is more expressive than AND (having functional completeness which AND does not).

Similarly, it can be built from other less complex operators (AND and NAND).

If you're taking "One thing" to the extreme, in terms of the granularity or complexity of that "one thing", NAND is not as granular or simple as AND - and therefore isn't taking it to as far "to the extreme".

thaumasiotes · on Aug 29, 2023

What's the argument that AND is less complex than NAND? It's true that NAND has completeness and AND doesn't, but so what? What you can build from something is not a measure of how complex it is. You measure complexity in terms of what it takes to describe something.

TylerE · on Aug 29, 2023

It seems naively obvious to me that a(b(x)) is more complex than b(x). Practically tautology.

thaumasiotes · on Aug 29, 2023

You have to justify why you've chosen the particular starting point. NAND isn't defined as being "first you do AND, and then you negate it". It's defined like this:

    +---+---+-------+
    | a | b | a ↑ b |
    +---+---+-------+
    | 0 | 0 |   1   |
    +---+---+-------+
    | 0 | 1 |   1   |
    +---+---+-------+
    | 1 | 0 |   1   |
    +---+---+-------+
    | 1 | 1 |   0   |
    +---+---+-------+

AND is defined like this:

    +---+---+-------+
    | a | b | a & b |
    +---+---+-------+
    | 0 | 0 |   0   |
    +---+---+-------+
    | 0 | 1 |   0   |
    +---+---+-------+
    | 1 | 0 |   0   |
    +---+---+-------+
    | 1 | 1 |   1   |
    +---+---+-------+

You may notice that they are almost exactly the same.

> It seems naively obvious to me that a(b(x)) is more complex than b(x).

This is just obvious gibberish; if you define b(x, y) = x & y and a(x) = ~x, then you can say "I think a(b(x, y)) looks more complex than b(x, y)", but how do you respond to "when c(x, y) = x ↑ y, I think c(c(x,y), c(x,y)) looks more complex than c(x,y)"? The two claims can't both be true!

Everything, no matter how simple, can be described as the end of an arbitrarily long chain of functions. So what?

alphazard · on Aug 29, 2023

The pipeline pattern has its uses, but it's not the most important (or even the most well known?) tenant of the Unix philosophy.

It's "Everything is a File"[0]. That one has really stood the test of time. And it's often misunderstood to mean that everything implements {Read,Write,Seek,Truncate, etc.}. Then when a TCP socket or character device or whatever shows up, the whole abstraction leaks, and it seems like not everything is a file.

The Good Parts of Everything is a File:

1. Everything exists in a Filesystem, meaning a directory tree. There are directories, they list things, either resources, or more directories. A process gets access to all additional resources through the filesystem. We've really screwed up by putting networking in it's own place, environment variables, etc.

2. File descriptors. If you squint, these are capabilities[1], which is probably the best way to manage resource permissions. They are fine grained, they handle delegation, there is a tree of legitimacy back to the root. We were so close, but we got this one wrong too, with a single global root, instead of a chroot per process by default.

[0] https://en.wikipedia.org/wiki/Everything_is_a_file

[1] https://en.wikipedia.org/wiki/Capability-based_security

fyrn_ · on Aug 29, 2023

They talk pretty extensively about everything is a file? They even mention the "everything is a file" description of that concept

colordrops · on Aug 29, 2023

Yeah seems the GP didn't watch the video.

xena · on Aug 29, 2023

At some point I need to make a really misleading title for my talks or articles just to catch people like the GP that don't read the article or watch the talk and only comment based on what the title makes them assume it's about.

alphazard · on Aug 29, 2023

I did not watch the video. I assumed the blog post was a transcription since it contains parentheticals like "(audience laughs)".

My takeaway from the transcript was that the author was chaining together webassembly programs like a pipeline in a Unix shell. Is the video about something else entirely?

guiambros · on Aug 29, 2023

The post seems indeed a full transcript of the video, and it does mention the everything-is-a-file: "So, going back to where we were with Unix, what does it mean for everything to be a file? What is a file in the first place?"

The talk is fantastic; Xe is a prolific hacker, low-level OS engineer, and engaging speaker. If anyone prefers YT, here's the direct link: https://www.youtube.com/watch?v=QNDvfez6QL0

zzo38computer · on Aug 29, 2023

The pipeline pattern is still very useful, even if it isn't the most important or most well known tenant (I am not sure of this, but maybe you are right). Still, of course it is not perfect. "Everything is a file" has its benefits too, but also is not perfect either.

However, even if not all of {Read,Write,Seek,Truncate,etc} are implemented for all objects, some of them will be implemented, e.g. a TCP socket or character device can use read/write (or some might be read-only or write-only) but is not seekable. This is useful to use programs expecting other files and they will still work. For example, once I had a USB with a exfat file system, which I could not mount, but I knew it contained a ZIP file, so I tried that it it was able to extract the ZIP archive even though it could not be mounted; that is it could treat the USB device itself as a file. And, commonly, it is useful to write stuff that could be written to files, to pipes; for example in Heirloom-mailx you can write attachments to pipes instead of files, and I find this very helpful.

I think that the file system directory tree is not the best way. File descriptors are like capabilities, and that is good (and I agree that it is probably the best way to manage resource permissions), but I think that it could be done better as better capabilities. I think that a single global root and chroot per process are both not the best way; I have a (what I think, at least) better way.

My own design has a file system but does not have directory structures nor file names, but it is a hypertext file system, and files can have multiple forks, and the data streams can contain links (similar to UNIX hard links) to other files. Links can optionally be to a fixed version, or to a changeable version; copy on write can be used if you have both kinds of links to the same file. There is also journaling, and can have locks and transactions that can consist of multiple objects at once (this is necessary in the core system, so that you do not make a mess trying to do such things in user code like SQLite (and any other SQL database engine) does).

My own design also uses "proxy capabilities". Messages can be passed using capabilities, and these messages can contain sequences of bytes and/or capabilities. A program might also create its own capabilities, which can be proxies of others; you can implement fine grained security and also allow fault simulation and many other purposes. All I/O and system calls (except Yield and Quit) must use these capabilities (for full security). A program will receive an initial message when it starts, so it will start up with some capabilities that were passed in that message (which can be whatever capabilities the caller decided to give it). Also, the multiple objects transaction/locking mentioned above is actually a general feature of capabilities and is not specific to disk files; you can make a transaction or lock of any set of objects (if supported; some combinations might not be possible). Proxy capabilities are actually very useful and many of the high-level features of the system are implemented in terms of proxy capabilities, so the kernel does not need to know all of the possible uses.

In this way, you can easily emulate a "chroot per process", although it does not actually work like that. Such an initial message could as easily be used to emulate environment variables or whatever else you might want, too; a POSIX compatibility layer can be possible in user code if needed, although the low-level and high-level design of this system are not designed to be POSIX but rather something different, which does (what I am considering) working better in many ways.

(My own operating system design does not currently have a name.)

frutiger · on Aug 29, 2023

Sounds pretty cool. Which hardware platforms does it target (even if they are virtual)? Is there a way to run it?

zzo38computer · on Aug 29, 2023

Currently the implementation is not written yet (and it is supposed to be possible that multiple implementations can be made, including ones that can run on other operating systems as well as by itself); currently they are the design ideas. I have written some stuff about it on comp.os.misc, although a few of the ideas have changed a bit since then and I have another message with some additional ideas, which I have not sent yet. Hopefully if other people also will comment then we can make the improvement, and then write the actual specifications (there are both low-level and high-level specifications) and can be made. I have essentially the ideas of the working of most of the low-level stuff already, and once it can be made the actual specification then the kernel could be implemented even though the high-level system might not be made yet.

some_furry · on Aug 28, 2023

> It's magic, just without the spell slots.

I laughed out loud during a work meeting when I read this. Bravo, Xe!

lifeisstillgood · on Aug 28, 2023

You were reading HN during a work call ! Call HR !

some_furry · on Aug 29, 2023

I'm losing my job soon due to their return to office bullshit. And I was hired as a remote employee.

HR can kiss my fuzzy butt lmao

lifeisstillgood · on Aug 29, 2023

I am sorry to hear that. Please accept my apologies for a quick and easy sarcastic comment that cut too close.

Good luck with the future job hunt.

some_furry · on Aug 29, 2023

Ah, thanks for the compassion. I didn't take any offense though. :)

jheriko · on Aug 29, 2023

[flagged]

dang · on Aug 29, 2023

Could you please stop posting unsubstantive comments and flamebait? You've unfortunately been doing it repeatedly. It's not what this site is for, and destroys what it is for.

If you wouldn't mind reviewing https://news.ycombinator.com/newsguidelines.html and taking the intended spirit of the site more to heart, we'd be grateful.