Hacker News new | past | comments | ask | show | jobs | submit login
Do Files want to be Actors? (lewiscampbell.tech)
120 points by LAC-Tech 10 days ago | hide | past | favorite | 85 comments





Asynchronous I/O completion notification was a huge innovation, allowing computation to proceed concurrently with I/O and thus dramatically increasing the throughput of the computer. Unfortunately it was also a huge source of bugs. The following innovation that made it usable, by making it less bug-prone, was called a "multitasking operating system". The so-called "OS" allowed you to write simple sequential code, but used the computer efficiently by switching back and forth between multiple tasks as their respective I/Os completed. We're talking about the introduction of the Univac 1103A in 01953, 72 years ago, and the following 20 years of innovations, including things like Dijkstra's THE operating system.

That is, asynchronous I/O is 20 years older than the Unix system call interface this article speculates it should replace.

Of course, context switching between different tasks is not free, and event loops have frequently been able to provide higher efficiency. The equilibrium has rocked back and forth as I/O has gotten faster and slower relative to task context switching. CICS, select(), poll(), Oberon, the Macintosh system, Win16, the JavaScript event loop, Tcl/Tk, Win32 IOCP, Symbian active objects, kqueue, epoll, and io_uring are some of the results.

But don't try to sell asynchronous I/O as a "game-changing" paradigm shift. It's a different programming model that's harder to program but can provide higher performance, just like it has been for 70 years.

If you're shopping for a paradigm shift that can improve this tradeoff, there are several candidates. Erlang-style lightweight processes, software transactional memory, and JS-style promises (originally from E, which was inspired by KeyKOS, which had event loops but no promises or asynchronous I/O) come to mind.

The hardware development that might be actually new has arguably already failed in the market: Intel/Micron's "Optane" memory and Flash-based NVDIMMs. Flash has big disk energy, but is fast enough that copying it to RAM one word at a time like disk will probably bottleneck your performance by an order of magnitude. io_uring doesn't fix this. We need interfaces designed for zero-copy access to bulk persistent data. Maybe something like Multics or mmap(). LMDB and FlatBuffers suggest that the potential for improved performance is significant. Could such high-bandwidth, low-cost memory keep up with LLM inference, allowing you to do inference on a 256-gigabyte model with 256 gigabytes of Flash but a much smaller amount of RAM?


We already have extremely high-bandwidth storage. Jens Axboe posts benchmarks on twitters using his io_uring test server, like this one[0] claiming 180 GB/s of write throughput, which is fast enough to start blurring the lines with memory bandwidth. You can't use mmap() for large persistent data because CPU silicon doesn't support enough virtual memory, ignoring the other performance issues with current implementations.

The elephant in the room is that using high-bandwidth storage well with minimal RAM requires different and much better scheduler design than currently exists in the vast majority of systems, including every OS I am familiar with. The higher the bandwidth and the larger the storage, the better your scheduling needs to be at latency hiding using techniques that don't involve a large cache. A lot of hardware tech, like Optane or HPC fabrics, was invented to avoid having to address schedulers being poor at latency hiding, which is essentially a (non-trivial) software problem.

Most users of modern fast asynchronous I/O still tend to delegate scheduling, treating the I/O schedule and execution schedule as separate concerns, even when done entirely in user space. It is a missed opportunity.

You do raise a valid issue: this requires a much more sophisticated design and implementation than most developers are comfortable with, which creates a lot of inertia behind doing things the classic way. This could all be abstracted away from the average developer in principle, something similar to a database kernel, but no one has built one yet.

[0] https://x.com/axboe/status/1854635553775378458


Thank you! This is extremely interesting indeed.

My perhaps naïve thoughts on "extremely high bandwidth" are that the bus from the NAND Flash matrix to the Flash chip's internal RAM buffer is typically something like 4 kibibytes wide, and maybe you can read a page of the Flash into that buffer in hundreds of nanoseconds, though the chips I've looked into take tens of microseconds. (And if you can cut the latency down that much, the scheduler problems get much easier, too.) If those buffers are then accessible over the CPU's memory bus, maybe you can usefully transfer pages from the Flash into the buffers at many times the bandwidth of the CPU's memory bus, as long as you're only reading a small part of each page. As I understand it, current SSDs mostly only approach the bandwidth of SDRAM if you're reading entire pages.

Back of the envelope: if you have 32 Flash chips in your system, and each one of them can read a 4096-byte page from its NAND matrix into a chip-internal RAM buffer every 100 ns, you'd have 1.3 terabytes per second of such bandwidth. However, this illustrates how demanding this kind of access would be to the hardware; basically it demands Flash as low-latency as SDRAM.

You can definitely mmap() parts of devices or files that are larger than your CPU's virtual memory. mmap() does not require you to map the entire device or file.

I agree that you need some abstraction layer that provides a simple, reliable interface with adequate performance.


The I/O amplification from atomic 4KiB block size is definitely a thing. Even if you have the I/O bandwidth to burn this still represents a waste of useful RAM and memory bandwidth. There are a number of data layout and scheduling techniques that can significantly mitigate this.

Latency is always going to have a speed-of-light issue, and storage is being moved physically further from the CPU with time; latency reductions in silicon are re-added by distance. Flash gets around that with extreme parallelism, which implies very deep pipelining from the execution scheduling side to fill all of the I/O slots necessary to saturate the bandwidth.

This in turn creates a raft of second-order design problems on systems with extremely large and extremely parallel storage. Total memory requirements for just the execution state being scheduled, the data structures that provide data selectivity (so you can choose the optimal I/O to schedule), and landing buffers for parallel inflight I/O can easily overflow available RAM on large servers in plausible environments. And databases normally keep much more than just that in RAM. It is an interesting open architecture problem that has never been considered. You can't trivially patch existing architectures to make it work, it would need to look very different.

We've taken it as axiomatic that certain data structures will always fit in RAM when dealing with large data, but extremely parallel, fast, and large storage is exposing that assumption in interesting ways. An architecture that effectively decouples memory requirements from large, fast storage would radically change how we design data intensive software, since there are a lot of design idioms and limitations today that primarily reflect RAM scaling issues.


1. The article wasn't talking about "asynchronous I/O". It was talking about message-passing (actor model/message-oriented OO) as the interface.

2. Message passing isn't new, it is at least 50 years old even if you don't go further back than early Smalltalk and the Actor model.

3. The article wasn't "selling" anything, and certainly nothing new see (2). It was noting convergence from several directions on an old and somewhat misunderstood paradigm.


It mentions the names of message-passing and the well-known actors model, but the ideas it engages with are the ideas of io_uring, which is a system for asynchronous I/O using a message queue. The two designs are not unrelated, and you could write an interesting post exploring the relationships between them, but unfortunately the author did not do that, because his engagement with the ideas of the actor model and message-passing was limited to quoting one of Hewitt and Baker's famous early attempts to define actors.

Instead, he jumps into salesman hypester mode with "The game has changed," which, like, gag. A lot of games are having their rules rewritten right now (tank/drone warfare, the energy market, freedom of expression, international finance, and artificial intelligence come to mind) but asynchronous I/O is not one of them. (Except, maybe, in the non-io_uring-related way I suggested—the advent of much-higher-bandwidth access to large-capacity storage devices than I/O buses can handle—to which message-passing is even less applicable.)

It's perfectly fair to describe actors as "somewhat misunderstood" because the ways Hewitt himself understood it over the 50 years he developed the idea frequently contradict one another. At the end of his life he spent several years writing https://arxiv.org/abs/0904.3036v12, which describes his conceptualization of it at that time, which was very different from the early versions, though I think he would deny that. The versions on the arXiv only go back to 02009, but I am pretty sure the draft paper he showed me when I met him a few years before that was an earlier draft of the same thing.


You're missing the point.

The existing "asynchronous" I/O mechanisms that I am aware of all use a procedural interface...which doesn't really work.

The first kind of procedural interface is what I would call "the simulation of synchronous I/O". So basic OS behavior that goes outside the procedural model supported in the programming language by suspending your process and going off to do something else while the I/O completes.

This has various problems, mainly the one of suspending your process, but it is nice and simple from the perspective of a program in the call/return architectural style because it never has to see anything outside its understanding of the world.

The attractive convenience and the intrinsic problems have led us to reproduce this mechanism at the process level, the kernel-thread level, the user-thread level and most recently the async/await level. The fundamental flaw remains: we are simulating a synchronous procedural interface on top of something that is very different[1]

Callback hell is another way of mapping asynchrony to synchronous procedural interfaces, but well...yikes. NT completion ports and the like are as well, and let's agree not talk about aio(4).

Asynchronous messaging such as that in io_ring is a different way of interfacing with asynchronous I/O, just like Erlang messages are different from synchronous procedure calls and synchronous RPCs. Instead of all communication being encoded in individual procedures, it is encoded in reified messages (the io_uring_sqe struct). These have an "opcode", the message name, and parameters.

Now you can ignore the completely different interface that is the point and instead focus on the underlying asynchronous I/O operations, but that is, well, missing the point.

I have built more asynchronous/message-oriented I/O APIs in userspace with and for Objective-S[2], and am personally very interested in how these could map to the io_ring kernel interface. I certainly agree with the poster's point that this is fundamentally different from what has come before. And again: the messaging interface to (inherently asynchronous) I/O, not the fact that there is some (procedural) mechanism for asynchronous I/O.

[1] https://2020.programming-conference.org/details/salon-2020-p...

[2] https://objective.st


> Callback hell is another way of mapping asynchrony to synchronous procedural interfaces, but well...yikes. NT completion ports and the like are as well

NT IOCP are true async as all I/O in the NT kernel is asynchronous. It was a design principle.

NT also has I/O Rings, based on io_uring.


I didn't say they weren't.

This isn't my first interaction with Kragen on this site, so unfortunately I'm not sure all his criticisms are genuine. He's said a few interesting things, but the amount of stuff he's written about my post is well over twice the length of my actual post, so at some point you just skim read.

I'm not sure what you mean by "genuine". My criticisms are sincere, as always, though of course that is no guarantee that they are correct. I've certainly made every effort to ensure that they are, but I'm fallible. Do you think they are incorrect?

If you don't find my perspective of interest, you of course have no obligation to read it, but if I didn't think something like my comments were worth reading, I wouldn't have written them. It's true that they're longer than your post—but that's because your post is unfinished!

Sometimes my writing falls short of the mark, but in this case, other people seem to have found my comment worthwhile, and I think that it turned out to be much higher quality than I had hoped. This time I think you're missing out by skimming.


If i had a more interested audience my posts would be more interesting.

Yeah, it's difficult for someone else to be of interest to someone so wrapped up in their ego they answer thoughtful critiques with lazy insults. Fortunately I have another audience.

On the other hand, the most interesting discussions to be had are the ones where the opponent disagees with you.

As this is hard for many people i learn to completely ignore any part that is not of interest.

The funniest instance was an infuriated coworker calling me names, ready to fight me. He must have raged on fir 10 whole minutes while i calmly looked at him without expession. After his rage my conclusion was that he was right and calmly told him he was, next shift we will do it the way you've described. If he was wrong i would just say NO.

Made a good friend that day :)


> Flash has big disk energy, but is fast enough that copying it to RAM one word at a time like disk will probably bottleneck your performance by an order of magnitude.

Much more of a slowdown than that. Mmap a big file on an SSD, access its bytes randomly, and you can get your 500MB/s read speed down to kilobytes/s.

> We need interfaces designed for zero-copy access to bulk persistent data.

This is in tension with caching. For it to work, you'd need to get system builders to stop bundling smaller&faster storage with their slower&larger storage.


I think that you have read me to be saying the opposite of what I meant in both of these cases. Possibly https://news.ycombinator.com/item?id=42595851 will clarify.

thank you for writing this.

I cut my teeth on OS/2 in the early 90s, where using threads and processes to handle concurrent tasks was the recommended programming model. It was well-supported by the OS, with a comprehensive API for process/thread creation, deletion and inter-task communication. It was a very clear mental model: put each sequential sequence of operations in its own process/thread, and let the operating system deal with scheduling - including pausing tasks that were blocked on I/O.

My next encounter was Windows 3, with its event loop and cooperative multi-tasking. Whilst the new model was interesting, I was perplexed by needing to interleave my domain code with manual decisions on scheduling. It felt haphazard and unsatisfactory that the OS didn't handle scheduling for me. It made me appreciate more the benefits of OS-provided pre-emptive multi-tasking.

The contrast in models was stark. It seemed obvious that pre-emptive multi-tasking was so obviously better. And so it proved: NT bestowed it on Windows, and NeXT did the same for Mac.

Which brings us to today. I feel like I'm going through groundhog day with the renaissance of cooperative multi-tasking: promises, async/await and such. There's another topic today [0] that illustrates the challenges of attempting to performs actions concurrently in javascript. It brought back all the perplexion and haphazard scheduling decisions from my Windows 3 days.

As you note:

> Of course, context switching between different tasks is not free, and event loops have frequently been able to provide higher efficiency.

This is indeed true: having an OS or language runtime manage scheduling does incur an overhead. And, indeed, there are benchmarks [1] that can be interpreted as illustrating the performance benefits of cooperative over pre-emptive multitasking.

That may be true in isolation, but it inevitably places scheduling burden back on the application developer. Concurrent sequences of application domain operations - with the OS/runtime scheduling them - seems like a better division of responsibility.

[0]: https://news.ycombinator.com/item?id=42592224

[1]: https://hez2010.github.io/async-runtimes-benchmarks-2024/tak...


Did you ever used SOM?

To this day it still seems it had a much better approach to components development and related tooling, than even COM reboot as WinRT offers.


Yes! Fond memories. I put it firmly in the Betamax category: superior technology that lost out for political/marketing reasons.

for those curious about SOM OS/2 Technical Library: System Object Model Guide and Reference

https://archive.org/details/os2-2.0-som-1991


What makes me angriest about the current async propaganda... and I use the term deliberately to distinguish it from calm discussions about relative engineering tradeoffs, which is a different discussion... is the idea that it started with Node.

Somehow we collectively took all the incredible experience with cooperative multitasking gathered over literally decades prior to Node and just chucked it in the trash can and had to start over at Day Zero re-learning how to use it.

This is particularly pernicious because the major issue with async is that it scales more poorly than threads, due to the increasing design complexity and the ever-increasing chances that the various implicit requirements that each async task has for the behavior of other tasks in the system will conflict with each other. You have to build systems of a certain size before it reveals its true colors. By then it's too late to change those systems.


I would frame it a bit differently. Async scales very elegantly if and only if your entire software stack is purpose-built for async.

The mistake most people are making these days is mixing paradigms within the same thread of execution, sprinkling async throughout explicitly or implicitly synchronous architectures. There are deep architectural conflicts between synchronous and asynchronous designs, and trying to use both at the same time in the same thread is a recipe for complicated code that never quite works right.

If you are going to use async, you have to commit to it with everything that entails if you want it to work well, but most developers don't want to do that.


This is actually a major issue in the LLM wrapper space. Building things like agents (which I think are insanely overhyped and I am so out on but won’t elaborate on), usually in Python, where you are making requests that might take 1-5 seconds to complete, with dependencies between responses, you basically need to have expert level async knowledge to build anything interesting. For example, say you want two agents talking to eachother and “thinking” independently in the same single threaded Python process. You need to write your code in such a way that one agent thinking (making a multi second call to an llm) does not block the other from thinking, but at the same time when the agents talk to each other they shouldn’t talk over eachother. Now imagine you have n number of these agents in the same program, say behind an async endpoint on a FastAPI server. It gets complicated quick.

It's also unnecessary for virtually all actual systems today.

The systems that can potentially benefit from async/await are a tiny subset of what we build. The rest just don't even have the problem that async/await purports to solve, never mind if it actually manages to solve it.


> The following innovation that made it usable, by making it less bug-prone, was called a "multitasking operating system". The so-called "OS" allowed you to write simple sequential code, but used the computer efficiently by switching back and forth between multiple tasks as their respective I/Os completed. We're talking about the introduction of the Univac 1103A in 01953, 72 years ago, and the following 20 years of innovations, including things like Dijkstra's THE operating system. That is, asynchronous I/O is 20 years older than the Unix system call interface this article speculates it should replace.

That's just a scheduler though, and not necessarily an actor-oriented one. Multitasking doesn't imply communication between tasks, certainly not actor-oriented bidirectional message passing.


Yes, I agree. Something similar is why I don't think it's accurate to describe this article as being about actors: it's not about schedulers, but it's about asynchronous I/O, which is equally well not the same thing as actors, though scheduling and asynchronous I/O both have very interesting relationships with actors, which the article unfortunately does not go beyond vaguely gesturing at.

Sorry, but you are first changing "what the article is about" to something that the article is not, in fact, about, and then criticizing this thing that you just made up.

Not helpful.


I don't understand what you're saying; could you clarify which three things you are referring to?

I found your other comment above, the one mentioning Smalltalk, very interesting, and will reply to it later after thinking more about it.


I will do my best.

1. You:

> I don't think it's accurate to describe this article as being about actors

So the article says it is about actors. It says it is about messaging, it certainly is about asynchronous messaging, and you even agree that io_uring is an asynchronous message queue.

In what way is the article not about a connection between actors and io_uring?

This is what I mean when I write that you are changing what the article is about. It is about this: asynchronous messaging/actors.

It may not go into a lot of depth about that connection, but it clearly is about it. And it may be wrong to focus on this. It may be wrong in how it describes it. But you cannot claim that it is about something else.

2. You:

> it's not about schedulers

Yes. And? Why does an article showing the connection between a general concept of asynchronous messaging (actors) and a specific instance of asynchronous messaging (io_uring) have to be "about" schedulers?

Please don't answer, it is rhetorical question.

3. You:

> but it's about asynchronous I/O

This is where you actually do the change. No: it is not about asynchronous I/O (in general). It is about an asynchronous messaging interface to I/O. Not the same thing. At all.

Once again, maybe you think it should be about this topic instead. And maybe you are even right that it should be about this (I don't think that's the case). But even if you were right that it should be about this other topic, you are not free to claim that it is about this other topic, when it clearly is not.

4. You:

> Asynchronous I/O completion notification was a huge innovation, ...

> But don't try to sell asynchronous I/O as a "game-changing" paradigm shift...

That's where you criticize the article for the thing you made up that it should be about, but is not. The article is not even about asynchronous I/O in general at all, never mind trying to sell asynchronous I/O as anything. It is talking about messaging, the fact that you can regard io_uring_sqe as a message and the submission and completion queues as message queues. Yielding something that's roughly equivalent to (some version of) the Actor model.


I take your example on multitasking operating systems as not being limited to only helping make friendly asynchronous I/O, but I do think a deeper consideration of Multics is coincidently appropriate.

The telephone and electrical power networks were vast in scope (and still are), enabling interstate communication and power utilities. Echoes of the transportation utilities enabled through railroads. Multics was architected partially with the commercial goal of scaling up with users, a computing utility. But in a time with especially expensive memory, a large always resident kernel was a lot of overhead. The hardware needed a lot of memory and would be contending with some communication network whose latency could not be specified at the OS design time. Ergo, asynchronous I/O was key.

Put differently, Multics bet that computing hardware would continue to be expensive enough to be centralized, thereby requiring a CPU to contend with time-sharing across various communication channels. The CPU would be used for compute and scheduling.

Unix relaxed the hardware requirements significantly at the cost of programmer complexity. This coincided roughly with lower hardware costs, favoring compute (in broad strokes) over scheduling duties. The OS should get out of the way as much as possible.

After a bunch of failed grand hardware experiments in the 1980s, the ascendant Intel rose with a dominant but relatively straightforward CPU design. Designs like the Connection Machine were distilled into Out of Order Execution, a runtime system that could extract parallelism while contending with variable latency induced by the memory subsystem and variable instruction ordering. Limited asynchronous execution mostly hidden away from the programmer until more recently with HeartBleed.

Modern SoCs encompass many small cores, each running a process or maybe an RTOS, along with multiple CPU cores, many GPU cores, SIMD engines, signal processing engines, NPU cores, storage engines, etc. A special compute engine for all seasons, ready to be configured and scheduled by the CPU OS, but whose asynchronous nature (a scheduling construct!) no longer hidden from the programmer.

I think the article reflects how even on a single computer, the duty of the CPU (and therefore OS) has tilted in some cases towards scheduling over compute for the CPU. And of course, this is without considering yet cloud providers, the spiritual realization of a centralized computing utility.


These are good points. I hadn't thought about the perspective that the central processor in a heterogeneous multicore system may spend a lot of its time orchestrating rather than computing—whether it's a GE 635 with its I/O controllers https://bitsavers.org/pdf/ge/GE-6xx/CPB-371A_GE-635_System_M..., an IBM 360 with its "channels" https://en.wikipedia.org/wiki/IBM_System/360_architecture#In..., or a SoC with DSP cores and DMA peripherals—but it's obviously true now that you say it. I've seen a number of SoCs like the S1 MP3 player and some DVD players where the "central processor" is something like a Z80 or 8051 core, many orders of magnitude less capable than the computational payload.

(One quibble: I think when you said "HeartBleed" you meant Meltdown and Spectre.)

I think there have always been significant workloads that mostly came down to routing data between peripherals, lightly processed if at all. Linux's first great success domains in the 90s were basically routing packets into PPP over banks of modems and running Apache to copy data between a disk and a network card. I don't think that's either novel or an especially actors-related thing.

To the extent that a computational workload doesn't have a major "scheduling" aspect, it might be a good candidate for taking it off the CPU and putting it into some kind of dedicated logic—either an ASIC or an FPGA. This was harder when the PDP-11 and 6502 were new, but now we're in the age of dark silicon, FPGAs, and trillion-transistor chips.


Asynchronous IO with user space threads works wonders to get both the performance of async IO and the convenience of sequential programming.

Another example would be the Solo Operating System, written in Concurrent Pascal, with co-routines support, back in 1975, same year as UNIX V6 was released.

I came here to basically say this. io_uring is a leap into the past. It's going to be blocked in a lot of environments since it makes system calls unmanageable. Now mmap() on the other hand is a real enlightened system call. I got probably 20,000 followers on Twitter when I popularized it for LLMs two years ago.

>"Our operating systems want to do things asynchronously, on their own terms. They'll tell you when they're done. Maybe this is a new era. Maybe making syscalls from 1970s Unix directly is like a remote procedure call to another machine - a leaky abstraction, a feeble attempt to impose your old mental model onto a new reality."

I think the article fell off here. Linux has a profoundly synchronous I/O system. The I/O U-Ring works there by dispatching requests to a kernel-maintained pool of worker threads. There the submitted request are run synchronously.

So the I/O U-Ring is in fact the abstraction, "a feeble attempt to impose your [new] mental model onto an [old] reality", as the author might put it. The actual Linux I/O system is most trivially exposed to userland by the same old "1970s Unix" means of the traditional system calls.

The real change with the I/O U-Ring is that it offers a way to submit work in bulk and finally introduces to Linux a form of I/O that appears to the program to be truly asynchronous.

(And it's not without some fair cause that file I/O is implemented as synchronous code. It's hard to do it asynchronously when it's a complex operation with many steps, where you are interacting with a page cache, not only for file contents but even caching the very metadata that describes how to get from an offset into a file to the block numbers where the data is stored. Windows has a famously profoundly asynchronous I/O system and even there actual file I/O is done just the same with synchronous logic ran in a worker thread, or copied directly from the cache if the requested data is already there.)


Fwiw, a good portion of IO via io_uring is not executed via the threaded work queue anymore. Even with buffered file reads it's avoided for some common filesystems.

You're right of course that there are lots of cases (missing metadata, synchronous operation like extending files, ...) where it's all offloaded to the wq.


I suppose by "new reality" I meant that asynchronicity seems to be happening from both above (programming lagnauges) and below (hardware and I suppose..physics), and then in the middle we still have units of execution pausing on syscalls. I get that io_uring is an abstraction built on top of classic unix syscalls, but I still find it interesting that even linux people are starting need an async model at the OS level.

This change of mentality is similar to how monolithic kernels get praised, only to have daemons, microservices and containers all over the place, using a much slower OS IPC communication channel than if it had been a microkernel from the start.

To me, it’s not a change of mentality, but a change of POV on how to think about concurrency. If everything is an actor, and you recognize it early on, you can use tech (elephant in the room is Erlang/Elixir) that might fare better overall. There’s no silver bullet ofc.

One thing common to that tech, including stuff on JVM and .NET, is that then you are at a point where UNIX/POSIX becomes irrelevant, basically my distributed systems story, since the early days of Java and .NET.

Naturally considering Erlang, and any other language with rich runtime and ecosystem as well.


Yes you're actually not the first person to mention microkernels to me, when I've ranted about this stuff.

How do they do syscalls? Is every one basically an IPC?


Yes, but a special kind of IPC, where ownership of message block is exchanged.

QNX and se4L are two of the fastest ones, as general purposes OSes. Then you have embedded ones for high integrity computing like INTEGRITY RTOS.

Here some info,

https://swd.de/Support/Documents/Manuals/Neutrino-Microkerne...

https://docs.sel4.systems/Tutorials/#sel4-mechanisms-tutoria...

https://www.researchgate.net/publication/386549964_An_Overvi...


Depends on the syscall. If you are talking about posix APIs or libc APIs then yes many of them are syscalls. For instance on fuchsia, opening a file requires ipc, but once you open it, read calls are syscalls into the kernel which either return right away because the data is already paged in or otherwise delegate to a filesystem in userspace to provide the data to back the pages it needs. However that said writes are just one asynchronously so it will never block. Network io is done through ipc, but buffers for the socket live in the kernel so you might not need to block for a read to finish, and almost certainly don't block for a write unless there are no more buffers left for that socket. Many other posix syscalls can occur entirely without leaving the process. For instance creating a file descriptor doesn't is an in process concept. A call like uname will require a blocking ipc call but that's not really a problem in practice.

If the caller uses asynchronous native APIs to perform ipc instead of posix APIs then everything can be non blocking.


Like everything else, the sweet spot is probably in between.

Is this Greenspun's tenth rule but for the BEAM? Is everything a distributed system?

(Trick question! A computer has for a long time been a distributed system, we just don't like to worry about that usually.)


The fact I even got into this is Joe Armstrong's fault. I was listening to his talk about programming multicores and thought "you know I'm on-board with the actor model across the network, but what's the point of having multiple actors on one machine?" Which lead me down the async rabbithole, which through io_uring ends up reminding me of actors!

I may be too far into the BEAM, but it's definitely the case that a modern computer is a distributed system at so many levels.

The Network is the Computer. :)


Yep.

I still have some of the Sun manuals with that sentence.

It is kind of ironic how many decades we have been doing distributed systems, and now everyone talks about microservices as if rediscovered powder.


yeah. but remrmber that the BEAM is more than just a message queue, its stuff like monitors and links, which are not part of the actor model and confers benefits like why you dont typically have to manually close a file in the beam, or how you can write to a (non-raw) file over the network transparently

Maybe it's a bad analogy, but this reminded me of how the "actors" in Sims 1 were implemented. Rather than having a massive Sim class that knows how to interact with every object, the Sims ask each object what it can do, and it's up to the object (kernel) to handle the implementation details (IO in this case), do its work, and return later when it's done.

Isn’t that just OOO? Or was there actual concurrency involved?

There are a lot of ways to skin a cat. I doubt that most of how Linux works is really ideal at this point. (Which is not to say it's easy to replace or impractical).

But that structure/API is very similar to many similar patterns in computing. Look at Smalltalk or OOP to some degree. Maybe extend it with operational transforms.

I think there are a lot of interesting alternative ways to look at operating systems and many developments over the years. Such as Plan 9, MirageOS, and several other projects.

Also just to clarify I love cats and need a better metaphor.


This reminds me of modern mathematics -- a lot of interesting and useful stuff comes from looking at old concepts from another viewpoint (different way to skin a cat). E.g. "what if we look at our stuffs in hand as elements of group XXX -- we get the whole group theory applicable now" etc. The trick is to note similarities.

So now we can study Actor problems and apply it to io_uring to avoid pitfalls earlier.


There's more than one way to skin a carrot

Wrapping file i/o in actors is exactly what we do in the Goblins distributed programming system. It ensures that only one operation is happening against the file descriptor at any given time while allowing other actors to use it asynchronously.

Interesting bit of synchronicity. I was recently looking into Erlang in the context of large scale data distribution and was wondering if instead of everything is a file everything is a process would be more appropriate. The rationale was that it would better aggregate behaviour and lend itself to more reasonable scaling methodologies.

Wouldn't we only have 2 actors, the kernel and our program / event loop? But "actors" plural makes it sound like a system of many independent nodes "sends new messages to other actors and computes a new local state for itself."

You know I'm not sure how many io_uring loops you'd need. If any wandering systems programmers around, could you tell us if there's any advantage to having more than one per process?

Surely one per core is near optimal?

The main benefit with Actors for me was that each Actor is single-threaded and stateful. That simplifies a lot (but moves the complexity to cross-actor communication). Not sure each file IO and state is guaranteed to be single-threaded.

Another characteristic of Actors is that Actors can spawn other Actors.

Also, "the mail system" (queues, message dispatch) is not part of the definition of the Actor Model. It's an implementation detail.


> Another characteristic of Actors is that Actors can spawn other Actors.

But it's entirely up to the actor if it wants to or not. I could imagine an actor supervision-hierarchy with directory-actors that have child file-actors and other child directory-actors. The file-actors could just be leaves in the hierarchy.

Each write operation on a file would be guaranteed to be single threaded via the file-actor. But a file-actor could also launch ephemeral child-actors that do read-only processing on a snapshot of the file. So, parallel processing would be possible for read-only operations.

A file-system does fit quite elegantly into the actor model imho. Whether it would be efficient or not, who knows, but at least on the surface it fits.


For what it's worth, I have developed an actor-based async file i/o framework that does multiplexing, caching and format translation. Files are actors and pages (which serve fixed size blocks out of a particular file) are actors. The main motivation was to support caching and prioritised i/o. It doesn't target io_uring yet though.

Sounds interesting. When can you share it with us on HN?

I wonder how the stdlibs of existing languages will take advantage of io_uring and async syscalls. Does this mean we won’t have to spin up a subprocess for the most trivial things?

The node.js std lib has had async versions of some of their modules for a while, eg fs/promise[1] I know node.js uses epoll but I am pretty sure bun uses io_uring.

Also worth mentioning Zig has a realy great iouring library[2] right in the stdlib, which without looking up the source code I am guessing is what bun uses under the hood:

[1] https://nodejs.org/api/fs.html#promises-api

[2] https://ziglang.org/documentation/0.13.0/std/#std.os.linux.I... - (give it a minute to load, the zig people are great system programmers but webdevs they are not lol)


Node.js has had async version of their modules from the very first version (or at least, long before it got popular). The "/promises" modules just wrap that in promises so you can use it with a nicer interface and async/await syntax (the original versions were callback based).

Wow, the Zig lib is as close to raw io_uring as it gets. It directly accepts an fd, a submission, and a result queue.

I'm not really sure what the point of this post is. As soon as you introduce asynchronous queueing of operation descriptors and completions the analogy to Actors is present. IO completion ports are not a new concept in operating systems, nor is inter-process messaging. Hell classic MacOS has async IO command objects, with an API to queue them, most likely you're going to put them in a queue when you get the completion callback too.

I think what's missing is that this an extremely low level programming model:

Upon receipt of this message in the event E, the target consults its script (the actor analogue of program text), and using its current local state and the message as parameters, sends new messages to other actors and computes a new local state for itself.

It doesn't say anything about whether you do:

    - nginx-style state machines in C
    - callbacks in C++, or C++ 20 coroutines
    - async/await in Rust
    - Goroutines in Go
    - async/await in Python or JS, with garbage collection
etc. I don't think the "actor model" really means that much these days.

What's a "canonical" and successful actor model program? What can we learn from such programs?

I think if you ask 5 people you'll get 5 different answers.

---

Also, with

    __u8    opcode;         /* type of operation for this sqe */
    __s32   fd;             /* file descriptor to do IO on */

then you have lost all static typing. It is too low level, so the analogy doesn't really hold up IMO.

Also, I don't understand why it's "do files want to be actors?", not "do Unix PROCESSES want to be actors?"

(copy of lobste.rs comment)


Which is why when you have programming languages with rich runtimes and ecosystem, the OS kind of becomes irrelevant.

"an operating system is a collection of things that don't fit inside a language; there shouldn't be one"

-- Dan Ingalls

So what happens is that those runtimes built on top of whatever low level primitives are available, and that is about it.

Even considering UNIX alone, many ways to do asynchronous IO aren't even part of POSIX, it has remained specific to each UNIX flavour.

To some extent, UNIX/POSIX API surface has been the C and C++ standard library that WG14 and WG21 didn't want to take over into ISO, but almost every C and C++ developer expects to exist anyway.


> then you have lost all static typing

Do you need it at this level? At some point everything is a bit-field. We impose typing to aid our mental models and build useful abstractions.

When interacting with the kernel we can let go, then reclaim, our types


> Is it just me, or are these two seemingly unrelated schools of computing converging on the exact same idea? You send messages to some target (ie, the file descriptor).

Seems to me that it's a matter of perspective, specifically about who you're sending messages to.

You can consider the message as being sent to the file (descriptor).

You can also consider it a message sent to the kernel, in which case the kernel is an actor and the file is a passive data abstraction.

Both are accurate and useful; which one makes more sense will depend on context and your own background.


I do think there's something to be said for files being seen not as things that can be "owned" and "manipulated" and "locked" by a main process, but rather as their own independent services, in a unified registry, who can only return results asynchronously, and only over an external API - a la the Amazon Memo (https://chrislaing.net/blog/the-memo/). Whether there's an API broker (in this case, the kernel) is just an implementation detail of the "network" layer.

I've said this before, and I'll say it again: On a long enough timescale, everything converges to Erlang.

Anybody knows if MacOS / iOS uses something similar to io_uring?


Not really, it depends on the OS X version, and the one used by Swift has multiple implementations.

Also it isn't regular pthreads on OS X, rather Apple own's flavour.

Described on the Internals section.


No. The only thing needed is Capabilities and we're done.

Files actually want to be microservices.

Oh, file descriptors, yeah. Not files themselves, that would be weird.

The file itself would be the state of the actor. So it makes sense to me (if I were to implement a file-system with actors).

Files and directories are actors in KeyKOS, which is one of those experiments someone did once that's worth reading about for new ideas.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: