> Conceptually, during a door invocation the client thread that issues the door ...

wahern · 2024-07-24T08:42:54 1721810574

IIUC, what they mean by "migrate" is the client thread is paused and the server thread given the remainder of the time slice, similar to how pipe(2) originally worked in Unix and even, I think, early Linux. It's the flow of control that "conceptually" shifts synchronously. This can provide surprising performance benefits in alot of RPC scenarios, though less now as TLB, etc, flushing as part of a context switch has become more costly. There are no VM shenanigans except for some page mapping optimizations for passing large chunks of data, which apparently wasn't even implemented in the original Solaris implementation.

The kernel can spin up a thread on the server side, but this works just like common thread pool libraries, and I'm not sure the kernel has any special role here except to optimize context switching when there's no spare thread to service an incoming request and a new thread needs to be created. With a purely userspace implementation there may be some context switch bouncing unless an optimized primitive (e.g. some special futex mode, perhaps?) is available.

Other than maybe the file namespace attaching API (not sure of the exact semantics), and presuming I understand properly, I believe Doors, both functionally and the literal API, could be implemented entirely in userspace using Unix domain sockets, SCM_RIGHTS, and mmap. It just wouldn't have the context switching optimization without new kernel work. (See the switchto proposal for Linux from Google, though that was for threads in the same process.)

I'm basing all of this on the description of Doors at https://web.archive.org/web/20121022135943/https://blogs.ora... and http://www.rampant.org/doors/linux-doors.pdf

monocasa · 2024-07-24T10:27:11 1721816831

Not quite.

There isn't a door_recv(2) systemcall or equivalent.

Doors truly don't transfer messages, they transfer the thread itself. As in the thread that made a door call is now just directly executing in the address space of the callee.

They're more like i432/286/mill cpu task gates.

wahern · 2024-07-24T10:55:01 1721818501

> Doors truly don't transfer messages, they transfer the thread itself. As in the thread that made a door call is now just directly executing in the address space of the callee.

In somewhat anachronistic verbiage (at least in a modern software context) this may be true, but today this statement makes it sounds like code from the caller process is executing in the address space of the callee process, such that miraculously the caller code now can directly reference data in the callee. AFAICT that just isn't the case, and wouldn't even make sense--i.e. how would it know the addresses without a ton of complex reflection that's completely absent from example code? (Caller and callee don't need to have been forked from each other.) And according to the Linux implementation, the "argument" (a flat, contiguous block of data) passed from caller to callee is literally copied, either directly or by mapping in the pages. The caller even needs to provide a return buffer for the callee's returned data to be copied into (unless it's too large, then it's mapped in and the return argument vector updated to point to the newly mmap'd pages). File descriptors can also be passed, and of course that requires kernel involvement.

AFAICT, the trick here pertains to scheduling alone, both wrt to the hardware and software systems. I.e. a lighter weight interface for the hardware task gating mechanism, like you say, reliant on the synchronous semantics of this design to skip involving the system scheduler. But all the other process attributes, including address space, are switched out, perhaps in an optimized matter as mentioned elsethread but still preserving typical process isolation semantics.

If I'm wrong, please correct me with pointers to more detailed technical documentation (Or code--is this still in Illuminos?) because I'd love to dig more into it.

FWIW, Here's the Solaris man page for libdoor: https://docs.oracle.com/cd/E36784_01/html/E36873/libdoor-3li... Did you mean door_call or door_return instead of door_recv?

monocasa · 2024-07-24T11:32:23 1721820743

I didn't imply that the code remains and it's only data that is swapped out. The thread jumps to another complete address space.

It's like a system call instruction that instead of jumping into the kernel, jumps into another user process. There's a complete swap out of code and data in most cases.

Just like with system calls how the kernel doesn't need a thread pool to respond to user requests applies here. The calling thread is just directly executing in the callee address space after the door_call(2).

> Did you mean door_call or door_return instead of door_recv?

I did not. I said there is no door_recv(2) systemcall. The 'server' doesn't wait for messages at all.

kragen · 2024-07-24T11:52:42 1721821962

thanks for finding the man page!

gpderetta · 2024-07-24T09:41:13 1721814073

I think what doors do is rendezvous synchronization: the caller is atomically blocked as the callee is unblocked (and vice versa on return). I don't think there is an efficient way to do that with just plain POSIX primitives or even with Linux specific syscalls (Binder and io_uring possibly might).

xoranth · 2024-07-24T11:25:44 1721820344

Sounds a bit like Google's proposal for a `switchto_switch` syscall [1] that would allow for cooperative multithreading bypassing the scheduler.

(the descendants of that proposal is `sched_ext`, so maybe it is possible to implement doors in eBPF + sched_ext?)

[1]: https://youtu.be/KXuZi9aeGTw?t=900

p_l · 2024-07-24T10:07:35 1721815655

The thread in this context refers to kernel scheduler thread[1], essentially the entity used to schedule user processes. By migrating the thread, the calling process is "suspended", it's associated kernel thread (and thus scheduled time quanta, run queue position, etc.) saves the state into Door "shuttle", picks up the server process, continues execution of the server procedure, and when the server process returns from the handler, the kernel thread picks up the Door "shuttle", restores the right client process state from it, and lets it continue - with the result of the IPC call.

This means that when you do a Door IPC call, the service routine is called immediately, not at some indefinite point in time in the future when the server process gets picked by scheduler to run and finds out an event waiting for it on select/poll kind of call. If the service handler returns fast enough, it might return even before client process' scheduler timeslice ends.

The rapid changing of TLB etc. are mitigated by hardware features in CPU that permit faster switches, something that Sun had already experience with at the time from the Spring Operating System project - from which the Doors IPC in fact came to be. Spring IPC calls were often faster than normal x86 syscalls at the time (timings just on the round trip: 20us on 486DX2 for typical syscall, 11us for sparcstation Spring IPC, >100us for Mach syscall/IPC)

EDIT:

[1] Some might remember references to 1:1 and M:N threading in the past, especially in discussions about threading support in various unices, etc.

The "1:1" originally referred to relationship between "kernel" thread and userspace thread, where kernel thread didn't mean "posix like thread in kernel" and more "the scheduler entity/concept", whether it was called process, thread, or "lightweight process"

creshal · 2024-07-24T07:17:23 1721805443

Sounds like Android's binder was heavily inspired by this. Works "well" in practice in that I can't recall ever having concurrency problems, but I would not bother trying to benchmark the efficiency of Android's mess of abstraction layers piled over `/dev/binder`. It's hard to tell how much of the overhead is required to use this IPC style safely, and how much of the overhead is just Android being Android.

p_l · 2024-07-24T09:01:41 1721811701

Not sure which one came first, but Binder is direct descendant (down to sometimes still matching symbol names and calls) of BeOS IPC system. All the low level components (Binder, Looper, serialization model even) come from there.

creshal · 2024-07-24T21:01:10 1721854870

From what I understand, Sun made their Doors concept public in 1993 and shipped a SpringOS beta with it in 1994, before BeOS released, but it's hard to tell if Sun inspired BeOS, or of this was a natural solution to a common problem that both teams ran into at the same time.

p_l · 2024-07-25T12:04:41 1721909081

I'd expect convergent evolution - both BeOS team and Spring team were very well aware of issues with Mach (which nearly single-handedly coined the idea that microkernels are slow and bad) and worked to design better IPC mechanisms.

Sharing of scheduler slice is an even older idea, AFAIK, and technically something already done whenever you call a kernel (it's not a context switch to a separate process, it's a switch to different address space but running in the same scheduler thread)

stuaxo · 2024-07-24T20:04:26 1721851466

That's really interesting, I wonder if a Haiku on Linux could use it.

Will binder ever make it into mainline Linux?

p_l · 2024-07-25T11:59:36 1721908776

Binders has been in mainline kernel for years, and some projects ended up using it, if only to emulate android environment - both anbox and its AFAIK successor Waydroid use native kernel binder to operate.

You can of course build your own use (depending on what exactly you want to do, you might end up writing your own userland instead of using androids)

creshal · 2024-07-24T21:05:22 1721855122

As far as I understand, it is already mainlined, it's just not built by "desktop" distributions since nobody really cares - all the cool kids want dbusFactorySingletonFactoryPatternSingletons to undo 20 years of hardware performance increases instead.

p_l · 2024-07-25T11:59:53 1721908793

Bunch of desktop distros include it to run anbox/waydroid

kragen · 2024-07-24T12:54:54 1721825694

what's the best introduction to how beos ipc worked?

p_l · 2024-07-24T13:17:35 1721827055

Be Book, Haiku source code, and yes Android low level internals docs.

A quick look through BeOS and Android Binder-related APIs will quickly show how Android side is derived from it (through OpenBinder, which was for a time going to be used in next Palm system based on Linux, at least one of them)

kragen · 2024-07-24T13:21:18 1721827278

thank you very much!

pjmlp · 2024-07-24T11:02:10 1721818930

Binder was inspired by IPC mechanisms in Palm and BeOS, whose engineers joined the original Android team.

rerdavies · 2024-07-24T07:33:45 1721806425

Conceptually. What are they actually doing, and why is it faster than other RPC techniques?

fch42 · 2024-07-24T11:34:58 1721820898

Think of it in terms of REST. A door is an endpoint/path provided by a service. The client can make a request to it (call it). The server can/will respond.

The "endpoint" is set up via door_create(); the client connects by opening it (or receiving the open fd in other ways), and make the request by door_call(). The service sends its response by door_return().

Except that the "handover" between client and service is inline and synchronous, "nothing ever sleeps" in the process. The service needn't listen for and accept connections. The operating system "transfers" execution directly - context switches to the service, runs the door function, context switches to the client on return. The "normal" scheduling (where the server/client sleeps, becomes runnable from pending I/O and is eventually selected by the scheduler) is bypassed here and latency is lower.

Purely functionality-wise, there's nothing you can do with doors that you couldn't do with a (private) protocol across pipes, sockets, HTTP connections. You "simply" use a faster/lower-latency mechanism.

(I actually like the "task gate" comparison another poster made, though doors do not require a hardware-assisted context switch)

p_l · 2024-07-24T15:00:55 1721833255

Well, Doors' speed was derived from hardware-assisted context switching, at least on SPARC. Combination of ASIDs (which allowed task switching with reduced TLB flushing) and WIM register (which marked which register windows are valid for access by userspace) meant that IPC speed could be greatly increased - in fact that was basis for "fast path" IPC in Spring OS from which Doors were ported into Solaris.

fch42 · 2024-07-24T16:59:26 1721840366

I was (more) of a Solaris/x86 kernel guy on that particular level and know the x86 kernel did not use task gates for doors (or any context switching other than the double fault handler). Linux did taskswitch via task gates on x86 till 2.0, IIRC. But then, hw assist or no, x86 task gates "aren't that fast".

The SPARC context switch code, to me, always was very complex. The hardware had so much "sharing" (the register window set could split to multiple owners, so would the TSB/TLB, and the "MMU" was elaborate software in sparcv9 amyway). SPARC's achilles heel always were the "spills" - needless register window (and other cpu state) to/from memory. I'm kinda still curious from a "historical" point of view - thanks!

p_l · 2024-07-24T17:56:52 1721843812

The historical point was that for Spring OS "fast path" calls, if you kept register stack small enough, you could avoid spilling at all.

Switching from task A to task B to service a "fast path" call AFAIK (have no access to code) involved using WIM register to set windows used by task A to be invalid (so their use would trigger a trap), and changing the ASID value - so if task B was already in TLB you'd avoid flushes, or reduce them only to flushing when running out of TLB slots.

The "golden" target for fast-path calls was calls that would require as little stack as possible, and for common services they might be even kept hot so they would be already in TLB.

__d · 2024-07-25T00:36:10 1721867770

Is the Spring source available anywhere?

p_l · 2024-07-25T11:46:24 1721907984

Unfortunately, as far as I know, not. Only published articles and technical reports :(

__d · 2024-07-26T00:50:22 1721955022

I once had the university CD-ROM release, but I've lost it, sadly.

rerdavies · 2024-07-30T08:36:38 1722328598

So if I understand it correctly, the IPC advantage is that they preserve registers across the process context switch, thereby avoiding having to do notoriously expensive register saves and restores? In effect, leaking register contents across the context switch becomes a feature instead of a massive security risk. Brilliant!

Is that it?

grishka · 2024-07-24T14:09:55 1721830195

Why would you care who spawned the thread? If your code is thread-safe, it shouldn't make a difference.

One potential problem with regular IPC I see is that it's nondeterministic in terms of performance/throughput because you can't be sure when the scheduler will decide to run the other side of whatever IPC mechanism you're using. With these "doors", you bypass scheduling altogether, you call straight "into" the server process thread. This may make a big difference for systems under load.

actionfromafar · 2024-07-24T07:10:26 1721805026

Reminds me vaguely of how Linux processes are (or were? I haven't looked at this in ages) elevated to kernel mode during a syscall.

With a "door", a client is elevated to the "server" mode.

blacklion · 2024-07-24T08:55:58 1721811358

Conceptually is key word here.

Later in this article authors says tat Server manage its own pool (optionally bounded) of threads to serve requests.

akira2501 · 2024-07-24T07:53:49 1721807629

> this "opens the door" for potentially worse concurrency headaches you have with threads you spawn and control yourself.

What makes doors "potentially worse" than regular threads?