I've been intending to try process-to-process message passing using a lock-free data structure very much like io_uring where the processes share pages (one page read-write, the other page readonly, and inverse for the other process) on hardware with L2 tightly-integrated memory (SiFive FU740). The result being zero-copy message passing at L2 cache speeds. [for an embedded realtime custom OS, not for linux]
Which makes me wonder if Mach wasn't right all along.