A FOSS proposal for a new type of OS for a new type of computer (2021)

Animats · on March 26, 2022

When a computer's permanent storage is all right there in the processors' memory map, there is no need for disk controllers or filesystems. It's all just RAM.

And errors are forever.

There were once LISP, Smalltalk, and Forth environments where you saved the entire state of the system, rather than using files. This was not a good thing. Only one person could really work on something. If you messed up the image, undoing the mess was hard. Revision control? What's that?

Progress has been made by learning how to minimize and discard state. What makes the web backend application industry go is that almost all the state is in the database. Database systems are well debugged and reliable today. This allows for mediocre quality in web apps.

Containers are another example of discarding state. Don't sysadmin, just flush and reload. So is the transition from object-oriented to functional programming.

No, making memory persistent is not the answer.

We do need to be thinking about new OS designs, because the hardware world is changing.

- Lots of RAM. Swapping out to disk, even SSD, is so last-cen. The moment that happens performance degrades so badly you may as well restart.

- Lots of CPUs. There are now 128-core CPUs. NVidia just announced a 144-core ARM CPU. The OS and hardware need to be really serious about interprocess communication. It's an afterthought in Unix/Linux. Needs to be at least as good as QNX. Probably better, and with hardware support.

- Different kinds of compute units. GPUs need to be first-class players in the OS. There are going to be more special purpose devices, for machine learning.

- SSD is fast enough that buffering SSD in RAM is mostly unnecessary.

zasdffaa · on March 26, 2022

> SSD is fast enough that buffering SSD in RAM is mostly unnecessary.

A very strong claim, which I'd contest depending on what you're caching. If it's application code then probably but if you've lots of data which you're expecting to re-read such as with a database with many gigs, I really disagree with that. If an SSD had say a 1 gig/sec bandwidth for reasing that's about 22 times slower than my arthritic 7-year dual-channel RAM bandwidth[1]. A more modern higher spec server will have more channels (and hopefully more modern, faster ram).

Can you elaborate a bit on your thoughts?

[1] and the path to ram may not have as many page faults -> kernel switches as reading from an SSD, but I may be wrong and it may not matter

kragen · on March 26, 2022

https://www.club386.com/silicon-motion-pcie-5-0-nvme-ssds-pr... says 14 GB/s.

And of course the sum of bandwidths within a set of NVDIMMs can easily be much higher than that.

jandrewrogers · on March 26, 2022

> Lots of RAM. Swapping out to disk, even SSD, is so last-cen. The moment that happens performance degrades so badly you may as well restart.

RAM is the new disk. These days we try to make codes live out of CPU cache as much as possible, scheduling the RAM access the way we used to schedule disk access, which is effective -- plus ça change, plus c'est la même chose.

> SSD is fast enough that buffering SSD in RAM is mostly unnecessary.

This is not even approximately true in practice for many applications. Striping an array of NVMe storage devices is still about an order of magnitude less bandwidth than RAM. Storage I/O is also high latency and highly concurrent; if your storage I/O is zero-copy then it implies locking up considerable amounts of RAM just to schedule the I/O transactions safely, never mind caching. Modern database kernels require clever I/O schedulers and large amounts of RAM to keep throughput from degrading to the throughput of even very fast storage hardware. And this assumes your storage code is aware of the peculiarities of SSD (inconvenient mix of block sizes, no overwrite, garbage collection, etc).

That said, you obliquely raise a valid point. Buffering storage in a cache usually doesn't work well once the storage:cache ratios become high enough, and the 1000:1 ratios seen in modern servers are well past that threshold. In these cases, you still need a lot of RAM but you end up redeploying it to other parts of the kernel architecture where it can be more effective than a buffer cache for storage.

solarengineer · on March 27, 2022

“ If you messed up the image, undoing the mess was hard. Revision control? What's that?”

VisualAge for Smalltalk provided built-in version control.

kragen · on March 27, 2022

More importantly, so does Squeak.

lproven · on March 27, 2022

And NewSpeak, which I mentioned in the talk.

I didn't submit this, but I did the original talk.

BTW the slideshow is just eyecandy and you can't get anything very useful from it alone... but there's a script on my blog.

kragen · on March 26, 2022

Multiuser systems with orthogonal persistence, like the single-user Smalltalk and image-based Lisps you mention, include EUMEL, L3, KeyKOS, OS/400 (iSeries), and I think Multics. (The Forths I know of were never image-based, so I can't comment on them.) It is definitely not the case that "only one person could really work on something [at a time]" on EUMEL, L3, KeyKOS, and OS/400. Maybe only one person could hack on the TCB at a time, but EUMEL, L3, KeyKOS, and OS/400 had extremely tiny TCBs, for that reason among others.

Having a single-level store doesn't save you from minimizing and discarding state. It just (mostly) decouples discarding state from power failures, making it intentional rather than accidental. In today's battery-powered environments, this is probably a better idea than ever. KeyKOS and EUMEL still had named files, containing sequences of bytes, in directories. They just weren't coupled to the disk device drivers. OS/400 has files, too. Smalltalk and Lisp environments were more radical in this sense.

Also, Smalltalk has had version control within its images for decades.

Stephen White and Pavel Curtis's MOO also had transparent persistence, didn't have files, and supported massively multiuser access; possibly a MOO server has supported more concurrent users than any AS/400, and certainly more than any EUMEL, L3, or KeyKOS system ever did. (LambdaMOO has 48 users online right now according to http://mudstats.com/Browse but had many more at its peak.) But, while MOO was programmable, it never supported high-performance or high-reliability computing as those other systems did. You could think of it as being written in a groupware scripting DSL, like a centralized Lotus Notes. The most complex MOO software I know of was a Gopher client, a "Gopher slate" that multiple people could use simultaneously. It used cooperative multitasking with timeouts that would abort long-running event handlers without rolling back their incomplete effects.

I agree, though, that a single-level store on its own doesn't solve the problems OSes are facing. But it might be useful. We have many layers of software on top of everything that accesses persistent data being built on the assumption that anything persistent is necessarily slow enough to cover millions of nanoseconds of processing. That greatly reduces the benefits of the enormous improvement SSDs represent over spinning rust.

I'd say that buffering SSD in RAM goes beyond "mostly unnecessary": it needs to be done in hardware (as with NVDIMMs) to not hurt performance in many cases. FRAM is even more extreme on this axis.

Another interesting question to me is migration. If I'm editing a Jupyter notebook on my laptop, it'd be nice to move it onto my cellphone and keep editing it when I travel, and it'd be nice to move the computations onto heavy servers when I'm connected to the internet. Lacking this, we seem to be moving to warehouse-scale computing: everybody runs all their notebooks on Google App Engine so they can use TPUs and access them from anywhere.

Related to migration is UI remoting. Why can't I use two cellphones together to access the same app, or two cellphones and a mouse as I/O devices to the same music synthesizer (running on my laptop)? Accelerometers and multitouch could be useful for a lot of things. Maybe ROS (the Willow Garage one, not the BBS software) has the right approach here?

Also, why is everything so goddamn slow and unreliable? Ivan Sutherland had a VR wireframe cube running on a head-mounted display more than 50 years ago. Why are these supercomputers in our pockets still too slow to run VR with low enough latency that people don't throw up? Why does it take multiple minutes to start up again if its battery runs down and I plug it in? Why does the alarm app on my cellphone refuse to launch when its Flash fills up? Why does my laptop become unresponsive and require rebooting when a web page uses too much RAM?

shakes fist at cloud

Anyway maybe I'll have a running prototype of a capability-based single-level-store operating system using optimistically-synchronized software transactional memory to guarantee real-time responsivity without an MMU or lots of concurrency bugs, I don't know, sometime next week. Or maybe not.

skissane · on March 27, 2022

> but EUMEL, L3, KeyKOS, and OS/400 had extremely tiny TCBs

L3's TCB is indeed quite small, and I don't know enough about EUMEL and KeyKOS to comment – but is OS/400's TCB really "extremely tiny"? On the contrary, my impression of it is that it quite expansive. When, in the mid-1990s, as part of the AS/400's CISC-to-RISC transition, IBM rewrote the OS/400's "Licensed Internal Code" layer from PL/MP (an IBM-internal-use-only PL/I dialect) into C++, the result was over 1 million lines of code [0]; the original PL/MP-based version had a roughly similar line count. A long way off "extremely tiny".

Part of the TCB for OS/400 (and contemporary IBM i) is the compiler backend – the compiler frontends compile to a common intermediate language (MI or TIMI), and the backend compiles that intermediate language into machine code for the underlying machine architecture (IMPI for the original CISC AS/400s, Power from the mid-1990s onwards). This is because the traditional OS/400 architecture has all processes executing in a single address space, and security is enforced at the intermediate language level not by the hardware, making that compiler backend security-critical. It is hard to have an "extremely tiny" TCB when the compiler backend has to belong to it. Another factor making the TCB big, is that IBM went for a "feature-rich"/"CISC-like" intermediate language rather than a more spartan one which would have pushed more functionality above the TCB.

[0] Berg, W., Cline, M., & Girou, M. (1995). Lessons learned from the OS/400 OO project. Communications of the ACM, 38(10), 54–64. doi:10.1145/226239.226253

kragen · on March 27, 2022

I had the idea that the IMPI-code-emitting backend was entirely responsible for isolation between tasks, and that it was quite small and simple. It sounds like neither of those is correct?

Similar sorts of virtual machines implemented as interpreters are commonly a few hundred to a few thousand machine instructions.

skissane · on March 27, 2022

It isn't small and simple, because keeping the underlying VM small and simple was never one of IBM's design goals. Instead, IBM made it feature-rich. Part of that was the influence of CISC design philosophy – if your hardware ISAs are rich, it is natural that your virtualised software ISA becomes rich as well, even richer (since richness is cheaper in software than hardware). Some would also suggest, IBM made it complicated to try to prevent anyone trying to clone it–if that was their plan, it was pretty successful, nobody has ever tried to clone the whole thing, although there have been partial emulations of certain limited subsets, such as the RPG compiler and facilities commonly used by RPG programs.

Have a look at the MI instruction set [0]: https://www.ibm.com/docs/en/i/7.2?topic=interface-machine-in...

You will see instructions like MATUP (Materialize User Profile): https://www.ibm.com/docs/en/i/7.2?topic=instructions-materia...

So stuff like user accounts is directly implemented in the VM.

You'd think in a capability-based operating system, you'd make the underlying VM agnostic as to stuff like user accounts – instead you'd have some kind of "security server" running as a process on top of the VM which generates "user ID capabilities" (it being the only process with the capability to generate those capabilities), and the VM itself doesn't treat those capabilities specially. But no, that is not how IBM designed AS/400 (or its System/38 predecessor.)

[0] Technically that's not the current MI, that's "old MI" (OMI). The "new MI" (NMI, aka W-code), introduced with "ILE" (Integrated Language Environment) in the mid-1990s is undocumented (although possibly IBM will share the documentation for it if you pay them $$$$ and sign an NDA). OMI is translated to NMI, and then NMI is in turn translated to POWER machine code; the current IBM compilers target NMI directly, whereas legacy IBM (or third-party) compilers output OMI. However, some of the NMI compilers, such as the ILE C compiler, support embedding OMI instructions; NMI supports most OMI instructions, although effectively it models them as built-in functions.

kragen · on March 28, 2022

Thank you very much for the corrections!

zozbot234 · on March 26, 2022

> It just (mostly) decouples discarding state from power failures, making it intentional rather than accidental.

Just wondering, but how is an OS supposed to achieve this without fsync()ing all writes to memory and being slow as dog$h!t? Having a "single level" of storage means it becomes harder to tell what might be purely ephemeral, and can thus be disregarded altogether if RAM or disk caches are lost in a failure. You might think you can do this via new "volatile segments" API's but then you're just reintroducing the volatile storage that you were trying to get rid of.

Animats · on March 27, 2022

Just wondering, but how is an OS supposed to achieve this without fsync()ing all writes to memory and being slow as dog$h!t?

Separate from the persistent storage thing, I've wanted traditional file systems that work like this:

* Unit files. The unit of file consistency is the file. The default case. Open, create, close, no other open can read it before close, on close it replaces the old file for all later opens. On crashes, you always get a consistent file, the old version if the close hadn't committed yet.

* Log files. The unit of file consistency is one write. Append only. On crashes, you get a consistent file up to the end of a previous write. No trailing off into junk.

* Temp files. There is no file consistency guaranteed. Random access. On crashes, deleted.

* Managed files. A more complex API is available. Random access. On a write, you get two completion events. The first one is "buffer has been copied". The second is "data is safely in permanent storage". This is for databases, which really care about when the data is safely stored. The second full completion event might also be offered for other file types, primarily log files.

That covers the standard use cases and isn't too hard to do.

jandrewrogers · on March 27, 2022

As additional support for this thesis, the interface to the logical (sometimes physical) file system inside many database kernels is almost exactly this. All four file types have a concrete use case that requires their existence.

One of the things that makes them easier to use in a database kernel context is that even though they are all built on top of the same low-level storage primitives, each file type usually has its own purpose-built API instead of a generic overloaded file API, which would get messy very quickly. A file's type never changes and the set of types is enumerable, so this is clean.

The only thing keeping this from being a user space thing is that there is no practical way of making this POSIX compliant. POSIX has all but permanently frozen what is feasible in a file system. Many database kernels either work off raw block devices or emulated raw blocks device on top of a POSIX file system, on which they build a file system, but it is never exposed to the world which eliminates any requirement for compliance with external standards. Imagine how much easier it would be to build things like databases if this was a first-class API.

rurban · on March 28, 2022

not being POSIX compliant is actually needed for a modern OS. with POSIX you'll never get to safe concurrency, so you'll hardly can make use of all the CPU kernels.

it must be a microkernel, supporting only safe languages. certainly not rust.

Genode is close, pony was close, Fuchsia is getting there, singularity was there already. concurrent pascal also.

kragen · on March 28, 2022

Hmm, are you saying POSIX compliance makes scalability impossible? Because it makes it possible to run software written in unsafe languages? I'm pretty sure Fuchsia will always be able to run software written in C.

rurban · on March 29, 2022

not impossible but hard. but more importantly not safe.

Fuchsia supports C++, sure. Fuchsia is not a safe OS, just like Rust a way to better safety. Dart for flutter is their safe language there.

kragen · on March 30, 2022

Sounds reasonable. I didn't know about Dart for Flutter!

kragen · on March 27, 2022

I agree that this would be useful. Have you ever written a prototype of it?

zozbot234 · on March 27, 2022

> The unit of file consistency is the file. The default case. Open, create, close, no other open can read it before close

"$PROGRAM cannot open document $FILENAME because it's already open in $PROGRAM. Try closing any open applications and open $FILENAME again." Who thinks this is actually sane behavior? Unix does it right.

skissane · on March 27, 2022

> "$PROGRAM cannot open document $FILENAME because it's already open in $PROGRAM. Try closing any open applications and open $FILENAME again." Who thinks this is actually sane behavior? Unix does it right.

Wouldn't a better approach be: "$FILENAME is already open in $PROGRAM. Would you like to: (1) Switch to $PROGRAM and continue editing it there; (2) Ask $PROGRAM to close it; (3) Edit it anyway (DANGEROUS!); (4) Abort". (For non-interactive processes, you wouldn't want to ask, you'd probably just want to always abort–which causes problems for some situations like "I want to install an upgrade to this shared library but $PROGRAM is executing using it", but there are ways to solve those problems.)

Kind of like swap files in vim – although vim doesn't have the "switch to $PROGRAM" option, because the underlying operating systems don't have good support for it – but that's an application-level feature which only works in one app. It would be better if it was an OS-level feature which worked across all apps.

kragen · on March 27, 2022

Instead of giving you a mutual exclusion error, a newly created file written this way might simply be invisible until it's complete, at which point it atomically replaces the old version of the file (which might still be accessible through mechanisms like VMS version numbers, WAFL snapshots, Linux reflinks, or simply having opened it before the replacement). This is probably what a transactional filesystem would give you: the creation, filling, and naming of the file would normally be part of the same transaction.

But, with optimistic synchronization, transactions that were reading an old version of the file might require redoing from the start when a new version was written.

kragen · on March 26, 2022

Both KeyKOS and EUMEL used periodic globally consistent checkpoints supplemented by an explicit fsync() call for use on things that required durability like database transaction journals. So when the system crashed you would lose your last few minutes of work; EUMEL ran on floppies, so the interval could be as long as 15 minutes, but I think even KeyKOS only took a checkpoint every minute or so, perhaps because it would pause the whole system for close to a second on the IBM 370s it ran on. (Many years later, Norm Hardy showed me a SPARC workstation with I think 32 or 64 MiB of RAM; he said it was the biggest machine KeyKOS had ever run on, twice the size of the mainframe that was the previous record holder. Physically it was small enough to hold in one hand.)

I assume L3 used the same approach as EUMEL. I don't know enough about OS/400 or Multics to comment intelligently, but at least OS/400 is evidently reasonably performant and reliable, so I'd be surprised if it isn't a similar strategy.

There's a latency/bandwidth tradeoff: if you take checkpoints less often, more of the data you would have checkpointed gets overwritten, so your checkpoint bandwidth gets smaller. NetApp's WAFL (not a single-level store, but a filesystem using a similar globally-consistent-snapshot strategy) originally (01996) took a checkpoint ("snapshot") once a second and stored its journals in battery-backed RAM, but with modern SSDs a snapshot every 10 milliseconds or so seems feasible, depending on write bandwidth.

Most of these systems did have "volatile segments" actually, but not for the reason you suggest, which is performance; rather, it was for device drivers. You can't swap out your device drivers (in most cases) and they don't have state that would be useful across power failures anyway. So they would be, as EUMEL called it, "resident": exempted from both the snapshot mechanism and from paging.

zozbot234 · on March 26, 2022

> So when the system crashed you would lose your last few minutes of work

So, pretty much how disk storage is managed today? But then if you're going to checkpoint application programs from time to time, it might better to design them so that they explicitly serialize their program state to a succinct representation that they can reload from, and only have OS-managed checkpoint and restore as a last resort for programs that aren't designed to do this...Which is again quite distinct from having everything on a "single level".

> but with modern SSDs a snapshot every 10 milliseconds or so seems feasible, depending on write bandwidth.

Not so. Even swapping to SSD is generally avoided because every single write creates wear-and-tear that impacts lifetime of the device.

kragen · on March 26, 2022

Well, I don't think it's "pretty much how disk storage is managed today," among other things because you don't have a multi-minute reboot process, and you usually lose some workspace context in a power failure.

The bigger advantage of carefully designed formats for persistent data is usually not succinctness but schema upgrade.

It's true that too much write bandwidth will destroy your SSDs prematurely, and that might indeed be a drawback of a single-level store, because you might end up spending that bandwidth on things that aren't worth it. This is especially true now that DWPD has collapsed from 10 or more down to, say, 0.2. But they're fast enough that you can do submillisecond durable commits, potentially hundreds of them per second. In the limit where each snapshot writes a single 4 KiB page, 100 snapshots per second is 0.4 MB/s.

If you have a 3-year-life 1TiB SSD with 0.2 DWPD, you can write about 220 TiB to it before it fails. So if you have 4 GiB of RAM, in the worst case where every checkpoint dirties every page, you can only take 56000 checkpoints before killing the drive: about an hour and a half if you're doing it every 10 ms. But actually the drive probably isn't that fast. If you limit yourself to an Ultra DMA 2 speed of 33 MB/s on average, while still permitting bursts of the SSD's maximum speed, that's 2.8 months, but a maximum checkpoint latency of more than 2 minutes. You have to scale back to 7.7 MB/s to guarantee a disk life of a year, at which point your worst-case checkpoint time rises to over 9 minutes.

jdougan · on March 27, 2022

GemStone/S is a multi-user Smalltalk (with database characteristics). It has revision control and is quite fun to work in.

kragen · on March 28, 2022

I should have mentioned it.

imtringued · on March 27, 2022

>And errors are forever.

Have you tried turning it on and off again? Oh, you can't.

>- Lots of CPUs. There are now 128-core CPUs. NVidia just announced a 144-core ARM CPU. The OS and hardware need to be really serious about interprocess communication. It's an afterthought in Unix/Linux. Needs to be at least as good as QNX. Probably better, and with hardware support.

Ah yes, Minecraft, the biggest traitor of all. A world bigger than earth but even getting to 250 players on a single server requires absurd levels of optimizations.

rurban · on March 28, 2022

> Have you tried turning it on and off again? Oh, you can't.

Actually you can. That was one of their biggest advantages, to save and restore running images, and debug into running images.

But diffing a serialized binary image (what changed?) was hard. you had to diff the commands leading to your image, a bit like in docker.

a9h74j · on March 26, 2022

> GPUs need to be first-class players in the OS

I've been starting to wonder, taking a crazy-stupid extreme, whether a GPU unit should be in charge of booting. Perhaps you boot into a graphical interpeter (like a boot into BASIC of old) which can then implement a menu (or have "autoexec.bas") to select a heavyweight OS to boot into.

GauntletWizard · on March 27, 2022

Your Raspberry Pi already does :) https://thekandyancode.wordpress.com/2013/09/21/how-the-rasp...

More seriously, there's probably something to this from a motherboard design perspective, there's enough general purpose compute on a modern graphics card to do the initial bootloader stuff. But graphics cards are not standardized as much as CPUs are, so it's unclear if it would work as we've currently designed systems.

a9h74j · on March 27, 2022

Great link for this, thanks. I had suspected some different ordering might already be called for given the need for e.g. RAM training.

That [version of?] pi booting is much more filename-oriented than all of the partitioning I have seen documented for a Rockchip [ http://www.loper-os.org/?p=2295 ].

bryan_w · on March 26, 2022

Interesting idea, but I think he's a little too hung up about programming languages. I think there's space for new operating systems and that doesn't necessarily preclude the use of any programming language.

We could have OSes that don't use files, that save state and are able to resume in seconds, and we can do all of that in C (I don't think we would want to but we could). Also there doesn't exist a world where grandma (or even the Kool kids today) is going to learn smalltalk in order to use an OS.

The main hurdle I see with all of this is hardware initialization -- how to handle, from cold boot, the fact that your network card has no idea what frames has been sent and what have it has been received.

The first step I see in all of this is creating an operating system where the kernel has no concept of being booted. One thing people struggle to wrap their head around with optane is "what does kernel upgrades look like" which would be messy in existing OSes.

jka · on March 26, 2022

> Also there doesn't exist a world where grandma (or even the Kool kids today) is going to learn smalltalk in order to use an OS.

I'd agree that well-designed operating systems shouldn't require anyone to undestand programming to begin to use them, but equally I'd argue that anyone should be able to look at the code that's running on their computer and inspect it. It's one way (often a very effective way) to learn about software in the first place.

The technology landscape will no doubt look very different in another twenty years, and while it's relatively well-understood that anyone can contribute to FOSS projects regardless of age or experience, it's worth improving that opportunity when possible (both technically and socially).

There's some kind of analogy with web development here; the ability to view source for webpages and tinker with the HTML/CSS/JS easily is an educational pathway.

> The first step I see in all of this is creating an operating system where the kernel has no concept of being booted.

If both the operating system and network card are safe to initialize from cold at any point (and with robust software and protocol design, they should be), then even with peer-to-peer commodity memory as a storage medium, statelessness should be perfectly achievable, I think?

Slightly off-topic: you may be interested in Linux's kexec[1] functionality (it provides the ability for a running kernel to load and run another kernel).

[1] - https://wiki.archlinux.org/title/Kexec

zozbot234 · on March 26, 2022

A single-level store based OS uses even more files than current OS's like Windows or Linux. Because every memory mapping is notionally persisted to disk and is thus a file. Whereas currently you have "anonymous" mappings that are understood as ephemeral, with no "files" in storage backing them.

jandrewrogers · on March 26, 2022

I've written code for a few different computing architectures that were designed to present a single logical level of RAM-like storage. That experience has convinced me that it is a terrible idea, largely because it makes elementary software optimization difficult and the default result is usually mediocre.

No matter how it is presented, all addressable storage in modern computing systems is hierarchical. Pretending these latency hierarchies don't exist doesn't make them go away.

If you obscure this by making it all look like RAM, the implied access latency variance due to access patterns and underlying concurrency control is still operative, just opaque. Computer science is full of data structures and algorithms designed to be fitted to the latency hierarchies of real hardware; it is difficult to make algorithms well-behaved when the hardware is designed to make it difficult to reason about these latency hierarchies.

Ironically, I always worked around this by reading the low-level system hardware documentation that described how the hierarchy was being hidden and scheduled so that the memory access patterns could be engineered to elicit specific hierarchy aware behaviors that the code wasn't supposed to know about. We ended up writing the same hierarchy aware code required for systems that don't use RAM as an abstraction, but with a much worse and more difficult to control API.

Working on different exotic computing architectures was extremely educational. It gave me an appreciation for the practical tradeoffs of various weird hardware models that most people only talk about in theory. It is similar to the perspective you gain from experience writing software in many different programming language paradigms. There are no silver bullets, just a different set of problems you have to deal with.

gjvc · on March 26, 2022

This talk is a wonderful 50-minute summary of computing over the last ~50 years with a couple of ideas for future directions, and I recommend every serious computerist watch.

lproven · on March 27, 2022

Thank you very much! :-)

(It's my talk, FWIW...)

gjvc · on March 27, 2022

sent you an email....

lproven · on March 28, 2022

Found it, and replied. Thanks for the comments!

PaulDavisThe1st · on March 26, 2022

1) The premise laid out at the link is very interesting, and one I have wondered about for more than a decade without reaching any conclusions. So much code in a modern kernel is predicated on the idea that if the data is on "disk" it will be slow and we should do something else while it arrives. What does kernel design look like when that's just not true anymore?

2) Then I went to the slides, and found what appeared to be a talk about programming languages and desktop environments.

3) Filesystems are often thought of as ways of finding data on some storage medium, and that's not wrong. But files are also a way to organize data, largely independently of the technical details of the filesystem itself.

orev · on March 26, 2022

The original PalmOS did this from the beginning, with the main difference being that memory was all RAM instead of non-volitile. Apps existed in one part of memory, and data in another. Data storage didn’t use files, but databases and memory segments. Anyone trying to solve this problem today could learn from that system.

And yes, if your battery died, you lost everything. Palm devices were designed to by synced to PCs at least daily, so you’d just restore by syncing again.

evgen · on March 26, 2022

Or learn from a better version of that same idea that was embodied in the Newton. Able to be coded using prototypes from top to bottom so that every app (including ones from Apple that were embedded in the system) was hookable and modifiable, data 'soups' that were persistent in nvram, and easy cross-application data sharing.

zozbot234 · on March 26, 2022

Except that memory segments are already files in all but name. The two notions were unified already in MULTICS, and planning for "single-level storage" makes that unification explicit.

clintonwoo · on March 26, 2022

If the computer has non volatile memory then how do you turn it off and on again to fix the bugs in software? Hope we don't have to say rest in peace to this beautiful piece of troubleshooting procedure

Someone · on March 26, 2022

The same way you do it now. Most computers already have non-volatile memory, typically a tiny bit for such things as boot parameters and an enormous amount in the form of disk storage.

If, today, your disk is corrupt, you tell your OS to boot in single-user mode and run fsck. If that doesn’t work, you fall back a level and reformat a disk.

I don’t see why a device where all storage is directly addressable would be different.

theamk · on March 28, 2022

But the disk does not get corrupt frequently because (1) user apps cannot write to it directly, they have to use a restrictive API, (2) each write operation is limited to a single logical part, and cannot affect other parts and (3) disk structure is designed to be recoverable.

As a result, in the existing system, if the video driver gets corrupt and messes up kernel memory, there is a very high probability that all the files are intact and system will be fully usable once I reboot. Maybe the single file that I was working on will be damaged. Compare this to fully persistent memory system, which has a high chance of being unusable afterward.

Someone · on March 28, 2022

I expect OSes to be compartimentalized similarly. There’s no way to manage a system with 2^trillions of states otherwise.

For example, memory protection mechanisms can be used to protect boot code from tampering, a level above that the core OS used for ‘just above reinstall’ recovery can use a different protection regime, the OS proper allows user modifications after sudo, but otherwise is protected from user processes, etc.

Actually, even without disk space being memory mapped I think the MMU (and robust disk drivers) are what gives you that high probability of the file system staying intact when a process goes rogue. It used to be fairly common to lose a disk because some runaway process trashed the in-memory cache of essential on-disk data structures, and the OS than (blissfully unaware of the damage) flushed the incorrect data to disk.

theamk · on March 29, 2022

Having the per-app compartments, recovery partition, or sudo-only memory is not going to help you if your text editor crashes, and loses all your documents. Or if photo app crashes and loses all the photos you ever had.

What you really want is some sort of generic "document" abstraction. So that user app can specify intent (read or write or update) and a document name, and system will enforce it. This way if your photo app gets damaged, the "blast radius" is limited only to the "documents" which are being edited now, which will hopefully be very close to zero. Hm.. I wonder what this sounds like? :)

(also: I used MS-DOS (with no memory protection, but with SMARTDRV disk cache) a lot, and crashed it a lot, often by my own code. The disk corruption was not that frequent at all. And in the worst case, you could often recover most of the data with NDD. The viruses and disk failures were far more common causes of data loss)

e2le · on March 26, 2022

Some kind of overlayfs type solution? With a non-writeable "default" lower state and a writeable upper state where the changes occur, removing the upper state would effectively reset the system.

clintonwoo · on March 26, 2022

Fair call, sounds like an immutable OS like CoreOS where updating it requires a reboot. That's possibly all that's required!

a9h74j · on March 26, 2022

IMHO something was lost when hard drives no longer had their own write-protect switch (remembering back to DEC rack-mount spinning drives).

IshKebab · on March 26, 2022

Same way you fix errors with persistent caches. You clear them.

marginalia_nu · on March 26, 2022

Eh, from hands on experience with Optane drives I'm very skeptical. They're still very far removed from RAM, even if they are good compared to SSDs. Like maybe next to RAM in the 1980s they seem about equivalent. Modern RAM has absurd bandwidth.

tux1968 · on March 26, 2022

IMHO, the ideas about non-volatile memory are a distraction and not a very fundamental or interesting detail of future systems. But the core proposal of combining https://squeak.org/ on top of https://oberon.org is really quite an exciting idea and could be a lot of FOSS fun.

jll29 · on March 26, 2022

As he says, Squeak already runs on bare metal, so why have a version with Oberon below? He himself admits in the talk that a system as he proposes would have two languages rather than one (i.e., go against the spirit of SmallTalk systems and LISP machines that he praises earlier in his task).

There are Smalltalk ports on top of LuaJIT and Oberon ports on top of LuaJUT, so there is at least one reality where the two languages sit side-by-side (rather than on top of each other as the talk proposes); see Michael Engel's talk on Vimeo for more details.

lproven · on March 27, 2022

I did the talk, BTW.

Because TBH Smalltalk isn't a great choice for low-level code such as OSes and drivers?

I hadn't seen Engel's talk at that time, but I have now, and we've chatted. There is a tiny tiny outside chance I might go and do a PhD with him... :-)

2OEH8eoCRo0 · on March 26, 2022

This is great. Ties together a lot of ideas that I've had but either can't properly express or lack the knowledge to execute on.

dmytrish · on March 27, 2022

This talk follows a familiar pattern in historiography of computing: the "golden days" of Lisp machines, Smalltalk workstations, Plan 9/Oberon awesomeness, etc. In other words, glorified tales of better days, as told by grey-bearded veterans.

The speaker does not touch at all the details of *why* all of these were superseded by supposedly worse and more complex systems, "What went wrong?"

The systems that are being romanticized in the talk were a coder's delight, a peak of personal productivity, discoverability, malleability. They did have an integrated rich UI, where everything is a programmable action, everything is traceable and introspectable, wrapped by human-usable keyboard/mouse actions and UI presentation.

Still, they handled structuring, versioning and communicating *complex data* badly, something which was solved much better even by Unix/Windows-style filesystems and utilities. Plan 9 was an improvement over Unix in many aspects, but handling data complexity was not one of them.

My suspicion is that what is still very lacking today is well structured and introspectable data, system-wide schemas, rigorously defined ABIs, serialization and IPC mechanisms, transactional guarantees. Mo more raw streams/blobs of bytes everywhere, no more untyped pipes, no more poorly defined structs in C headers that implicitly define ever-changing ABIs, no more file systems without transactional guarantees, no more reimplementation of sorting and filtering in every Unix utility. I hope that the state of the art has advanced far enough to finally have a useful and comprehensive common language for data and interaction (Fuchsia's FIDL + capnproto + a kind of Kaitai Struct on steroids?), including network communication (kind of a global IPFS-like database of schemas?).

So no, moving back to the "good old days" of a magic old programming language is not the answer, in my opinion. There are other pressing issues that have never been solved well before and may be finally solved now.

lproven · on March 27, 2022

> The speaker does not touch at all the details of why all of these were superseded by supposedly worse and more complex systems, "What went wrong?"

I did the talk, FWIW.

I didn't much because it would have turned it into a very different talk, and frankly a rather depressing one.

The real answer is very simple:

They were superseded by cheaper systems, which could be built and maintained by less-skilled and therefore cheaper people. And possibly also used by less-skilled people.

gjvc · on March 27, 2022

>The speaker does not touch at all the details of why all of these were superseded by supposedly worse and more complex systems, "What went wrong?"

Microsoft in the 1980s / 1990s

lproven · on March 27, 2022

Partly. Not just them; in a way, MS was an outgrowth.

MS DOS wasn't written in C. MS BASIC wasn't.

But Windows was. And they didn't have a good direction for Windows until they got OS/2 3 in the divorce from IBM, and hired Dave Cutler et al from DEC to finish it and make it work.

OS/2 and NT were built using tools developed on (and for) UNIX. Mostly, C.

Win NT is the next generation of both DEC VMS and OS/2, and it was built using the language that was being honed on, and for, VMS' rival on the PDP-11 and VAX: UNIX.

unfocussed_mike · on March 27, 2022

> With a few terabytes of nonvolatile RAM, who needs an SSD any more?

I don't know if the author intended this to be rhetorical, but the reason "disk" file storage exists is to store files, right? Files will outlive a system like books have outlived entire societies.

karmakaze · on March 26, 2022

We're talking about losing the constraint of filesystems that will burden future development.

At the same time we're clinging to programming using empirical thinking that came from the first machine language programs ever written. We should be at a point that we can think and write more declaratively even when executed sequentially. Numbers of cores has continued to go up and any language that doesn't parallelize well isn't worth investing in for the long term.

josephg · on March 26, 2022

Personally I'm horrified by the vision of 18 different electron apps, each using gigabytes of storage space even when they're not running.

But I think there's always some essential state that we'll want to persist. And I agree that modern filesystems are way too janky. Its hard to store anything safely because there's no transactional support. Its hard to share edits between computers because files are totally opaque to the operating system. So my changes will clobber your changes. And the save-load loop is inefficient. Why do I have to write out my entire file again after making a single character change?

But the core promise of persistent values is important. Why can't I just bless a variable in my program (and give it a name) and have it persist between instantiations? Like, if I give x some persistent identity, then set x = 5, it should still be 5 when I restart my computer, reload my webpage, or open a second web browser.

I'm not sure if persistent memory is the answer. My take is that I think we need to start using CRDTs more deeply in our operating systems.

A well written operation based CRDT (Concurrent Replicated Data Type) is an append-only log of patches, which together create a mutable value. Treating it as an append-only log makes it easy to store on disk or replicate over the network. And treating it like a value makes it easy to program against. The log of changes grows over time, but you don't have to store all of it. (Or really, any of it). You have a tunable knob of how much history you want to keep around. More history data means you can support branches, merging and time travel. Less history makes it smaller. (And the data you need to keep for merging is really tiny in practice, btw.)

With CRDTs we can make magic variables which work like that if we want. And unlike persistent memory, CRDTs can share their value with other processes, other devices and other users.

Rather than sysfs, procfs, etc, I want all the OS's internal state exposed as live variables I can just subscribe to. And I want my own data (documents, etc) to work exactly the same way. For some data sets (like my source code), I want to keep a full change history. And for other data sets (my CPU temperature) that should be something I explicitly opt in to.

But regardless, all of this data should be able to be shared with a flick of the mouse.

oneplane · on March 26, 2022

As wonderful as ideas are, change from technology only becomes change in availability. Or, in simpler terms: it's not always about what is possible, and quite often it's about what is needed instead.

Someone · on March 26, 2022

Earlier HN discussion at https://news.ycombinator.com/item?id=26066762 (119 comments)

lproven · on March 27, 2022

Yeah, I submitted it myself right after I made the talk.

LichenStone · on March 26, 2022

This reminds me of some Bret Victor talks.

lproven · on March 27, 2022

I will take that as high praise! :-D

BruceEel · on March 26, 2022

Alas, my Macbook Pro computer is so obsolete the video won't play!

raybb · on March 26, 2022

Their website seems to be struggling a bit with the video.

Here's a mirror that will only last 24 hours or 100 downloads. https://wormhole.app/kr6xX#tHs1gcAJfyBgN5ti0tBpxA

Alternatively, here's a p2p mirror that will stay up as long as someone has the page open: https://instant.io/#8587ef24d016aff8c87c6de186e1e8584270a4c7

BruceEel · on March 26, 2022

Thank you!

andi999 · on March 26, 2022

I think the next step should be focused on security. This captures part of it nicely: https://xkcd.com/1200/

I mean any user space program can harm all (same) user space data and since often there is only one user in a system this is fatal.

fuzzfactor · on March 26, 2022

Well I've been doing computerized gas analysis since dirt was rocks and only the rare laboratories with mainframes could do it.

Before the first benchtop data systems came along you had to evaluate the curves from the graph paper yourself and then calculate numerical results by hand. A single-point report like CO2 content alone is quite easy & quick manually, but when you have dozens of hydrocarbons on the same graph that all need to be calculated in parallel it was not so quick and more prone to error.

Programmable calculators could then be used to save time on some of the purely numerical work for the high-data runs, and this is about like what would become the final part of the built-in workflow for these early application-specific computerized gas workstations.

Offices didn't have PCs yet, just typewriters, copiers, filing cabinets and simple calculators.

For laboratories the built-in gas report printers were leapfrogging technology with each generation, the inkjet first appeared in a HP model for the instrument lab before they began to apply it to office equipment.

The evolution has been interesting.

One particular series of late 1970's design benchtop data system was multi-user and multi-tasking way before Windows. Like others it took in the raw analog signal (still do but they're "black box" interfaces connected to PC's for further calculation now) from the benchtop gas analyzer in real time and produced digital data from the curves on the graph. This electronically accomplished that first manual procedure of measuring the curve to begin with.

From that point on it's all calculation.

Each gas run was autosaved in memory as a file having an automatically-assigned file number, no more than 3 digits. There was no way anybody had enough memory to record the entire analog signal at the time so this was just the key geometric features in digital form.

There was no file system.

Today on PCs the entire analog gas signal is recorded to HDD in digital format routinely. And people still lose files all the time, don't ask me how I know.

But for that antique data system there was no storage other than memory absolutely required, many bare-bones consoles were issued without the optional micro-cassette tape drive. The tape was quite slow and best for long-term storage or disaster recovery. An optional COM port became available years later which was faster, but there was no workflow change for most users. A great deal of fairly advanced work could be accomplished by many without need for any storage device I/O over a period of years. Just a one-page printout about each gas destined for the physical filing cabinet.

You never were supposed to turn the console off. Except whenever you wanted to! The major use case is 24/7 so there were expensive battery backup units to protect from short- or long-term power failures, which only preserved the memory. If the power failed the battery kept everything the same and you only lost a gas run if it was in progress and had not been saved in memory.

But you could also power down the console from the main switch any time and its high-power-consuming hardware, processor, and power-supply would go cold. Like when you're done for the day, the gas runs are over, and you were finished sitting at the console. The memory alone would still be powered by the battery backup unit without depleteing the battery at all unless the backup unit itself lost power. Under backup power you could also change any circuit board other than the dedicated memory PCB, including the main power supply, and it would power back up with the memory intact.

The OS was immutable and in ROM where it should be.

Also the default OS performance upon cold power-up was adequate for many simple gas tests without need for user programs or reference data files to be loaded beforehand. For many operators a few minutes of manual parameter entry gave full disaster recovery. So those guys didn't even need the expensive batteries, tapes or COM port. It was basically application-specific enough for the simple stuff already.

You could then extend that with optional user programs to more complex automation & calculations but only up to the limitations of memory.

User programs ended up as file numbers too so you were your own file system.

And to just get a truly worthwhile file listing, you had to write your own program.

Glad there were only 3 digits.

These things did achieve nearly zero show-stoppers per year on average, unlike modern PC systems which are in the handfuls at minumum and dozens for many operators on average.

There was no booting, the OS ran from ROM and the memory was accessed as needed from there. Optional storage devices were only needed for user items, the OS did not require any files from storage, and only one file present by default in memory. Simply its own memory work area.