I think this is underrated as a design flaw for how Linux tends to be used in 2024. At its most benign it's an anachronism and potential source of complexity, as its worst it's a major source of security vulnerabilities and unintended behavior (eg linux multitenancy was designed for two people in the same lab sharing a server, not for running completely untrusted workloads at huge scale, so it doesn't really implement resource fairness or protect against DoS well).
I haven't had a chance to try it out but this is why I think Talos linux (https://www.talos.dev/) is a step in the right direction for Linux as it is used for cloud/servers. Though personally I think multitenancy esp. regarding containerized applications/cgroups is a bigger problem and I don't know if they're addressing that.
Actually, I have been wondering if using a Linux system as multi-user could be a boon in security.
As single user, each and every process has full and complete control of $HOME. Instead, I would prefer all applications were sandboxed to their own little respective areas with minimal access to data unless explicitly authorized. Without going full QubeOS, get some amount of application separation so my photo utility does not have permissions to read ~.ssh.
Create a user account for each application (Firefox, Email, PDFReader, etc). Run each of those applications as the foreign user account. Each application now has its own $HOME with minimal user data. Barring a root-escalation or user-separation bug, the data in your true HOME should be isolated. Even process/environment variable space should be segregated.
This also has a win in that it becomes possible to better segregate the threat model of less trusted applications. Doing granular network permissions per application is a bit hairy in Linux, but it is trivial to fully deny network access to a specific user account.
Not true isolation, but for the semi-trusted development environment, gets you a little something.
I was experimenting with taking that to a median using containers in nixos. IMO the distinguishing feature of qubes is the fact that there's chrome indicating the security level of a window based on its vm - I put together https://github.com/andrewbaxter/filterway to use with window manager rules to hopefully get the same result.
For each container I'd run a `filterway` process with a unique app id outside the container and mount the filterway wayland socket inside the container, then wayland programs in the container would just work IIRC (maybe needed to set an environment variable for the wayland socket, or xdg_runtime_dir or something).
I think the wayland compositor itself was running as a user, so I had some setuid commands so that the system bar launch icons could start/stop the containers as the wayland user.
IIRC wayland was pretty flexible, just mounting sockets in various places and making sure permissions were set on the socket worked great.
Some other quick notes: App ids are optional in the wayland spec, but as long as you don't run any such apps in privileged contexts (outside of a container) you can still visually distinguish those. Also IIRC Sway didn't have the ability to vary chrome based on app id - I thought I'd try to indicate the permission level in the task/system bar instead but I think other compositors do have more powerful window decoration rules.
Those solutions seem more aimed at keeping the system clean vs isolating what resources a program can access.
Flatpak does indeed get me part of the way there with better isolation, but available apps seem so scatter shot that I need a fallback mechanism for when there is not an official Flatpak artifact. Distrobox makes a point of indicating they are not a security boundary.
For shared local usage or "pet" (as opposed to cattle) servers I'd agree, and in fact this is close to what Linux was designed for, since I'd consider the multiplexing lab server to also be a semi-trusted environment.
I'm referring more to how Linux is used in vast pools of "cattle" servers in Cloud, locally by eg one main user (who doesn't need multi-user but probably still needs some notion of "admin" and per-program permissions), or in a corporate setting (where the actual identity system is managed remotely). This is probably >99% of Linux environments.
> As single user, each and every process has full and complete control of $HOME. I would prefer all applications were sandboxed to their own little respective areas with minimal access to data unless explicitly authorized.
This is what OpenBSD's unveil does. Firefox for example only has access to ~/Downloads (and some stuff in ~/.mozilla, ~/.config, ~/.cache) in my home directory.
Now this looks promising for mere mortals. I found jart's Linux port of pledge[0] which makes it seem possible to simply wrap utilities through a preceding script. If I couple this with distrobox/podman (which should work fine?) I might be able to pretty seamlessly lock down utilities by default with minimal shenanigans.
Assuming it does what it says on the tin, and it can work with GUI apps, this would get me almost all the way.
> Instead, I would prefer all applications were sandboxed to their own little respective areas with minimal access to data unless explicitly authorized.
You’ll be interested to learn about systemd-nspawn. You can sandbox stuff with it really easily. It is like chroot so not really resource intensive, lighter than a container.
I think a pretty useful thing you can do is boot ephemeral instances. So whatever someone does there gets undone. Useful if you’re doing system testing or CI. Because you just set up the machine once and then your scripts and whatever can do what you want. Perfect example is when trying to test install scripts.
Though this is also kinda the point of flatpak and snap. Though these are controversial in the Linux community. Then again a lot it people dislike systemd, though fewer than originally.
The nspawn does look interesting, and potentially exactly what I want. Although, this wiki page is dense enough that I am concerned I am going to somehow misconfigure it and be less secure than I would be without using it.
I Flatpak wherever I can, but several of my required applications are not first-party packaged, which makes me extra squeamish about installing them.
I read a good chunk of that wiki link, but didn't really come away with an understanding of how it differs from just using Docker for sandboxing an app.
It differs by not being insane. Trivial functionality that actually works. It's what's good about systemd.
It doesn't require forwarding sockets or giving free access to root just for building images. It doesn't explode just because you touch your nftables rules. It doesn't suddenly expose a process to the Internet because of some undocumented option. You can use all the normal tools such as auditd and SELinux without having your configuration overwritten by a madman.
You’re missing the trees for the forest. At a high level they are the same, just as with LXC or podman or others. But it’s the details which are really important. Because your leveraging the system you can really shrink down the size, another user mentioned. But there’s also a convenience in just being able to use systemd when its already built into your system.
I suggest also reading
man systemd-nspawn
Just type it into your terminal, you don’t need to install anything
*Nix was designed to be multi-user. It's probably the only security boundary that was present from nearly the start. I think there are some rough edges on the user-per-application model, but it should all be scriptable so that the machinations are mostly hidden.
I agree that multi-user should go away for modern server workloads, however, users are used as a blast door. Mainly because Linux's security model is lacking. systemd for example commonly runs services under separate users to make it more difficult for a compromised application to elevate privileges. Android does something similar AFAIK.
Users should have never became a security boundary to isolate applications, but they unfortunately have, and there's not really an alternative.
This is why I think multitenancy is the more important problem (though both are related), because it's the key to solving shared-kernel application permissions without "users". Containers were a step in the right direction but aren't a sufficient security boundary in themselves - what is currently handled by the "container runtime"/sandbox needs to be built into the kernel IMO.
Linux's security model doesn't become better just because everybody is doing it that way, and besides that, everybody is doing it because they are copying Linux.
Nah, its been lacking since inception, with people trying things like chroot jails and suid bits decades before Linux was a twinkle in an eye, and we still regularly fail at running untrusted code.
My impression was that all the hosted k8s providers are doing multitenancy with if not full per-customer VMs, at least additional abstractions like gVisor.
Are there some that aren't? Or are you referring here more to untrusted/shared in the sense of platforms like Github Actions just running everyone's different loads on the same pool of kernels?
Right, I'm talking about shared-kernel multitenancy. Shared-kernel multitenancy isn't just about reducing the OS overhead from host + one or more VMs (or sandboxes) to just a single host, it's also about not having to continually start and stop virtual machines/sandboxes, which introduces its own resource overhead as well as a latency hit (which essentially always coincides with resource pressure from increased usage, since that's why you're scaling up) every time it's done. Also, even VMs and sandboxes don't really protect against DoS/resource fairness/noisy neighbor problems that well in many cases.
Why does this matter? Incurring kernel/sandbox boot overhead on cold start/scaling makes it so that services have to over-provision resources to account for potential future resource needs. This wastes a lot of compute. I also think it's incredibly wasteful for companies to manage their own K8s cluster (if K8s supported multitenancy you'd probably want only one instance per-datacenter, and move whatever per-cluster settings people depend on to per-service. This is also much closer to "how Google does things" and why I think Kubernetes sucks to use compared to Borg), again because of all the stranded compute, and also because of the complexity of setting it up and managing it - but without shared-kernel multitenancy, multi-tenant K8s has to employ a lot of complicated workarounds in userspace. Or you can use a serverless product, ie pay for someone else's implementation of those workarounds, and still suffer some of the resource/latency overhead from lack of shared kernel multitenancy.
This is one of the problems I want to address with my company BTW, but it would take years if not decades to get there, which is why I'm starting with something else.
I haven't had a chance to try it out but this is why I think Talos linux (https://www.talos.dev/) is a step in the right direction for Linux as it is used for cloud/servers. Though personally I think multitenancy esp. regarding containerized applications/cgroups is a bigger problem and I don't know if they're addressing that.