Maybe its me, but I don't see what the restrictions would be to make an image which had a root FS inside it, without privileges. The "bits" which make it be a root FS don't need you to "mount" it as a root process and run chflags, you can dd into the image, change things, do math, change checksums. Sure, its a lot harder but the principle of what an image is, is: its a stream of bits. If you can modify the stream of bits, you don't need root to mark it as a region which has magic root properties be it a chflags FS state, or a setuid bit of whatever.
Also, again it may just be me, but if you are running a hypervisor limited VM image on a stream of bits, and you can modify those bits outside that state, restricting this VM not to have root runtime is slightly odd.
This reads like a proscriptive "no root" rule has been metasploited into "we will wake you at 3am to check if your dream, you are running as root" type extremism.
It's less about "could I write a program to modify bits directly?" and more about "what can I get that I won't have to support myself for the rest of time?".
Nothing stops anyone from writing code to interpret Dockerfiles or to directly fiddle with image layers. But taking the cost:value ratio proportional to everything else you need to be doing, it's probably a poor investment of time.
Google has economies of scale around this exact problem, which is why they've been pumping out work in this area -- Kaniko, Skaffold, image-rebase, FTL, rules_docker, Jib etc.
So his story reduces down to the real problem: what tooling can I find, which runs without root, but makes images which include root outcomes, for the set of things I need in an image which can't run un-privileged, and those tools need to work without running setuid() or seteuid() to root.
Thats a good story btw. I have people working near me who probably want the same thing from a lower driver, but nonetheless interest in non-root required builds.
> Maybe its me, but I don't see what the restrictions would be to make an image which had a root FS inside it, without privileges.
There are many reasons why those restrictions exist, it's mainly related to what types of files you can create and how you could trivially exploit if the host if things like mknod(2) were allowed as unprivileged users. There's also some more subtle things like distributions having certain directories be "chmod 000" (which root can access because of CAP_DAC_OVERRIDE but ordinary users cannot, and you need to emulate CAP_DAC_OVERRIDE to make it work).
In short, yes you would think it's trivial (I definitely did when I implemented umoci's rootless support) but it's actually quite difficult in some places.
Also unprivileged FUSE is still not available in the upstream kernel, so you couldn't just write your own filesystem that generates the archives (and even if FUSE was unprivileged it would still be suboptimal over just being more clever about how you create the image).
Also, again it may just be me, but if you are running a hypervisor limited VM image on a stream of bits, and you can modify those bits outside that state, restricting this VM not to have root runtime is slightly odd.
This reads like a proscriptive "no root" rule has been metasploited into "we will wake you at 3am to check if your dream, you are running as root" type extremism.