Hacker News new | past | comments | ask | show | jobs | submit login
Tiny Linux distro that runs the entire OS as Docker containers (github.com/rancher)
219 points by purak on April 29, 2017 | hide | past | favorite | 171 comments



This is starting to smell like a system on top of a system to fix something that could be fixed in the system. Kind-of like implementing a filesystem on top op a filesystem... or putting a database on a filesystem to run another filesystem inside the database, or using a webbrowser as a runtime instead of an operating system.


This kind of thing happens when the base system is ubiquitous and therefore hard to change. It's easier to layer something on top of the base, where people can "opt in" and there's a large preexisting compatible audience.

Changing the base layer itself at a minimum requires people to upgrade, and now you don't have that advantage of the preexisting audience anymore. If your improvement requires a breaking change, then you're in a real pickle. So people will stack on more and more until it becomes more or less unbearable.


There is an advantage in layering components, or building new software on top of existing components.

However, a minor improvement or bugfix should be done in the component that is responsible for it. Creating another layer instead of fixing the problem is just creating more problems.


You can go whole hog with "containers" by just adopting unikernels.

But, unless you are willing to abandon Linux/Unix, this is kinda where you are left.


How would that work when containers make kernel calls?


Where containers would make kernel calls, unikernels make calls to the VM hypervisor. (Not exactly, but kinda.)


Which is why highly flexible, powerful base systems are usually preferable to less flexible base systems optimised towards specific goals.

Common Lisp is my go to example of this. It's possible to extend the language to support entirely new paradigms without breaking it, because of the power of the macro system.


The reason a new layer is being built is because people don't want vendor lock in and VM's aren't portable anymore.

You can't take an AWS VM and fire it up on DigitalOcean without trial by fire. VM's should be portable but they're not because cloud providers don't want them to be.

For a while everyone was locked into AWS but now that there's other options some companies want to hedge their bets, or even run parts of their workload locally.

Since the VM space is all proprietary now people are moving one level up and building a consistent environment above it. I fully expect docker to become proprietary in a couple years and the cycle will begin again.


"You can't take an AWS VM and fire it up on DigitalOcean without trial by fire. VM's should be portable but they're not because cloud providers don't want them to be."

You can't ?

I have never used DO, but I have deployed linux and FreeBSD systems on EC2 and moving them to bare metal was just a tar command away ... you can even pipeline 'dd' over ssh if you want to be fancy ...


You could say that it's as easy as using 'rsync'...


A useful analogy for understanding meatspace ecosystems. Software is a bit like pond scum.


The whole notion of containers is basically this. That's why I am not sure why not just fix the OS. If there's anything to fix in the first place.


I always felt, perhaps uncharitably, that the point of containers was "those other programmers are idiots so we need to encapsulate everything for the sake of defense"


One huge benefit of containers is that you can treat a program as something atomic: Delete the container and it's gone, as if it were never installed.

Modern package management systems like APT spend a lot of effort installing and removing files, and they don't do it completely; any file created by a program after it is installed will not be tracked.

You could accomplish the same thing in other ways (as Apple's sandboxing tech does), of course.


"Modern package management systems like APT spend a lot of effort installing and removing files, and they don't do it completely"

Well, there /is/ another way to do it.

STATIC LINK ALL THE THINGS

Which would work if licenses and copyrights didn't exist.


That's one of the main touted benefits of containers (a.k.a build reproducibility). You can view containers as a an overly complicated way to make software with complicated deployment brain-dead easy to deploy.

We're at the point in the hype cycle where it starts getting fashionable to dismiss that as an overkill, but the reality for most of us out there is that most software is way more complicated than a single executable and containers make it easier to deploy complicated software.


I'm talking about mutations to the file system. Things like database files, logs, /var/run, etc.

Managing internal dependencies (like libraries) is another concern entirely. But containers are good for that, too.


> Which would work if licenses and copyrights didn't exist.

I don't think it would.

Dynamic linking allows a library to be patched once and have the patch apply to all the programs using it. If every program was statically linked, you would have to update each one individually.

Not to mention the waste of space.

I'm guessing much of that is moot these days, but IMHO it's still something to aim for.


Patch a library and perhaps you end up breaking some programs that rely on that library.

The benefit of that goes away with containers anyway, you don't share libraries, every instance gets its own install.


Could have sworn that _nix already had mechanisms for loading different lib versions side by side...


I think GP is being snarky/sarcastic.


Programs either shouldnt create such files or the if the user created them they shouldn't be removed.


Are you saying programs cannot create files? That's nonsensical. /var exists for this purpose.


tbh, I see the security point in Docker as a huge risk. Basically you're depending on everyone in the chain to regularly recompile your images or you will get compromised eventually. There is no such thing as an apt-get dist-upgrade or any way to create real useful audit logs with Docker.

On the other hand, deploying stuff is dead easy now. You just tell the hoster "deploy this Docker package, expose port X as HTTP, and put a SSL offloader in the front" and that was it, no more "we need <insert long list> to deploy this" or countless hours spent with the hoster on how to get weird-framework-x to behave correctly.


This is a good reason to build your own rootfs for containers and have as little as possible in them.


I work in a big company. It's much easier for us to spin up a container to run whatever experimental program we've thought might be useful to help us do our job than to provision a real box for it to run on or fit it into the whole bureaucracy.

If it's actually useful we'll find someplace for it to live (or just in containers if that's all that's needed). If not, finding that out was cheap.


This is a more charitable interpretation of what I meant: "we can quickly test the value of an idea via a prototype without the cost of making it super well behaved in other areas." Which is an excellent application of containers!

I just sometimes wonder if the cart hasn't gotten in front of the horse on the whole container front. The fact that the term "bare iron" has been hijacked to mean "not under virtualization" made me start to think that something is seriously fucked up.


It very much has. Mix it in with devops and agile, and before you know it everyone is throwing hot code directly to "prod"...


If you have machines with unused capacity, why would deploying a docker container image be easier than deploying a RPM package?


Because either there is no RPM or the RPM conflicts with other RPMs on the machine.


There is software collections for RPMs, which is basically namespacing. Can have quite an operational overhead if the thing you want to install has a lot of dependencies though.


Sounds like then you should learn what a chroot is. All the existing linux platforms already provide the solution to your exact problem with much less overhead than docker.


I agree this is possible, but the tools are somewhat obscure and slow. You can debootstrap in a chroot and then install packages, but it takes a long time, and likely re-downloads hundreds of megabytes packages you already have on the system.

I don't like Docker, but I think it does some differential compression with the layers when you are modifying the image. So you don't have to re-do this install from scratch.

You may also run into issues with user IDs and various system config files in the chroot. Configuring the service with flags, env vars, and config files is a bit of a pain.

Docker is essentially a glorified chroot... I've essentially tried to rebuild it, and it is unfortunately a lot of work.


Only that chroot wont also run on the development machine, running a totally different OS.


This is a self-inflicted problem. 787 engineers don't have to test their work on a Learjet.


You do realize that the scenario you have described could be carried out with chroot just as easily, right? The only way "containers" help (and not containers as is, but Docker specifically) is that they ship the whole chroot OS image with all the dependencies.


I think the "just as easily" is highly debatable. I'm not a container evangelist by any stretch, but if you were going to take a "just chroot it" approach, the very first thing you'd want to do to ease the operational burden is define some kind of standard app packaging format that defines what's in the chroot and an entry point and an environment, and maybe some scheme for mapping external data and various other niceties you get with containers, and at that point, congratulations, you've reinvented containers.


> I think the "just as easily" is highly debatable.

Heads up, you're talking about a different thing. You want to run these things as normal operations, but gunnihinn was talking about deploying an application to check if it is of any value. Chroot is just enough to make a mess as the application's developer instructed in INSTALL.txt without the need to worry about cleaning up afterwards.

And by the way, you seem to be confusing containers and Docker.


I disagree. Just google "doesn't work in chroot" and you'll be reminded of a litany of issues that come up when trying to build/run things in a chroot, and a container containing a linux distro makes a tidy little sandbox which generally avoids those issues. It's somewhere on a spectrum between a chroot and a VM, which I think a lot of people find value in.

And I'm not confusing containers and Docker, I'm just speaking a bit imprecisely. In my experience, conversations about "containers" are rarely about raw containers, but rather some specific containerization scheme and tools (e.g. docker). I suspect everyone in this entire subthread means "docker containers" when they says "containers."


> Just google "doesn't work in chroot" and you'll be reminded of a litany of issues that come up when trying to build/run things in a chroot

Yeah, some newbie forgot to mount-bind a necessary directory like /proc or /dev, didn't provide sensible /etc/resolv.conf, or messed up host's and chroot's paths, either in request or in configuration. Nothing that would render chroot unviable. Is this what you meant?


Why would anyone want to get all that right manually when containers manage all that automatically?


Good question. Why anyone presented with fancy and fashionable third party software would use a mechanism that has been present for decades, ships with the operating system, works reliably and predictably, doesn't change substantially every quarter, doesn't do any magical things to network configuration, and is easy to inspect, debug, and adjust for an outsider? Why indeed?


So you have an organizational problem and you're hacking one problem with another hack. Clearly nothing can go wrong with this approach.


I don't see that as an organizational problem, but a way to try something out safely and faster.


I was referring to how they are unable to provision the right resources to run experiments because of all the red tape and they are getting around the issue by using docker. What happens when the red tape guard get wind of this?


I don't think so. Consider that computers are generally so powerful these days that when running a single application stack their seriously under utilized. The first way people went about getting this going was Virtual Machines (made popular by IBM with its VM/370 OS :-) and that works well but when all of your clients are running the exact same OS down to the same version, it is kind of waste to have 'n' copies of the OS loaded, so containers are a 'semantic' fractioning of the resources where the OS is common but the set of processes are unique to a client. You still need a way to allocate from the single set of resources so containers provide that abstraction.


What exactly is the problem with process sandboxing and language level VMs? The industry is tackling all the wrong problems. So have we given up on process sandboxing with capabilities? The whole containerization movement is one giant hammer to kill a fly kinda business these days. While you guys are figuring all this out I'm gonna stick with BEAM, JVM, and other tried and true methods. I'll check back in another 5 years to see what this mess has turned into. Maybe that's enough time to figure out how all the overlay networking is working out because, you know, we need a few more overlays between the hardware and the VM.


Because then each piece of software you develop is limited to a single process written in a single language. How do you use "process sandboxing with capabilities" to isolate, say, an Erlang process that uses a NIF embedding a shared-memory section to interact with a companion OpenCL C process? With containers (or VMs), the answer is obvious; with POSIX-level isolation primitives, not so much.


http://man7.org/linux/man-pages/man7/capabilities.7.html

There are several capabilities in that list to address your problem. Since your question is concerned about memory you can search for "memory" in that page and note the exact capabilities you will need.

OpenBSD has even more features (https://en.wikipedia.org/wiki/OpenBSD_security_features) alongside the capability model.

My gripe is that instead of learning how to properly use their tools and platforms people have started looking for golden hammers like docker and now we have a mess like RancherOS. If someone can explain to me how isolating the system services in docker containers (which by the way are not isolated since they are privileged) does anything above and beyond what capabilities and the security features in OpenBSD provide then I'll concede the point. My guess is there are no advantages and people are jumping on bandwagons they know nothing about and since the previous generation of tools was not utilized to its full extent neither will all the new docker hotness. People are hammering square pegs into round holes.


Note that my point wasn't about how "your process" can be non-isolated in certain ways using capabilities, such that you could, in theory, sandbox both of those processes individually and have them still do whatever IPC you want; yes, this is certainly possible, and obviously more sensible if one of those processes isn't so much "your" process as it is some other process managed by some other party that you're interacting with.

My point was more about the "developer UX" of needing to isolate things that way. Containers have the semantics of isolating a group of processes, but not performing any internal isolation between the processes in the container. This is almost always what you want—you want "your app" to be able to have multiple processes, and to not have any security boundary between the parts of "your app", just between "your app" and "other apps" or the OS. In other words, you want to have a single "process"—from the OS security subsystem's perspective—whose threads happens to be composed of multiple PIDs, multiple binaries, and multiple virtual-memory mappings. You want to fork(2) + exec(2) without creating a new security context in the process.

Sandboxing would make perfect sense if single processes were always the granularity that "apps" existed at. Sometimes they are. Sometimes they're not, and people do complex things with IPC-capability-objects. And sometimes, when they're not, that fact pushes people toward avoiding multi-process architectures in favor of monolithic apps that reimplement functionality that already exists in some other program in themselves, in order to bring that functionality into their platform/runtime so it can live in their process.

Containers let people avoid this decision, by just applying things like capabilities at the container level, rather than at the process level.

> If someone can explain to me how isolating the system services in docker containers (which by the way are not isolated since they are privileged) does anything above and beyond what capabilities and the security features in OpenBSD provide then I'll concede the point.

Completely apart from the above, my understanding of Rancher is that it's the Docker part of "Docker container", not the container part, that provides the benefit there. Docker is a packaging and service-management system; that its packages use containers is frequently beside the point. Rancher's system services are Docker images (i.e. Docker "packages"), and so you use Docker tooling to create, distribute, manage and upgrade them. If your own application on such a system is managed through Docker, this provides a neat solution to unifying your operations—you just do everything through the docker(1) command.


Ugh, "apps". If ever there is a tortured term in computing these days it's that one.


Okay, how about "a purchased software product launched through a GUI"? There's no guarantee that it's a single process, but there's an assumption that it's a single security context.


Well at Google where a lot of development on containers took place the problems with process sandboxing and language level VMs were machine resource allocation, both in memory and in I/O bandwidth. Lets say you have three "systems" on the box, one is a collection of processes providing elements of a file system, one is a collection of processes providing computation, and one is a collection of processes providing chat services. Now you want to allocate half of the disk i/o to the file services, and a half to the compute system, then 75 percent of the network bandwidth to the chat system, 20 percent to the file system, and the remaining plus any "unused" to the compute system. You want half of the memory in the system to go to the compute system and give the rest to the network file system processes.

That is a complex mix of services running on a machine, some sharing the same flavor of VM, and you're allocating fractions of the total available resource capability to different components. If you cannot make hard allocations that are enforced by the kernel you cannot accurately reason about how the system will perform under load, and if one of your missions is to get your total system utilization to be quite high, you have to run things at the edge.


Because process sandboxing turned out to be insufficient, and we need layered sandboxing with several levels for protection for security.


"it is kind of waste to have 'n' copies of the OS loaded, so containers are a 'semantic' fractioning of the resources where the OS is common but the set of processes are unique to a client. You still need a way to allocate from the single set of resources so containers provide that abstraction."

Agreed. What I find so odd is that this problem was simply and elegantly solved in BSD with 'jail' and everyone went about their business.

I do not understand why (what appears to be) the linux answer to 'jail' is so complicated and fraught and the subject of so much discussion.

I am not sure that containers and their build scripts represent the $huge_profit_potential that people think they do ...


Do you feel the same way about visibility in OOP? E.g. private data and methods?


Containers are just unifying the concepts of namespaces and cgroups in to environments -- I'm not sure that there's a better way to do that at the OS level, as I enjoy the primitives being separate and having the potential to remix them as my understanding of containerization evolves. (Okay, so there's a few other things mixed in like SELinux settings.)

I think there's interesting ideas of environments in other operating systems, but I'm not seeing how you'd make things better by flattening containers into the OS, per se.


One obvious tweak would be to make it so that every bit of isolation that containers now get by default, you instead get per POSIX process-group/session or somesuch, so you don't have to think in terms of containers to get the benefit of containers—they're just something the OS does transparently whenever you make it clear that a set of processes forms a distinct, separate cluster.

Making existing programs compatible with such a paradigm probably wouldn't be any more work than e.g. adding SELinux/AppArmor support.


Nah, that would be too easy. Let's instead introduce yet another notion of sessions (logind).


(I'm not saying this is a good thing but ...) Containers are often used where you want to run several programs, but each depends on a mutually incompatible set of libraries. Two programs need python-somelib-1.0 and python-somelib-2.5, but both versions can't be installed at the same time. Or you need to upgrade the programs at different times and during the upgrade window they'd depend on different versions of python-somelib.

Now of course the correct way to solve this would be (a) to make both programs use the same version of python-somelib, (b) make upgrades happen atomically, (c) for Python modules to actually have some API backwards compatibility. But in the absence of doing the right thing you can use a container to effectively static-link everything instead. And worry about the security/bloat/management another time.


You forgot possibility (d) allow multiple library versions to be present simultaneously.


See: Virtualenv (for Python), Snappy, Flatpack

The main advantage of docker, as far as my understanding goes, is more of a prebuilt system configuration thing. Need a database? Load the prebuilt PostgreSQL image onto the respective machine.

One thing I think should get more use in general is that Dockerfiles are essentially completely reproducible scripts[1]. Too many companies I've seen still use Word documents full of manual steps one can easily get wrong for all their machine setups (especially in the Windows world). If you want to test something quickly, you're bogged down for a day.

[1]: example: https://github.com/kstaken/dockerfile-examples/blob/master/m...


Over the last 15 years I've built and maintained Linux distros using apt, with custom debs and pressed files which handle configuration, automatic upgrades for most systems, and a small shell script and ssh for the rest.

Now it's all docker rather than packages, ansible (which leaves no trace of what it's doing on the target machine) rather than a for I in 'cat hosts'. Fine, but where's the benefit?


Note that I didn't make any value statement towards Docker - I don't have that much experience with it and found their infrastructure to be rather clunky, when I tried it. So, I'm not sure I'm qualified enough to make definitive statements about its usefulness.

I can see, however, a benefit in encapsulating different services in different containers, as this potentially gives you some control over them (available disk space, network usage etc.) * . On top of that, I imagine starting out on a "machine agnostic" approach can be rather useful if you have to change your network landscape further down the line: If your database already is configured as if it were running on its own server, there's certainly a bunch of unintended coupling effects you can avoid.

That said, I can't see Docker being the silver bullet it gets hyped up to be sometimes. But that goes for most new and shiny things in the tech space...

* And yes, that's already feasible without containers. Docker's approach to this seems to provide a way to do it in a much more automated way than most alternatives though.



> The main advantage of docker, as far as my understanding goes, is more of a prebuilt system configuration thing

I think the main advantage is that it standardizes the interface around the application image/container. This allows powerful abstractions to be written once rather than requiring a bespoke implementation for each way that the application is structured. Imagine writing the equivalent of Kubernetes around some hacked-together allows-multiple-versions-of-a-dependency solution. It would be a nightmare. But because the Docker image/container interfaces are codified, you can build powerful logic around those boundaries without needing to understand what's inside those images/containers. Dynamically shifting load, recovering from failures, automatic deployments and scaling are all much easier when you don't have to worry about what language the application is written in or how the application is structured.


There is a way to do this without the security/bloat/management issues. Take a look at the approach of GNU Guix. Both versions of a package can coexist in the system. Libraries are dynamically-linked and security updates can be applied fast (without building the world), using the "grafting" system.


Actually, Lennart proposed some years ago exactly how you could "fix" the OS to apply the ideas from containerisation. http://0pointer.net/blog/revisiting-how-we-put-together-linu...


This seems horribly complicated and ridiculous especially since when it was written btrfs which was a requirement was pretty terrible.


At the time of writing, btrfs was seen as the hero to be. Things have changed quite a lot. Currently we are solving this problem with a combination of containers and an immutable filesystem (ostree). Anyways, solutions discussed in the open are always a good idea... How ridiculous they may be perceived. Lennart, a Club Mate to you...


Well the author is Lennart Poettering, the guy that "gave" the Linux world Pulseaudio, Avahi and Systemd...


It's nothing like that at all. It's just a minimal operating system with a small install image. The only special thing it does is replace the init process with a system-docker process to reduce the overhead of resources used by the operating system.

The point isn't really to "run more Docker". It's to eliminate as much operating system overhead as possible, so that nearly every CPU cycle and byte of memory usage is dedicated to your containers.


You realize how you're not making any sense. Docker is another abstraction so it can not be better in terms of resource usage than running whatever process you are now running inside docker. You literally have more overhead with the docker approach in terms of all resource utilization. You now have all the overhead of an OS and then you are layering docker on top of it. The OS has not gone away. It is still managing processes, memory, files, sockets, networking, etc.


Rather it's replacing systems like systemd and various other system daemons with the docker equivalents.

For example why run a network supervision daemon if dockerd or equivalent handle all the important complex pieces of networking via container orchestration? Why have a local package manager, or system port mapper.


I'm pretty horrified by the argument you're making. The reason all those things are separate things is because they serve orthogonal functions. By bundling all of that into a single binary how have you improved things? You've increased the attack surface, reduced stability, increased complexity, and made things a lot harder to test and verify.


Well can't speak to the horror of the design as I'm not a really proponent of that particular design. Just clarifying the parent's statement a little. Though the rancheros design is not particularly worse than what systemd does now, based on my recent experiences. It's all one giant (poorly?) implemented binary either way. From a pragmatic standpoint I don't see a difference. What's the difference between one opaque binary vs another, except possibly one's written in Go which I find easier to read if needed and is less likely to have buffer overruns. Really cutting out one crapshoot seems logical as at least there's only one system you'd need to learn. Still I'd like a non-either of those options approach.

Personally I prefer running SmartOS and Triton containers. Their system seem much more stable than any of the Linux containers and/or systemd setups I've tried. It sticks a bit more to traditional unix design which makes sense to me. Items like the caching layer for containers build on ZFS snapshots, a well tested file system layer, rather than ad hoc userland tools. Triton also runs all of the orchestration layers in separate zones (containers) like RancherOS is trying to build. But each component is a simple(ish) service, it's easy to `zfs list` and check on a container's file system or fix it or backup, etc. Same with SVC or VM machine management which both have small simple tools that do one thing pretty well.

To that note, docker has been moving towards breaking out and using smaller daemons haven't they? If that continues it might turn out more modular in the end wherein RancherOS would end up being more modular than systemd Linux setups. Imho, that'd be great.


Have you ever used Rancher OS? Not trying to sound like a judgemental jerk, but the arguments you are making are clearly from someone who is commenting based on a preconceived notion of what it is, without knowing what it really is. Instead of combating others in comments because they can't succinctly describe it to you, go research it instead. If you don't care enough to look into it, you shouldn't be caring to argue in the comments.


Not to sound like a judgmental and experienced jerk back but how many container orchestration systems have you built and run in production? Since I have first hand experience in building, hacking, and working around all the limitation of a few such systems deployed into production environments I think I know all there is to know about RancherOS.

By all means continue to run RancherOS and let me know how that goes when you're managing a few hundred to maybe upwards of a thousand VMs and then layering a container orchestration system with the underlying VMs coming and going on an on-demand basis. I remember thinking "I really wish I had more of this docker stuff in the OS itself. Because dealing with all the caching, volume mounting, and instability in userspace is so much fun".

I'll await your report because clearly my experience with these systems and all the ways they fail is too combative for your taste. There are a few things they don't tell you on the brochure when you're drinking the kool-aid.


It's simply billed as the easiest way to run Docker, well: "The smallest, easiest way to run Docker in production at scale."

I might give it a crack based on that. If you are hell bent on running Docker then an equivalent of the Ubuntu minimal install seems a good way to start.


Your comment made me more interested in this solution than I was at first glance. I tend to prefer pragmatism over elegance, which is why I like using a webbrowser as a runtime, for instance. It'll work everywhere.

If this idea has similar pragmatic advantages over the would-be best solution, then I'm game.


Have a look at http://genode.org/


>Kind-of like implementing a filesystem on top op a filesystem

yep, when you want a distributed filesystem building it on top of a good and already working local filesystem is a pretty robust and cheap approach.

>smell like a system on top of a system to fix something that could be fixed in the system.

container layer is basically a distributed OS. Making a distributed OS by "fixing in the system" is pretty much non-starter ... or a huge academia project.


How is docker a distributed OS?


Creator of RancherOS here. Thanks for the interest in our tiny distro. RancherOS was created the beginning of 2015 and at the time was quite a novel concept. We strived to not just use container technologies in a Linux distro but actually package everything as standard Docker containers. Fast forward two years, what we were doing back then is now becoming the accepted practice. Most major distro are adopting more container packaging approaches in the form of flatpak, snap, and containerd. RancherOS 1.0 LTS was just released a couple weeks back and we have started development on 2.0. 1.0 was a bit ahead of the time and honestly had to employ a lot of technics to make it work that we didn't like. With 2.0 we will shift the focus from Docker to containerd, OCI, and LinuxKit which will allow a much cleaner design.


Will there be an ARM version?


We released some versions for ARM but in the end ARM requires too much effort to support right now. As ARM64 matures and Docker gets full multi arch image support we'll revisit this.


This is clearly a trend, though it remains to see if it will garner enough acceptance to actually be "the future". systemd supports launching container-based services via nspawn and already namespaces "legacy" services very heavily. In fact, systemd et al were among the heaviest early drivers of cgroup technology for cleaner starting and stopping of groups of processes.


It very much is a trend, a trend of by developers, for developers. All this is so that some primadonna developer somewhere can use the latest languages and frameworks to build his "app" without some sysadmin or other uses of available computing resources gets in his way.

It's about kicking the question of static vs dynamic can up the stack, as now you have something that is a hodgepodge of dynamic bindings that seems to behave like something static as long as you do not look behind the curtains. Oh, and don't mind the turtles...


> This is clearly a trend

More like a virus.


I am starting to wonder, why not just execute processes directly with cgroups commands?

  $ cgcreate -g memory,cpu:groupname/foo
  $ cgexec    -g memory,cpu:groupname/foo bash
https://wiki.archlinux.org/index.php/cgroups

It's the bare basic that libvirt and Docker et al are based anyway. So if you want to run just one process per "container" it seems rather logical to keep it simple and use cgroups commands directly. (Similar on Windows, using just sandboxie is so simple. Or do it like Android, execute every app/process with a different user.)

https://en.wikipedia.org/wiki/Cgroups


Managing cgroups directly would be redoing a lot of work already done by libcontainer/runc (which we leverage through Docker now, probably containerd in the future). Using these projects in RancherOS allows us to focus on other things and not reinvent the wheel.


A lot of different concepts got conflated into "containers", but a big value add of systems like Docker over this is the use of immutable FS images with built-in distribution mechanisms. Just like it's nice to have git remotes and porcelain around them vs. the limited utility of git if you couldn't push and pull.


systemd already puts each service in a separate cgroup AFAIK, so commands like cgexec aren't even needed. I suspect people are more interested in namespace separation, but recent versions of systemd can also do that [1]. I don't think systemd has image management so that's still a reason to use Docker.

[1] https://www.freedesktop.org/software/systemd/man/systemd.exe...


Systemd creates three "slices": system, user and machine. Daemons started by systemd go in system, user processes in user and virtual machines go in machine.

Each process is placed in a hierarchy​ so you end up with system-httpd which makes it easy to assign or limit resources based on the slice.

Redhat covers it extensively in their performance​ and tuning class.


In the case of system services, it is probably acceptable (perhaps even ideal) to leave the image management up to deb, RPM, or ostree. systemd shouldn't do everything :) .


Systemd technically does include rudimentary image management through systemd-importd and machinectl.


Linux's control groups are a resource management system, an attempt to do resource management better than the POSIX rlimit mechanism. They are not job objects, and do not do all of the things that jobs do (on operating systems that have them). However, a job object is the kind of mechanism that people want for this sort of thing.

* http://jdebp.eu./FGA/linux-control-groups-are-not-jobs.html


Because you can't make billions of $ doing that?


Unfortunately, systemd/nspawn does not benefit from the hype Docker garners, despite being infinitely better. This industry is becoming more and more hype and cargo-cult driven, instead of making sane technological choices


Being easier to try, having clear documentation/marketing, and having a bigger community are also forms of "better". If something is so infinitely technically better then winning is "just" a matter of creating on-ramps to ease its adoption.


systemd's documentation is second to none: https://www.freedesktop.org/wiki/Software/systemd/


Only if one sets the bar quite low, and has very lax standards for doco. Unfortunately, people often do set the bar low in the Linux world. But to those from other worlds the descriptions that come to mind are "acceptable" and "mediocre". As people have pointed out passim over the years, the expected as the norm quality of doco for the worlds of the BSDs and the commercial Unices is noticeably a higher standard than in the Linux world. By those standards, "second to none" is most definitely an exaggeration.

Sadly, "treated as an afterthought" is all too often still applicable, as well; this also being a disease of Linux doco that it hasn't wholly shaken off. The culture of updating the doco in lockstep when the software changes hasn't really taken a firm root, alas.

Just one example of such doco problems is a systemd issue where the doco does not tell the the issue raiser that the entire basis for the issue is wrong. Users have to resort to finding commentary hidden in the source code. Raised as a documentation issue, it requests a documentation change to warn users of something that is not in fact the case at all. Ironically, the true doco issue is actually that it is deficient, and the correct doco change would be to move the commentary into the manual where users can easily see it.

* https://github.com/systemd/systemd/issues/5735

* https://www.freedesktop.org/software/systemd/man/systemd-tim...

* https://github.com/systemd/systemd/blob/5f36e3d30375cf04292b...

* http://jdebp.eu./FGA/systemd-documentation-errata.html


Can you give an example of a better-documented open-source project that is less than 5 years old and has the same level of complexity of the systemd family?


> despite being infinitely better

Genuinely curious, how is it "infinitely better"? I'm considering potentially switching away from Docker on my production boxes to something else, but I've mainly only heard about rkt when researching.


I feel exactly the same way about systemd as you seem to feel about docker.


>despite being infinitely better

How and why?

You're being condescending. I like Docker and if anything you make me not want to try systemd-nspawn with this attitude.


Great job proving the second part of what he said


Put up a sentence or two of informative discussion, or links or any kind, this is not any kind of way to advance the state of a thread.

"Go on, try it" doesn't help me in any way. I believe that Docker has more attention, you might call it hype. I'd say "eyes" instead. We have here a way to shown to run everything in Docker, parent link of thread (RancherOS.) That's great, I already went ahead and tried it. I'm still waiting to be convinced that I should try nspawn instead.

Without your help, I won't even know what OS Distro I can download to try it, let alone why it's better.


Docker also is less invasive. You do not need to build the distro around docker to use docker...


This hasn't always been so, as historically a kernel patched to support aufs was a requirement. It's easier to install docker in 2017 as much of the plumbing shared by the container ecosystem (including nspawn) has become ubiquitous.


You do not know an operating system that has systemd? Given https://news.ycombinator.com/item?id=13715574 that seems very improbable.


I was fishing for an example of any OS or orchestrator that uses systemd-nspawn.

[1] shows me how to run systemd-nspawn by itself.

[2] shows me that it's basically just like chroot when it comes to a user experience.

When do I get to the part that's better than the entire ecosystem of schedulers and orchestration tools that has sprung up built on and around Docker? Are all of those companies wrong? (Are you trying to tell me it's all just hype and I should put everything into the hands of one competent sysadmin that manages nspawn and systemd?) I could be convinced of that, but I just don't see anyone doing that. I guess that's actually what was meant by cargo cult.

This all really just makes me want to go out and spend some more time looking at Rkt instead. We're all not even remotely convinced that this is better. Where is the mantl.io built on systemd-nspawn?

[1]: https://wiki.archlinux.org/index.php/Systemd-nspawn

[2]: https://rich0gentoo.wordpress.com/2014/07/14/quick-systemd-n...

Edit: I am still going to upvote you because you went to the trouble of going through my post history.


Well, between a good technology with a lot of hype and a better technology with no hype except for a few condescending people, how am I supposed to chose the latter? Tell me.

I just read the ArchWiki page on systemd-nspawn[1] and I fail to see how it is any better by the way. It just looks way harder to use (Docker images vs packages, scripts and per distro instructions ; docker create, docker start, docker ps, docker logs vs pacstrap, systemd-nspawn, machinectl, journalctl) and honestly not very different technically. systemd-nspawn just looks like a less user-friendly Docker to me.

[1]: https://wiki.archlinux.org/index.php/Systemd-nspawn


Would you prefer if all those command were prefaced with `systemd`? Because that's all there is to it do docker in your example then.

You're seeing condescension where there is none. I'm just pointing out facts. It's okay, Docker runs on hype, and apparently so do you. But then, I can't expect Red Hat to invest into advertising for a core system component, because developers ought to be aware of it.

nspawn also offers faster startup time, better integration with cgroups and chroot jails, etc.


>Would you prefer if all those command were prefaced with `systemd`?

Well, I'm fine with journalctl and machinectl as they're part of systemd. I'm not really fine with having to install respectively arch-install-script, deboostrap+debian-archive-keyring, debootstrap+ubuntu-archive-keyring to run an Arch, Debian or Ubuntu container. What if I want to run something like CentOS or Alpine?

>But then, I can't expect Red Hat to invest into advertising for a core system component, because developers ought to be aware of it.

That's why Docker has the market. systemd is huge and scary, developers see it as a sysadmin only component. You cannot expect developers to know systemd without explaining it to them in a way they can understand.

>nspawn also offers faster startup time

Is Docker slow? Starting a container is usually instantaneous. Maybe the engine? For me it's managed with systemd and its weird socket binding, it's pretty fast too.. Fast is good but I can't remember thinking "wow docker is slow"

>better integration with cgroups and chroot jails

How? Why do I need this better integration?

- - -

I'm convinced there are not a lot of things Docker cannot do in comparison to systemd-nspawn. On the contrary, with systemd-nspawn:

- how do I spawn a container remotely?

- how do I share my "images"? is there an easy way to bundle the app I want to isolate? something at least kinda portable between Linuxes, so no .deb/.rpm

- can I include a file to my source code and tell my users something like "run docker build, then docker run and you're good to go"?

- my sysadmins just gave me the rights to run the docker command (we configured the user namespace so that I'm not indirectly root on the host), would it be that easy for them with nspawn?

- say I want a specific dependency, redis for example. Can I do something as simple as `docker run -p6379:6379 -v/data/redis:/data --name redis redis` or would I have to manually install the redis in the nspawn?


I've been using RancherOS for a few weeks now and I'm quite delighted with it. A confusion that occurred at first for me was the distinction between Rancher (the project/company), Rancher (the application) which can be used for installation and distribution, and RancherOS, which is the concept of replacing the Init process with Docker. Docker is used in separate instances - one to manage the containers that make the OS, and one for the remaining containers that make your apps.

I think the advent of Docker Swarm probably put a crimp on development and use of Rancher (the app). To me the way forward is Docker's own clustering tools, and the ease of standing up a cluster of Atom processors at www.packet.net where they install (as an option) RancherOS is very attractive.


I'd really love to see some of this stuff transition to the desktop too.

Like, for example, containerize Skype, so that it can't read my home. Or contain Firefox to just read `~/.mozilla` and `~/downloads`.

For binary blobs I don't trust that much, I'd really value this.

For FLOSS stuff, it still provides protection from bugs.


Problem is it's sometimes more annoying than helpful, e.g. having to copy files into the ~/.mozilla directory in order to be able to upload them to Dropbox via Firefox, or attach to an email.


Look at http://flatpak.org for precisely that.


Nope, it does a lot more: it re-packages software, and basically shoves a second package manager down my throat; one that actually bundles dependencies within in package, carrying along all the issues that that flow brings with it.

I want to isolate data, no libraries. Libraries are there to be shared.


Qubes (https://www.qubes-os.org/) is a good fit for this.


Yeah, creating a new OS/distribution will never fix the problem. You can't tell people "Oh, just wipe everything clean and installing this other OS".

It needs to build on top of what we have, otherwise adoption will never take place.


Apparmor does that for you. Or selinux, depending on your distro.


Firejail.


I don't understand why running software on bare metal is viewed as a problem to be solved.

how many layers of abstraction are necessary, and why?

Obviously virtualizing serves a valuable purpose.

Making development more accessible is great. Simplistic dev services like this mean reliance on others infrastructure, and being bound to cloud.

Doesn't seem forward thinking.

Can you imagine if Google had decided to run their search app on Microsoft servers?


The basic thing is that we have ended up with a world of rock star code monkeys. And those rock stars can't be held back by some admin or exec saying no to using some hot new language or framework...


CoreOS works the same way. All containers. You can run `toolbox` to get into a systemd-namespace'd Fedora container (any other container can be specified; it's just Fedora by default), from which you're supposed to do all your troubleshooting/analysis (caveat: systemd-namespace does not seem to support `auditd` well).

I still strongly dislike "containers". It's not worth the complexity or instability. Two thumbs way down!


CoreOS works the same way. All containers.

Does it though? I use CoreOS without containers (for the nice auto-updates/reboots), and it works really well with just systemd services. I'm aware the branding sells it this way (esp. the marketing rebrand as Container Linux or whatever), but does it run any containers as part of the base system? I've found CoreOS with containers not very reliable, and CoreOS without containers extremely reliable.

Since I use Go on servers which has pretty much zero dependencies, what I'd really like to see is the operating system reduced to a reliable set of device drivers (apologies to Andreessen), cloud config, auto-update and a process supervisor. That's it.

Even CoreOS comes with far too much junk - lots of binary utils that I don't need, and I'd prefer a much simpler supervisor than systemd. Nothing else required, not even containers - I can isolate on the server level instead when I need multiple instances, virtual servers are cheap.

CoreOS is the closest I've seen to this goal, the containers stuff I just ignored after a few tests with docker because unless you are running hundreds of processes, the tradeoff is not worth it IMO. Docker (the company) is not trustworthy enough to own this ecosystem, and Docker (the technology) is simply not reliable enough.

The OS for servers (and maybe even desktops) should be an essential but reliable component at the bottom of the chain, instead of constantly bundling more stuff and trying to move up the stack. Unfortunately there's no money in that.



Yeah, I agree and I think that FreeBSD jails in particular are much better (to be fair, I am not very well informed on Solaris Zones, so maybe they're the best). They are certainly much less ostentatious and do not try to redo everything for their own little subworld like Kubernetes does.

I sat down one day to try to write down what would make Linux containers/orchestration usable and good, and realized after about 20 minutes that I was describing FreeBSD jails almost to a T. The sample configuration format I theorized is very close to the real one.

However, I think that there's good reason for actual deployments of containerized systems to remain niche, as it did until the VCs started dumping hundreds of millions into the current Docker hype-cycle, and the big non-Amazons jumped on board as a mechanism to try to get an advantage over AWS.

What people really want are true VMs nearly as lightweight and efficient as containerized systems. In fact, I think many people wrongly believe that's what containerized systems are.


What people really want are true VMs nearly as lightweight and efficient as containerized systems

Like what QubesOS is trying to do?


Could you elaborate on the instability concerns? What kind of workload are you running on containers?


Sure. Here's one example that I've dealt with in the last week.

We have a server that receives the logs from our kubernetes cluster via fluentd and parses/transforms them before shipping them out to a hosted log search backend thingy. This host has 5 Docker containers running fluentd receivers.

This works OK most of the time, but in some cases, particularly cases when the log volume is high and/or when a bug causes excessive writes to stdout/stderr (the container does have the appropriate log driver size setting configured at the Docker level), the container will cease to function. It cannot be accessed or controlled. docker-swarm will try but it cannot manipulate it. You can force kill the container in Docker, but then you can't bring the service/container back up because something doesn't get cleaned up right on Docker's insides. You have to restart the Docker daemon and then restart all of the containers with docker-swarm to get back to a good state. Due to https://github.com/moby/moby/issues/8795 , you also must manually run `conntrack -F` after restarting the Docker daemon (something that took some substantial debug/troubleshooting time to figure out).

We've had this happen on that server 3 times over the last month. That's ONE example. There are many more!

Containers are a VC-fueled fad. There are huge labor/complexity costs associated and relatively small gains. You're entering a sub-world with a bunch of layers to reimplement things for a containerized world, whereas the standard solutions have existed and worked well for many years, and the only reason not to use them is that the container platform doesn't accommodate them.

And what's the benefit? You get to run every application as a 120MB Docker image? You get to pay for space in a Docker Registry? Ostensibly you can fit a lot more applications onto a single machine (and correspondingly cut the ridiculous cloud costs that many companies pay because it's too hard to hire a couple of hardware jockeys or rent a server from a local colo), but you can also do this just fine without Docker.

Google is pushing containers hard because it's part of their strategy to challenge Amazon Cloud, not because it benefits the consumer.


I think you are suffering a bit from the over-engineering of Kubernetes and Docker and throwing the baby out with the bath water. Containers in general are great for simplifying deployment, development and testing. We use docker currently and it works great, but we are using just docker and using it only as a way to simplify the above. We are deploying 1 application to 1 system (EC2 via ASG).

There is also nothing keeping you using Docker for containers. LXC also works great and it has no runtime, so you have none of the stability issues you can get with Docker. Though I must say Docker has improved a lot and I think it will stabilize and _it_ won't be an issue (not as sure about Kubernetes).


Sure, I agree that both Docker and k8s (at some level, k8s probably had to have a lot of that complexity to interface with Docker) are overengineered, and that there are better containerization processes/runtimes.

But I still don't think containers are what most people want. People need/want ultra-lightweight VMs with atomized state. NixOS looks promising but I haven't used it yet. It seems to give you a way to deterministically reason about your system without just shipping around blobs of the whole userland. You can also approximate this on other systems with a good scripting toolkit like Ansible.


All I want is a way to encapsulate an application and its dependencies as a single artifact that I can run, test and deploy. Right now containers are the best way to achieve this but I'll probably be happy with any solution to this problem.

NixOS does look interesting and I've considered playing with it for personal projects, but IMO it is still to fringe for use at work where you need both technical usefulness and a general consensus that it is appropriate (i.e. mindshare).


It seems deploying thousands of ultra-lightweight VMs with atomized state would still require an orchestration layer. I don't follow how that would remove complexity and/or improve stability.


It removes complexity because you can already use a lot of stuff that exists. Kubernetes has established itself as a black box.

Kubernetes has the concept of an "ingress" controller because it has established itself as the sole router for all traffic in the cluster. We already have systems to route traffic and determine "ingress" behind a single point (NAT). Kubernetes also manages all addressing internally, but we have technologies for that (DHCP et al). Kubernetes requires additional configuration to specify data persistence behavior, but we have many interfaces and technologies for that.

VMs would be able to plug into the existing infrastructure instead of demanding that everything be done the Kubernetes way. It reduces complexity because it allows you to reuse your existing infrastructure, and doesn't lock you in to a superstructure of redundant configuration.

kube is very ostentatious software in this way, and it makes sense that they'd throw compatibility and pluggability to the wind, because the strategic value is not in giving an orchestration platform to AWS users, but rather to encourage people to move to a platform where Kubernetes is a "native"-style experience.

As for orchestration, people were orchestrating highly-available components before Kubernetes and its ilk. Tools like Ansible were pretty successful at doing this. I have personally yet to find executing a `kubectl` command less painful than executing an Ansible playbook over an inventory file -- the only benefit would be faster response time for the individual commands, though you'd still need a scripting layer like Ansible if you wanted to chain them to be effective.


Right, and this orchestration layer is supposed to be designed for resilience and stability from the start, because it's an architectural problem that cannot be fixed later without a complete redesign of everything. There is one such proven architecture - supervision trees and it addresses both problems: complexity and stability. VMs are not even necessary for this, nor are cloud platforms.


I think at this point it's accepted that supervision trees are an app or runtime concern. Operating systems and distributed platforms have to support a wider range of architectural patterns.

Disclosure: I work on a cloud platform, Cloud Foundry, on behalf of Pivotal.


Supervision trees have enough flexibility to support any architecture pattern imaginable, you got it kind of backwards. They are like trees of arbitrary architecture patterns. The idea is to limit the scope of any possible problem, so that it only affects a tiny part of the system, but at the same time reducing complexity by only having to deal with little responsibility in each supervisor. Kind of a tiny orchestration system for each service, instead of a centralized one.


> You get to pay for space in a Docker Registry?

Nonsense. You can run your private Docker registry or if you want to support stuff like authentication and access control use Sonatype Nexus. Both open source.

> but you can also do this just fine without Docker

Not as easily. You'd need to use VMs with all their associated costs (especially if you use VMware) to provide proper isolation, and the hosting department usually will have day-long phone calls with the devs to get the software to reproducibly run, and god forbid there's an upgrade in a OS library. No problem there with Docker, as the environment to the software is complete and consistent (and if done right, immutable).


We run a "private Docker registry" ... backed by Elastic Container Registry. I'm sure this is the case with the vast majority. You're certainly right that it's possible, but it's about as likely as using cloud platforms in a rational way to start with (elastic load scaling only).


So your problem is with Docker, not containers as a concept.


Yeah, kind of. The complexity of the current iteration is not wholly the fault of Docker, though I'm sure some of the utilities had to increase complexity to work well with Docker (k8s just barely got support for other container platforms). This was a story about an annoyance/bug/issue with Docker, but I have annoyances with other things too.

Some people did not know how to do any server management before kubernetes became a big deal, so they think kubernetes is the only way to do it. For the rest of us, I don't think there's a lot of value brought by this ecosystem.


I have been running rkt on coreos and has been pretty stable..


Probably it was not the vision of these types of projects, but this reminds me a lot of Qubes OS[1]. I actually have occasionally used docker (lxc) to run some applications that I was not trusting, and I was controlling them using cgroups. Right now my chrome browser is running like that.

[1] https://www.qubes-os.org/


I have to say this enrages me. The system services are still privileged containers and we are now basically emulating a micro-kernel (very badly I might add with a monolithic kernel). If you want to use a micro-kernel then use a fucking micro-kernel. Hacking a micro-kernel with docker is not the right approach, especially given the stability track record of docker itself. It's a hack and aesthetically unpleasant on all sorts of levels. Not the least of which is that docker itself is one giant hack.


You have serious problems if this enrages you. Let other people hack in peace. Go do your own "right" thing and leave the rest of us alone.


Didn't Docker release this themselves (different project) just a week or so ago? I forget the name, and can't seem to find it on their site.


There's definitely overlap in functionality between RancherOS and LinuxKit, but there are also a few pretty big differences.

- LinuxKit seems designed to be a piece that can be used to build a Linux distro but isn't a full distro out of the box like RancherOS

- As far as I know LinuxKit is still based on Alpine whereas RancherOS is custom and doesn't have much of a host filesystem

- LinuxKit is based on containerd and RancherOS is still based on Docker (though this is likely to change soon)

We're definitely interested in collaborating with LinuxKit since we do have similar goals. It's probably a good idea for us to write a more detailed blog post comparing the two since we've been getting this question pretty often lately.


LinuxKit is not based on Alpine - earlier versions before open sourcing were. We build our system containers from Alpine though, so there is still a connection. (I work on LinuxKit).


Thanks for the correction! That's good to know.

We considered using Alpine as the base image for our system containers for a time. They're still built using Buildroot currently, though we're playing around with another project that we might use instead.


Rancher has been around for a while now, at least a few years.

http://rancher.com/getting-started-with-rancheros/



this project is pretty old though. i remember playing with an earlier version sometime around august 2015


I guess RancherOS is not something "new". First time I use it was in 2015...

Anyway, it seems it's design has inspired some people recently.


for the ignorant, why would I want to wrap docker inside docker?


This is not Docker-in-Docker. This is an OS that acts like a "container hypervisor". It provides the bare minimum for hardware interfacing and hosting containers, just as conventional hypervisor provide the bare minimum for hardware interfacing and hosting virtual machines. The benefit is that more system resources are available for your clients (containers, in this case).

I haven't used RancherOS but CoreOS works mostly-fine. However, I would avoid using these things altogether because containerization sucks.


I'd suggest that containerisation as a principle is fine, but its not what most people want.

What people want is a mainframe where a lump of code is guaranteed to run the same everytime, regardless of the machine state, and if something goes wrong with the underlying layer is self heals(or automatically punts to a new machine, state intact).

What we have currently is a guarantee that the container will run, and once running will be terminated at an unknown time.

Mix in distributed systems(hard) atomic state handling (also hard) and scheduling(can be hard) its not all that fun to be productive for anything other than a basic website.


they actually mention it in their readme.

> it seemed logical and also it would really be bad if somebody did docker rm -f $(docker ps -qa) and deleted the entire OS

or are you asking why anyone would want a 'docker-os', which has everything but the docker daemon as a container?


No no, I get the coreos idea, or providing libc, kernel docker service and nothing else.

I saw that line, but since if you want to run docker at scale you'd have each execution node under the tight control of a scheduler, it seemed like a small edge case.

As I said, for the ignorant such as myself why would I chose this over coreos?


no, it's the OS services that are dockerized. so your containers won't be docker inside docked.


I still don't see any clear reason for doing this. I mean, "because we can" is interesting in a sort of academic way I guess.


Containers compartmentalize runtimes with their dependencies, eliminate library and version conflicts. That's just one benefit.


To separate all the OS-service containers, from the user-service containers.

I guess this is mostly so you don't accidentally delete OS-containers (like ntpd) when trying to delete all your containers.


for science


That's ungodly.



congrats, you predicted years after its inception.


Thanks!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: