Hacker News new | past | comments | ask | show | jobs | submit login
How I shrunk a Docker image by 98.8% – featuring fanotify (jtlebi.fr)
105 points by jtlebigot on April 25, 2015 | hide | past | favorite | 47 comments



Reminds me of the method used for the demoscene FPS game .kkrieger which was stored in less than 100kb. They basically played through the game several times and trimmed out any code paths that weren't used in order to get it small enough, using a rudimentary c++ "pretty printer" they wrote that tracked executions of code paths. They had the advantage of being able to alter their code to only use constructs supported by their custom tool. Utlimately, this led to some bugs/features being stripped. An example being they hadn't pressed up in the menu during their runthrough, so you could only navigate downward.

https://fgiesen.wordpress.com/2012/04/08/metaprogramming-for...


I don't think there is any way to prove that this found all the required files. The more paths through the code, each with its own potential file accesses that can't be predicted with out run time information, the more likely one will be missed in this optimization stage.


Variation on the halting problem? Given infinite running time and arbitrary input, can you prove that a program will never access file X.


Agreed that this seems equivalent to me. Assume instead of the instruction "access file X", the instruction is HALT.


There's no general algorithm, but you could probably prove it, if you tried really hard for your given example. :D


I'm also interested if it's possible to say so with certainty.


You have runtime information though. It's true that this method will not find all things (dlopened files in strange codepaths), but just like we have tools that can verify 100% code coverage in tests, you could fuzz inputs until you find that you've hit every single branch of the executable's instructions and record all dependencies as you go.

You could argue that that can still be fooled by, e.g., making the software dlopen the argument given to it at which point that codepath would have different dependencies each time it was hit, but that argument quickly devolves. That same argument says that when I run `ls /tmp/file` that makes `/tmp/file` a dependency of ls and thus I must include every file in the image else it will have different behavior.

I think intelligent fuzzing + high branch coverage can prove that you have found all required files.


I don't think you can ever prove that you've found the required files for an arbitrary binary. (I especially have a hard time believing that such a proof would involve fuzzing, which is random.) However, it seems reasonable that you would be able to achieve a high enough level of confidence that this technique would be viable.


You cannot. It reduces to the halting problem, relatively trivially:

    <arbitrary code that cannot open foo.txt>
    do something with foo.txt
This will use foo.txt iff said code halts.

You can, however, prove that you've found a superset of the required files for an arbitrary binary. Or prove that you've found the required files for some, but not all, arbitrary binaries.


You're wrong.

You cannot say you haven't found all the dependencies, but you can say you have found all the dependencies (given the constraints I placed above).

The halting problem only says that you cannot prove that a given program will halt.

However, you can prove a specific program halts if, in fact, that program halts.

The original question was not "can prove that I can find the dependencies for an arbitrary binary", but "can you prove that all the dependencies were found for a single specific binary".

For some program that has an infinite loop you can say "I don't know if I've found everything", but if you have shown that you have hit every code branch, as I said above, then clearly this program both halts and has had all dependencies found, excepting different behavior for user input within those already explored branches.


I was thinking the same thing during the article, and the author says as much in the 'Last Thought', and doesn't recommend using this for production purposes. The footnotes say that this was more of an exercise in the syscall than in docker.


Sandstorm.io has baked something like this into its basic packaging tool for about a year now, except based on FUSE rather than fanotify. Really helps cut down package sizes - many are 10-20MB despite containing all userspace dependencies of the app. https://blog.sandstorm.io/news/2014-05-12-easy-port.html


The 80/20 solution here is to just find the few files that take up the largest amount of space and are clearly pointless to your app, and remove them. The USB hwdb, for example. Also, trimming down the timezone and locale DBs to just the ones your app runs on (hopefully UCT and UTF-8) should help—unless your app has to deal with data containing user-defined datetimes/charsets.

The other interesting thing to try, if your app's problem isn't so much library-dependencies but instead Unix shell dependencies, is to use a Busybox base image. Apps whose runtimes are already sandboxed VMs, especially, usually work great under Busybox: the JVM, Erlang's BEAM VM, etc.


A better idea for chroots or VM images is supermin, where you copy the files from the host filesystem. (http://libguestfs.org/supermin.1.html)


Isn't the point of running an application in a container, or any chrooted environment, to only isolate the application from the rest of the operating system?

Then why would you start out with a complete extra operating system in there? Why not just put the application and its dependencies in there?

To strip non-dependencies from an complete operating system sounds like a very failure prone way to accomplish almost the same thing. You really need to execute all code paths, which is difficult to guarantee (did you really run your application in all locales for example?).


> Then why would you start out with a complete extra operating system in there? Why not just put the application and its dependencies in there?

Packaging is hard. Let's go shopping!


A layered file system already does this.


Any Unix-ish application (i.e. one that shells out to do something at some point) will have a package dependency tree that ends up transitively closing over the "base"/"essential" package-set of the OS. "Dependency" has three meanings, to a packaging system, even though at run-time only one of them is relevant. There are:

1. "run-time dependencies" — package B needs package A installed because a binary from B actually makes use of a file from A when it runs.

2. "install-time dependencies" — package B needs package A installed because B is effectively a "plugin" for A. B is theoretically useless to the OS, except when used in the context of a sane A-like environment. This usually also implies that B, when installing itself, will run a script provided by A, usually to register itself in a database that A owns. This doesn't at all imply, though, that you couldn't just directly call the binary contained in the A package for a useful effect.

3. "asynchronous/maintenance-time dependencies" — package B needs package A because B does something to increase the system's entropy, and is written to assume that the system will compensate for this by having A running.

Docker images really only need type-1 dependencies, but as you dig toward the core of a package dependency graph, you start to see a lot more of type-2 and type-3 dependencies. If you execute a "debootstrap --variant=minbase", pretty much everything in there is there for type-2 or type-3 reasons.

A Docker container doesn't need to be a maintainable or autonomous OS distribution. It doesn't need grub, it doesn't need mkfs or fsck, it doesn't need mkinitramfs or the HAL hwdb; it doesn't need localegen, or debconf, or even apt itself. It needs to be a baked, static collection of files related to the application's run-time needs. But there's no demand you can make of apt or yum or even debootstrap that will spit out such a thing.

There was a project somewhat in this vein a long time ago, for embedded systems, called "Emdebian Baked"[1]. It was a misstep, I think, because it focused on creating variants of packages and a secondary dependency graph; rather than being a transformation one could apply to existing packages and the existing graph.

I've worked on and off on creating a transformation tool—effectively, a combination of a dependency graph "patch" that contains empty virtual-packages for many essential-package dependencies, a file filter/blacklist, and a final package whose installation burns away the whole package-management infrastructure from the chroot this is executing in. I haven't been happy with any of the results yet, though. Would anyone be interested in collaborating on such a thing as an open-source project?

[1] http://www.emdebian.org/baked/


Nix helps with this. There is an optimisation pass that can create hardlinks between similar packages/files? There was recent talk on package deduplication. Also every package directly specifies every dependency. However it currently won't help to remove unused files in each package, that violates the immutable hashes. The solution is to create more granular packages or to leave the immutability zone and into the mutable world if you have embedded scenarios.


I beg to differ, but we can probably compare data points until the cows come home.

Anyhow, even a large-ish application such as Oracle or a control system doesn't actually use ping or dd or troff, or most parts of what a modern unix-OS is comprised of. Most things suid are usually unnecessary, which if nothing else does decrease the attack surface.

Most web apps probably needs nothing unix-ish at all. A chrooted PHP app mounted noexec makes me sleep better than one running in a complete operating system. And most server side Java apps re-invents everything unix anyway, from mail processing to cron jobs, so they generally don't shell out as often as you'd think.

So I would argue it's actually pretty common that your applications have a limited set of dependencies. Especially compared to the hundreds of packages in any minimal modern unix install.


I agree that it's common, but it's not common enough to make this into a helpful property if you're trying to define a 100% solution. The reason Docker exists at all, apart from just nsexec(1)ing static binaries, is that a lot of things do need an environment—not of other Unix binaries per se, but of library assets like locales, charmaps, keymaps, geoip mappings, etc.—and then these asset packages think they're there to provide assets for maintenance-time functionality of a computer rather than to provide run-time functionality to an app in a container, so they pull in utilities related to themselves, which pulls in the base system.

If you can manage to get a working install of Postgres without pulling in half of Debian, I would be surprised.

But yes, on the other hand, it's perfectly possible to package some things, like the JVM, in a sort of "spread-out in a directory but equivalent to static-linked" fashion. The sort of things you see telling you up "unzip them into /opt/thispkg" because they don't really follow any Unix idioms at all, tend to be surprisingly container-friendly. They come from a world where binaries are expected to be portable across systems with different versions of OS libraries available, rather than a world where each app gets to ask the OS to install whatever OS library versions it requires.


Postgres is actually a good counter example to your point. It is a self-contained application that doesn't shell out. It doesn't need to access any of the things you mention, including charmaps, keymaps and geoip mappings.

I regularly run it chrooted without problems. You do need to understand you use case however. Things like external database utilities and backup scripts differ in requirements. Some of them are run outside the chroot, some don't.

It's absolutely not complicated, and if you have the faintest idea what you're doing it's much easier to get right than the fanotify dance described above.

And a complete operating system in a chroot would sit mostly unused, and only increase the attack surface for no reason at all. So, why?


> If you can manage to get a working install of Postgres without pulling in half of Debian, I would be surprised.

You mean like in this blog-post: https://blog.docker.com/2013/06/create-light-weight-docker-c...


It's not only about the isolation, but also reproducibility across time and portability across machines.


The exact approach described here is very extreme. It's a top-down method with a tool. I find the tool may be of some interest, but I think bottom-up method would be more practical . I have done some experiments with Yocto/OpenEmbedded and about to put that out one day, once I have time to document it ...


Why not just use a micro kernel container like OSV from cloud outs? Same result with less effort


The truth is hidden in a comment: The goal was to learn fanotify syscall using a real world use-case. This said, when Dockerizing an application from scratch, using an optimized base image may be a suitable option. But that's not always the case. For instance, I often start a project from the Python base image which contains loads of generic libraries that I will not use in a given project but will be important for others. This is when a profiling based approach is interesting. You get the ease of a known environment and the efficiency of an optimized image.


OSv is a unikernel and isn't a container... but anyway, good point!


ptrace probably would have been better solution, at least it would have avoided the problems with links



Why not ldd?


It doesn't find dependencies like /etc/ssl/certs/ca-certificates.crt or /usr/share/zoneinfo.


ah, thanks.


I've been playing with a project to do this. The first major obvious problem is anything that uses dlopen won't necessarily get all that it needs.


Yes. Grepping the code or fighting runtime errors are two complementary approaches I can think of... Not sure if there are other methods.


another way of achieving this https://github.com/PerArneng/fortune


While a standard base might be bigger, it does make it easier to cache when you use it in most images. A lot of smaller specific images will mostly be unique.


Does CAP_SYS_ADMIN still leak out of containers? I know at some point running with that meant you were root on the host...


Thats only to find the files, not afterwards.


Could this method be used as a reversed way of creating Unikernels?


If only there were some way to describe in a few lines of text what an image should contain?


Do Docker images really have to contain an entire bloated Linux distro? Even for Xen, which, as a hypervisor, provides fewer services than Docker, it's possible to write applications which run directly under Xen.


They don't have to, one can run static binary without any problems. It's just that most people keep throwing in a whole distro...


You can't make a truly static binary with glibc, so almost no one has a toolchain that is able to do it.


What do you mean by "truly static"?


One that doesn't try to load dynamic objects on runtime like glibc does, if you use certain functions.


im not sure why one wouldnt want a whole distro tho - it makes debugging, testing, etc far easier.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: