Dockerception: Docker building dockers – keeping them small

TheDong · on April 6, 2015

There's no need to do things exactly as is being done there.

You can bind-mount in a directory for your build dockerfile and output to that directory and then use that as the input to the next dockerfile without having to rely on the build dockerfile creating a valid tarball containing a dockerfile etc.

The difference isn't huge, but I think it's good to uncouple the build/run-image creation logic a little more.

Doing so also lets you more easily work on the dockerfiles independently and debug certain issues (e.g. you can compile the binary outside of docker, put it in the run one, make sure things are okay).

As for his bonus points: You can avoid using busybox by compiling your go program with: "CGO_ENABLED=0 go build -a -installsuffix cgo my-program.go" (assuming go 1.4).

The output of this is a static binary which you can add to an empty docker container. The way to do that would be to have a dockerfile of `FROM empty; COPY ./my-program /my-program; ENTRYPOINT ["/my-program"]` with newlines replacing each semicolon.

It will end up somewhat smaller than the busybox image, but the savings aren't nearly as dramatic as you might expect. If you do tls/ssl or rely on any exec calls, you'll also probably be better off just using busybox for the certificates and basic binaries being present.

Gigablah · on April 6, 2015

I recently created a presentation about lean containers:

http://www.slideshare.net/KuanYenHeng/docker-on-a-diet

If you need a tiny base image with a decent package index, try Alpine Linux.

mdaniel · on April 6, 2015

I, too, read a blog talking about the joys of Alpine Linux but just be forewarned that you'll need to test the hell out of you app and be prepared to chase some weird behavior. It's libc and/or python are not the ones you know and love.

I am certain there are great use-cases for this, but it seems the discussions imply that "change your FROM, save a ton of disk space" but there is more to that story.

knite · on April 6, 2015

I started using Alpine a couple of weeks ago. Really curious what sorts of problems you've encountered. The only big one for me so far is the change to DNS resolution.

mdaniel · on April 6, 2015

I was using it for hosting a Scrapy spider, which is Python 2.7 using twisted and a whole host of cryptography native libs (that I compiled against Alpine, I'm just listing them to provide the situation for the issue).

The requests would leave the machine, but it seemed none of the handlers were firing with the result. I thought, ok, perhaps I've made a configuration mistake or the target website is acting funny.

After 3 hours of inserting every kind of debugging print and log statement I could think of, I decided to sanity check the setup using debian:wheezy. As my post above implied, it worked immediately and flawlessly.

Thus: it may work great for you, but ensure their libraries and binaries are a good fit for your situation.

I did see quite a few other comments discussing Python, Ruby and other not-easily-statically-compilable stacks may not be a good fit for the system the article was proposing, so this may not strictly be Alpine's fault. It's just when I see a discussion of the great savings from using Alpine, I feel it's important to remind folks there is no free lunch.

Gigablah · on April 7, 2015

That's true. In my case I had plenty of tests for my application (that I ran inside the container environment) so I felt safe to proceed.

Ruphin · on April 6, 2015

You encounter the same kind problem whenever you want to use Docker with anything that needs a compilation step. Even with front-end development you run into the same sort of issues with packaging for production (css preprocessors, coffeescript, 6to5 and such).

Some standardisation in this area would be nice, but the problem can be solved rather trivially by using a separate container with all the tools for the build process, and placing the build product into a lean production image. This process is easy to automate/script, and I don't feel that there is much gain from having some encompassing spec or standard that has to consider all the edge cases and is more complex than necessary as a result.

bmurphy1976 · on April 6, 2015

I just don't see the problem here.

Docker is a compiler (docker build), Dockerfiles are source code, and Docker images are binary artifacts. We already have tools that manage this for us: make, cmake, nmake, rake, scons, etc.

People need to get over their fear of build tools and use them. This is exactly the kind of problem they solve. Make IS the standard.

kevan · on April 6, 2015

We just switched to something similar for DockerUI [0]. centurylink/golang-builder [1] (500+MB) compiles a static binary and we drop that into a really light runtime (5MB) from scratch.

[0] https://github.com/crosbymichael/dockerui [1] https://github.com/CenturyLinkLabs/golang-builder

muaddirac · on April 6, 2015

I really like the idea here, but haven't thought of a good way to do it with languages like Python/Ruby/Javascript.

Is there any good way to bundle up, say, a Flask app on Gunicorn into a single, minimal-dependency executable? Or a Rails app? I haven't found any satisfactory solutions. There are projects that will compile python down, but that doesn't seem to cover the WSGI server part (gunicorn, etc).

zokier · on April 6, 2015

> Is there any good way to bundle up, say, a Flask app on Gunicorn into a single, minimal-dependency executable?

You don't need to build a single executable for this approach to work. You could use something like buildroot to build your runtime docker image.

docker-nano was doing something to that effect, see for example nano/node.js: https://github.com/Docker-nano/Node.js

vidarh · on April 6, 2015

To track down dependencies, "strace" is your friend. And no, that's not an ideal solution, but it works reasonably well. Couple that with "ldd" on the interpreter.

For Ruby, additionally "bundle install --standalone --path [..]" gets your far.

sibsibsib · on April 6, 2015

it's possible to do this with PEX. I've done this experimentally by subclassing gunicorn.app.base.Application. Haven't tried it out in prod yet though.

djhworld · on April 6, 2015

This is great for AOT languages that don't require runtime dependencies.

I'm going to use this in my projects from now on, would be great to do away with hundreds megabytes of layers.

I guess the downside is you lose some of the ability to find out where your final image came from, but that can easily be fixed with a well controlled build pipeline.

vidarh · on April 6, 2015

Look around at various Docker images, and you find plenty of Dockerfiles that includes things like "wget [some random url containing a tar archive with prebuilt binaries]". Getting people used to a proper workflow for this might very well improve the situation.

falcolas · on April 6, 2015

If you really want to keep your container images small, omit the base image entirely.

Make a folder with your app, identify and hardlink its dependencies into that folder, and stream the tar of that folder into Docker. No bash, no kernel, no /usr/bin, no binaries, libraries or files of any kind that you don't explicitly need.

It's a method that has been used for years in creating chroot environments, and it works for containers as well. It does require a bit more knowledge than `apt-get install`, however.

solipsism · on April 6, 2015

Docker will hopefully solve this with new Dockerfile syntax soon. A proposal to do that has been bouncing around[0] -- even a working implementation or two in forks -- but they haven't settled on a final design direction yet.

[0] https://github.com/docker/docker/issues/7115