Hacker News new | past | comments | ask | show | jobs | submit login

Don't conflate using "apt-get" in a Dockerfile with what "docker build" does.



You can absolutely build a reproducible image in Dockerfile if you have discipline and follow specific patterns of doing.

But you can achieve the same result if you use similar techniques with a bash script.


You _can_ if you have _disipline_. That sounds like a foot gun the longer a project goes on and as more people touch the code.

Just create a snapshot of the OS repo, so apt/dnf/opkg/ etc will all reproduce the same results.

Make sure _any_ scripts you call don't make web requests. If they do you have the validate the checksums of everything downloaded.

Have no way to be sure that npm/pip/cargo's package build scripts are not actually pulling down arbitrary content at build time.


So, outside of the fact that a nix build disables networking (which you can actually do in a docker build, btw) how would you check all those build scripts in nix?

You seem to be comparing 2 different things.


You don't. Those scripts will just fail forcing you to rewrite them. This is why some people trying to create new packages often complain, because they need to patch up original build for given application to not do those things.

There are still ways that package will not be fully reproducible, for example if it uses rand() during build, Nix doesn't patch that, but stuff like that is fortunately not common.


Docker doesn't give you the proper tooling to not have to use e.g. apt-get in your Dockerfiles. For that reason, one might as well conflate them.


I'm not sure that this is a Docker problem but you do have a point. I've used docker from the very beginning and it always surprised me that users opted to use package managers over downloading the dependencies and then using ADD in the docker file.

Using this approach you get something reproducible. Using apt-get in a docker file is an antipattern


Why? — I agree that it’s not reproducible, but so what?

We have 2-3 service updates a day from a dozen engineers working asynchronously — and we allow non-critical packages to float their versions. I’d say that successfully applies a fix/security patch/etc far, far more often than it breaks things.

Presumably we’re trying to minimize developer effort or maximize system reliability — and in my experience, having fresh packages does both.

So what’s the harm, precisely?


This feels like moving the goalposts. This is on a thread which began with the statement that Docker is reproducible. Will we next be saying that, OK, it's an issue that Docker isn't reproducible, but it's doing it for a noble reason?

Regardless.... I can give a few reasons that it matters, off the top of my head:

1) Debugging: It can make debugging more difficult because you can't trace your dependencies back to the source files they came from. To make it concrete, imagine debugging a stack trace but once you trace it into the code for your dependency, the line numbers don't seem to make any sense.

2) Compliance: It's extremely difficult to audit what version of what dependency was running in what environment at what time

3) Update Reliability: If you are depending on mutable Docker tags or floating dependency installation within your Dockerfile, you may be surprised to discover that it's extremely inconsistent when dependency updates actually get picked up, as it is on the whim of Docker caching. Using a system that always does proper pinning makes it more deterministic as to when updates will roll out.

4) Large Version Drift: If you only work on a given project infrequently, you may be surprised to find that the difference between the cached versions of your mutably-referenced dependencies and the actual latest has gotten MUCH bigger than you expected. And there may be no way to make any fixes (even critical bugfixes) while staying on known-working dependencies.


Docker doesn't give you the tooling to build a package and opts for you to bring the toolchain of your choice. Docker executes your toolchain, and does not prescribe one to you except for how it is executed.

Nix is the toolchain, which of course has its advantages.


In terms of builds and dependency management, Docker and nix actually work pretty similarly under the covers:

Both are mostly running well-controlled shell commands and hashing their outputs, while tightly controlling what's visible to what processes in terms of the filesystem. The difference is that nix is just enough better at it that it's practical to rebase the whole ecosystem on top of it (what you refer to as a "toolchain") whereas Docker is slightly too limited to do this.


Uh, I never even mentioned apt. Docker and nix are, likewise, very different. In not super familiar with either but I do know docker isn’t reproducible by design whereas nix is. I’m not sure nix is always deterministic, though i know docker (and apt) certainly aren’t, nor are they reproducible by design.


So the thing here is docker provides the tooling to produce reproducible artifacts with graphs of content addressable inputs and outputs.

nix provides the toolchain of reproducible artifacts... and then uses that toolchain to build a graph of content addressable inputs in order to produce a content addressable output.

So yes they are very different, but not in the way you are describing. Using nix, just like using docker, cannot guarantee a reproducible output. Reproducible outputs are dependent on inputs. If your inputs change (and inputs can even be a build timestamp you inject into a binary) then so does your output.


With nix, you just have to be careful not to do anything non deterministic to get a deterministic build. With docker build, you have to specifically design a deterministic build yourself. It’s easier to just not use inputs that change than to design a new build that’s perfectly deterministic.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: