Hacker News new | past | comments | ask | show | jobs | submit login
Magicpak: Build minimal Docker images without static linking (github.com/coord-e)
112 points by coorde on April 13, 2020 | hide | past | favorite | 30 comments




'Cool' isn't the word I'd use here.

But I guess it's useful if you already have an ad-hoc build system and you can't/won't migrate to something industrial-strength.


Could you elaborate with some examples of what you mean, and where those might be better alternatives than using the linked tool?


Which of the linked tools?

The original commenter called magicpak a 'cool alternative to nix', to which I replied that nix is an industrial-strength, tested and very through solution, while magicpak is more like a dirty hack. (We used to have an internal tool much like magicpak before nix came along.)


Honest question: what's the difference between storing dependencies in the executable (statically) and storing them in the Docker image?


When statically linking the linker can exclude functions that aren't referenced (directly or indirectly) by the main application. For something like libc or, especially, the C++ standard library many and often even most routines can be excluded.

OTOH, container images seem to be bloated by unused binaries and completely unused libraries, not unused functions in shared libraries.


What are the main advantages of excluding unnecessary library functions / libraries / executables?

Here are some possible advantages I can think of: less surface area for security, less surface area for maintenance, smaller container image, implying faster time to pull image, and perhaps also faster time to boot image.

Disadvantages: you have to do the work to set it up in the first place and maintain it (putting in extra effort to avoid adding superfluous dependencies in future), while the above advantages may be marginal.


One advantage I've recently encountered is that CVE scanners like Nessus or Stackrox seem to be naive. They don't seem to examine binaries for statically linked libraries, so a container with only a single statically linked binary is likely to stay nominally CVE-free in perpetuity.

This might also be true if you simply copy shared libraries. AFAICT these scanners rely on the package index to identify potential CVE exploits. But I may be wrong, and in any event it'd be much more difficult to identify the version of a statically linked library than a shared library, which can be identified by its name or hash.

This doesn't excuse avoiding keeping containers updated, but if you want to be cynical about vulnerability management, i.e. that any trick which improves the signal-to-noise ratio is worth exploring, then it's something to keep in mind.


It sounds like static linking would also hide CVEs that do affect the functions that were compiled into and are used by your statically linked binary in a way that would introduce security issues for that binary.

BTW IIRC normal static linking doesn't strip out all unused functions, you need LTO (Link Time Optimisation) to do that.


Normal static linking does not include objects that aren’t referenced. If you put every function in its own object file, then unused functions will not be linked in, regardless of how many objects files are in the library. If you have 10,000 functions in two object files, then you can link none, either, or both - all sunsets of object files, but likely many unused functions.

LTO does something more complex than just pick the right routines - it does the last few compilation/optimization stages when you run the linker. As a result, you get to inline across modules, drop some unused functions and more - however, this would only work for object files that were properly LTO compiled themselves. If you have e.g. a third party library that wasn’t LTO compiled for your specific tool chain, you are back to the “whole object file or bust” link regime.


The wide gap between those two options is kind of dissatisfying, it would be cool if LTO could disassemble existing object files and do the unused code removal on the assembly language layer. I guess that Facebook BOLT and Google Propeller are working on an approach like that.

https://github.com/facebookincubator/BOLT https://www.phoronix.com/scan.php?page=news_item&px=Google-P...


If the linker decides something is unused and strips it, then it doesn't have a realistic impact on security. It was proven the function won't be used, so it can't be exploited even if present.

The only edge case I can think of is if you could use it for ROPing some gadgets from the unused functions, but that's pretty unlikely and the gadgets are usually not that unique.

Most advantages are in performance: smaller size, faster loading, better cache usage.


ROP gadgets, and data strings/buffers - you might have a printf format string to feed into system() that you wouldn’t have had otherwise and might not be able to supply with your shell code; also, ROP gadgets aren’t very unique, but often system and library calls are available in libc() or some other library; approximately 1% of programs I write needs system/exec calls - but it’s always there in libc making it easier to return to / exploit.


Research shows that containers are really bad for security: the majority of containers on docker hub has vulnerabilities with CVEs out and never received updates.

Shared libraries, at least, allow more automated updates, but containers are still a problem.


You don't have to build the executable yourself. You can just apt-get install whatever and then bundle only the shared libraries used by whatever particular executable you care about (for instance, moreutils includes a ton of tools, maybe you only care about `vidir`, and `vidir` probably doesn't need all the libraries that `isutf8` (also packaged by moreutils) uses).

I did this just recently by hand when I wanted to run a source-only binary in my dom0 but didn't want to build it in the dom0. I built it in a container, checked what shared libraries it wanted with `ldd path/to/binary` inside the container, copied all of the referenced libraries to ~/libs, then copied that dir and the binary to dom0 and ran it with `LD_LIBARARY_PATH=$HOME/libs path/to/binary`, worked perfectly.


It's a good question and one not answered by the README. The very first thing I add to any README now is a heading "Why?" (usually along with "What?" (is it)) because I often land on a piece of software's home page and after reading I still don't know how it would benefit me or why it has been created. The easiest way would be a before and after kind of thing but just a line or two can be a help.

The 5 W's[1] aren't just for journalists.

[1] https://en.wikipedia.org/wiki/Five_Ws


This isn't exactly "the difference", but some advantages of distributing software in Docker images include cross-platform support and a uniform sandboxing interface. I wrote more about this here: https://jonathan.bergknoff.com/journal/run-more-stuff-in-doc...


Maybe to rephrase the original question: given that you've already decided to distribute your software as a container image, what is the advantage of storing dependencies in an executable (statically) inside the container image vs storing them some other way inside in the container image?


Basically, it's like distributing a single static binary in a tar file, versus distributing a dynamic binary and all its dependencies in a tar file. The whole purpose of using a tar file is to keep multiple files.

You can make a smaller tar file by including just the static file (less inodes/wasted file space of deps linking to each other/unnecessary extra function call data). And technically static files can sometimes execute faster (depending on page/cpu cache etc). But the advantages of static are much smaller when you consider the advantages to dynamic in a tar file - and even more advantages in a container.

So while there can be advantageous, they are vanishingly small.


There's one against static linking in development or with lots of deployments: You can't use layers efficiently. With shared deps you can install them before the app itself. With static linking your last layer always includes both, making it larger and slower to down/up-load.


Not much besides the convenience of not needing to research/implement compiling something statically.


Beyond a method of software distribution, docker containers are also a nice sandbox. Apps run inside docker will have very limited access to the rest of the system unless you explicitly give access.


But don’t assume it is a security perimeter, mostly because the docket developers don’t.

It’s a lock on the door - keeps honest people honest and erroneous rm -rf confined. But it might not stop a determined hacker.


Sandboxing should be left to the OS/User and not part of the "binary" distribution method imo


Not at all. Docker is a security disaster.


In the former case you could have two closely related programs, say a DB server and its CLI stored efficiently.


How does this work behind the scene?

BTW: I might be an exceptional person. But the first thing I look for in a new software is how it works. Granted that, in most cases, I might not be the targeted audience. But a brief introduction can effectively boost my interests in learning more. The opposite case, i.e., no such information, turns off my curiosity.


Does this handle non-executable resources? i.e. the trusted root SSL certificates that you typically install alongside OpenSSL


Are any results available? How much does this save in a few common use cases?


How does this handle something like a python program




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: