Why is a Rust executable large?

haberman · on June 2, 2016

I've been playing around with a tool to answer the more generic question: why is my binary (written in C, C++, Rust, etc) so large?

We use CPU profilers to tell us why our programs are slow. My tool is intended to be a size profiler: why is my program big?

It turns out there are some really interesting analyses and visualizations you can perform to answer this question.

A few direct comments on the article:

- I do not recommend stripping your binaries completely! Use "strip --strip-debug" instead (or just don't compile with -g). You should realize that debug information and the symbol table are two separate things. The debug information is much larger (4x or more) and much less essential. If you strip debug information but keep the symbol table, you can still get backtraces.

- I don't believe -Wl,--gc-sections has any effect unless you also compile with -ffunction-sections and/or -fdata-sections, which these examples for C/C++ do not.

If you care about binary size in C or C++ you should probably be compiling with -ffunction-sections/-fdata-sections/-Wl,--gc-sections these days. Sadly, these are not the default.

lifthrasiir · on June 3, 2016

- It is fine to strip both debug information and symbol table if you want the distributable (I mentioned the ISP at the beginning for one reason) and the program does not rely on them in any way. Actually, I wonder why separate debug files [1] are not a norm in Unixes.

- `-Wl,--gc-sections` does have an effect even at the absence of `-ffunction-sections` (and so on) because libc and libstdc++ is already compiled in the way allowing for GC---perhaps necessarily. The example had a single function anyway and I was lazy enough to exploit that `-ffunction-sections` wouldn't have a difference...

[1] https://sourceware.org/gdb/onlinedocs/gdb/Separate-Debug-Fil...

haberman · on June 3, 2016

Without symbols you can't get backtraces, profile the program, use function-based DTrace probes, readably disassemble it, etc. I'm not saying it's impossible to distribute stripped binaries, I'm just saying I don't recommend it. Compared with the debug information, I think the symbol table is much bigger bang for the buck, considering how much smaller it is.

It's strange -- I can verify with -Wl,--print-gc-sections that this is indeed discarding some sections from a static glibc link, so I was wrong about that. On the other hand I can also see plenty of sections in libc.a that have more than one function in them -- not sure why this would happen if libc was indeed compiled with -ffunction-sections.

I agree that separate debug files are nice, and OS X's implementation of them is especially nice, since it does a very good job of finding the right debug information for an executable.

lifthrasiir · on June 3, 2016

Yeah, I don't doubt the usefulness of the symbol table. I'm probably thinking of the distributable in Windows, where you... don't really do such things.

IIRC the coding convention of glibc is that most functions (and probably all public functions) are contained in their own files. So it effectively results in the similar effect, even when `-ffunction-sections` is missing.

Annatar · on June 3, 2016

Actually stripping debug information (or compiling without -g) makes it very difficult to troubleshoot a problem in production, where the binary might be deployed at a customer's site, and where the source code might not be available (firewall, no network connection, ...) The additional information is encoded into the ELF header, but ignored by the runtime linker, so it does not hurt the performance of the program.

The space savings come nowhere close to the benefit of having the additional information available when debugging.

Some operating systems, for example SmartOS, and any other OS based off of the illumos's source code, even inject the entire source code into the ELF binary and library in a special compact format during the build with the ctfconvert(1ONBLD) / ctfmerge(1ONBLD) tools[1][2].

If you are ever considering stripping the binary just to save some disk space but do not have a good reason for it (like building for a space-constrained appliance), please abstain from doing so; every developer and engineer trying to debug your program will be thankful to you if you do not remove the debugging information.

[1] http://lethargy.org/~jesus/writes/mdb-ctf-dwarf-and-other-an...

[2] http://dtrace.org/blogs/rm/2013/11/14/userland-ctf-in-dtrace...

caf · on June 3, 2016

Debian does use separate debug files, packaged in separate packages: https://wiki.debian.org/DebugPackage

ashitlerferad · on June 3, 2016

...and now they are finally being automatically built:

https://wiki.debian.org/AutomaticDebugPackages

Ubuntu/Fedora has been doing this for years too.

steveklabnik · on June 2, 2016

Small note, Rust already does the -ffunction-sections thing by default, so it's missing for the C/C++ examples, but is fine for the Rust ones.

jstelly · on June 3, 2016

Aras Pranckevičius wrote a nice tool for profiling your DLL/EXE size on windows:

https://github.com/aras-p/sizer

bjourne · on June 4, 2016

Are you sure those flags has any effect? In the C++ project I'm trying it on --ffunction-sections/-fdata-sections/--gc-sections causes the object files to grow a bit. But the resulting binary is to the byte exactly as big as it was without the flags.

pslam · on June 2, 2016

The short version is: as small or large as you want to trade-off convenience vs performance vs size.

I've made useful firmware for a micro-controller (yes in Rust) which is just 5KB. You can, as the article shows how, dynamically link and then it's about the same as the equivalent C/C++.

The point is, there's nothing inherent about Rust - the language - which results in binaries appreciably different in size to what you can achieve in C/C++.

davidcuddeback · on June 3, 2016

> I've made useful firmware for a micro-controller (yes in Rust)

Would you mind sharing which microcontroller and how to get a Rust compiler for it? I would love to use Rust to program microcontrollers. Every time I've looked into this, I've thought it wasn't possible, because I don't see any microcontrollers in the list of supported platforms here (https://doc.rust-lang.org/book/getting-started.html#platform...) or here (https://github.com/rust-lang/rust/tree/master/mk/cfg). I've considered trying avr-rust (https://github.com/avr-rust/rust), but the README says "NOTE: This does not currently work due to a bug." Any pointers to get started would be appreciated.

kam · on June 3, 2016

I've used Rust on a variety of ARM Cortex-M based microcontrollers including Atmel SAMD21, NXP LPC1830, and STM32F4. AVR is tougher because that's an 8-bit architecture with very new support in LLVM.

On microcontrollers, you use libcore instead of the full libstd, for lack of an OS or memory allocator. Libcore is easy to cross compile because it has no dependencies. You provide a target JSON file to specify LLVM options, and a C toolchain to use as a linker, and nightly rustc can cross-compile for platforms supported by LLVM.

https://github.com/hackndev/zinc is a good starting point.

Things like interrupt handlers and statically-allocated global data require big chunks of unsafe code. Rust has a promising future in this space, but it will take more experimentation to get the right abstractions to make microcontroller code more idiomatic.

shandor · on June 3, 2016

> I've used Rust on a variety of ARM Cortex-M based microcontrollers including Atmel SAMD21, NXP LPC1830, and STM32F4

That's an encouraging start.

Having been there, and as reported on weekly basis by numerous people like @internetofshit, much of the code that runs the current IoT hype is utter and complete tripe. I have a faint hope that Rust could be something to help in that regard, even though at the same time I recognize that much of the badness in IoT has to do with financial considerations (too costly to make a good quality talking coffee-maker). Maybe it's especially because IoT seems to be all about getting cheap stuff out cheaply, I have this hope that at least having the language and development ecosystem help devs instead of shooting them in their foots would be a good start.

TylerE · on June 3, 2016

How does programming a "modern" 8 bit architecture work anyway? Do you have 16bit longs? 32bit long longs?

Sanddancer · on June 3, 2016

For 8 bit platforms, 8 bits are still bytes/sbytes. Most of them have native functions that work on 16 bit integers, they just take up a pair of registers. You have access to the same array of integer sizes. Plain int is 16 bits, and longs are 32 bits and long longs still 64. Still, I much prefer using the int definitions included in C99, where you define the bit size explicitly. uint16_t is a lot more explicit than int, especially if you've got code that's being shared between a few different micros of different word sizes.

Merad · on June 3, 2016

On AVR 8 bit microcontrollers, at least: yes. Pointers are 16 bits, and there are a handful of instructions specifically for 16 bit integer and pointer operations (which operate on pairs of 8 bit registers). For everything else, operations are performed by chaining together 8 bit operations. Adding two 32 bit ints for example would need 4 add instructions.

davidcuddeback · on June 3, 2016

Thanks! That helps a lot. I'll pickup a dev board for one of those ARM MCUs and look into Zinc.

yitchelle · on June 3, 2016

Last time I looked at this, it seems the number of supported microcontroller are limited. It is usually depends whether llvm supports it or not.

I was wondering if the Rust front end of llvm can be ported to other compiler engines. That could open up the number microcontrollers supported quite quickly.

zem · on June 3, 2016

relatedly, poke around http://masterq.metasepi-design.com/ - he's doing some interesting work with microcontroller programming in ats.

Bromskloss · on June 3, 2016

> I've made useful firmware for a micro-controller (yes in Rust)

That's cool. Any pointers regarding how to get started with that?

Someone1234 · on June 2, 2016

A related but distinct question might be: How does Rust filesize scale? Meaning, a Hello World application is X, and X is bigger than C/C++. But does Rust outgrow projects in C/C++ as the complexity increases? Or is the code generation actually consistent and all of this is simply about the corelib size and nothing more.

kibwen · on June 3, 2016

Given that Rust monomorphizes generic functions like C++, and given that Rust uses a high-quality C++ backend for code generation, I'd assume that default binary sizes would be comparable. Producing a more scientific comparison would require implementing a large project more-or-less identically in both languages, which is unlikely in the near future (it might be instructive to compare Servo to Gecko, but Servo isn't near complete yet, and even then Servo does many things differently that might influence the comparison).

glandium · on June 2, 2016

> Just to be cautious: Is it a valid question to ask after all? We have hundreds of gigabytes of storage, if not some terabytes, and people should be using decent ISPs nowadays, so the binary size should not be a concern, right?

$ find /usr/bin -type f | wc -l 2254 $ du -sh /usr/bin 445M /usr/bin

If all these programs were written in Rust and statically compiled, assuming only a 600K difference by binary, that would make my /usr/bin 1.3G (or 300%) larger.

But in reality all those programs are dynamically linked against many libraries in /usr/lib, so the difference would be even bigger, with libraries duplicated between all those programs.

Sure, you can dynamically link with Rust too, but then, you hit the other problem that there is no stable ABI (yet?), and that upgrading Rust (every 6 weeks) means recompiling everything.

fpgaminer · on June 3, 2016

I would gladly sacrifice that disk space in exchange for system utilities written in Rust. We're still finding vulnerabilities in core utilities, even after all these decades.

Aoreias · on June 3, 2016

Most recent vulnerabilities in core utilities really don't have a lot to do with memory safety though - Shellshock & Imagemagick were input sanitization, other common ones though are still injection vulnerabilities or authentication weaknesses. Heartbleed excluded most major vulnerabilities these days aren't related to memory safety.

fpgaminer · on June 3, 2016

Sure, but Rust isn't just about dealing with memory safety. The language also lends itself well to solving other common mistakes by virtue of its design and by being built on modern principals. Idiomatic C promotes throwing around pointers/arrays and hoping that the next coder that comes along to consume a struct reads the docs/header and understands how the data in that struct is supposed to be used. Idiomatic Rust uses its type system to strictly enforce how a struct and its data can/should be used. It's a world of difference and results in drastically less bugs. Not to mention the rest of the Rust ecosystem works in harmony with the language to further reduce bugs. Testing as a first class citizen of the language and its tooling is one of the big ones.

That isn't to say that you can't do something similar in C, but it is an order of magnitude more challenging to design a "module" in C that is explicit and robust compared to the effort to do the same in Rust. I've coded my fair share of cryptographic systems in both C and Rust. Bulletproof C is just _exhausting_ to code and work with. The same kind of code in Rust is, dare I say, fun to write. It's just a joy to use Rust's type system to enforce rules and invariants, and then codify those rules in the documentation comments above the structs/functions, and then have "cargo test" actually run the code in that documentation automatically to check it for validity.

And yes, as you point out, some of the big bugs lately have been logic bugs resulting not necessarily from poor code but from poor design. Thing is, the less mental capacity a language requires from a coder the more mental capacity that coder has to use for thinking about the application logic. i.e. in C when you get a string you have to think about how to handle the UTF-8 encoding and what to do about path names that somehow ended up with a non UTF-8 character, and whether the string is NULL terminated or pascal, and is memmove (src, dst), or (dst, src)? In Rust, well, that's all handled, so you think about what the string actually means and, hopefully, you'll realize that hey you should probably sanitize that string so it can't be used to gain shell access from an SVG file.

robryk · on June 3, 2016

Well, there are for example all those libbfd-related issues that caused running strings(!) on unstrusted files to be unsafe.

_0w8t · on June 3, 2016

Heartbleed is not a real memory safety bug when program reads beyond allocated memory. It is more of improper reuse of previously allocated buffer and could exist in safe Rust just as well.

rkangel · on June 3, 2016

You're right, there isn't a classic simple buffer overrun that Rust would trivially catch, but you're missing two things:

1) The problem was really sending back uninitialised memory. In Rust you can't have uninitialised memory. The oversize allocated buffer would have to have initialisation data passed in (possibly zeroes)

2) You'd never write the Rust code like that anyway. The abstractions avaialble mean that you aren't separating the content of some data and the length to pass to allocators.

yazaddaruvala · on June 2, 2016

Forgetting about Rust for a second. Talking about dynamically linked libs in general.

Dynamic linking was an optimization which came about when memory was expensive. Memory is no longer expensive.

Is 1.3GB (or even 13GB) a lot on your hardware[1]?

According to "Do not prematurely optimize": Pretending we never had dynamic linking, and given today's hardware constraints[2], as a community would we choose to reimplement and universally adopt this optimization?

[1] Keep in mind that a single "modern" application, on average, weighs in the 10s of MBs or GBs.

[2] I'm talking about the general case, for the majority of OS distributions, ignoring the relatively exceptional case of embedded systems, which do in-fact still need it.

yongjik · on June 2, 2016

Main memory is cheap but slow. Having a frequently called function in a shared library vs. statically linked code could mean the difference of the code executing from CPU's cache or from main memory.

Even latest desktop processors have a L3 cache of only a few MB.

tatterdemalion · on June 2, 2016

But static linking could mean that function will be inlined.

angry_octet · on June 3, 2016

Inlining saves time lost due to jumping about, but it can cost time if it causes code replication (same as loop unrolling), because it can bloat the hot code to larger than the smallest cache.

So the arguments against inlining apply even more strongly when talking about every program being statically linked, the same code (standard library) will exist in memory in many places, and will get dumped and reloaded to L2/L3 every process swap. Nothing slower than having to wait for something to be faulted in.

majewsky · on June 3, 2016

> Nothing slower than having to wait for something to be faulted in.

There is something slower. When your executables are so large that you have to hit the swap drive on process swaps.

adrianratnapala · on June 3, 2016

It means that function might be inlined.

And sufficiently aggressive inlining will increase the program size further. This might or might not be compensated for by the increase in instruction-pointer locality.

netheril96 · on June 3, 2016

> Having a frequently called function in a shared library vs. statically linked code could mean the difference of the code executing from CPU's cache or from main memory.

I am under the impression that when process switches the CPU caches are flushed.

nimrody · on June 3, 2016

No. Only TLBs are flushed (and probably only partially). TLBs are used to associated virtual addresses with physical addresses and memory maps are different per processes.

(That's one reason why it's beneficial to schedule a process on the same CPU if possible - the data is still in the cache)

jeffdavis · on June 3, 2016

Dynamic linking offers modularity and separation of concerns.

I don't really care which point-release of zlib my program is linked with, I just want to decompress stuff. If someone finds a bug (or exploit), I am not the best person to quickly realize it and release an update -- the maintainer of zlib, and the packagers, and OS distributions, and sysadmins are in a much better position. But if it's statically linked, then developers have to be involved.

You could say that we could invent a mechanism to allow sysadmins to rebuild with patched libraries, but then we'd still need to reinvent all of the versioning and other headaches of dynamic libraries.

I think dynamic libraries are kind of like microservices. Sure, they can break stuff, but they allow higher degrees of complexity to still be manageable.

markus2012 · on June 2, 2016

Memory is terribly expensive and I have to fight all of the other developers/product folks/upper management for every byte in my environment (hundreds of thousands of servers).

I have no choice but to use DSOs for our Rust code.

rtpg · on June 2, 2016

Dynamic linking also lets you update libraries due to things like security issues, it's not just a memory thing. Kinda agree on the space thing too (plus much less chance for things like buffer overflows..)

yazaddaruvala · on June 3, 2016

FWIW: I think everything has its place, and everything has tradeoffs. I can definitely see a lot of usefulness for dynamic linking. The point you raise probably being the best current reason.

... but since I'm already playing devils advocate :)

Dynamic linking also lets you update libraries ... and cause security issues simultaneously across all applications. Increasing the number of possible attack vectors to successfully utilize that vulnerability.

jacques_chester · on June 4, 2016

Actually, it's a wash. If all we had was static linking, people would statically link the same common libraries. So you'd have to update multiple binaries for a single vulnerability.

I've seen this in my day job at Pivotal. The buildpacks team in NYC manages both the "rootfs"[0] of Cloud Foundry containers, as well as the buildpacks that run on them.

When a vulnerability in OpenSSL drops, they have to do two things. First, they release a new rootfs with the patched OpenSSL dynamic library. At this point the Ruby, Python, PHP, Staticfile, Binary and Golang buildpacks will be up to date.

Then they have to build and release a new NodeJS buildpack, because NodeJS statically links to OpenSSL.

Buildpacks can be updated independently of the underlying platform. The practical upshot is that anyone who keeps the NodeJS buildpack installed has a higher administrative burden than someone who uses the other buildpacks. The odds that the rootfs update and NodeJS buldpack are updated out of sync is higher, so security is weakened.

Dynamic linking is A Good Thing For Security.

[0] https://github.com/cloudfoundry/stacks

[1] https://github.com/cloudfoundry/nodejs-buildpack

f2f · on June 3, 2016

this makes the false assumption that updating dynamic libraries never introduces any new bugs.

majewsky · on June 3, 2016

No, it makes a trade-off. Especially on the stable channel of Debian/RHEL/SLES, an update will most of the time fix more bugs than it introduces.

kbenson · on June 2, 2016

This was a much more powerful reason before things like docker became common, and methodologies adapted to provide updates for docker images, which for this purpose are functionally identical to a static binary.

At least I hope "methodologies adapted", I don't use docker images, so that's an assumption on my part, but I feel it's a fairly safe bet.

cyphar · on June 3, 2016

Docker images don't have a nice way of updating without "rebuilding everything". There's a tool called zypper-docker that does allow you to update images, but there's no underlying support for rebasing (updating) in Docker. I was working on something like that for a while, but it's non-trivial to make it work properly.

kbenson · on June 3, 2016

Hmm, I assumed it would be something along the lines of the images being fairly static, and updated as a whole, and you just apply your configs and data, possibly through mount points.

cyphar · on June 3, 2016

I was responding to the comment that security updates to libraries make it harder to update static binaries. Docker has revived the problem, and there isn't a way of nicely updating images without rebuilding them (which in turn means you have to do a rollout of the new images). While it's not a big deal, it causes some issues that could be avoided.

kbenson · on June 3, 2016

Yes, but presumably you're running far fewer docker images than you have binaries that would be affected if you statically compiled everything. For example, I assume in a statically compiled system, an update to zlib will likely affect a lot more packages than docker images you are running (on a server I admin, there's 3 binaries in /bin that link to zlib, and 374 binaries in /usr/bin, which will condense down to some smaller, but still likely quite large set of OS packages). It's easier in a dynamically linked system, where you can just replace the library, but it's not that much better for the sysadmin, as if you want to make sure you are running the new code, you need to identify any running programs that have linked to zlib and restart them, as they still have the old code resident in memory.

Retra · on June 2, 2016

Yes, we would. Not because it's 'premature optimization', but because it's easy optimization.

"Do not prematurely optimize" is not a software design rule, it's a time-management rule. Dynamic linking has software design impacts.

kbenson · on June 2, 2016

> "Do not prematurely optimize" is not a software design rule, it's a time-management rule.

No, it's both. Optimization often affects the cost of later decisions, and the reason not to prematurely optimize is because it can easily take you to a local optimum which is not very optimal at all. This is a perfect example of that, as the GP comment point out. If memory is not constrained as it was when this trade-off became common, it may not have become prevalent. Static binaries are faster (the degree to which depending on a lot of factors), while dynamic binaries are smaller on disk and in memory, if the shared libraries are already used elsewhere. Modern optimizations at the OS level for forking and threading should make consideration of those negligible.

Retra · on June 2, 2016

Dynamic linking wasn't invented as a premature optimization though. And if it didn't exist today, it would still not be premature to invent it, because dynamic linking does not only concern how large and fast your program is, but also how it is interacted with.

So here is my point: optimization that can affect the relevant interfaces of your software is not premature because deciding on the interfaces your software exposes is not premature.

kbenson · on June 2, 2016

You are choosing to focus on the "premature optimization" wording, which is fair, it was said. I'm focusing on "as a community would we choose to reimplement and universally adopt this optimization?" (emphasis mine). I think it would be implemented, I do not know of any evidence that makes me believe it would become universally adopted given modern resources.

Retra · on June 3, 2016

What do you mean by 'universally adopted?' Available on all platforms? Default linking strategy? All code is dynamically linked?

kbenson · on June 3, 2016

I'm not sure exactly what was originally meant, I interpreted it as how currently dynamic linking is the norm, is used in every mainstream OS and most the applications run on them, and all the major mobile platforms. If we had to make a choice right now without the history of dynamic linking behind us, would we still choose to use it for the majority of platforms?

yazaddaruvala · on June 3, 2016

FWIW: You appropriately articulated how I meant 'universally adopted'.

Nit:

"Use it for the majority of platforms [and/or for the majority of applications]?"

robryk · on June 3, 2016

Is it really easy? It brings with it many versioning problems.

santaclaus · on June 3, 2016

> Is 1.3GB (or even 13GB) a lot on your hardware

On my phone or tablet, that is most definitely a lot.

mcguire · on June 3, 2016

A library update would require rebuilding every application that uses the library. A change to libc would require effectively reinstalling the OS.

At that point, an extra 1.3GB would just be icing on the download.

bjourne · on June 4, 2016

Dynamic linking certainly has no technical advantages like lower memory/disk usage or faster processing. Its main advantage, which has been cited before, is that it forces cohesion in the Linux community.

E.g if an author of a program finds a problem in the dynamic library he or she is using, the problem is forced to be solved upstream, benefitting all users of the library. If instead static linking was the norm, it is much more likely that the author would just solve the problem for him or herself and the solution would never reach the wider community.

In the best of worlds, we would have static linking everywhere but the "social contract" of dynamic linking would be enforced just as strongly.

jleahy · on June 2, 2016

Not necessarily.

You could go the way of busybox or uutils and have a single binary with many hard links. So 'ls', 'wc', 'grep', etc can all point at a single executable which dispatches to different functionality based on argv[0].

Then you can even share code between the binaries, which should make them even smaller.

glandium · on June 2, 2016

ls and grep are in /bin on my system. wc, coming from coreutils is in /usr/bin. But that's really only a small part of /usr/bin.

$ dpkg -L coreutils | grep -c /usr/bin 76

Are you suggesting that all of e.g. GNOME should be a single program?

cgag · on June 2, 2016

Why hard links over symlinks?

hueving · on June 3, 2016

So someone doesn't unlink the main binary when they are cutting unused utils to save space.

mraleman · on June 2, 2016

> Just to be cautious: Is it a valid question to ask after all? We have hundreds of gigabytes of storage, if not some terabytes, and people should be using decent ISPs nowadays, so the binary size should not be a concern, right?

Phones? IoT? Embedded? OS devs? Someone just checked in a 14KB binary size reduction to our shell (by removing unnecessarily virtual methods in some C++) that was widely celebrated.

Also, smaller .dll's load faster from disk.

AndyKelley · on June 3, 2016

Here's Zig[1] for comparison:

    $ cat ../example/hello_world/hello.zig 
    const io = @import("std").io;
    
    pub fn main(args: [][]u8) -> %void {
        %%io.stdout.printf("Hello, world!\n");
    }
    $ ./zig build ../example/hello_world/hello.zig --export exe --name hello --release --strip
    $ wc -c hello
    5760 hello
    $ ldd ./hello
	not a dynamic executable
    $ ./hello 
    Hello, world!

5 KB. There are a few unfortunate reasons this isn't even smaller, one of them being a bug report[2] I filed in LLVM.

Also note that Rust is a lot further along than Zig right now. Zig does not have backtraces or threads yet. But I believe that the executable size for hello world in release mode will not contain backtrace code, or threads, or a memory allocator, even when Zig catches up to Rust in terms of std lib functionality.

[1]: http://ziglang.org/

[2]: https://llvm.org/bugs/show_bug.cgi?id=27610

DannyBee · on June 2, 2016

"C and C++ folks had been fine with that approximation for decades, but in the recent decade they had enough and started to provide an option to enable the link-time optimization (LTO)"

LTO is much older than this, even for C/C++

Just not in GCC :)

hendzen · on June 3, 2016

There has been a technique (the 'unity build') that approximates a poor-man's LTO for a long time. Basically you #include all your cpp files in to a giant translation unit and then compile it :)

lifthrasiir · on June 3, 2016

I meant to include early 2000s as well, but my English ruined the intention. Changed to decade"s" now.

eatonphil · on June 2, 2016

Where has it been around?

plorkyeran · on June 2, 2016

VC++ first shipped it in Visual Studio .NET (early 2002), and I don't remember it being touted as the first production-quality implementation of the concept, so I assume it was around elsewhere before that.

to3m · on June 2, 2016

I dunno about "much" older but I'm sure it was part of the Xbox tools in 2001... they provided some kind of early build of the VS2002 compiler. VS2002 proper was released in 2002 (oddly enough).

DannyBee · on June 6, 2016

IBM Research had compilers that did it in the late 80's

neopallium · on June 2, 2016

Golang binaries are also very large [0].

  package main
  func main() {
    println("Hello, world")
  }

That compiles to 1.1M using go 1.5.

0. https://github.com/golang/go/issues/6853

cgag · on June 2, 2016

Go doesn't pride itself on not having a runtime though. People aren't surprised that there's space taken up by the garbage collector and green thread management and such.

ComputerGuru · on June 2, 2016

I don't understand your remark. To the contrary, a runtime should decrease binary size. The same PHP script (interpreted not JIT or VM'd though, obviously) will be only a few hundred bytes..

cgag · on June 2, 2016

The runtime in this case is compiled into the binary. With php, the runtime is all contained in the php binary that runs your scripts, in Go's case, the runtime is copied into every binary it produces.

nowprovision · on June 3, 2016

For a more true comparison with external scripting runtimes one can do: go run main.go

Although this would be ill-advised, kind of like PHP in general..

cgag · on June 3, 2016

I'm not sure what you mean. Isn't this equivalent to `go build main.go && ./main`?

nindalf · on June 3, 2016

It is. Not sure if the other guy has ever used Go.

nowprovision · on June 3, 2016

Hmm it would seem so, I could of sworn it was different, my bad.

Retra · on June 2, 2016

Aren't you just assuming the runtime is dynamically available? This is not a necessary feature of a runtime. You could compile PHP and add the size of the interpreter to the executable.

andrewchambers · on June 3, 2016

each go binary is self contained and has the entire runtime linked into it. I'm sure you will find the php runtime is many megabytes.

pjmlp · on June 3, 2016

By default yes, but they are adding dynamic linking support.

So far it only works on GNU/Linux I think.

dkhenry · on June 3, 2016

Before I found rust I used to be a big advocate for Scala. What eventually drove me away from Scala and towards go and rust was my jar's for Scala were clocking in at 300 to 400 Mb's, so the fact that a rust binary has a pretty fixed overhead of only a few hundred Kb I consider that a big win.

stcredzero · on June 2, 2016

I recall someone once getting a Squeak Smalltalk image down to 384k. However, there was a Digitalk originated project called "Firewall" that could produce Smalltalk images as small as 45k, suitable for writing command-line programs, even on early 90's machines. (Even recent versions of VisualWorks can get their memory footprint down around to that of Perl 5's, and even beat Perl 5 in terms of startup speed, provided you are prepared to dig around and shut a whole lot of things off.)

kbenson · on June 2, 2016

That's fairly impressive. Perl 5 is pretty quick to start:

    # perl -E 'my $cmd = shift; use Time::HiRes; my @times; for (1..10) { my $start=Time::HiRes::time; my $out = system($cmd); my $stop = Time::HiRes::time; die if $out>>8; my $time = $stop - $start; push @times, $time; printf "%0.4f\n", $time; } @times = sort {$a<=>$b} @times; @times = @times[1..8];  my $cumulative=0; $cumulative += $_ for @times; my $average = $cumulative/8; printf "Average time of 10 runs of \"%s\", dropping best and worst: %0.4f\n", $cmd, $average;' "perl -e '1;'"
    0.0039
    0.0032
    0.0031
    0.0031
    0.0032
    0.0035
    0.0036
    0.0030
    0.0033
    0.0032
    Average time of 10 runs of "perl -e '1;'", dropping best and worst: 0.0033

That's Perl 5.22.1. For "python -c '1'" (Python 2.7.5) I get 0.0173. A minimal C program (just return success from main after including stdlib and stdio) built with default gcc opts is <9k in size, and vacillates between averaging 0.0007 and 0.0010 in the the benchmark above when I run it.

thwarted · on June 3, 2016

That perl code that Time:HiRes to measure the time it takes to start perl via system(), which includes the time it takes to fork a shell and parse the command and then fork to spawn perl. `time perl -e '1;'` is more representative of the raw perl startup time, which on my machine is reliably 0.002 real seconds, ⅔ of your times (that is, there's a lot of overhead in your measurement).

A minimal C program (just return success from main after including stdlib and stdio)

You do realize that a "including stdlib and stdio" means nothing for a C program, right? These are, literally, just the API definitions, in the Oracle-vs-Google Android sense. The default gcc options probably produced a dynamic executable; if you compile it statically, you might be able to shave a few more microseconds in startup time.

kbenson · on June 3, 2016

> includes the time it takes to fork a shell and parse the command and then fork to spawn perl.

The only difference from the shell builtin time, or /usr/bin/time, should be the shell startup and exec call. Every command is fork+exec, so you can't really get away from that. If I really cared, I would try to account for that, but I don't. I thought it was pretty obvious that I wasn't being very rigorous. I just did the minimum to make the values I got not useless.

> which on my machine is reliably 0.002 real seconds, ⅔ of your times

And on mine it fluctuates between 0.002 and 0.009. You mention your times relative to my times, but that's useless. I ran mine on a 512mb VPS. The only relevant measure to determine overhead would be my method vs the shell builtin. Is that what you were actually referring to?

> You do realize that a "including stdlib and stdio" means nothing for a C program, right?

Truthfully, so I wouldn't have to remember the setup for C, I just googled "minimal C program" and removed the printf it had. I only mentioned the includes for completeness sake, and I didn't want to clutter the comment with the source.

> if you compile it statically, you might be able to shave a few more microseconds in startup time

Possibly, but that's not really what I was trying to convey. I was just pointing out that a minimal Perl program isn't slower than a minimal C program, and it's notable if you can get your startup times close to that for a system with a virtual machine.

kbenson · on June 3, 2016

Err "minimal Perl program isn't slower than a minimal C program" was supposed to be "minimal Perl program isn't that much slower than a minimal C program".

grenoire · on June 2, 2016

I really like executable shrinking posts. However, isn't it the case that the size of the executable won't increase significantly if you use --release to distribute bigger programs? After all the size comes from the library and memory allocator being included in the executable. As long as the libraries are not heavy, the executable should stay reasonably small.

kbenson · on June 2, 2016

That matches my (amateur) understanding of how it works. The executable would increase by small amounts because presumably your own code is a relatively small portion of all the code included in the default binary. Then again, since it's not dynamically linking, every crate you use increases the size...

steveklabnik · on June 2, 2016

Yes, the debuginfo might be linear, but the other stuff is constant.

amag · on June 2, 2016

Dunno what GCC is doing but a statically linked C++ "Hello World" built by MSVC is about 140k and for C it's about half that.

jokoon · on June 3, 2016

Another example of how languages don't matter, and that developing a good enough language with all the nice ecosystem that goes with it, is a massive amount of work, that can only be accomplished by OS vendors or years of dedicated OSS contributors.

At that moment, I think most language developers should turn to LLVM to spread their languages more rapidly, rather than trying to mess around with all the realities of language making, especially when it involves statically typed and compiled languages.

pjmlp · on June 3, 2016

*BSD and GNU/Linux eco-systems are a bit different in this regard, but on other OSes usually system programming languages that aren't part of the OS vendor offerings tend to have a hard time getting wide adoption support.

halestock · on June 2, 2016

How much of that initial 650k is constant in size? Would we expect it to increase with programs of substantial complexity, or is the overhead relatively constant?

tatterdemalion · on June 3, 2016

A large portion of it is constant. As this article discusses, much of the bulk comes from statically linking the standard library and jemalloc.

gpm · on June 3, 2016

All the sources he talks about in the article, except debug information, don't grow with program size.

Kiro · on June 3, 2016

The tl;dr doesn't explain why it's so big, only how to fix it.

masgui · on June 3, 2016

`ls -lh` please!

masgui · on June 3, 2016

ls -h please!

astro1138 · on June 2, 2016

Why not link the Rust libstd dynamically?

jontro · on June 2, 2016

They did in the article

    $ cargo rustc --release -- -C prefer-dynamic 
    $ ls -al target/release/hello
    -rwxrwxr-x 1 lifthrasiir 8831 May 31 21:10 target/release/hello*

miander · on June 2, 2016

Good analysis. Now that good information on the causes is available, the larger community can discuss solutions.

tatterdemalion · on June 2, 2016

The cause of this has always been known (within the Rust community), but it is not considered a problem. For most users, the advantages of static linking and using jemalloc significantly outweigh the cost of a 0.5mb constant overhead on binary size. Those users under different circumstances can configure their build using the tactics in this article to make a different trade off more suited to their circumstances.

toss1941 · on June 2, 2016

I don't understand the article's point about ISP speeds. Really if you're on a slow enough ISP that a <1MB file is too large for someone to download, then practically speaking they won't be able to download anything smaller either.

voltagex_ · on June 3, 2016

Because it's not that single ~1MB file that's a problem, it's the cumulative effect. There are people on dialup and GPRS and even people who pay by the megabyte.

imtringued · on June 3, 2016

Have you considered that some applications might have more than one executable?