Hacker News new | past | comments | ask | show | jobs | submit login
Code Inflation [pdf] (computer.org)
80 points by chilgart on April 17, 2015 | hide | past | favorite | 29 comments



Those who don't see anything unusual or wasteful about an 8KB+ executable whose only action is to exit and return a value are recommended to watch this demo generated in realtime by a 4KB executable:

http://www.youtube.com/watch?v=jB0vBmiTr6o

There's more here: http://www.pouet.net/prodlist.php?type[]=4k

The demoscene is basically the exact opposite of mainstream software culture, and although it's focused on multimedia shows, I think some of their techniques and underlying motivation could be applied more generally...

I wonder if /bin/true and /bin/false in their most simplest forms even meet the minimum "requirement for creativity" to be copyrightable, and if the reason why it has bloated significantly is so that it could be.


Here's an awesome one I found http://koti.kapsi.fi/rimina/living_in_a_box_720p.html Sad buildings


The problem with code inflation is not the the space needed on disk or in memory. The real problem lies in the quality assurance area. With every byte a program grows (in source code), the complexity of the system also grows with it. It is not only inherent complexity, but also extrinsic complexity -- for example dependencies from dynamic libraries and also the dependencies from the operating system.

With the growing complexity, the systems become more and more fragile and difficult to maintain. You can see that, when software just fails on one computer and runs on an other computer with the same operating system, but some little, weird dependency (e.g. with the graphics card) just makes the program misbehave on that particular computer.

Some times I am just worried, that all the computer scientists of the world make the world of computers more and more complex and on one day, the software becomes unmanageable. Even today, as normal computer user, I some times get the impression, that computers take more time (for installing updates and updates of the updates, worrying about threats, getting the best virus scanner ...) as they save us.


Here's a link to the coreutils source for true if anyone's interested:

http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/true...

I think the argument can be made that command lines tools shouldn't provide versioning and help text at all - leave it up to the manual pages.


On the contrary. Given today's disk sizes, I think man pages should be stored inside executables, in a special section of the executable that doesn't get loaded when the executable is loaded. That makes it easier to keep executable and man page in sync, and would allow us to (eventually) get rid of the man page directories for non-programmers (there are some issues to solve here for zip/unzip, busybox and the like, but I think thise are surmountable. Also, users running with small disks could strip such sections from binaries)

I also think there should be another data section in each command-line program that contains an abstract layout of a dialog for entering arguments, as was customary in Macintosh Programmer's Workshop (http://en.m.wikipedia.org/wiki/Macintosh_Programmer%27s_Work...).

Shells could use that layout to make it easier for users to enter lesser used commands or command options. Some shells would use curses to layout these dialogs, others would choose to use a 'real' GUI. Alternatively, separate tools would be written, and shells would use a COMMANDO shell variable to pick the one the user prefers.


> I think the argument can be made that command lines tools shouldn't provide versioning and help text at all - leave it up to the manual pages.

I'd have a real hard time agreeing with that. But I could make an exception for /bin/true.


I think that the code to handle versioning and help text (except a brief usage message, which is often useful) shouldn't be included in each binary.

I agree that help text should be the purpose of the manpages.

A version number (or pointer to version string) could be one of the fields in the file header, to be read by a tool ("version"?) and displayed that way.

This way you still get versioning and documentation, without a massive duplication of functionality across every single binary.


Also worth looking at is false.c, which is considerably shorter: http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/fals...


That seems to be ~70 lines of code, not the 2k+ claimed by the article. Is he including statically linked libraries as well?

edit: Ah, I see he meant bytes, but mislabeled the graph.


So, I found this very interesting to read.

I created an empty file, a la the original true, named truth and placed it at the beginning of my $PATH.

    $ time truth
    
    real 0m0.002s
    user 0m0.000s
    sys  0m0.003s
I then realized that since true is a builtin, I'd need to call the true binary with its full path. I'd hate to give one an unfair advantage by having to search the path vs being called by name. So....

    $ time /usr/bin/true
    
    real 0m0.001s
    user 0m0.000s
    sys  0m0.000s

    
    $ time /usr/local/sbin/truth
    
    real 0m0.003s
    user 0m0.000s
    sys  0m0.000s

    
    $ time true # The builtin
    
    real 0m0.000s
    user 0m0.000s
    sys  0m0.000s
Clearly my 2KB true binary is superior to an empty file. The builtin (obviously) outperforms both, by at least one order of magnitude (estimated).

This post is not intended to be a serious performance comparison.


I just repeated your test, but got opposite results:

    $ time /usr/bin/true

    real 0m0.003s
    user 0m0.001s
    sys	 0m0.002s

    $ time /usr/bin/truth

    real 0m0.001s
    user 0m0.000s
    sys	 0m0.001s
I can't imagine for the life of me how an empty file could perform worse than an actual binary, but there's gotta be a reason.


If you're relying on an empty binary, or exit 0; instead of an actual binary you have to execute a shell process to evaluate it. The binary doesn't need to load a shell, it can just do a bit of setup, then make a single exit 0 system call.


Thank you. That makes sense. So basically the empty file is seen by the shell as a "script" that has to be evaluated, so a separate shell process is spun up to handle it? And the difference between my output and the parent's has to do with some difference in our environments?


this is what happens when we aren't allowed to use p values anymore :)

https://www.sciencebasedmedicine.org/psychology-journal-bans...

The default p is p = 0.5, so I have no trouble accepting both your conclusion and your parent's :)

seriously though, this is why we use significance testing in real fields, like computer science...


Exactly. I did the most naive thing possible and only once for each version of true/truth. I was just bored/curious.


A good and informative read. While I'm new to programming, I have began to judge my productivity in terms of lines of code I remove, not add to a project.


Awesome. You're following in some big footsteps: http://akkartik.name/post/28672493


If all programmers did what you are doing, we would not have the problems we do.

I hope someone gives you a medal and merit badge and a cookie.

[1] have begun


Agreed; I now always run a minifier before committing.


Not sure if sarcastic...

Reducing lines doesn't matter. Reducing logic is what matters.


The same is true for standards. There are often so many non-features that make the standard harder to implement yet bring no real advantage. With standards the problem is that you have to implement everything to be standard-compliant. And then you have a reluctance to remove the cruft that nobody uses in newer versions of the standard.

I'm currently working a bit with SVG paths. There are some features that aren't really used that much in the wild. For instance quadratic bezier curves, arcs, a shorthand syntax for successive horizontal/vertical lines in a subpath. Those things could be debated but they are okay.

Then we also have things that are really just unneeded. You can also write numbers in xxEyy or xx.yyEzz way. Scientific notation has limited uses, computer graphics is not one of them. You can use a comma in addition to optional whitespace, but only at specific locations and at most one. Also there's exactly one place in the grammar where the whitespace is not marked optional.


This article is the best response to the other article on the front page about how optimizations are "useless"


"OMG, /bin/sh increased in size by 191x from 1974 to 2015!" seems like the byte-counting equivalent of people who lose their minds about "YOUR FOOD HAS CHEMICALS IN IT WITH COMPLICATED NAMES!". Both sound impressive and might make you worry - and both omit important facts that would greatly reduce that effect.

For instance, 191x growth since 1974 seems steep until you realize that the corresponding storage has grown from, say, 10.5MB (capacity of an RL02 for the PDP11) to ~1TB; a scaling of 100000x over the same period. That's not even really comparing apples-to-apples; the RL02 was not the sort of thing you'd have on your desk.


If we'd gotten 191 times better at avoiding bugs and security issues (and those problems scaled linearly with code size), this would be to the point.


Counterpoint:

The original 486 had 8kb of L1 cache. My laptop now has (2x)32k.



ironically, this article looks like it grew organicaly from the 14 words

   "Software tends to grow over time, whether or not there’s a need for it"
that it highlights, until it contained an introduction, a First Law, a table of data abou the Unix "true" command, the same data as a figure, a logarithmic chart of bash, and a foray into "dark code" (which can be present at any size.)


I'm a big fan of Gerard Holzmann -- his work is well worth checking out.

http://en.wikipedia.org/wiki/Gerard_J._Holzmann


>> So, why does software grow? The answer seems to be: because it can.

Not really a meaningful explanation. Nor is the example.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: