Those who don't see anything unusual or wasteful about an 8KB+ executable whose only action is to exit and return a value are recommended to watch this demo generated in realtime by a 4KB executable:
The demoscene is basically the exact opposite of mainstream software culture, and although it's focused on multimedia shows, I think some of their techniques and underlying motivation could be applied more generally...
I wonder if /bin/true and /bin/false in their most simplest forms even meet the minimum "requirement for creativity" to be copyrightable, and if the reason why it has bloated significantly is so that it could be.
The problem with code inflation is not the the space needed on disk or in memory. The real problem lies in the quality assurance area. With every byte a program grows (in source code), the complexity of the system also grows with it. It is not only inherent complexity, but also extrinsic complexity -- for example dependencies from dynamic libraries and also the dependencies from the operating system.
With the growing complexity, the systems become more and more fragile and difficult to maintain. You can see that, when software just fails on one computer and runs on an other computer with the same operating system, but some little, weird dependency (e.g. with the graphics card) just makes the program misbehave on that particular computer.
Some times I am just worried, that all the computer scientists of the world make the world of computers more and more complex and on one day, the software becomes unmanageable. Even today, as normal computer user, I some times get the impression, that computers take more time (for installing updates and updates of the updates, worrying about threats, getting the best virus scanner ...) as they save us.
On the contrary. Given today's disk sizes, I think man pages should be stored inside executables, in a special section of the executable that doesn't get loaded when the executable is loaded. That makes it easier to keep executable and man page in sync, and would allow us to (eventually) get rid of the man page directories for non-programmers (there are some issues to solve here for zip/unzip, busybox and the like, but I think thise are surmountable. Also, users running with small disks could strip such sections from binaries)
I also think there should be another data section in each command-line program that contains an abstract layout of a dialog for entering arguments, as was customary in Macintosh Programmer's Workshop (http://en.m.wikipedia.org/wiki/Macintosh_Programmer%27s_Work...).
Shells could use that layout to make it easier for users to enter lesser used commands or command options. Some shells would use curses to layout these dialogs, others would choose to use a 'real' GUI. Alternatively, separate tools would be written, and shells would use a COMMANDO shell variable to pick the one the user prefers.
I think that the code to handle versioning and help text (except a brief usage message, which is often useful) shouldn't be included in each binary.
I agree that help text should be the purpose of the manpages.
A version number (or pointer to version string) could be one of the fields in the file header, to be read by a tool ("version"?) and displayed that way.
This way you still get versioning and documentation, without a massive duplication of functionality across every single binary.
I created an empty file, a la the original true, named truth and placed it at the beginning of my $PATH.
$ time truth
real 0m0.002s
user 0m0.000s
sys 0m0.003s
I then realized that since true is a builtin, I'd need to call the true binary with its full path. I'd hate to give one an unfair advantage by having to search the path vs being called by name. So....
$ time /usr/bin/true
real 0m0.001s
user 0m0.000s
sys 0m0.000s
$ time /usr/local/sbin/truth
real 0m0.003s
user 0m0.000s
sys 0m0.000s
$ time true # The builtin
real 0m0.000s
user 0m0.000s
sys 0m0.000s
Clearly my 2KB true binary is superior to an empty file. The builtin (obviously) outperforms both, by at least one order of magnitude (estimated).
This post is not intended to be a serious performance comparison.
If you're relying on an empty binary, or exit 0; instead of an actual binary you have to execute a shell process to evaluate it. The binary doesn't need to load a shell, it can just do a bit of setup, then make a single exit 0 system call.
Thank you. That makes sense. So basically the empty file is seen by the shell as a "script" that has to be evaluated, so a separate shell process is spun up to handle it? And the difference between my output and the parent's has to do with some difference in our environments?
A good and informative read. While I'm new to programming, I have began to judge my productivity in terms of lines of code I remove, not add to a project.
The same is true for standards. There are often so many non-features that make the standard harder to implement yet bring no real advantage. With standards the problem is that you have to implement everything to be standard-compliant.
And then you have a reluctance to remove the cruft that nobody uses in newer versions of the standard.
I'm currently working a bit with SVG paths. There are some features that aren't really used that much in the wild. For instance quadratic bezier curves, arcs, a shorthand syntax for successive horizontal/vertical lines in a subpath. Those things could be debated but they are okay.
Then we also have things that are really just unneeded. You can also write numbers in xxEyy or xx.yyEzz way. Scientific notation has limited uses, computer graphics is not one of them. You can use a comma in addition to optional whitespace, but only at specific locations and at most one. Also there's exactly one place in the grammar where the whitespace is not marked optional.
"OMG, /bin/sh increased in size by 191x from 1974 to 2015!" seems like the byte-counting equivalent of people who lose their minds about "YOUR FOOD HAS CHEMICALS IN IT WITH COMPLICATED NAMES!". Both sound impressive and might make you worry - and both omit important facts that would greatly reduce that effect.
For instance, 191x growth since 1974 seems steep until you realize that the corresponding storage has grown from, say, 10.5MB (capacity of an RL02 for the PDP11) to ~1TB; a scaling of 100000x over the same period. That's not even really comparing apples-to-apples; the RL02 was not the sort of thing you'd have on your desk.
ironically, this article looks like it grew organicaly from the 14 words
"Software tends to grow over time, whether or not there’s a need for it"
that it highlights, until it contained an introduction, a First Law, a table of data abou the Unix "true" command, the same data as a figure, a logarithmic chart of bash, and a foray into "dark code" (which can be present at any size.)
http://www.youtube.com/watch?v=jB0vBmiTr6o
There's more here: http://www.pouet.net/prodlist.php?type[]=4k
The demoscene is basically the exact opposite of mainstream software culture, and although it's focused on multimedia shows, I think some of their techniques and underlying motivation could be applied more generally...
I wonder if /bin/true and /bin/false in their most simplest forms even meet the minimum "requirement for creativity" to be copyrightable, and if the reason why it has bloated significantly is so that it could be.