Cramming a tiny program into a tiny ELF file

LegionMammal978 · on Dec 10, 2023

I'm surprised to see this little article of mine on the front page! If anyone has any questions about it, feel free to ask.

Also, if you're interested in more ELF shenanegans, check out the rest of tmp.0ut #3 [0]; the other authors have been working on some really neat stuff!

[0] https://tmpout.sh/3/

o11c · on Dec 10, 2023

Did you consider doing multiple calls to `write` instead reassembling the string?

I wonder if that's enough bytes that implementing a tiny interpreter would work; certainly it would win for slightly larger sizes (say, 4KiB).

LegionMammal978 · on Dec 10, 2023

I did consider that (as well as using writev), but the problem is that every syscall overwrites the number in eax with the return value, so we'd need more logic to reset it to __NR_write twice before setting it to __NR_exit_group. Also, the two parts have two different lengths that we'd have to store in rdx. (Unless we split it into 7+7 bytes, I suppose, but that would waste a perfectly good byte at offset 0xb.) Ultimately, more than 1-2 small instructions of extra logic simply aren't worth it at this scale, unless we can balance it out with fewer bytes of control flow.

Meanwhile, you're right that an ad-hoc interpreter will always be the answer for larger sizes. (It's very common in regular sizecoding to use a compressed executable with some code to unpack itself.) But since there simply isn't enough space here for complex logic, our only remaining avenue is to make the bytes work double and triple duty, whether through "Ouroboros Programming" or through regular subroutines. And of course, these kinds of techniques remain important when golfing the interpreter/decompressor itself, which has to be rooted in bare machine code.

tomcam · on Dec 10, 2023

Great article. What is the meaning of code like this? The jmp instruction seems necessary.

    jmp 0x14
    0x14:

LegionMammal978 · on Dec 10, 2023

In the instruction listings, I've omitted all the data between the basic blocks, and reordered/repeated the blocks to emphasize the control flow. In that particular case, there are actually four bytes, 03 00 3e 00, between the jump and the label. In fact, it's not actually necessary to use a 2-byte 'jmp' to skip past that data, but golfing it further doesn't matter, since it's already at the minimum size for the 80-byte pattern with an undivided string.

upon_drumhead · on Dec 10, 2023

Amazing work and a very entertaining read. Thank you for sharing!

Two9A · on Dec 10, 2023

These kinds of articles are always fun: essentially playing golf with the size of an executable or some other file format.

To plug my own entry in the genre, back in 2006 a few members of the Nintendo DS emulation scene tried to put together the smallest DS ROM that would still do something visible on the system: https://imrannazar.com/hacks/the-smallest-nds-rom

We ended up with a 352-byte file with two minimal executables, for the ARM7 and ARM9 cores. I've yet to see a .nds file smaller than that.

tetris11 · on Dec 10, 2023

Check out the 357-byte x86_64 ELF file to bootstrap the entire Guix software stack (albeit with a guile helper library)

https://guix.gnu.org/en/blog/2023/the-full-source-bootstrap-...

The annotated ELF file contains two types of comment annotations corresponding to the different software that parses it.

https://github.com/oriansj/bootstrap-seeds/blob/master/POSIX...

tromp · on Dec 10, 2023

> However, I think tasks like these provide partial answers to the eternal question of Kolmogorov complexity: "What sorts of programs can be expressed within an ELF file of a given length?"

Or the eternal busy beaver question: how big an output can be expressed with a program of a given number of bits?

We know the answer up to 36 bits (578960446186580977117854925043439539266349923328202820197287920039565648199686) [1] and that within at most 13 more bits, Graham's Number will be surpassed [2].

[1] https://oeis.org/A333479

[2] https://codegolf.stackexchange.com/questions/6430/shortest-t...

spockz · on Dec 10, 2023

Love the content but one snag about the site itself. On iOS by default there is a big black void to the right and the text is constrained to the left 2/3 of the screen. That would be okay but somehow the safari feature to double tap on a paragraph and get it fitted to the viewing page is broken as well.

isodev · on Dec 10, 2023

Indeed, the viewport meta tag isn’t set correctly as well. But that was secondary to the .sh TLD being blocked by my ISP by default :-)

082349872349872 · on Dec 10, 2023

For anyone missing COM files, it's not very difficult to slap a (nearly) constant ELF header on what is basically a glorified COM file to generate a valid executable.

(hmm... I haven't done this since widespread W^X; it might be a little more difficult these days but if so anyone who's had to write ROMable code will know the usual workarounds)

balou23 · on Dec 10, 2023

You can explicitly mark a memory area as write and execute in the elf header. Totally unsafe, but saves important bytes

082349872349872 · on Dec 11, 2023

That's what I had been doing, but I had been guessing that "mandatory W^X" meant a loader should ignore it even if requested?

(for the historians, we're basically discussing the difference between OMAGIC [0407] and NMAGIC [0410] a.out formats, both of which are much simpler than ELF)

PS. salut, balou !

gwern · on Dec 10, 2023

I wonder if you could use AFL to try to fuzz your way to a shorter executable? There may be undocumented glitches or assumptions in how exactly the loader works which would shave some bytes off (and given how fundamental this is, any hits might also be major security vulnerabilities).

LegionMammal978 · on Dec 10, 2023

Well, I've already gone through every undocumented assumption possible in the kernel's loader. It's simple enough that there really are some hard, clear requirements you can establish for what it will refuse to load (or at least, usefully load; there are several combinations that will make the loader succeed but not actually map any executable memory, or that map memory but fail to place the entry point into it, resulting in an instant segfault). Showing that these requirements rule out anything smaller than offset 0xc for the program header is the subject of my unpublished longer article. As for the program itself, 36+ free bytes is quite a lot of space to search exhaustively, and I haven't been able to find any x86 emulator that implements every actual behavior, or allows very rapidly resetting the state.

Fuzzing might be more useful for something like the dynamic loader, which has far more features than the kernel loader, and is far more eager to read user-supplied data.

EGreg · on Dec 10, 2023

So… like an ELF on the shelf? :)