Hacker News new | past | comments | ask | show | jobs | submit login
What's in a Linux Executable? (fasterthanli.me)
359 points by todsacerdoti on Oct 29, 2020 | hide | past | favorite | 64 comments



I once wrote an ELF by hand for a CTF challenge. The challenge was to have an shared library such that when it is passed to LD_PRELOAD it spawns a shell by execve

LD_PRELOAD=<ELF> /bin/true

The constraint being the ELF needed to be less than 196 bytes so obviously it could not be created by gcc. In the end I could not believe it ran, considering the amount of hacks that I had to do to trim it to 193 bytes.

https://github.com/TeamGreyFang/CTF-Writeups/tree/master/Pla...


Here's my writeup/solution for the same challenge, represented as a diagram:

https://twitter.com/David3141593/status/1253122980525334529


Your pinned tweet is quite fascinating.


Cool diagram. Putting the shellcode in headers is very innovative. I didn't have too much assembly knowledge to trim it further during the competition.


Yep, it’s fun to see what things glibc will let you get away with ;)


> Clearly, something was different about these files. Seen from notepad, they were mostly gibberish, but there had to be order in that chaos. 12-year-old me knew that, although he didn't quite know how or where to dig to make sense of it all.

Glad I wasn't the only one! When I was a kid, I had tried several times to somehow create executables myself, without knowing how to program. Long after the failures I had suffered from, I discovered I could make an executable using so-called "programming". I still can't forget the time I created a Hello World executable for the first time. It was a gorgeous experience. Good times.


Same here - I remember when you'd get games as .COM file with a nag screen/message etc that you could just skip over by flipping a jump instruction...

I remember being super puzzled when I tried that with a .exe and it didn't work


There's a whole fun game to played yet today trying to defeat copy protection on old DOS games. Programmers of the time resorted to a lot of complicated tricks to keep what you're describing from working properly, but assembly of the era is pretty easy to follow and having a 30 year advantage in tooling helps a lot.

Hyperspeed was an interesting one. The crack that comes with the Steam version doesn't actually work since it triggers a lockup the first time you use the spindrive, so players just disable the crack and play the game normally with a PDF manual. I spent a long time trying to bypass that copy protection screen and all the checks to see if I'd bypassed it, but every time I thought I'd finally nailed it I ran into a new place the game would intentionally lock up. Ultimately I just truncated the word match table to a single entry so the word would always be the same. That was a single-byte change.


It was a whole new world to create a .com file with just debug.com and some assembly (most of the work is already in interrupt 21h). Good times.

I looked at DOS .exe (NE? or was that the format for Win 3.1...) format too. Still relatively simple. Then came the PE which was super complicated.


RE: debug.com

As a teenager who didn't know any better, it was empowering to gain that level of access to a computer, to realize a certain level of control over it that was previously abstracted away. It felt like some insider knowledge that I was becoming aware of.


At 14, I wrote a functional virus on paper, hand-calculated the jump offsets and entered the "code" in debug.com. The first version was 273 bytes long. It did not do much, just printing the BELL character, then looking for non-infected COM files in the current directory and infecting one of them. F-Prot identified it as a generic virus and killed it. That made me a little mad at that time, so added some polymorphism to it by decoding and encoding the virus code with a different key every time (using XOR). The code size has increased by about 60 bytes, and only around 10 bytes were unencrypted, which did the in-place decryption in memory. F-Prot identified it as a generic polymorph virus and killed it again... After that tried to make it resident, but intercepting the int 21h calls turned out to be a tough nut to crack, so tried to do something with EXE files instead. A basic variant was nearly ready, when I have fucked up my 40MB hard drive and lost everything... But that's an other story of my early self-education :)


Your experience already show the difference between my generation where doing anything required programming (8 bit home computers) and a couple of years later.

Somehow a lost experience in today's hardware.


Curious how the image below was created. Looks pretty neat.

https://fasterthanli.me/content/series/making-our-own-execut...


Just good ol' draw.io desktop app: https://github.com/jgraph/drawio-desktop

Comments below already figured out the rotate-hue trick :)

Styling inside the SVG is possible... if the SVG is inline, and not in an `<img>` tag. Also, it would be really hard to target which elements to style, as draw.io does not really give you classes for each color family.

Oh also, I have a whole .drawio => .pdf => .svg => optimized .svg pipeline so you can just save the .svg and not worry about having the fonts. (But also the svg no longer contains the text itself — it's a trade-off).


Thanks! Isn't it a lot of effort to draw by hand? Genuine question, as I'm interested to know if there's an easy to write text format that will get converted to such a neat looking image :)


It definitely takes up a good chunk of time, but it's worth it! It's not that bad once you learn the keyboard shortcuts.

I thought about going the "declarative / adhoc tool" way but I make lots of different diagram types, so I would probably spend forever bikeshedding it instead of writing the actual article.


What’s really neat is that when you switch to dark mode in the bottom left corner, the image flips to dark mode too!


That's a "feature" of SVGs. The image is an SVG and since they are transparent it goes dark when switching.


It’s not just that the image is transparent though, because the color of the text changes too. I thought perhaps it’s using the site CSS to style SVG elements (which appears to be possible, though I haven’t tried it), but it turns out it’s just a CSS filter `invert(83%) hue-rotate(180deg)`. Still, it looks great and I’m impressed by the attention to detail.


Ah, serves me right for not seeing it myself and assuming it was just the background. Thanks for the additional info.


Okay, this is driving me crazy. Where?


Bottom-left of the article linked in the OP, not the linked SVG image itself.


It's a very low-contrast sun in the bottom left corner.


Agreed, and it's also a vector image which can be zoomed back and forth without any loss of details. I'm not sure what software might have created it, but ploticus' multi column legends seem somewhat similar. http://ploticus.sourceforge.net/doc/welcome.html

Also worth checking is GLE. https://glx.sourceforge.io/index.html


One thing I was wondering recently is what are all the files in a tar.gz that I install. I’ve been messing with my remarkable tablet lately, and it’s a stripped down Linux environment. When I use wget to get packages, I get a usr folder with a bunch of stuff. I’m not really sure how to install thereafter? There’s no package manager. What’s the structure behind: usr/, bin/ etc/ lib/ and moving those to usr/local? Any resources?


Distributed binary packages (.deb, .rpm), are just a compressed file "like" a tar.gz with a little bit of metadata too.

> What’s the structure behind: usr/, bin/ etc/ lib/ and moving those to usr/local

A brief resume of the filesystem structure purposes is documented in "man hier" in any non stripped down linux (or online).

When you compile source code to binary format, there is a default PREFIX (i.e. /usr). So libraries will be installed to /usr/lib and binaries to /usr/bin...

Build systems, let you change such PREFIX, so you can build a program telling it to live under /usr/local, or any other root directory like /opt/myprog-version...

Then. if you build another program, that depends on the libraries of the previous program, you may need to tell where to locate such libraries if they are not in the default place, each program build flags may vary, but for example in ruby you can pass it to ./configure --with-openssl-dir=/opt/myopenssl-X.Y


Thanks for the tip about "man hier" :)


There's also file-hierarchy(7) which is more systemd-centric.


Sometimes the tar files are just archives that get extracted at /, which is a fairly primitive but functional way to distribute packages.

For a description of the directories, see: https://en.wikipedia.org/wiki/Filesystem_Hierarchy_Standard

Note that these directories do not actually have many hard and fast rules... while you can't change /proc, you can argue all you like over whether something goes in /bin or /sbin, or argue whether it should go in /usr. There are a ton of differences in the details.



This looks great!!


I missed that you were talking about non-x86, if you just go up a few levels you can see the docs for other platforms.


The tar, assuming it comes with a Makefile, should have a Make install command or equivalent. On Debian/Ubuntu based systems, the convention is to use checkinstall to keep track of installation. Another new trend in recent years is to install third party packages to /opt.


Ah I see. In my case I was installing precompiled binaries for an armv7 architecture.


For the most part, you can read the Filesystem Hierarchy Standard to get an idea of what all the different directories do: https://www.pathname.com/fhs/

Distros mostly keep to this, but they implement their own details and you should look into each distribution to know exactly what goes on in any particular one.


Yeah - I was kind of reverse engineering the one on the tablet which is a custom thing on ArmV7 so it was kind of like a guessing game


The Filesystem Hierarchy Standard describes a lot of how Linux filesystem is laid out. Wikipedia has an OK intro https://en.m.wikipedia.org/wiki/Filesystem_Hierarchy_Standar...

For user applications the XDG Base Directory specification is also useful https://specifications.freedesktop.org/basedir-spec/basedir-...


If you are wondering what all those files in /bin are, some of them exist, because they are part of the POSIX standard and have to be present in every POSIX compliant system. A good overview over these commands can be found here:

https://shellhaters.org/

The links lead directly into the POSIX documents, which can be quite handy when you write shell scripts, because there you can read which options should be the same on every POSIX compliant OS.


If you want to install such a package into its own prefix (not directly to /) the convention is typically to use /opt/$SOFTWARENAME. So one would have bin,lib etc under that directory. If the software is not designed to be relocatable like this, one may have to set up PATH, LD_LIBRARY_PATH, etc so that it can find its executables, libraries etc


In general you should do your absolute best to avoid downloading and running binaries.

It's insecure and it's very prone to error on Linux. There's almost always a better way (and anyone who says differently is probably trying to sell something to someone.)


For those interested, the current de facto ELF standard is http://www.sco.com/developers/gabi/latest/contents.html (yes, for historical reasons, that SCO) although there is a rumour it will move to a GitHub project at some unspecified time in the future.

There is, in fact, no de jure standard for the ELF binary format. It is, however, pretty much a universal standard format for executable, dynamic shared object, and partially-linked compiled file format used on every platform except Apple and Microsoft systems.


The most recent ones I've found were https://github.com/hjl-tools/x86-psABI/wiki/X86-psABI which have now been moved to GitLab and no longer have pre-built PDFs? It's a hard "standard" to track for sure.

In a pinch, the uClibc website will do: https://uclibc.org/docs/psABI-x86_64.pdf


There are pre-built PDFs on the GitLab repo, but they're pretty well buried. If you view a CI job you can download the generated artifacts.


Besides Aix mentioned on the sibling comment, mainframes and some non-POSIX embedded OSes.


(... and systems like AIX, which use XCOFF. A major pain point for anyone needing to write cross-platform tooling.)


I actually felt at home with XCOFF, a UNIX that used the same ideas as Symbian and Windows for dynamic link libraries, including import/export definition modules.


Since we're talking about ELF binaries, let me drop a link for my favorite size profiler (that is compatible with ELF and other formats):

Bloaty McBloatface.

https://github.com/google/bloaty


Following this blog post, the author teamed up with Ryan Levick and broadcasted a livestream about the subject, turning it into a two-part video series. Videos are avail on YouTube.

Part 1: https://www.youtube.com/watch?v=jR2hUhjcAXI

Part 2: https://www.youtube.com/watch?v=Mln3idSVsxg


For me it is worth it only for the many neat pointers on how to use `xxd` and for example `echo` to convert hexadecimal numbers to decimal. Great read!


If you want more file-format goodness, check out Ange Albertini's Corkami Project:

https://github.com/corkami/pics

Here are Windows and Linux executables:

https://raw.githubusercontent.com/corkami/pics/master/binary...

https://raw.githubusercontent.com/corkami/pics/master/binary...


I liked this blog's theme. What is it made of?


Here’s the author (fasterthanlime) writing about how they built their website by hand - https://fasterthanli.me/articles/a-new-website-for-2020

Like most things written by fasterthanlime, it will take more than an hour to read.


https://builtwith.com/detailed/fasterthanli.me This suggests hugo is being used


This used to be true — until June 2020, as that website noted.

It's all custom now, and closed source, since I want to focus on writing rather than maintaining an OSS project that fits everyone's needs


Why does linux only support ELF?

Other executable formats, for instance, support multiple architectures per file. Why does Linux not?


The Linux kernel actually does allow executing non-ELF executables, via the binfmt_misc (https://en.wikipedia.org/wiki/Binfmt_misc) mechanism.

My favourite use for this feature is via `qemu-user-static`, which allows you to transparently "run" executables compiled for other architectures thanks to QEMU emulation. Very handy for embedded systems development.


I remember some people using this to launch java applications.


For one thing, lack of demand. Linux distributions can relatively easily ship different packages for different architectures, and doing so is more space-efficient. It seems to me like shipping multiple binaries within a package is mostly useful for the distribution of proprietary software, and even in that case, it isn't that hard to ship two binaries and a shell script.


That's a function of your runtime dynamic linker, which is most likely provided by glibc (on linux). I believe it supports a.out still; at the very least, it used to.


Nope. It's a function of the kernel, which needs to know the binary format in order to load the executable. The dynamic linker is invoked at a later phase of execution, and in the case of the ELF file it does that by reading the dynamic linker's name from the PT_INTERP entry in the Dynamic table stored in the PT_DYNAMIC segment. By default, if the kernel doesn't recognize the ELF magic (first 4 bytes in the file are `\0x7fELF`), or can't load the dynamic linker specified in the PT_INTERP entry (eg. wrong architecture), it will launch /bin/sh as the interpreter and feed the loaded file to it assuming it's a text file containing a shell script.

If the kernel has the bin_fmt mod loaded, it can also recognize other binary formats before falling back to /bin/sh.


https://www.phoronix.com/scan.php?page=news_item&px=Linux-Dr... : >20 years after obsoleting it, it's being dropped. Longer than I would expect!


You can only ever run software for one architecture on a single machine, why pay for getting stuff you can't use?


It seems like you’re really asking two questions. My comment for the first one is that you can use binfmt_misc to add your own executable formats. For the second one, is there anything other than Mach-O that does that?


TMDR




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: