Hacker News new | past | comments | ask | show | jobs | submit login
Show HN: A minimal C runtime for Linux i386 and x86_64 in 87 SLOC of C (github.com/lpsantil)
172 points by oso2k on Jan 30, 2015 | hide | past | favorite | 54 comments



For x86_64 Linux only, here's a single-file crt0 (with arbitrary syscalls working from C-land): https://gist.github.com/lunixbochs/462ee21c3353c56b910f

Build with `gcc -std=c99 -ffreestanding -nostdlib`. After -Os and strip, a.out is 1232 bytes on my system. I got it to 640 bytes with `strip -R .eh_frame -R .eh_frame_hdr -R .comment a.out`.

Starting at ~640 bytes, maybe you could come close to asmutils' httpd in binary size. Failing that, take a look at [1]

You can get pretty far without a real libc, keeping in mind:

- You probably want fprintf, and things can be slower without buffered IO (due to syscall overhead)

- `mmap/munmap` is okay as a stand-in allocator, though it has more overhead for many small allocations.

- You don't get libm for math helper functions.

Of course, you can cherry-pick from diet or musl libc if you need individual functions.

[1] "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux" http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...


Thanks for this Ryan. I'll reference this gist too. I remember starting with something like this early on (w/o your sysdef) but wanted argc, argv, envp and to separate syscall* from the startup code.

Part of the reason I've started this is that libc suffers from bad design (in von Leitner's definition [0]). I find c0's & djb's APIs better designed [1][2]. *printf is a prime example of bad design. Heck, pretty much everything in stdio.h & string.h has a design flaw)

[0] http://www.fefe.de/dietlibc/diet.pdf

[1] http://c0.typesafety.net/tutorial/Strings.html

[2] http://www.fefe.de/djb/


And the good thing for the statically linked executables is that the binary can work on every Linux as the syscall interfaces remain the same. The critical point is not depending on something else platform specific other than the syscalls, but for some kind of utilities its doable.


Pros:

- Portable (as long as you don't need dlopen/dlsym). If you do need to dlopen other libraries, you can run into issues on ARM mobile with mixed soft/hardfp calling conventions. Otherwise you're fine if the ABI is solid.

- No need to atomically update your system, as applications stand alone.

- Slightly faster program startup time.

Cons:

- Slightly larger file size (assuming you linked glibc... doesn't apply so much to things like musl or lib43).

- You can't link in OpenGL client libraries, so games will probably still need the dynamic linker.


On x86 and x86_64 you should be fine to use OpenGL via dlopen/dlsym, right?


I like this a lot more than I should. :)

Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces, and it makes me sad that I'm pulling in some 3MB of library code (libc + libm + pthread; not even counting the dynamically-loaded stuff like nsswitch) that I mostly don't want.

Sadly as a C++ programmer I do at least need libgcc to implement exceptions, which in turn likely pulls in glibc anyway. Sigh. (And I haven't completely cut ties with libstdc++ yet, though I'm close...)

(And yeah, on typical system these libraries are already resident anyway since other apps are using them, so wanting to avoid them is mostly silly, but it feels nice!)


> And yeah, on typical system these libraries are already resident anyway since other apps are using them, so wanting to avoid them is mostly silly, but it feels nice!

In principle, shouldn't avoiding those libraries make code friendlier to the instruction cache, even if the libraries are already in memory? (yes, I'm also trying to justify the inherent niftyness of this...)


The major niftyness of the static linking (it doesn't have to be minimally minimal) on Linux is that it's the only way I know of (1) to have a nice portable executable for every x32 Linux or for every x64 Linux, no matter who made the distro and what he put inside of it.

At the moment, Windows has an advantage: even if some fractions in Microsoft don't like to support that (2), Windows has a single C crt dynamic library which is compatible through the different versions of Windows, which can give you an extremely small portable binary which also links the C crt library (the starting size of your such binary is typically around 3 KB on the Win64, if I remember).

----

(1) If somebody knows anything else please say!

(2) Modern versions of Visual Studio are intentionally made to not link to that C crt lib. You have to jump through the hoops.


I've crashed into an issue where "crtdll.dll" did not manage to get installed on a fresh XP, so software relying on its presence died mysteriously.

First level support for that software didn't understand what I was going on about.


> Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces

I'd rather not re-write fprintf, myself, but I suppose that's up to you. ;)

There are some good functions implemented in the kernel instead of libc, but I personally think fork() and pthread_create() are more intuitive than learning what exactly Linux expects from a clone() system call, even assuming it's fully-documented. (Which, to be fair, it likely is. This isn't Windows; the Linux kernel API is stable and meant for public consumption.)

My main point is that a kernel (as opposed to a VM) is an abstraction anyway, so I might as well pick a convenient abstraction, regardless of where the code to implement that abstraction happens to live.


If you use -ffreestanding -nostdlib, libgcc doesn't pull in libc. You'll need to write the syscall(2) function yourself.


Somehow I've always found system calls far more pleasant to use than "section 3" C library interfaces

I agree, and I think it has to do with the fact that parameters are always passed in registers, and error returns are quite straightforward (negative of the error number) as opposed to the C interface convention of returning only -1 and putting the error number in errno.


>>I'm pulling in some 3MB of library code that I mostly don't want.

Doesn't LTO fix that?


If you can get it to work... We've had lots of problems with LTO and arm-none-eabi-gcc.


00_start.c is too hacked on x86_64. it'll work but you're getting a less efficient binary since gcc has to assume _start is called like a normal C function (e.g. it creates a preamble). you should just implement it in assembly.

__init() itself also needs some work. the argument list is weird, linux pushes all of argc, argv, and environ on the stack. why special case argc? also your method of deriving argv and environ from the function argument's address is extremely brittle, and i don't think it actually works on x86_64 (if it does, that's really lucky). you aren't calculating envp using argc, so it's probably wrong. you could get more efficient code from using __attribute__((noreturn)). this would be better:

    /* called from _start */
    void __init(void *initial_stack) __attribute__((noreturn));    
    void __init(void *initial_stack) {
        int argc = *(int *) initial_stack;
        char **argv = ((char **) initial_stack) + 1;
        /* assert(!argv[argc]); */
        char **envp = __environ = argv + argc + 1;

        _exit(main(argc, argv, envp));
    }


What do you mean 00_start.c is too hacked on x86_64? As for efficiency of _start, `objdump -d lib/00_start.o` yields

``` 0000000000000000 <_start>: 0: 48 89 e5 mov %rsp,%rbp 3: 48 8b 3c 24 mov (%rsp),%rdi 7: 48 8b 74 24 08 mov 0x8(%rsp),%rsi c: 48 83 e4 f0 and $0xfffffffffffffff0,%rsp 10: e8 00 00 00 00 callq 15 <_start+0x15> 15: c3 retq ```

I only see one byte of inefficiency, the `retq`. Otherwise, it's exactly as I've specified it.

What I found is (by using gdb) is that the stack contains, in order , `argc` `[RSP+0]`, `argv` `[RSP+8]`, & `envp` `[RSP+16]`. I verified this using 'frame' in gdb using the source (RSP) and dest addresses. Honestly, I was surprised since it matched exactly what was presented to the ELF image on i386.

Most of the libc's I've surveyed did something like you've specified for __init. However, gcc generated different code for -O3 & -Os, often breaking one or the other optimization args, by modifying what was stored/pointed to for envp and/or *argv. While argc, argv, envp, and envpc are soecified tge


good point, the gcc optimizer is smart enough to omit the preamble. it's still hacked though. inline assembly is one thing, messing with compiler-owned registers is another especially without a clobber list. btw why do you prefix your x84_64 start code with "mov %rsp,%rbp"? "xor %ebp, %ebp" is more idiomatic and efficient.

according to the abi (http://www.x86-64.org/documentation/abi.pdf), the stack frame is set up like this:

    argc = [RSP+0]
    argv = RSP+8
    envp = RSP+8+8*argc+8
the same for x86 but replace 8 with 4. your code mirrors this but retrieves argv by taking the address of the second function argument (because you pass [RSP+8] to __init(), which is actually argv[0]). C provides no guarantees on the stability of addresses of passed argument values between caller and callee, so this makes your code subject to non-standard behavior. i can see this working for x86 since values as passed through the stack but not for x86_64 where values are passed through registers.

if you're seeing bugs between gcc -O3 and -Os it's likely due to that, or it's due to improper use of inline assembly/clobbering registers.


Point taken. I'll fix up the clobbers and noreturns.

And you're right about me having issues with calculating envp. There's multiple examples of envp=argv+argc+1. But when I examined the stack frames, that lead to miss calculations and with -Os envp was getting clobbered. Try running t/test.exe, what I have now works.


it only works because you're skipping over environment variables:

    char **argv = &stack;
    char **envp = __environ = argv + ( 1 * sizeof( void* ) );
let's say argv = 0x8, then according to this code, on x86_64, envp = 0x48 (0x8 + sizeof(void * ) * sizeof(char *)). here's a sample stack where argc = 1:

    stack:
    [0x0]  = 1 (argc)
    [0x8]  = "program path"
    [0x10] = 0
    [0x18] = "FOO1=FOO1"
    [0x20] = "FOO2=FOO2"
    [0x28] = "FOO3=FOO3"
    [0x30] = "FOO4=FOO4"
    [0x38] = "FOO5=FOO5"
    [0x40] = "FOO6=FOO6"
    [0x48] = "FOO7=FOO7"
    ...
since you set envp to 0x48 it now points to FOO7=FOO7, you've inadvertently skipped FOO1-FOO6. if argc = 2, then envp would point to FOO6 and you skipped FOO1-FOO5.

try this with your code, pass 8 arguments to your test. the environment will point to the last element in argv and then terminate, completely missing the actual environment. again, that's only the behavior on x86, on x86_64, passing any argument will cause a segfault.


I think I've fixed it [0]. Ran into the issue where gcc was kindly re-aligning the stack to 16-bytes on i386 for no good reason (stack was already aligned?!?!). But that's fixed. Just hope gcc on i386 doesn't stop doing `sub ESP, 0x1C` upon entry to _start.

[0] https://github.com/lpsantil/rt0/blob/master/lib/00_start.c


Nice! As part of some work that I was doing ages ago, I had to build myself a custom libc to statically link executables that would run on Android and WebOS since they are both essentially ARM Linux under the hood.

You can learn a lot by writing yourself a libc. Even building a simple/stupid malloc from scratch is a learning exercise.


I was shocked how complicated the glibc malloc is. Here's the source for anyone interested: http://repo.or.cz/w/glibc.git/blob/HEAD:/malloc/malloc.c


It's relatively easy to make an allocator, it is really hard to make an efficient one that works well across a large variety of use cases. That's also why it can pay off big time if you know more about your use case than malloc does (and you usually do) to roll your own allocator.


Memory management is complicated (at least, doing it well is). It's a good thing to discover as well, because it should lead to an appreciation that malloc() isn't cheap, and there can be an awful lot of work behind every single call.

Besides, that code seems pretty well commented.


It's complicated, but there's quite a bit of wisdom embodied in it. It's well worth reading the code and comments carefully, if you're at all interested in memory management.


malloc is a generic allocator, it needs to worry about a lot of things that are not common. A simple allocator will work most of the time, but the devil is in the details.


Have a look at the libc we wrote for Atari ST TOS/MiNT. Nice and small, but still mostly POSIX compliant. Nearly 30 years ago now.


That is very cool. That source public anywhere?


Not right now. I just looked at it again, but it's just way too ugly to open-source without some serious work. It was cobbled together from bionic's headers with a re-implementation of enough of the standard library to get TomCrypt and file I/O calls up and running.

Happy to share it privately (email in profile).


My eventual goal is to rewrite asmutils'[0] httpd [1] in C using librt0 and get a binary about 2-3x in size (2-3K). Malloc unnecessary.

[0] http://asm.sourceforge.net/asmutils.html

[1] https://github.com/leto/asmutils/blob/master/src/httpd.asm


Nice. I went down this rabbit hole a while ago. Didn't get very far though. Have fun!

(my crappy code - https://bitbucket.org/gcmurphy/libc/)


Somewhat related: a minimal OS that just prints "Hello world" to the screen https://github.com/olalonde/minios (interesting code is in kmain.c and loader.s). Wrote it while going through http://littleosbook.github.io/ (which is great by the way if you are interested in learning a bit about OS development).


I am not familiar with embedded asm. Can someone explan what the following line does?

"register long r10 __asm__( "r10" ) = a3"


It declares a variable named r10 and instructs the compiler to store it in the r10 CPU register. It's a GCC extension; the farthest you can get in standards-compliant C is

    register long r10 = a3;
but the register keyword is advisory only (the compiler is free to ignore it) and you cannot specify the exact register you want to be used.

Reference: https://gcc.gnu.org/onlinedocs/gcc/Local-Reg-Vars.html


What does "SLOC" stand for in this context?


Source lines of code


Why not just "LOC"---Lines of Code? To exclude Makefiles and the like?


Definitions aren't all that solid, but it's nice to exclude whitespace and comment lines sometimes.


Yes and it also usually excludes things like macros & define'd constants.


Why just not do

gcc -static -Os -fdata-sections -ffunction-sections -Wl, --gc-sections ...

or similar, in the compiler of choice?


I'll admit my ignorance down at this level. Can someone explain what does and how it can be used?


libc is C's "standard" library. It has a lot of stuff in it that some programs, especially small single purpose ones, don't need. So when a very simple program is linked into an executable, it has a bunch of extra stuff brought along for the ride. If you write helloworld.c (the canonical 4 line program in C) and link it on Ubuntu 14.04LTS it is 6240 bytes stripped, 8511 bytes unstripped. With this version of libc you can make it 1/5th that size.

Not a huge deal on large systems with giant disks and memory but a lot of ARM linux users are rediscovering the joys of small binaries, especially on limited space eMMC storage for the kernel and all the programs and libraries.


I'd be careful calling what I've done a libc. It's really just a little bit of startup code, and syscall for i386 & x86_64.


It's pretty awesome regardless, nice job.


Thank you.


Ancient history question: What was the main problem the creators of the ELF were trying to address at the time the ELF was adopted?


There's a brief explanation here:

http://en.wikipedia.org/wiki/COFF#History

(ELF replaced COFF)


Shared libraries?

Forwarding to the present day, with ample inexpensive memory and secondary storage space available, what if for some strange reason I do not use shared libraries in my project? Is there still a need for ELF? Why or why not?


I'm not following your intent. ELF is the current modern standard file format for specifying executables. It's just how you store executable machine code in a way that Linux/Unix systems know how to load the program into RAM and begining execution.

As such, generally speaking, yes, ELF is needed for all machine code executables regardless of minor details like shared libraries.

Before ELF, COFF was used. Before COFF, Unix used multiple generations of so-called "a.out" format.

Except as an exercise in retro-computing, one can no longer use those older file formats because modern tools no longer support them.

If you are unhappy with the complexity of generating ELF, you might look into whatever Fabrice Bellard did with his remarkably tiny "tcc" C compiler.

Incidentally, in recent years more and more people are starting to ask "do we really need shared libraries in today's era of huge amounts of RAM?" -- whatever the answer, it's a reasonable question.


Your last line more or less captures my "intent".

The idea is that if someone can reconsider the need for shared libraries, then could she also reconsider the need for ELF?

But maybe there are other pertinent reasons that ELF exists, besides shared libraries?


I already answered that very directly:

> ELF is the current modern standard file format for specifying executables. It's just how you store executable machine code in a way that Linux/Unix systems know how to load the program into RAM and begining execution.

There is no other way to implement executables. And if there were, it would look just like ELF, from twenty thousand feet.

ELF has nothing in particular to do with shared libraries, and shared libraries still exist if ELF does not, they just have some implementation difficulties in some cases.

It's like saying, what if you didn't need any networking code, would you still need ELF? Yes you would. What if you didn't need GUI code? Yes, you still need executables.

You seem to have gotten a misconception about ELF really fixed in your mind. It is the one and only way to store executables. Without it, you have no tools, no commands, no apps, no way to run any software, no nothing.

Shared libraries are a small side feature and don't have very much to do with ELF, despite the fact that ELF made it easier to implement them.

Also, I mentioned that people have questioned the need for shared libraries, but I fail to see any motivation whatsoever for why anyone might want to get rid of ELF.


    What is ELF? (top)

   ELF is a binary format designed to support dynamic objects
 and shared libraries. On older COFF and ECOFF systems,
 dynamic support and shared libraries were not naturally
 supported by the underlying format, leading to dynamic
 implementations that were at times complex, quirky, and
 slow.
source: netbsd.org

However you said "ELF has nothing in particular to do with shared libraries..."

Perhaps the concept of a binary format has nothing in particular to do with support for shared libaries. Perhaps that is a design choice. And perhaps ELF is merely one approach to the design of a binary format.

If true, then this choice makes a lot of sense in a world where there is a compelling need to use shared libraries, such as the computing world 20 years ago. But I am not so sure the same holds true in a world where the need for shared libraries is "questionable".

You said "Shared libraries are a small side feature..."

If that is true (cites to articles appreciated), then I would be curious what are ELF's "main" features, and how these differ from the formats that preceded ELF (e.g., a.out). Also I would be curious how frequently ELF's "minor" features have historically been used in practice, as opposed to just adding complexity and taking up space.


For the uninitiated what does this program do and what is its importance?


Attempt at an explanation:

To get a C-program to run, even an empty "int main(){ return 0;}" needs some supplemental code to set things up the way your main() function expects.

This supplemental code is what the github repository provides. You can then write very small, statically linked, programs without pulling in most of the C-library and other conveniences, and you will only be able to use the "raw" interfaces your kernel provides.

E.g. you'll have first to ponder what the "write()" system-call actually is, then use write(1,"Hello\n",6); instead of the wrappers and convenience of the C-library such as printf() which of course additionally gives you formatting, buffered I/O, ...).




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: