The Shortest Crashing C Program

AlexanderDhoore · on May 24, 2013

This reminds me of "A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux" [1]. The author tries to create the smallest possible elf executable possible. You would think it'd be easy... :) go read it. Very cool!

[1] http://www.muppetlabs.com/~breadbox/software/tiny/teensy.htm...

sublimit · on May 24, 2013

The way he just keeps pushing further and further pleases the hacker inside me. A recommended read for sure.

femto · on May 24, 2013

It depends on the definition. You can do better than this if you define a valid C program as anything that passes though the C compiler and generates an executable. Behold the zero length program:

$ touch a.c

$ gcc -c a.c

$ ld a.o

ld: warning: cannot find entry symbol _start; defaulting to 0000000000400078

$ ./a.out

Segmentation fault

lgeek · on May 24, 2013

You can build it with a single command:

gcc -nostdlib ./empty.c -o ./empty

Edit: This one actually runs correctly:

    $ touch empty.c
    $ gcc -static -nostartfiles ./empty.c -e_exit -o ./empty
    $ ./empty && echo $?
    > 0

thristian · on May 24, 2013

That's a shame. At least in one point in history, that was the shortest-known quine:

http://www.ioccc.org/years.html#1994_smr

femto · on May 24, 2013

It still is, provided you follow the build procedure prescribed by the author of that quine (check the Makefile from the contest):

$ rm -rf a

$ cp a.c a

$ chmod +x a

$ ./a

$

mikeash · on May 24, 2013

Even with the regular procedure, it's a quine if you ignore stderr and only look at stdout, which should definitely count.

emillon · on May 24, 2013

I find it interesting that displays the error from exec, but not bash:

    zsh: exec format error: ./a

unwind · on May 24, 2013

Why?

The file is marked as executable, so the shell very reasonably tries to execute it by calling some well-chosen member of the exec() family (http://linux.die.net/man/3/exec).

The exec() function then needs to open and parse the file according to the formats it supports, which of course fails since the file is empty.

Do you simply mean that you expected the shell to validate this, and not try to execute empty files?

kps · on May 24, 2013

Traditionally, if the kernel cannot execute the file, then it is treated as a shell (/bin/sh) script. (Somewhere along the line, #! got added to specify an interpreter other than the shell.) I read POSIX as requiring this <http://pubs.opengroup.org/onlinepubs/009695399/utilities/xcu..., so if zsh claims to be a POSIX compatible shell, that's probably a bug.

In Seventh Edition UNIX, /bin/true is an empty file; it is a shell script that succeeds at does nothing.

Some later commercial UNIXes are noted to have /bin/true contain nothing but comments containing a copyright notice for that nothing.

nknighthb · on May 25, 2013

The particular version of POSIX you linked to (2004) actually forbids the behavior you describe if you read it strictly. [1] defines a text file as "A file that contains characters organized into one or more lines.".

This was altered for 2008[2] to "A file that contains characters organized into zero or more lines."

The 2008 version is actually broken, since it contradicts itself -- a file cannot "contain characters" on zero lines.

[1] http://pubs.opengroup.org/onlinepubs/009695399/basedefs/xbd_...

[2] http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/V1_...

logilogi · on May 25, 2013

> a file cannot "contain characters" on zero lines.

I disagree. To me this doesn't mean that a file "contains at least one character", but that files are containers and their contained values are characters. Like most containers in computer science, the set of contained values can be empty, but it's still meaningful to say that it's a container that "contains characters".

emillon · on May 24, 2013

> Do you simply mean that you expected the shell to validate this, and not try to execute empty files?

I understand what happens here and why there is an error message in zsh, but I'm surprised by the fact that bash does not signal the error (exec returns -1, after all).

Bash includes logic to parse ELF[1], so I guess that after exec fails it tries to parse the file and has a special case for empty files.

    [1]: http://utcc.utoronto.ca/~cks/space/blog/unix/BashSuperintelligentExec

yuvadam · on May 24, 2013

Does not compute. At least on OS X:

   $ ld a.o
   ld: warning: -macosx_version_min not specified, assuming 10.7
   Undefined symbols for architecture x86_64:
    "start", referenced from:
       implicit entry/start for main executable
   ld: symbol(s) not found for inferred architecture x86_64

lgeek · on May 24, 2013

Does it work if you specify an entry address? Something like this:

gcc -nostdlib ./empty.c -e0 -o ./empty

nwn · on June 3, 2013

How about:

    $ ld a.o -U _main -U start

jejones3141 · on May 25, 2013

Since ANSI/ISO C, a "translation unit" (whatever is left of a file after preprocessing) has to have at least one declaration; a zero length source file won't cut it.

llbit · on May 24, 2013

Originally I thought I'd skip mentioning compiling empty files because doing so without linking separately `gcc` will refuse to link it. I updated the article with a reference to your comment.

lgeek · on May 24, 2013

Actually you don't have to link it separately if you don't link against stdlib. See my comment here: https://news.ycombinator.com/item?id=5762578

llbit · on May 24, 2013

Cool!

to3m · on May 24, 2013

The explanation is not quite correct - execution starts at &main rather than the address given by the value of main. On VC++, at least - well, on my PC anyway - the process halts because the data segment doesn't have the execute bit set. It isn't trying to run code at address 0.

(If execution of bytes in the data segment were possible, which I'm sure it used to be, then you'd still likely get a crash, but it's not guaranteed. (uint32_t)0 is a valid sequence of instructions - it's ADD BYTE PTR [EAX],AL - and so if EAX contained a valid value then it would execute without a problem. Then, if the following byte were 0xC3 (RET) then the program would execute. OK, so that's all rather unlikely, but you have to bear these things in mind. So I think 0xCC (INT 3) would be a better choice.)

lgeek · on May 24, 2013

You're right about this. On my GNU/Linux machine, main is in the data segment. The process receives a segfault for trying to execute from a page marked as not executable.

    (gdb) print &main
    $1 = (int *) 0x600864 <main>
    (gdb) run
    Starting program: /tmp/s 
    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000600864 in main ()

If we are to mark the data segment as executable (quite easy for ELF), we can see execution starting at &main and continuing until end of the page and then segfaulting for trying to execute from an unmapped virtual address.

    (gdb) print &main
    $1 = (int *) 0x600864 <main>
    (gdb) run
    Starting program: /tmp/s 
    Program received signal SIGSEGV, Segmentation fault.
    0x0000000000601000 in ?? ()

If we change the source to main=0xC3; as per your suggestion and we mark the data segment as executable, the program exits correctly (but with an exit status we don't control).

    (gdb) x/i &main
    0x600860 <main>:	retq   
    (gdb) run
    Starting program: /tmp/s_ret 
    [Inferior 1 (process 10588) exited with code 0140]

anarion · on May 24, 2013

No, a ret instruction would probably segfault, depending on the content of the stack. To terminate a program you have to use the corresponding system call. On linux :

mov $1, %eax

int $0x80

to3m · on May 24, 2013

That (or something like it) is true for the process as a whole, but not necessarily for main. It's usual to call main from a library-provided function, so it returns just like any other function. This removes the need to special-case main in any way, and provides a space for any system-specific startup and shutdown code.

If you've got VS2012, you can see this code in the file at something like "c:\Program Files (x86)\Microsoft Visual Studio 11.0\VC\crt\src\crt0.c" (it should be easy to find for other versions - it's been in pretty much in that place, with that name, probably with those contents, since VC5 I think).

For glibc, see http://sourceware.org/git/?p=glibc.git;a=blob;f=csu/libc-sta....

My post was a bit x86/VC++-specific but the principles have been common to all the C environments I've used. I don't think I've ever used one that by default called your startup function directly, bypassing C runtime initialisation. (Though it's very easy to set this up with Visual Studio.)

lgeek · on May 24, 2013

to3m is correct. On my GNU/Linux machine main() is called from __libc_start_main(). A ret instruction in main() returns to __libc_start_main(), which in turn calls exit().

dysoco · on May 24, 2013

Seems to work really well: It even crashed the website.

llbit · on May 24, 2013

My shitty server was never meant to handle HN traffic.

kd0amg · on May 24, 2013

But now we can't marvel at how short it is.

tunnuz · on May 24, 2013

I was about to say the same thing :D

Jabbles · on May 24, 2013

Who says it will crash? Could run very nicely, printing a list of prime numbers, or write poetry, or anything else that undefined behaviour encompasses.

ghayes · on May 24, 2013

"global variables in C are initialized to zero implicitly"

NULL pointers will lead to a crash. It would be more interesting to have it as a random pointer, which could do quite anything.

subleq · on May 24, 2013

> NULL pointers will lead to a crash.

Not in C. Dereferencing a NULL pointer is undefined behavior, so any of the actions described by the parent would be correct.

gizmo686 · on May 25, 2013

Is 0 really the same thing the sane thing as 'NULL' in the context of C? If you actually wanted a pointer to the begging of the memory, you would dereference 0, which has the well defined meaning of getting whatever is at memory address 0. When the programming attempts to get that, it is shut down by the system.

asveikau · on May 25, 2013

> Is 0 really the same thing the sane thing as 'NULL' in the context of C?

Yes. I don't have chapter and verse handy but it is in the standard. The bit pattern of NULL is not required to be zero (so memset(&p, 0, sizeof(p)) is not guaranteed to yield null) but it must compare equally to 0 and assigning 0 must produce NULL.

[Edit: OK, in C99 this is covered in 6.3.2.3: Pointers. "An integer constant expression with the value 0, or such an expression cast to type void * , is called a null pointer constant." Then 7.17.3 says that NULL expands to a null pointer constant.]

> If you actually wanted a pointer to the begging of the memory, you would dereference 0,

Yeah, it's really easy to set up an environment where that happens. At one point I was experimenting with writing a small/toy kernel for x86 and I mapped the virtual address 0 to a valid page, and boom, dereferencing NULL did stuff. Not a great idea to set up the page tables that way for obvious reasons, but I'm going to guess that lots of hardware out there will let you do it...

In the old days of 16-bit x86, linear address 0 had the interrupt vector, so as I recall lots of DOS (maybe even Win9x) environments had dereferencing NULL do meaningful (surely confusing) things.

sltkr · on May 25, 2013

Is there any standard-compliant way to crash in C? A call to abort maybe?

Maybe the task should have been formulated as the shortest valid C program that invokes undefined behaviour instead. (Or maybe the task isn't very interesting either way.)

alcuadrado · on May 24, 2013

It seems to be down, google cache: http://webcache.googleusercontent.com/search?q=cache:4FhUcns...

lucb1e · on May 24, 2013

Wordpress strikes again; "error establishing database connection"

rabino · on May 24, 2013

A bad hosting strikes again.

lucb1e · on May 24, 2013

I have bad hosting with good software. Ran flawlessly with 8-15ms generation times on #3 of the HN homepage for a couple hours, only the network latency went up to at peak ~1.2 seconds (got less than 1mbps upload here). The page also executes multiple database queries for each pageload, just like Wordpress. No caching needed for me, it's all about optimization.

mortehu · on May 25, 2013

Well, the mistake in this particular case is probably allowing more web application processes than database connections from those processes, which is an easy thing to get wrong.

sablezab · on May 24, 2013

What software do you use?

lucb1e · on May 25, 2013

Self written, no framework used. It's a simple blog with quite custom requirements so I figured whynot just build a custom one. It runs on an Intel Atom, 1GB RAM (and there's more to run than a wamp stack). and 832kbps uplink.

As for software, I wrote it for PHP 5.3 (nowadays upgraded to 5.4 though) with MySQL and persistent database connections. The server is Windows 7 with apache 2.4.

bbanyc · on May 24, 2013

The first IOCCC winner declared main as a short[] of VAX machine code: http://www.ioccc.org/1984/mullender.c

You could probably do the same thing in x86 and it'd work on a modern compiler.

ssp · on May 24, 2013

It won't work on modern Linux with modern CPUs because the array will not be in an executable mapping.

deweerdt · on May 24, 2013

> Also, global variables in C are initialized to zero implicitly, so this is equivalent:

EDIT: this is wrong, see below.

That's wrong. 'static' variables are initialized to zero. Non-static variables are un-initialized, so they have a "random" value.

See:

$ valgrind ./a.out

==5118== Memcheck, a memory error detector

==5118== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info

==5118== Command: ./a.out

==5118==

==5118== Process terminating with default action of signal 11 (SIGSEGV)

==5118== Bad permissions for mapped region at address 0x600864

==5118== at 0x600864: ??? (in /home/def/a.out)

==5118== by 0x4E54A14: (below main) (in /usr/lib/libc-2.17.so)

to3m · on May 24, 2013

See my post: https://news.ycombinator.com/item?id=5762363

main will have a value of zero, and 0x600864 will presumably be &main (it's not the initial arbitrary value of main).

Auto variables are left uninitialized so that they don't have to be given a value when they're allocated. It's for efficiency, and it makes the compiler simpler to have this blanket rule rather than have it try to figure out the minimal initializations necessary (which probably isn't even possible). But this ocnsideration doesn't apply to globals or statics, because the initialization can be done at compile time, or (sometimes, in C++) on program startup.

lgeek · on May 24, 2013

> the initialization can be done at compile time, or (sometimes, in C++) on program startup

With ELF binaries for C programs it's done at startup as well. The data segment is created as having memory size SIZEM and file size SIZEF. If SIZEF < SIZEM, memory from SIZEF to SIZEM is set to 0.

noselasd · on May 24, 2013

Actually it is both. In C, variables with static storage duration are zero initialized.

Global(variables at file scope) and variables with static linkage (i.e. the static keyword) both of have static storage duration.

deweerdt · on May 24, 2013

That's correct, my bad.

bnegreve · on May 24, 2013

But global variables are static.

deweerdt · on May 24, 2013

No, they're not. In fact, if the program used 'static main;' instead, it wouldn't even compile because the 'main' symbol wouldn't be visible by the linker.

anarion · on May 24, 2013

Yes they are ! Global variables have static storage duration and are therefore default initialized. Be careful with the word 'static' which does not always correspond to the the keyword static which has several meaning ! When used with a global variable the static keyword has not the same meaning as static storage duration". It only means no external linkage.

to3m · on May 24, 2013

Both have so-called "static storage duration", which is what influences the initial value. See C99 standard, section 6.2.4 paragraph 4:

"An object whose identifier is declared with external or internal linkage, or with the storage-class specifier `static' has /static storage duration/. Its lifetime is the entire execution of the program and its stored value is initialized only once, prior to program startup."

The default initial value of objects with static storage duration is dealt with in 6.7.8 paragraph 10. Basically: pointers set to NULL, non-pointers have all bits reset, aggregates thus recursively.

bnegreve · on May 24, 2013

Ok, well I agree. I mean global variables are not created dynamically. There is room reserved for them in the data segment which is initialized to 0. Can you give me an example where a global variable isn't initialized to 0? Your valgrind example doesn't say much about the value in the main variable ..

Edit, @deweerdt: ok :)

deweerdt · on May 24, 2013

@bnegreve can't reply to your post, but i was mistaken. externally visible symbols are also initialized to 0

themattrix · on May 24, 2013

You can go even shorter if you cheat:

     $ cat short.c
     M
     $ gcc -DM='main;' short.c -o short
     $ ./short
     Segmentation fault

danielsamuels · on May 24, 2013

The Shortest Crashing Wordpress Site

_kushagra · on May 24, 2013

The site seems down, "Error establishing a database connection"

tshile · on May 24, 2013

Which has its own irony

BostX · on May 24, 2013

:) works as intented

jstanley · on May 24, 2013

I'm not convinced this is a C89 program. It is only an "accident" that the linker doesn't know about types.

I find it hard to believe that the C89 spec states that an integer called "main" is to be considered the main function, and suspect this is undefined behaviour (though I've not checked).

nitrogen · on May 24, 2013

It compiles and runs with gcc -std=c89 and gcc -std=c99, so even if it's not a true C89 program, it's a compilable GNUC89 program.

  $ gcc -std=c99 -pedantic /tmp/main.c -o /tmp/main
  /tmp/main.c:1:1: warning: data definition has no type
      or storage class [enabled by default]
  /tmp/main.c:1:1: warning: type defaults to ‘int’ in
      declaration of ‘main’ [enabled by default]
  /tmp/main.c:1:1: warning: ‘main’ is usually a function
      [-Wmain]
  
  $ /tmp/main
  Segmentation fault (core dumped)

kghose · on May 24, 2013

Not on mac: a.c -> main;

    gcc -nostdlib -std=c89 a.c -o a

    a.c:1: warning: data definition has no type or storage class
    Undefined symbols for architecture x86_64:
         "start", referenced from:
         -u command line option
    ld: symbol(s) not found for architecture x86_64
    collect2: ld returned 1 exit status`

poizan42 · on May 24, 2013

It can't be a valid C89 program. On many Harvard architecture based microprocessors data pointers and code pointers have differing size.

mikeash · on May 24, 2013

Is any crashing C program valid? You can only crash by invoking undefined behavior, and I believe that any program which invokes undefined behavior is "invalid". It's a major pitfall of C that determining "validity" requires solving the halting problem.

dietrichepp · on May 24, 2013

Different size maybe, but different busses^H^H^H^H^H^Haddress spaces definitely, and using an address on the wrong bus is a sure way to cause problems.

poizan42 · on May 24, 2013

Of course that is the definition of a Harvard architecture. It doesn't says that it won't link, just that it won't work. If compiling to an architecture with smaller sized code pointers than data pointers then the linker will most likely refuse to link it at all - otherwise it will have to truncate the adresses.

kmm · on May 24, 2013

^W will delete the previous word.

marshray · on May 24, 2013

How about:

    main(){*(int*)0=0;}

or:

    main(){*""=0;}

or:

    main(){main();}

pdw · on May 24, 2013

The last one is just as likely an infinite loop as a crash. Even C compilers occasionally manage to do tail-call optimization these days.

mikeash · on May 24, 2013

All substantially longer than the example given.

marshray · on May 24, 2013

The site was down when I made my suggestions. Still, I think mine may be a bit more language compliant than the shortest variants in the article.

rwmj · on May 24, 2013

Edit: deleted because I got to the Google cache and that's what the site was suggesting.

marshray · on May 24, 2013

That's essentially where he's going with it, noting that you can even leave off the "=0". But as others here point out there's some question as to how many linkers will actually produce an executable image from that source.

joeyh · on May 24, 2013

Seems appropriate that the default C program, as it were, segfaults.

stinos · on May 25, 2013

"address 0, which is not an address that we have access to"

if I'm not mistaken, there are platforms like TI C600 dsps for which 0 is the start of the usable address space

sfvisser · on May 24, 2013

Interesting. We tried to do the same for Haskell. The shortest we could come up with:

import Unsafe.Coerce;main=unsafeCoerce()1

tmhedberg · on May 24, 2013

Why not:

    main=undefined

hesselink · on May 24, 2013

That throws an exception, I believe. We wanted something that actually segfaults.

rpearl · on May 24, 2013

That really depends on what you mean by "crashing"

webreac · on May 24, 2013

With visual studio V6.0 (AFAIR), I made a short program that crashed the compiler:

int a;::a::b();

sbanach · on May 24, 2013

It's also the shortest C program that you can link at all.

efermat · on May 24, 2013

$ echo "m;" > short.c

$ gcc -O0 -c short.c

short.c:1: warning: data definition has no type or storage class

$ ld -e _m -o short short.o

ld: warning: -macosx_version_min not specified, assuming 10.7

$ ./short

[1] 2040 segmentation fault ./short

qznc · on May 24, 2013

"The shortest crashing C89 program" to be more precise. :)

Trufa · on May 24, 2013

And interesting question, what would be the shortest not crashing C program?

main(){}

??

lgeek · on May 24, 2013

An empty file would do: https://news.ycombinator.com/item?id=5762578

hkmurakami · on May 24, 2013

reminds me of the recent fad of TAS (tool assisted speed run) videos of "fastest crash" of video games.

stefap2 · on May 24, 2013

Shortest crashing website: Error establishing a database connection

kghose · on May 24, 2013

for those who see the server crashing: The program is in C89

main;