Hacker News new | past | comments | ask | show | jobs | submit login
Tearing apart printf() (2018) (maizure.org)
152 points by dreampeppers99 on Nov 12, 2019 | hide | past | favorite | 24 comments



Excellent stuff. My couple of notes..

1. printf may malloc so don't use it in an out of memory situation. Though I think this problem is vanishingly unlikely these days.

2. printf of floats used to require linking to the math library on some platforms (-lm)

3. I am pathetically grateful for nice diagnostics you get when you use the wrong % formatter in more recent C compilers. This used to be a rich source of errors and non-portability.


You can also use __attribute__((format(printf, ...))) for functions of yours that ultimately pass their varargs to a member of the printf family.

It is exceedingly rare for malloc() to give you a recovery chance on failure. Folks are so used to the Linux model of overcommit+OOM killer that almost nobody's code can gracefully deal with memory exhaustion.


> printf may malloc so don't use it in an out of memory situation. Though I think this problem is vanishingly unlikely these days.

Alternatively: don’t call printf from your interposed malloc, bad things will happen ;)


> 1. printf may malloc so don't use it in an out of memory situation. Though I think this problem is vanishingly unlikely these days.

I'd like to point out that FreeBSD's printf only does this if:

1. you use a lot of arguments (i.e. a compile-time decision, and quite rare)

2. use wchar_t support (also a compile-time decision and quite rare)

3. the output is to something that buffers/mallocs (i.e. stdio)

4. locale support does it for some reason (I simply haven't checked this)

Which makes it great to use in e.g. crash handlers. It's actually Async-Signal-Safe if you cut out the stdio bits. And it's available under BSD license... and actually a nice read (more on that in a separate post.)


>1. printf may malloc so don't use it in an out of memory situation. Though I think this problem is vanishingly unlikely these days.

I feel like you've just saved my future self from a terrible bug in the future. Thanks


Also, don't use printf in a signal handler for the same reason. Any attempt to malloc or free within a signal handler context runs the risk of a deadlock.


4. On GNU/Linux you can add your own %-conversions (e. g. %Y) with register_printf_function.


register_printf_function is deprecated; you should use register_printf_specifier instead.


I strongly advise against using either since both are not only non-portable, but also break any compiler support for getting you warnings when you're using it wrong.

The Linux kernel style suffix extensions (cf. https://www.kernel.org/doc/Documentation/printk-formats.txt) are much better on the latter front.


> 1. printf may malloc so don't use it in an out of memory situation. Though I think this problem is vanishingly unlikely these days.

You can avoid this by doing something like:

  setvbuf(stdout, NULL, _IONBF, 0);


You sure about that? This seems to suggest otherwise: https://sourceware.org/git/?p=glibc.git;a=blob;f=stdio-commo...



The registers rdi, rsi and rdx are acknowledged in the article but they don't appear in the disassembly:

  000000000040f9c0 <__libc_write>:
    40f9c0:  83 3d c5 bb 2a 00 00   cmpl   $0x0,0x2abbc5(%rip)  # 6bb58c <__libc_multiple_threads>
    40f9c7:  75 14                  jne    40f9dd <__write_nocancel+0x14>

  000000000040f9c9 <__write_nocancel>:
    40f9c9: b8 01 00 00 00        mov    $0x1,%eax
    40f9ce: 0f 05                 syscall
This is because they are set when the write function is called. The System V AMD64 ABI specifies that rdi, rsi and rdx are used for the 1st, 2nd and 3rd arguments of a function, perfectly matching the Linux system call ABI. The 5th and 6th also match: r8 and r9, respectively. However, the 4th argument doesn't match: system calls use r10 while System V uses rcx. I wish I knew why.


The x86 `syscall` instruction stores the return address into `rcx` as well as `RFLAGS` into `r11`. These are unconditionally clobbered and thus cannot be saved or used in a `syscall` transition.


Isn't that disassembly a replacement of the call to printf with a call to the write system call? Most compilers have the printf family as builtins to allow removing them from the code entirely: https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html


Compilers will rarely inline all of printf in my experience.


I wish there was a book length guide with this kind of tutorial style writing but haven’t found much. Everything I’ve found on Linux is more like a reference resource


This is a good overview of how it all works. On some systems like AmigaOS, there is no libc at all, for example standard input / output is handled by the dos.library in ROM. compiler-specific implementation is delivered as libc.a, which means that only static linking with libc is possible. UNIX®️ software cannot be compiled at all without modification and without having the 3rd party ixemul.library downloaded from "AmiNet" and installed in the LIBS: assign (which usually resolves to SYS:LIBS, which in turn usually resolves to either DF0:LIBS or DH0:LIBS).


“Linux/GNU” is a new one. Not sure if it’s a typo or a troll, but if the latter, it’s a pretty decent troll if such a thing exists? If it’s a typo, geez dude. You’re going to give RMS an aneurysm.


> You’re going to give RMS an aneurysm.

He has much more serious problems nowadays...


Surely he has Emacs configured to rewrite all references to the one true name.


VIM?


As far as the formatting part is concerned, if you just want to look at an implementation, FreeBSD's is extremely nice and readable:

https://github.com/lattera/freebsd/blob/master/lib/libc/stdi...

For FRRouting, we decided we want to use Linux kernel style extensions (like "%pI4"), so to get this in a portable way we imported FreeBSD's printf into our code. Since printf() isn't exactly "hot" code getting changed a lot, we considered the maintenance / duplication cost acceptable.

You can see the result here:

https://github.com/FRRouting/frr/tree/master/lib/printf

http://docs.frrouting.org/projects/dev-guide/en/latest/loggi...

stdio support is completely gone in our copy, WCHAR_SUPPORT is disabled at compile time, locale support is stubbed out/hardcoded to C locale. None of these matter to us. As a result, we can use this printf even in the SEGV handler. (That's only a bonus though, the main reason was extensibility on the format specifiers.)





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: