Logging C Functions

csmpltn · on May 20, 2022

> " In the above example, you may have noticed that a number &407eef was printed in place of a symbol. This is one of the tradeoffs that needed to be made by Cosmopolitan Libc's kprintf() function, which is too mission critical to be able to call into all the non-privileged code needed to open() + mmap() + etc. the concomitant ELF .com.dbg binary and load the symbol table. The simple solution for this is to have your main function call GetSymbolTable()"

You could just log the base address of the executable and the relative addresses of the functions instead. The conversion from a relative address to the function's name can take place later, as part of a post-processing step. Should further reduce runtime overhead.

jart · on May 20, 2022

That's a smart way to do it and it certainly would have made things easier. I'm a big fan of the UNIX philosophy of small simple programs. But please consider that this facility is also used for ShowCrashReports() which prints a backtrace upon a SIGSEGV, SIGBUS, etc. When a crash spontaneously happens, it's not always easy or possible to run it again piped into an addr2line-like tool. It makes life simpler to have reporting facilities baked into each binary, and a lot of effort went into making it lightweight. Right now life.com (i.e. exit(42)) in the default build mode is 68kb (although it's 12kb in MODE=tiny) and that includes --ftrace, --strace, support for six operating systems and it embeds an operating system in each binary too, so it can run on autonomously on metal.

Veserv · on May 20, 2022

Why do you need to run the crashing program again? In the case you bring up you already printed the encoded backtrace. You can just copy the contents printed onto your screen/log file and post-process that as long as you have also output the relevant runtime linking information used to make the encoding.

AlotOfReading · on May 20, 2022

Is there a reason why you don't call GetSymbolTable() yourself in the runtime init or at least through an __attribute__((constructor)) if you want to keep it C?

jart · on May 20, 2022

It's mostly to avoid imposing the latency on startup unless it's necessary or the user requested it. But now that you mention it, I think it should be updated to call that automatically in cases where the --strace flag is passed. Thanks!

mikepurvis · on May 20, 2022

I think Tracealyzer does some of this kind of thing for logging embedded control flow in a way that's ultra compact but requires significant post processing after the trace has been recovered.

It's a commercial product but in my usage/experience worth every penny: https://www.highintegritysystems.com/tools/tracealyzer/

johnklos · on May 20, 2022

Justine's work is nothing short of magic. I learn something (often many things) new from every post.

fouronnes3 · on May 20, 2022

My excitement level for seeing a justine.lol link is on the same level as a bellard.org link.

MonkeyClub · on May 20, 2022

I'm theory, I'd say me too, but in practice, I find that Justine's posts give me a stronger dopamine fix, what with the increased posting frequency and the exposed thought process, and a hefty dosage of tryptamine with the Lisp + Assembly psychedelia.

Mmm... Tryptamine...

MonkeyClub · on May 20, 2022

I mean, honestly, sometimes there's a post that comes up that you've been waiting for for years, right?

I think Justine goes like on an inward self search or something for personally echoing creative landmarks in the overall late eighties early nineties PC scene, the posts we're all eagerly waiting for, and when she finds one then just stops waiting :-)

Very inspiring overall, I'm eagerly waiting for the next one, but in the meantime let me enjoy this one.

jart · on May 20, 2022

Justine here. I'm glad to hear you've been enjoying it! A lot of what I do for fun in my spare time is read old code, since I want to understand the hopes and dreams of each generation, and then find some way to capture the essence of that dream using the advantages of modern tools. For example, https://justine.lol/sectorlisp2/ was a really nice blog post that recreates the idea of LISP as it existed around 1960, except much smaller and beautifully polished since we now have better tools than punch cards. I'm obviously much younger than the original LISP hackers, but I put a lot of effort into understanding and faithfully recreating their ideas. I even colorized the LISP 1.5 listing for my blog https://justine.lol/sectorlisp/ while I was reading it, since you really get to know people when you read their code. In any case, you can expect some more content from me in the upcoming weeks.

enriquto · on May 20, 2022

In 40 years people will read your code and find it just as legendary as that of the original lisp hackers. From time to time, I reread ape.S and it always brings tears to my eyes. (Too bad that github botches the art, it has to be experienced inside a real text editor.)

planede · on May 20, 2022

> Debuggers aren't very good at handling this situation. Because once the control flow jumps to the NULL page, you'd need to find a way to rewind execution history to figure out how it got there.

This is one place where reversible debuggers shine. Try rr[1].

[1] https://rr-project.org/

feffe · on May 20, 2022

In my experience debuggers handles this fine. Some archs also has a link register (jump and link) which may help finding back. This test is from x86-64 Linux.

  /*
  gcc -g -Wall -o x x.c
  gdb ./x
  (gdb) r
  (gdb) bt
  #0  0x0000000000000000 in ?? ()
  #1  0x0000555555554617 in foo () at x.c:6
  #2  0x0000555555554628 in main () at x.c:10
  (gdb) f 1
  #1  0x0000555555554617 in foo () at x.c:6
  6           bar();
  (gdb) p bar
  $1 = (void (*)(void)) 0x0
  */
  
  #include <stddef.h>
  
  void (*bar)(void) = NULL;
  
  void foo() {
      bar();
  }
  
  int main() {
      foo();
  }

mark_undoio · on May 20, 2022

It gets worse if you have a stack smash, e.g:

  #include <string.h>
  
  void foo() {
      int array[1];
      memset(array, 0, 100); /\* Oh no, trash the stack! */
  }
  
  int main() {
      foo();
  }

This gets us:

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000000000 in ?? ()
  (gdb) bt
  #0  0x0000000000000000 in ?? ()
  #1  0x0000000000000000 in ?? ()

But with a time travel debugger (I'm using UDB because - disclaimer - it's what I work on. `rr` would work just as well):

  Program received signal SIGSEGV, Segmentation fault.
  0x0000000000000000 in ?? ()
  recording 10,617> backtrace
  #0  0x0000000000000000 in ?? ()
  #1  0x0000000000000000 in ?? ()

^ Because we returned to NULL we have segfaulted but the stack is also trashed due to the memset. We don't know how we got here.

  recording 10,617> reverse-stepi
  0x0000000000401146      6       }
  99% 10,616> bt
  #0  0x0000000000401146 in foo () at smash.c:6
  #1  0x0000000000000000 in ?? ()

^ We've stepped back before the return, so we can now see how we got to NULL. Still incomplete stack because it's still trashed.

  99% 10,616> reverse-step
  5           memset(array, 0, 100); /* Oh no, trash the stack! \*/
  99% 10,586> bt
  #0  foo () at smash.c:5
  #1  0x0000000000401155 in main () at smash.c:9
  99% 10,586>

^ We've gone back before the stack smash happened, so now we get the full backtrace.

feffe · on May 26, 2022

ASAN is pretty great with these cases as well. Spent much time looking for the needle in the haystack before ASAN came along.

bjarneh · on May 20, 2022

I visit that URL to feel like an idiot.

tptacek · on May 20, 2022

it surprises me that I found no evidence of someone having invented it before.

It sounds a little like they re-invented Detours.

https://github.com/microsoft/Detours

jamesmunns · on May 20, 2022

It also reminds me of techniques that I've seen for inserting coverage instrumentation.

Use nops (or other placeholders to insert "counter increment" fills) on instrumented runs, e.g. in test, that way you don't have to modify the "production" code to measure test coverage.

amwales · on May 20, 2022

Nice, have been using both 'ltrace' and 'strace' for many years on Linux. If you are using Linux, take a look at the ltrace package that is most likely included with your distro and will work with glibc requiring no recompilation against another libc to use. The ltrace package has been around for many, many years.

https://www.man7.org/linux/man-pages/man1/ltrace.1.html

winrid · on May 20, 2022

Also similar thing for NodeJS: https://github.com/ValYouW/njsTrace

Icathian · on May 20, 2022

When I see superpower utilities like this, my immediate question is usually "why can't/don't the mainstream options do this do this?".

Based on the opening of TFA, my assumption is that they could, but just haven't yet? Is this something we can hope to be added by more mainstream projects? Or are there technical or cultural blockers I can't see?

csmpltn · on May 20, 2022

> "Based on the opening of TFA, my assumption is that they could, but just haven't yet? Is this something we can hope to be added by more mainstream projects? Or are there technical or cultural blockers I can't see?"

Instrumentation, tracing (for both logging and performance analysis), stack traces and runtime analysis are absolutely not new concepts. There are no "cultural blockers" - this stuff is already used everywhere, and has been for many decades now.

There are countless ways to implement those mechanisms. This blog post presents yet another one. There are numerous tools in this space already - across all programming languages, different kinds of hardware and operating systems, and IDEs.

sdmike1 · on May 20, 2022

Any you recommend for C? :)

fuy · on May 20, 2022

Great post, thanks!

One typo, which is a bit confusing until you look at the source code to confirm it's actually --ftrace: "The Cosmopolitan Libc _start() function starts by intercepting the --strace flag"

jart · on May 20, 2022

Fixed. Thank you!

NeutralForest · on May 20, 2022

Where do I even start to understand what's going on here? I know (some) C and I have Tanenbaum's book but this seems so far beyond.

jart · on May 20, 2022

The Cosmopolitan Libc _start() function starts by intercepting the --ftrace flag. https://github.com/jart/cosmopolitan/blob/master/libc/runtim... If it exists, then it opens and sorts of the symbol table from the elf binary. https://github.com/jart/cosmopolitan/blob/master/libc/runtim... Then it changes the protection of memory https://github.com/jart/cosmopolitan/blob/master/libc/runtim... so it's able to iterate over the program's memory to look for nop instructions it can mutate. https://github.com/jart/cosmopolitan/blob/master/libc/runtim... Those NOPs were inserted by GCC. It's easy to self-modify them in memory, since they have the same byte length as the CALL instruction. Think of it like a mini linker. It just relinks the profiling nops. Once they've been rewritten, functions will start calling https://github.com/jart/cosmopolitan/blob/master/libc/runtim... which saves the CPU state to the stack. That means ftrace kind of acts like an operating system kernel. Once the assembly saved the CPU it can call the C code https://github.com/jart/cosmopolitan/blob/master/libc/runtim... that acquires a reentrant mutex and unwinds the RBP backtrace pointer (via __builtin_frame_address(0)) to determine the address of the function that called it. Once it has the address of the function, it passes it along to kprintf() which has a special syntax for turning numbers into symbols. https://github.com/jart/cosmopolitan/blob/master/libc/intrin... That's really all there is to it!

NeutralForest · on May 20, 2022

Thanks for your answer! I'll dig into the code then =)

csmpltn · on May 20, 2022

The TLDR summary here is that the compiler allows you to hook into ("instrument") function calls and run custom logic of your choosing on function enter/leave. From that point on, you could use that mechanism to log how your program executes ("tracing").

NeutralForest · on May 20, 2022

Thanks, the overall logic is somewhat clear to me but I'd like to understand Justine's work a bit more in depth (like https://justine.lol/cosmopolitan/) but I don't know where to start. Just reading the code or the tutorial requires some background knowledge I don't think I have.

jart · on May 20, 2022

Justine here. We recently started a Discord chatroom https://discord.gg/WH25psU9 for Redbean / Cosmopolitan Libc / etc. You're invited to join us! You're free to ask myself and others for help on using / understanding Cosmopolitan Libc in this chatroom. You can also just come to hang out and meet the authors.

NeutralForest · on May 20, 2022

Thanks Justine that's really cool! I'll def check it out

legalcorrection · on May 20, 2022

In case anyone is wondering, you can implement this with _penter and _pexit in MSVC.

There's also a great Windows tool for this kind of tracing, WinAPIOverride. http://jacquelin.potier.free.fr/winapioverride32/

rurban · on May 20, 2022

Nice! Very similar to dtrace instrumentation

panic · on May 20, 2022

Yeah, in fact dtrace’s pid provider (http://dtrace.org/blogs/brendan/2011/02/09/dtrace-pid-provid...) can log all function calls in user-space processes—and it doesn’t require recompilation or special flags (it patches the code at runtime).

Veserv · on May 20, 2022

But in return the overhead is immensely higher per invocation. Here [1] we see a 15 ns function call increase to nearly 1 us with tracing enabled. Given the implementation of dtrace which appears to patch in a user-kernel trap, introspect based on trap location, log, then return, this is very likely a representative overhead of every probe.

Incurring a 1 us overhead on each function call is very steep if you are doing a function entry/exit trace and nearly totally smears the profiling information you could get. In contrast, efficient recompilation-based instrumentation should only incur maybe 100 ns down to maybe around 10 ns depending on how aggressively you instrument and how much overhead you are willing to incur in the logging disabled case. In aggregate, a efficient recompilation-based approach should only incur a whole program overhead in the low double digit percent range when enabled and at most a low single-digit percent, if even that, when disabled. As a corollary, if 1/10th the per-invocation overhead results in say a aggregate 30% overhead, then we can reasonably assume the full overhead case is around 10x as much overhead resulting in 300% aggregate overhead, or a program taking 4x as long to run. That is a qualitatively different amount of overhead.

[1] http://dtrace.org/blogs/brendan/2011/02/18/dtrace-pid-provid...

jart · on May 20, 2022

For what it's worth, I believe Cosmopolitan Libc's --ftrace overhead averages out to 280ns per function call. That's the number I arrived at by building in MODE=opt, adding a counter to ftracer, running Python hello world with the trace piped to /dev/null, and then I divided the amount of time the process took to run by the number of times ftracer() was called. Part of what makes it fast is that it doesn't have to issue any system calls (aside from write() in the case where it needs to print). As for the overhead when ftracing isn't enabled, I believe there is zero overhead. The NOP instruction in the function prologue is nearly free. I recall reading reports where the instruction timings for these fat nops is like ~200 picoseconds.

Most of the overhead comes from the fact that it's using kprintf() to print the tracing info, since I'm happy to spend a few extra nanoseconds having more elegant code. So it could totally be improved further. Another thing is that right now it's only line buffered. So if it buffered between lines, it'd go faster.

mhh__ · on May 20, 2022

dtrace and ebpf are much more powerful than this as far as I can see.

I have a bpftrace script that measures template compile times across all compiler invocations on my machine, no recompilation required.

awild · on May 20, 2022

They are most likely more capable but Cosmopolitan which this is built upon has a different use case than just instrumentation

mhh__ · on May 20, 2022

Probably, but all the top comments are saying "this is magic" or "I got to this blog feel stupid of whatever" whereas I just think it's a nice hack, so I thought I'd mention how I'd actually to do it in production.

mhh__ · on May 20, 2022

That should say [stupid" or whatever], I can't type.

moonchild · on May 20, 2022

> the cost of calling an empty function can be as high as 14 cycles of overhead per function call

How do you figure? Call and ret should be just a couple of cycles each. And they run in parallel, so if you were waiting for a memory access or otherwise didn't have anything better to do, the overhead is even less.

mark_undoio · on May 20, 2022

Maybe counting the need to save/restore caller saved registers if they are in used + needed after the call?

moonchild · on May 20, 2022

It is called in the prologue; no caller-saved registers have been used yet, so there is no need to save them.

throw827474737 · on May 20, 2022

> Since we use Python unit tests to test Cosmopolitan Libc.

Does anyone how this is meant?

They leverage Python's unit tests, running them on a cosmopolitan-libc linked Python version?

jart · on May 20, 2022

Author here. We have 364 separate test programs in the Cosmopolitan repository. However most of our testing comes from running the tests of huge existing projects that use libc, e.g. Python. Try running this on Linux:

    git clone https://github.com/jart/cosmopolitan.git
    cd cosmopolitan
    make -j16 o//third_party/python

That command will build Python and all its dependencies from scratch within the hermetic monorepo in addition to running its unit tests. On my $1,000 Core i9-9900 PC this takes 31.078 seconds.

asfdasdfvcs2 · on May 20, 2022

For the author: the domain justine.lol is blocked at a certain FAANG, hence inaccessible on work computers.

jart · on May 20, 2022

Justine here. Please contact your system administrator and let them know the restriction is in error. FAANG loves me since I was a employee of theirs for many years, and some workers from FAANG were even generous enough to sponsor me on GitHub today. Thanks guys! As for justine.lol, there isn't any user-submitted or untrusted content on this domain. What most likely happened is a virus scanner got unhappy with the Actually Portable Executable format, which is still very new.

One thing you can generally do to verify the authenticity of the binaries I publish, is go on VirusTotal and check to see if there's an upvote from "howishexeasier" since that's me. It's the closest thing to code signing that a multiplatform binary format allows, and honestly I think all platforms should use the service to check binaries.

Twisol · on May 20, 2022

That sounds like a problem for the FAANG, not for the author. (My old workplace used a third-party web filter that ended up blocking my own website. I sent a few requests over the years to unblock a few sites; as far as I recall, they all eventually got unblocked.)

lifthrasiir · on May 20, 2022

I've once seen tarsnap.com blocked for being a "file hosting service". Technically correct, but it was funny that a service with only command-line interface can get blocked. Actually, it was not funny for me because I needed the scrypt documentation hosted from the same domain...

exikyut · on May 20, 2022

This probably needs more specifics to be acutely actionable, for example further details could turn up in the author's inbox from an anonymous (eg protonmail) address. Given that at the author's level of specialization the world shrinks somewhat it shouldn't be too hard to verify the information and dismiss duplicate messages/red herrings, and then proactively alert other employees facing the same filtering about what's going on.

cpach · on May 20, 2022

If you want to read the article anyway, try one of these links:

https://web.archive.org/web/20220520063319/https://justine.l...

https://archive.ph/uMfIS

vamega · on May 20, 2022

Well I'm not sure which FAANG you're at, but it's definitely blocked at Amazon. Just ran into this error right now?

gavinhoward · on May 20, 2022

It's not hard to make an ftrace facility for C functions. I did it with [1] and [2] (using a concept I call "stackpools") for this comment since I've had it in mind for a while, and this post pushed me to do it.

What jart's does better than mine is that hers is not opt-in; you don't have to put special macros for it. Mine requires that. Hers also can instrument all functions; mine does not get external functions.

What mine does better is that it is not a security issue wrt W^X, and it also implements scope- and function-based RAII. For example, now that I have implemented it, I can change all of my direct calls to unlock mutexes at the end of a function into destructors, and change the lock calls to use the stackpools. After doing that, then any function that buys into the system I have, and that takes a lock, will release the lock by the end of the function without anything special, just like using RAII with locks in C++.

I can turn ftracing on and off at build time as well. [3]

Using the tests/gaml/gaml_fuzz program, which is still incomplete, you can see this. Assuming the build directory is `build/`, running:

    cat ../build.gaml | tests/gaml/gaml_fuzz

with ftracing turned on will give you an ftrace that looks like:

    main()
      y_fs_filebytes_fread()
        y_strucon_handleError()
        y_strucon_handleError()
      y_fs_filebytes_fread()
      y_strucon_status()
      y_strucon_status()
      y_exit()

I could make it prettier, and I probably will in subsequent commits.

In other words, it's not magic, and in my opinion, it's not really a good idea because of the security implications because while you could argue that it's only for development, it's still another route to gain access to a developer's machine or a CI machine.

[1]: https://git.yzena.com/Yzena/Yc/src/commit/80713280c429850553...

[2]: https://git.yzena.com/Yzena/Yc/src/commit/80713280c429850553...

[3]: https://git.yzena.com/Yzena/Yc/src/commit/80713280c429850553...

jart · on May 22, 2022

Why is it a security risk? The point of the .privileged section is that it remains in the W^X state the whole time. OpenBSD permits doing code morphing that way, so it must be secure. Code morphing is pretty much essential to how things like virtual machines work too. Would you say that JIT is insecure?

gavinhoward · on May 22, 2022

Just because OpenBSD does it does not mean it's not a security issue. It just means that they have to. OpenBSD is not some impenetrable vault of an OS.

Yes, JIT is insecure. You best hope that you don't have some kind of vulnerability in your JIT or an attacker can do return-oriented programming by making their own gadgets, i.e., they don't have to find the gadget they want, they can just create it.

In fact, this exact problem with JIT's is why I don't implement interpreters with JIT's. Instead, I generate bytecode and then run that because I can more easily sandbox it that way.

But that is ignoring the biggest elephant in the room. You're doing this in a libc. The libc is where attackers usually search for gadgets. It's not a good combination to have a libc, one of the highest value targets, to allow self-modifying code with an easy way to activate it.

I remember trying Cosmopolitan once. Ran into a bug right away. It couldn't handle spaces in a filename. I can only imagine what kinds of bugs it still has and continues to get as you add features. And what bugs of those will allow creation and exploitation of gadgets now that you have this?

We had a conversation on lobste.rs where I talked about why I don't think what you're doing is a good idea. You deleted everything you said. That's not a good look. The bug I ran into trying Cosmopolitan was not a good look. Not understanding the risks of what you are doing is not a good look.

All of that has made Cosmopolitan radioactive to me. I won't touch it.

jart · on May 23, 2022

> It couldn't handle spaces in a filename.

Then show me the GitHub issue you filed that proves it. If you're talking about the Makefile configuration, I don't care.

> I can only imagine what kinds of bugs it still has and continues to get as you add features. [...] All of that has made Cosmopolitan radioactive to me. I won't touch it.

Good. Please stop following me.

gavinhoward · on May 23, 2022

> Then show me the GitHub issue you filed that proves it. If you're talking about the Makefile configuration, I don't care.

You already fixed it [1], which was good. I left that experience with a neutral feeling; it was good that you fixed it, but it was not good that such a common thing was not handled. I decided to keep a watch on Cosmopolitan with a bit of happy anticipation.

(It was later seeing you on this site and lobste.rs that left a bad taste in my mouth.)

You're right, though, that a Makefile won't be able to handle it, and that that's not Cosmopolitan's problem, though I would suggest a different build system.

> Good. Please stop following me.

Oh, I don't follow you. But I check this site and lobste.rs regularly, and I see you post. Since I believe that Cosmopolitan is the wrong direction for the industry to go, I express that opinion. Nothing wrong with that, especially since you do the opposite by so zealously marketing Cosmopolitan. And it's easy to notice one of your submissions; your domain is instantly recognizable, as is your username.

Basically, Cosmopolitan may be radioactive to me, but that doesn't mean I won't express my opinion about it when I see a submission about it, and that does not mean I'm "following" you; it just means I'm an opportunist.

If expressing my opinion of your software makes you unhappy, that's not my problem, it is yours.

[1]: https://github.com/jart/cosmopolitan/issues/11