LD_PRELOAD: The Hero We Need and Deserve

woodruffw · on Feb 18, 2019

Awesome post! `LD_PRELOAD` is a powerful tool for program instrumentation.

It's worth noting, though, that using `LD_PRELOAD` to intercept syscalls doesn't actually intercept the syscalls themselves -- it intercepts the (g)libc wrappers for those calls. As such, an `LD_PRELOAD`ed function for `open(3)` may actually end up wrapping `openat(2)`. This can produce annoying-to-debug situations where one function in the target program calls a wrapped libc function and another doesn't, leaving us to dig through `strace` for who used `exit(2)` vs. `exit_group(2)` or `fork(2)` vs. `clone(2)` vs. `vfork(2)`.

Similarly, there are myriad cases where `LD_PRELOAD` won't work: statically linked binaries aren't affected, and any program that uses `syscall(3)` or the `asm` compiler intrinsic to make direct syscalls will happily do so without any indication at the loader level. If these are cases that matter to you (and they might not be!), check out this recent blog post I did on intercepting all system calls from within a kernel module[1].

[1]: https://blog.trailofbits.com/2019/01/17/how-to-write-a-rootk...

cyphar · on Feb 18, 2019

There is a very recent development in this area -- there is now a way to do it without ptrace and instead entirely using seccomp[1]. It's somewhat more complicated (then again, ptrace is far from simple to get right) but gives you the benefit that you don't need to use ptrace (which means that debuggers and upstart will work).

It's going to see a lot of use in container runtimes like LXC for faking mounts and kernel module loading (and in tools like remainroot for rootless containers), but it will likely also replace lots of uses of LD_PRELOAD.

[1]: https://youtube.com/watch?v=sqvF_Mdtzgg

roblabla · on Feb 18, 2019

There's another way to intercept syscalls without going as far as a kernel module, using debugging API ptrace. There's a pretty neat article about how to implement custom syscalls using ptrace: https://nullprogram.com/blog/2018/06/23/

woodruffw · on Feb 18, 2019

Yup! I discuss the pros and cons of using `ptrace` within that post.

It's all about the use case: if being constrained to inferior processes and adding 2-3x overhead per syscall doesn't matter, then `ptrace` is an excellent option. OTOH, if you want to instrument all processes and want to keep instrumentation overhead to a bare minimum, you more or less have to go into the kernel.

sl1ck731 · on Feb 18, 2019

I've been looking for a while for a way to capture all file opens and network ops to profile unknown production workloads similar to proc. explorer on Windows, which I believe is implemented using ETW. Unfortunately strace seems to be out of the question purely because of the performance impact. Is the performance impact due to strace or ptrace itself?

woodruffw · on Feb 18, 2019

It's ptrace itself: every traced syscall requires at least one (but usually 3-4) ptrace(2) calls, plus scattered wait(2)/waitpid(2) calls depending on the operation.

If you want to capture events like file opens and network traffic, I'd take a look at eBPF or the Linux Audit Framework.

alexgartrell · on Feb 18, 2019

I recommend bpftrace as an entry point to working with bpf

https://github.com/iovisor/bpftrace

sl1ck731 · on Feb 18, 2019

This is really cool. Unfortunately the 4.x kernel requirement wouldn't work for the majority of my work since RHEL is still on 3 :|

moonbug · on Feb 18, 2019

If you have RHEL 7.6 or later, you have bpf

floatboth · on Feb 18, 2019

Most infamously, Golang programs use their own syscall wrappers, so you can't hook that with LD_PRELOAD.

en4bz · on Feb 18, 2019

You can use [1] from Intel to successfully intercept all syscalls from a given library, by default libc only. This library actually works by disassembling libc and replacing all `syscall` instructions with a jump to a global intercept function that you can write yourself. Incidentally it's also an LD_PRELOADed library.

[1] https://github.com/pmem/syscall_intercept

woodruffw · on Feb 18, 2019

Yep, I give them a shout-out at the bottom of that post -- using capstone to instrument system call sites at runtime is great (if a little crazy).

I didn't actually realize it was an Intel project. I wonder how it stacks up against Pin.

philpem · on Feb 18, 2019

LD_PRELOAD is a fantastic tool.

At a previous job, we wanted binary reproducibility - that is to say, building the same source code again should result in the same binary. The problem was, a lot of programs embed the build or configuration date, and filesystems (e.g. squashfs) have timestamps too.

Rather than patch a million different packages and create problems, we put together an LD_PRELOAD which overrode the result of time(). Eventually we faked the build user and host too.

End result: near perfect reproducibility with no source changes.

I've also used it for reasons similar to the GM Onstar example in the article -- adding an "interposer" library to log what's going on.

I've pulled similar stunts with pydbg on a Windows XP virtual machine -- sniffing the traffic between applications and driver DLLs (even going as far as sticking a logger on the ASPI DLLs). That and the manufacturer's debug info got me enough information to write a new Linux driver for a long-unsupported SCSI device which only ever had Win9x/XP drivers.

jacobush · on Feb 18, 2019

Thank you! I may yet figure out the protocol of my APS film scanner.

philpem · on Feb 18, 2019

Well if I could figure out the protocol of the Polaroid Digital Palette (specifically the HR-6000 but the ProPalette and CI-5000S use the same SCSI protocol)...

Look for any debug data you can turn on in the driver and correlate that against whatever you see going to the scanner. Try to save timestamps if you can, then merge the two logs.

I was a little surprised that while Polaroid had stripped the DLL symbols, they'd left a "PrintInternalState()" debug function which completely gave away the majority of the DP_STATE structure fields.

After that, I reverse-engineered and reimplemented the DLL (it's a small DLL), swapped the ASPI side for Linux and wrote a tool that loaded a PNG file and spat the pixels at the reimplemented library.

And then someone sent me a copy of the Palette Developer's Kit...

(Incidentally I'd really love to get hold of a copy of the "GENTEST" calibration tool, which was apparently included on the Service disk and the ID-4000 ID Card System disks)

jacobush · on Feb 19, 2019

Wow, do you use these for anything?

I shoot 135 film and some medium format, I have tried Super 8 and would love to start shooting 16mm film - but having a film recorder and actually use it something?!

:-D What can you do, what would you do?

If I was filthy rich I'd project 35mm movies in my living room. :)

rumcajz · on Feb 18, 2019

That's evil. I like it.

reiger · on Feb 18, 2019

Was this with an operating system?

philpem · on Feb 18, 2019

Embedded Linux "thing".

bvinc · on Feb 18, 2019

I'll share my story. I used to work at a popular Linux website hosting control panel company. Back in the early 2000's "frontpage extensions" were a thing that people used to upload their websites.

Unfortunately, frontpage extensions required files to exist in people Linux home directories, and people would often mess them up or delete them. People would need their frontpage extension files "reset" to fix the problem. Fortunately, Microsoft provided a Linux binary to reset a users frontpage extension files.

Unfortunately, it required root access to run. Also unfortunately, I discovered that a user could set up symlinks in their home directory to trick the binary into overwriting files like /etc/passwd.

We ended up actually releasing a code change that would overwrite getuid with LD_PRELOAD so that the Microsoft binary would think it was running as root, just to prevent it from being a security hazard.

mixmastamyk · on Feb 18, 2019

So, it didn’t need root, but insisted on it? A MS binary no less.

Twirrim · on Feb 18, 2019

It was very much in keeping of the Microsoft of the era. Not out of maliciousness. Just a general lack of interest or knowledge of any non-Windows platform, but a recognition that if Frontpage was going to be as dominant as they wanted, they at least needed to vaguely support it.

Think the worst case of "Well it works on my machine"

pjc50 · on Feb 18, 2019

Ah, I remember how bad this era of Microsoft was.

Debian have a similar tool called "fakeroot" which is part of their packaging process.

saagarjha · on Feb 18, 2019

Was there no way to jail or chroot the binary?

q3k · on Feb 18, 2019

In the 'early 2000s' there was no security-focused containerization available on Linux.

segfaultbuserr · on Feb 18, 2019

There's a well-known libfaketime library, that can forge the current system time.

https://github.com/wolfcw/libfaketime

Here's my friend's LD_PRELOAD hack, it pushes the idea further: hooking gettimeofday() to make a program to think that the time goes faster or slower. Useful for testing.

https://github.com/m13253/clockslow

stillworks · on Feb 18, 2019

How about an entire business being setup around this little gem https://www.vornexinc.com/our-overview.htm

EDIT : Not libfaketime but the LD_PRELOAD recipe

frankc · on Feb 18, 2019

I believe libfaketime also supports speeding up and slowing down time

roghummal · on Feb 18, 2019

>Useful for testing.

And as a speed hack for Quake 2!

kreetx · on Feb 18, 2019

Yeah, that hack was a great insight!!

eclipxe · on Feb 18, 2019

Love it!

Slartie · on Feb 18, 2019

I've implemented some sort of "poor man's Docker" using LD_PRELOAD, back then in 2011 when Docker wasn't a thing. It works by overriding getaddrinfo (IIRC) and capturing name lookups of "localhost", which are then answered by an IP address that's taken from an env variable. The intended use is the parallelization of automated testing of a distributed system: by creating lots of loopback devices with individual IPs and assigning those to test processes (via the LD_PRELOAD hack), I could suddenly test as many instances of the software system next to each other as I wanted, on the same machine (the test machine is some beefy dual-socket server with lots of CPU cores and RAM). Each instance (which consists of clients and several processes that provide server services, thus they're by default configured to bind themselves to specific ports on localhost, as it is common for dev and test purposes) would then be able to route its traffic over its own loopback device, and I was spared of having to somehow untangle the server ports of all the different services just in order to be able to parallelize them on a single machine and of the configuration hell that would have come with this. It helped that processes by default inherit the env variables from their parents that spawned them - that made it a lot easier to propagate the preload path and the env variable containing the loopback IP to use. I just had to provide it to the top-most process, basically.

Today, one would use Docker for this exact purpose, putting each test run into its own container (or even multiple containers). But since the LD_PRELOAD hack worked so well, the project in which I implemented the above is still using it (although they're eyeing a switch to Docker, in part because it also makes it easier to separate non-IP-related resources such as files on the filesystem, but mostly because knowledge about Docker is more widespread than about such ancient tech as LD_PRELOAD and how to hack into name resolution of the OS).

matthewaveryusa · on Feb 18, 2019

Here's my ldpreload hack: rerouting /dev/rand to dev/urand -- because I disagree with gpg's fears on entropy. Now it's as fast as generating a private key with ssh-keygen or openssl:

https://github.com/matthewaveryusa/dev_random_fix

kelnos · on Feb 18, 2019

You can also just simply delete /dev/random and symlink it to urandom. Or delete it and create a character device at /dev/random that uses urandom's major/minor numbers.

MereInterest · on Feb 18, 2019

My layman's understanding of the two is that /dev/urandom will happily output more bits than it has been seeded with, and so is unsuitable for use in cryptography, as it can output correlated values. Is my understanding here incorrect?

segfaultbuserr · on Feb 18, 2019

(edit: I see my parent post is being downvoted. How can this be? The commenter is just asking a question...)

It is incorrect. Both /dev/urandom and /dev/random are connected to a CSPRNG. Once a CSPRNG is initialized by SUFFICIENT unpredictable inputs, it's forever unpredictable for (practically) unlimited output (something 2^128). If the CSPRNG algorithm is cryptographically-secure, and the implementation doesn't leak its internal state, it would be safe to use it for almost all cryptographic purposes.

However, the original design in the Linux kernel was paranoid enough, it blocks /dev/random (even if a CSPRNG can output unlimited random bytes) if the kernel thinks the the output has exceeded the estimated uncertainty from all the random events. Most cryptographers believe if a broken CSPRNG is something you need to protect yourself from, you already have a bigger trouble, and it's unnecessary from a cryptographic point-of-view to be paranoid about a properly-initialized CSPRNG. /dev/random found on other BSDs is (almost) equivalent to Linux's /dev/urandom.

However, /dev/urandom has its own issues on Linux. Unlike BSD's implementation, it doesn't block even if the CSPRNG is NOT initialized during early boot. If you automatically generate a key for, e.g. SSH, at this point, you'll have serious troubles - predictable keys, so reading from /dev/random still has a point, although not for 90% of the programs. I think it's a prefect example of being overly-paranoid about unlikely dangers, while overlooking straightforward problems that are likely to occur.

The current recommended practice is to call getrandom() system call (and arc4random()* on BSDs) when it's available, instead of reading from raw /dev/random or /dev/urandom. It blocks, until the CSPRNG is initialized, otherwise it always outputs something.

*and no, it's not RC4-based, but ChaCha20-based on new systems.

loeg · on Feb 18, 2019

> /dev/random found on other BSDs is equivalent to Linux's /dev/urandom.

This isn't quite true. The BSDs random (and urandom) block until initially seeded, unlike Linux's urandom. Then they don't block. (Like the getrandom/getentropy behavior.)

> The current recommended practice is to call getrandom() system call (and arc4random() on BSDs) when it's available, instead of reading from raw /dev/random or /dev/urandom. It blocks when the CSPRNG is initialized, but otherwise it always outputs something.

+1 (I'd phrase that as "blocks until the CSPRNG is initialized," which for non-embedded systems will always be before userland programs can even run, and for embedded should not take long after system start either).

segfaultbuserr · on Feb 18, 2019

Thanks for the correction, fixed. It was a bit difficult for me to paraphrase it...

nullc · on Feb 18, 2019

> it blocks /dev/random (even if a CSPRNG can output unlimited random bytes) if the kernel thinks the the output has exceeded the estimated uncertainty from all the random events. Most cryptographers believe if a broken CSPRNG is something you need to protect yourself from, you already have a bigger trouble,

Not just that, but if you have a threat model where you actually need information theoretic security (e.g. you're conjecturing a computationally unbounded attacker or at least a quantum computer)-- the /dev/random output is _still_ just a CSPRNG and simply rate limiting it doesn't actually make a strong guarantee about the information theoretic randomness of the output. To provide information theoretic security the function design would need to guarantee that at least some known fraction of the entropy going in actually made it to the output. Common CSPRNGs don't do this.

So you could debate if information theoretic security is something someone actually ever actually needs-- but if you do need it, /dev/random doesn't give it to you regardless.

[And as you note, urandom doesn't block when not adequately seeded ... so the decision to make /dev/random block probably actually exposed a lot of parties to exploit and probably doesn't provide strong protection even against fantasy land attacks :(]

bscphil · on Feb 18, 2019

> simply rate limiting it doesn't actually make a strong guarantee about the information theoretic randomness of the output. To provide information theoretic security the function design would need to guarantee that at least some known fraction of the entropy going in actually made it to the output. Common CSPRNGs don't do this.

This is an interesting point I hadn't thought about before, so thanks for that. I suppose if you're generating a OTP or something like that, there might be some small advantage to using /dev/random, but the probability of it making a difference is pretty remote.

The one thing I haven't been able to figure out is why Linux hasn't "fixed" both /dev/random and /dev/urandom to block until they have sufficient entropy at boot and then never block again. That seems like obviously the optimal behavior.

nullc · on Feb 20, 2019

Blocking could potentially result in the system getting stuck during boot and simply staying that way. Compatiblity is a bear. The getentropy syscall does the reasonable thing.

zeveb · on Feb 18, 2019

It's important to note here that Linux's behaviour is broken, plain and simple: /dev/random blocks even if properly seeded, and /dev/urandom doesn't block even if improperly seeded.

The Real Solution™ is to make /dev/random and /dev/urandom the same thing, and make them both block until properly seeded. And replace the current ad-hoc CSPRNG with a decent one, e.g. Fortuna. There were patches almost 15 years ago implementing this (https://lwn.net/Articles/103653/), but they were rejected.

There's simply no good reason not to fix Linux's CSPRNG.

kccqzy · on Feb 18, 2019

I think getrandom(2) is a fine choice, but if you are using the C library (as opposed to using asm directives to make syscalls), getentropy(3) is even better. No need to think about the third `flags` handler or read a long section about interruption by a signal handler.

int_19h · on Feb 18, 2019

Python had an internal fight of its own about how to handle it right, too.

https://www.python.org/dev/peps/pep-0524/

gizmo686 · on Feb 18, 2019

Yes and no. Much of cryptography is based on psuedorandom number generators, which output more bits than they are seeded with. If these PRNGs are not secure, then almost any piece of cryptography you actually use would be insecure independent of your choice to use random or urandom.

Unless all of your cryptography is information-theoretically secure, there is no problem using a PRNG.

If you happen to be using a an information-theoretically secure algorithm than you are theoretically weaker using a limited entropy PRNG; but there is no practical implications of this.

throwawaymath · on Feb 18, 2019

The only information theoretically secure encryption algorithm is a one-time pad seeded with true randomness. In fact, you cannot achieve information theoretic security using a pseudorandom generator of any kind.

gizmo686 · on Feb 18, 2019

Its not an encryption algorithm, but Shamirinformation theoretically secure 's secret sharing is also information theoretically secure.

Maskawanian · on Feb 18, 2019

Here's mine:

https://github.com/dsaul/UDELibXprop-Legacy

I was writing a dock program over a decade ago, and java programs didn't put the PID on the window, whereas everything else did.

Had to fix it somehow...

nemonemo · on Feb 18, 2019

I am curious. How do you know if this is secure or not? Is there any publication or article available for this slightly time-saving but potentially dangerous choice?

segfaultbuserr · on Feb 18, 2019

1. The official man page.

The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in all use cases, with the exception of applications which require randomness during early boot time; for these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.

2. https://www.2uo.de/myths-about-urandom/

tomjakubowski · on Feb 18, 2019

Not that I disagree with you, but which are the official man pages for /dev/urandom? It's my recollection that the advice therein varies from OS to OS.

segfaultbuserr · on Feb 18, 2019

This page is part of release 4.16 of the Linux man-pages project. A description of the project, information about reporting bugs, and the latest version of this page, can be found at https://www.kernel.org/doc/man-pages/.

And only Linux has /dev/urandom.

floatboth · on Feb 18, 2019

BSDs (incl. macOS) have /dev/urandom, but it's the same thing as /dev/random. Both don't ever block after they've been filled initially at boot time.

masklinn · on Feb 18, 2019

> this slightly time-saving but potentially dangerous choice?

The one and only danger is during the machine's boot process, because while /dev/random and /dev/urandom use the same data:

* on linux /dev/random has a silly and unfounded entropy estimator and will block at arbitrary points (used to be a fad at some point, but cryptographers have sworn off it e.g. Yarrow had an entropy estimator but Fortuna dropped it)

* also on linux, /dev/urandom never blocks at all, which includes a cold start, which can be problematic as that's the one point where the device might not be seeded and return extremely poor data

In fact the second point is the sole difference between getrandom(2) and /dev/urandom.

If you're in a steady state scenario (not at the machine boot where the cold start entropy problem exists) "just use urandom" is the recommendation of pretty much everyone: tptacek, djb, etc…

https://www.2uo.de/myths-about-urandom/

https://sockpuppet.org/blog/2014/02/25/safely-generate-rando...

http://blog.cr.yp.to/20140205-entropy.html (see bottom of page)

cesarb · on Feb 18, 2019

> In fact the second point is the sole difference between getrandom(2) and /dev/urandom.

AFAIK, there's another important difference: getrandom(2) doesn't use a file descriptor (so it'll work even if you're out of file descriptors, or in other situations where having an open fd is inconvenient), and it doesn't need access to a /dev directory with the urandom device.

loeg · on Feb 18, 2019

https://sockpuppet.org/blog/2014/02/25/safely-generate-rando...

(Note, that's from 2014; today I would recommend getrandom() instead.)

feisuzhu · on Feb 18, 2019

You can just replace the /dev/random device file with /dev/urandom

segfaultbuserr · on Feb 18, 2019

Why don't you replace it system-wide?

matthewaveryusa · on Feb 18, 2019

It's a shared system and I'm not going to prohibit the free exercise of other people's religions even though mine is clearly better.

mikepurvis · on Feb 18, 2019

Agree, couldn't this be done with a bind mount or something?

segfaultbuserr · on Feb 18, 2019

The catch is old programs that depend on the blocking behavior of /dev/random during early boot could be an issue. Unlikely to be a problem on a server, though...

masklinn · on Feb 18, 2019

If they're doing the replacement at runtime, the cold start is probably long done with.

jrockway · on Feb 18, 2019

A bind mount? Just open up the source code and add a "u" where necessary.

vageli · on Feb 18, 2019

> A bind mount? Just open up the source code and add a "u" where necessary.

So, recompile every program and every subsequent update to use this functionality or...bind mount. One of these sounds easier than the other.

Twirrim · on Feb 18, 2019

One useful tool for the toolbox, to be used very carefully, after thinking about the consequences:

libeatmydata: https://github.com/stewartsmith/libeatmydata

It disables fsync, o_sync etc, making them no-ops, essentially making the programs writes unsafe. Very dangerous. But very useful when you're trying to bulk load data in to a MySQL database, say as preparation of a new slave (followed by manual sync commands, and very careful checksumming of tables before trusting what happened)

nicoburns · on Feb 18, 2019

Useful for running tests against a throwaway MySQL database too!

cryptonector · on Feb 18, 2019

I've been using LD_PRELOAD for fun and profit for a long, long time. Its simplicity is due to the simplicity of the C ABI. Its power is due to dynamic linking.

C is one programming language. C w/ ELF semantics and powerful link-editors and run-time linker-loaders is a rather different and much more powerful language.

I won't be sad to see Rust replace C, except for this: LD_PRELOAD is a fantastic code-injection tool for C that is so dependent on the C ABI being simple that I'm afraid we'll lose it completely.

saagarjha · on Feb 18, 2019

I don't see the C ABI going anywhere for a while.

ec109685 · on Feb 18, 2019

Doesn’t rust use the C ABI under the covers?

tomjakubowski · on Feb 18, 2019

You can easily write and call functions that abide the C ABI in Rust, but the set of types permitted in those signatures is much smaller (only #[repr(C)]-compatible types) than in ordinary Rust functions. The Rust ABI is more complicated and won't be stabilized anytime soon.

pferde · on Feb 18, 2019

Holy crap, was that article annoying to read! The author should really cut back on meme image macros.

larrywright · on Feb 18, 2019

This is Jess’ personality - check out her Twitter account. I don’t mind it, but I’ve followed her for a while so I’m used to it. Honestly I find it to be a refreshing break from typical the typically stiff writing I see. She’s smart and doesn’t need to hide behind stodgy writing in order to make herself seem smarter.

pferde · on Feb 22, 2019

I don't really mind it on twitter, as twitter is anything but serious, and you can't really have any coherent text there.

But such elements in a regular article simply harm its coherence and readability for anyone that does not spend much of their time in (rather noisy and immature, IMHO) communities which feature "meme image macros" heavily.

(Bonus negative points if some of the images are animated. That makes me think that the author actively hates the readers.)

kreetx · on Feb 18, 2019

Since I knew what LD_PRELOAD did then I was rather amused that the author had a similar epiphany over it as I did many years ago, which made it a great read -- something to relate to.

I agree that the article is emotional, but the annoyance or not is so very subjective.

JeanMarcS · on Feb 18, 2019

And the intro going for about 1/3 of the total article before even knowing what we are talking about (yeah, I click on article when I’m intrigued by the title, when it seems programming related)

d99kris · on Feb 18, 2019

I'll also join in and share my projects using LD_PRELOAD. These also work on macOS through its equivalent DYLD_INSERT_LIBRARIES.

https://github.com/d99kris/stackusage measures thread stack usage by intercepting calls to pthread_create and filling the thread stack with a dummy data pattern. It also registers a callback routine to be called upon thread termination.

https://github.com/d99kris/heapusage intercepts calls to malloc/free/etc detecting heap memory leaks and providing simple stats on heap usage.

https://github.com/d99kris/cpuusage can intercept calls to POSIX functions (incl. syscall wrappers) and provide profiling details on the time spent in each call.

sherincall · on Feb 18, 2019

I recently gave a small talk about this and listed the applications I had for it in the past few years:

- Test low memory environment

- Add memory tracking

- Ignore double frees (fix broken programs)

- Cache allocations / lookaside lists

- Trace all file operations

- Seamlessly open compressed files with fopen()

- Speed up time() as program sees it

- Offset time() to bypass evaluation periods

- Alternative PRNG

- Intercept/reroute network sockets

- Trace various API calls (useful when debugging graphics APIs)

- Force parameters to some API calls

- Set a custom resolution not supported by program

- Switch between HW and SW cursor rendering

- Framelimiting and FPS reporting

- Replace a library with a different one through a compat layer

- Frame buffer postprocessing (e.g. reshade.me)

- Overlays (e.g. steam)

E: format

int_19h · on Feb 18, 2019

For a Windows equivalent:

https://github.com/Microsoft/Detours/wiki

It's a bit more unwieldy to use, because it doesn't just replace all matching symbols (it's not how symbol lookup works for DLLs in Win32) - the injected DLL has to be written specifically with Detours in mind, and has to explicitly override what it needs to override. But in the end, you can do all the same stuff with it.

andrewf · on Feb 19, 2019

the injected DLL has to be written specifically with Detours in mind

Does it? I've almost (ie: haven't :P) used Detours but https://github.com/Microsoft/Detours/wiki/OverviewIntercepti... reads like it can rewrite standard function prologues.

int_19h · on Feb 19, 2019

I didn't phrase that unambiguously -"injected DLL" in this case means "the DLL with new code that is injected", not "the DLL that the code is being injected into". With LD_PRELOAD, all you need to override a symbol is an .so that exports one with the same name. With Detours, you need to write additional code that actually registers the override as replacing such-and-such function from such-and-such DLL. But yes, the code you're overriding doesn't need to know about any of that.

andrewf · on Feb 20, 2019

Ah, I get it. Thanks.

nsteel · on Feb 18, 2019

Librespot uses LD_PRELOAD to find and patch the encryption/decryption functions used in Spotify's client so the protocol can be examined in wireshark (and ultimately reverse engineered). I am not the original author, he wrote a MacOS version using DYLD_INSERT_LIBRARIES to achieve something similar.

https://github.com/librespot-org/spotify-analyze/blob/master...

aboutruby · on Feb 18, 2019

In Ruby world a lot of people use LD_PRELOAD to change the default malloc to jemalloc (or tcmalloc): https://github.com/jemalloc/jemalloc

saagarjha · on Feb 18, 2019

Another interesting preloaded library is stderred, which turns output to standard error red: https://github.com/sickill/stderred

disqard · on Feb 18, 2019

I once used LD_PRELOAD to utilize an OpenGL "shim" driver (for an automated test suite). The driver itself was generated automatically from the gl.h header file.

gnufx · on Feb 18, 2019

If everyone is giving examples, of LD_PRELOAD — it has serious production use at scale in HPC, particularly for profiling and tracing. Runtimes such as MPI provide a layer designed for instrumentation to be interposed, typically with LD_PRELOAD (e.g. the standardized PMPI layer for MPI). Another example is the entirely userspace parallel filesystem that OrangeFS (né PVFS2) provides via the "userint" layer interposing on Unix i/o routines. That sort of facility is a major reason for using dynamic linking, despite the overheads of dynamically loading libraries for parallel applications at scale. I'm not sure if a solution could be hooked in with LD_PRELOAD, but Spindle actually uses LD_AUDIT: https://computation.llnl.gov/projects/spindle

tom_mellior · on Feb 18, 2019

I like the use of LD_PRELOAD in this paper: Long et al., Automatic Runtime Error Repair and Containment via Recovery Shepherding, PLDI 2014, http://people.csail.mit.edu/rinard/paper/pldi14.pdf

The authors have a small library that sets up some signal handlers for things like divide by zero and segmentation faults. They LD_PRELOAD this library when starting a buggy binary (they test things like Chromium and the GIMP), and when the program tries to divide by zero or read from a null pointer, their signal handlers step in and pretend that the operation resulted in a value of 0. The program can then carry on without crashing and usually does someting meaningful. Tadaa, automatic runtime error repair!

floatboth · on Feb 18, 2019

My favorite: https://github.com/musec/libpreopen is a library for adapting existing applications that open() and whatnot from all over everywhere to the super strict capability based Capsicum sandbox on FreeBSD. I'm working on https://github.com/myfreeweb/capsicumizer which is a little wrapper for launching apps with preloaded access to a list of directories from an AppArmor-like "profile".

raincom · on Feb 18, 2019

LD_PRELOAD is extremely helpful in troubleshooting libraries. Around 2007, qsort on RHEL was slower than SUSE. I raised a case with Redhat along with a test case; but Redhat was not helpful, as it was not reproducible.

So, I copied glibc.so from a SUSE machine to that RHEL machine and ran the test case with LD_PRELOAD, compared with the RHEL glibc. I showed these results to Redhat. Eventually, a patch was applied to glibc on their side.

badrabbit · on Feb 19, 2019

I personally just hate LD_PRELOAD because it's very difficult to turn it off and keep it off. I am glad others find uses for it and that's great,but I hate the privilege escalation attack surface it opens up,I get that it has uses,but there needs to be a simple way to disable it for hardened systems.

djhworld · on Feb 18, 2019

I feel like I've learned something new from this, I'd never heard of this before.

Would this work with Go or Rust binaries?

cyphar · on Feb 18, 2019

LD_PRELOAD only works for binaries that are dynamically linked (LD_PRELOAD is actually handled by the link loader not the kernel[1]), and you can only use it to overwrite dynamic symbols IIRC.

It definitely doesn't work with Go, and Rust might work but I'm not sure they use the glibc syscall wrappers.

[1]: http://man7.org/linux/man-pages/man8/ld.so.8.html

steveklabnik · on Feb 18, 2019

Rust uses glibc by default; you can use MUSL but you have to opt in.

cyphar · on Feb 18, 2019

I'm aware of that, I guess my point was that Rust probably doesn't use a lot of glibc (like most C programs would) so the utility of LD_PRELOAD is quite minimal.

I don't know enough about .rlib to know whether you could overwrite Rust library functions, but that's a different topic.

steveklabnik · on Feb 18, 2019

Rust uses glibc to call into the kernel like anything else. The standard library is built on top of it.

cyphar · on Feb 18, 2019

Right, but does that mean it's only used as a way of getting syscall numbers (without embedding it like Go does) or is it the case that you could actually LD_PRELOAD random things like nftw(3) and it would actually affect Rust programs? I'll be honest, I haven't tried it, but it was my impression that Rust only used glibc for syscall wrappers?

steveklabnik · on Feb 18, 2019

We don’t provide nftw like functionality in std, and so you can’t replace it as it would have never even been called. But for example, malloc and free are used, not sbrk directly: https://github.com/rust-lang/rust/blob/master/src/libstd/sys...

floatboth · on Feb 18, 2019

Not that it matters which particular libc you use, any libc is going to have dynamic symbols named 'open', 'read' etc. that can be hooked :)

steveklabnik · on Feb 18, 2019

MUSL is statically linked, and so you don’t have those dynamic symbols. That’s the point!

floatboth · on Feb 18, 2019

oh. I understand why that's the default, but it should be possible to dynamically link musl on a distro like Alpine, right?

cyphar · on Feb 19, 2019

Musl supports dynamic linking. But it also supports static linking (which glibc doesn't really support because of NSS and similarly fun features) -- hence why Rust requires musl to statically link Rust binaries.

steveklabnik · on Feb 18, 2019

My understanding is that there’s some complications there, but I’m not fully aware of all the details, honestly.

AndyKelley · on Feb 18, 2019

Here's mine: https://github.com/andrewrk/malcheck/

It uses LD_PRELOAD and a custom malloc so that you can find out all the horrible ways that application developers did not plan to run out of memory.

loeg · on Feb 18, 2019

We use a sort of similar trick (not via LD_PRELOAD, though) to inject faults in M_NOWAIT malloc() calls in the FreeBSD kernel. FreeBSD kernel code tends to be a bit better than most userspace code I've seen as far as considering OOM conditions, though it is not perfect.

hendry · on Feb 18, 2019

Isn't LD_PRELOAD's hipness outweighed by the Pandora's box of security issues it gives rise to?

regecks · on Feb 18, 2019

I know that the linker will ignore LD_PRELOAD for suid binaries, what other kinds of issues are there?

FartyMcFarter · on Feb 18, 2019

What security issues are those? Is there anything you can do with LD_PRELOAD that you cannot do in other ways such as modifying binaries before executing them?

craftyguy · on Feb 18, 2019

As a regular ol' GNU/Linux user, you cannot modify binaries in /usr/bin (or /bin), but you can definitely influence their behavior by "LD_PRELOAD=blah /usr/bin/thing".

Godel_unicode · on Feb 18, 2019

Except you (non-root) can only do that for yourself, and thus you can only make them do things you could make them do anyway.

yjftsjthsd-h · on Feb 18, 2019

If you can do that, you can (generally) do `cp /bin/foo ./ && modify foo && ./foo`

0xbadcafebee · on Feb 18, 2019

It depends on assumptions in the way a system is hardened. For example, a home directory mounted noexec. In theory, LD_PRELOAD will not mmap a file in a noexec area. But if you can find an installed library with functions that mirror some other application you have, and you can LD_PRELOAD that library before executing the target application, you might be able to force the library to call unexpected routines. (That's a stretch, granted)

Another would be possible RCE. Say you can get a server-side app to set environment variables, like via header injection. Then say you can upload a file. Can you make that server-side app set LD_PRELOAD to the file, and then wait for it to execute an arbitrary program?

G4Vi · on Feb 18, 2019

I needed to calculate the potential output file size tar would produce, so what better way than using tar itself to calculate it. It just required hooking read, write, and close.

https://github.com/G4Vi/tarsize

jamesu · on Feb 18, 2019

I’ve found LD_PRELOAD immensely useful patching new code into binaries, saves the pain of trying to squeeze code into the existing binary.

piyush_soni · on Feb 18, 2019

Wow, that looks like a security nightmare.

bouk · on Feb 18, 2019

It doesn’t work with setuid binaries

yjftsjthsd-h · on Feb 18, 2019

Meh? It only works on your own programs.

jannes · on Feb 18, 2019

What do you mean? There are many examples of people using LD_PRELOAD to patch the behaviour of other's binaries.

slrz · on Feb 18, 2019

Sure, but not across a security boundary.

Being able to override some library function such that running my text editor does $BADTHING isn't very interesting from a security perspective: if I have the capability to do that, I could also just run a program that does $BADTHING directly. Why bother with additional contortions to involve the text editor?

hawski · on Feb 18, 2019

Malicious program without LD_PRELOAD can still copy the binary to other folder and sufficiently change the menu to point to the copy. Then modify the copy by binary patching to do whatever. Or run it via modified qemu to do whatever. The main problem is the lack of a proper sandbox and that all programs in user session generally have the same permissions.

yjftsjthsd-h · on Feb 18, 2019

No, I mean programs that you are executing as your own user.

asynch8 · on Feb 18, 2019

I remember back in the day when LD_PRELOAD first became the go-to userland rootkitting method

tinix · on Feb 18, 2019

audio generation from malloc and read: https://github.com/gordol/ld_preload-sounds

also for OSX: DYLD_INSERT_LIBRARIES

ndesaulniers · on Feb 18, 2019

One issue on OSX is multi-level namespaces; I had to recompile with a flag to disable them in order to hook malloc/free, for example.

tinix · on March 1, 2019

Yeah if you dig you'd find this: https://github.com/gordol/ld_preload-sounds/issues/4#issueco... ;)

Hackbraten · on Feb 18, 2019

In case you can’t recompile, you should be able to do the same thing with `DYLD_FORCE_FLAT_NAMESPACE=1`.

ndesaulniers · on Feb 21, 2019

Where have you been all my life?

nailer · on Feb 18, 2019

One other useful thing : making a trash can for Linux, by wrapping unlink

a_sink · on Feb 18, 2019

If I were to provide ld preload based security cover (take any binary and secure it with ld-preload), will that be acceptable to corporates? Or does that increase the attack surface?

saagarjha · on Feb 18, 2019

How do you plan to secure a binary with LD_PRELOAD?

vectorEQ · on Feb 18, 2019

one feature to break all software. superb

0xbadcafebee · on Feb 18, 2019

Wait until they find out how Go apps work...

graememcc · on Feb 18, 2019

Given Jess used to work at Docker, much of which is written in Go, and has spoken at various Go conferences, she already knows.

akhilcacharya · on Feb 18, 2019

...what's this a reference to?

yjftsjthsd-h · on Feb 18, 2019

Static linking, which makes this trick not work.

saagarjha · on Feb 18, 2019

And the fact that Go will embed raw syscalls in its binaries, which is also somewhat annoying.

trasz · on Feb 18, 2019

Which is unsupported on anything other than Linux and various BSDs.

floatboth · on Feb 18, 2019

On FreeBSD, it's supported but kinda sucks. e.g., porting to a new CPU architecture is hell (I contributed to the FreeBSD/aarch64 go port, someone else picked it up now…)

The libc is the stable ABI on pretty much any OS that's not called Linux, just use it.

yjftsjthsd-h · on Feb 18, 2019

Okay, but now you're using a C library as the basis for your non-C programming language. I understand why it is that way, but that kind of sucks.

floatboth · on Feb 18, 2019

Ehh… does it kind of suck? The ABI of libc's syscall wrappers is basically "here's some ELF symbols to call with some arguments using the operating system's preferred calling convention". The only really "C" thing about it, other than the name, is struct layouts of various arguments.