Hacker News new | past | comments | ask | show | jobs | submit login

It is sometimes used to allow one binary to be the symlink target of hundreds of commands.

Android does this for most common shell commands. Toybox and busybox are examples of such implementations.

https://github.com/landley/toybox

https://en.m.wikipedia.org/wiki/BusyBox




I just learned that rustup/rustc/cargo etc. work like this too. I couldn't understand why the gentoo formula was symlinking the same binary to a bunch of aliases.


On my system, these are hardlinks (regular files with a link count >1 and the same inode) rather than symlinks, though I'm not sure why.


Maybe to avoid broken links if you move the original files? That's the main benefit of hardlinks vs symlinks in my mind at least.


That can also be a downside, you believe you have moved stuff but now you can have different versions of programs that don't expect that to be a possibility.


If there is a simlink, a hardlink and an executable, all with the same name, which one will it run? Which one will the shell object to? Which one should the shell object to. If a virus/SUID program overwrites a simlink, no problem, but ift it traces the simlink to the executable, and then over writes that...


And that makes a lot of sense, especially for binaries that are statically linked (as usually are Rust binaries), since that could save a lot of disk space!


clang does this too.


Also if you want a program to call itself, which is sometimes useful, this way lets you actually call the same program, rather than assuming the name and path.


Don't do this - if you (reliably) want the path to the current executable there is no portable way to do it, but on Linux you need to readlink /proc/self/exe and on MacOS you call _NSGetExecutablePath. I forget the API on Windows.


I would not say it in such absolute way - /proc/self/exe has downsides as well. As this resolves all symlinks, so this breaks all the things that depend on argv[0], like nice help messages, python's virtualenv, name-based dispatch, and seeing if the program which was executed via symlink or not.

A lot of times you know you never called chdir(), in which case I'd actually recommend executing argv[0], as this is nicest thing for admins. If you are really worried, you can use /proc/self/exe for progname and pass argv[0] as-is, but that's overkill a lot of times.


Those are all cases where you're using argv[0] as an argument to the program where it's appropriate. Using it as the path to spawn a child process is incorrect. You're free to re-use it as an argument.

I have fixed enough software that made this mistake that I'm confident to be absolute about it. It's a very easy mistake to make but it's really annoying when software makes it and someone needs to deal with it at a higher level. It's better for developers to know that argv[0] isn't the path to the executable it's what was used to invoke the executable.


What’s the issue with using argv[0] as a way to spawn yourself? I don’t recall running into a lot of issues.


If it's a relative path, then changing the working directory will break (chdir("/") is a very common tactic at the top of main()).

It's possible/desirable for the parent to change the PATH of a child process, particularly one that spawns other processes. So the argv[0] used to spawn the original process may be garbage for spawning children.

Similarly in any kind of chroot jail (which may or may not be docker these days), relative paths and PATH can be garbage even if they don't change.

The real problem is that I've seen in-house and open source frameworks/libraries that have a function like `get_executable_path` that reads `argv[0]` and this is just incorrect behavior. Spawning yourself is one of the less risky things you can do, but there are gotchas and a way to avoid them!


Hmm... I generally have so many issues with chdir (e.g. someone gives you a relative path to a file you need to read and now that's screwed up because you did a previous chdir) that I just avoid all use of it in the first place.

Generally don't run into chroot all that often these days & docker gives you a fully virtualized environment where if a relative path is garbage then you may have other problems too (e.g. given relative paths to files). You certainly have to be careful around chroot / docker anyway as I think resolving /proc/self/exe probably is dangerous too for all the same reasons and you need to be careful to use the literal "/proc/self/exe" string for the spawn command and also require that /proc is mounted and remember to pass through argv[0] unmolested (or mutating as needed depending on use-case).

There's enough corner cases that I'd hesitate given blanket advice as it requires knowing your actual execution environment to a degree that there's lots of valid choices that aren't outright "wrong". And some software may be portable where argv[0] is a fine choice that works 90% of the time without worrying about maintaining a better solution on Linux.


It's very common for daemons/servers to chdir("/") at the top of main. Relative paths sent by clients getting broken is a feature, not a bug. (In fact I just fixed a bug related to this an hour ago because a relative path was not being canonicalized before being passed to the daemon I'm working on and it caused a file to be written to the wrong place).

There's no way create a process such that /proc/self/exe is incorrect except if the process itself performs a chroot, or someone has overwritten what it points to. I'm talking about some other program running the process where those challenges don't show up.

> . And some software may be portable where argv[0] is a fine choice that works 90% of the time without worrying about maintaining a better solution on Linux

Except it's broken on MacOS and Windows, too!

I'm pretty confident saying that if you want to get the path to an executable, use the bespoke method for your platform because it ain't argv[0]. I have seen that codepath break so many times that there should just be a standard library method for it (and there often is, depending), and I have written this function at several companies.

There are not any edge cases that I'm aware of, except for a few esoteric ones. But there are quite a few edge cases for using argv[0], they exist on all platforms, and it's very annoying for people that have to fix or work around it because a software author didn't understand what argv[0] was.


For the c-programmer adding a dependency is so difficult that he would rather use a roll his own 99% solution than use a library. It does protect him from supply chain attacks, I suppose.


> It's very common for daemons/servers to chdir("/") at the top of main. Relative paths sent by clients getting broken is a feature, not a bug.

I instead put that in the lauhch script / systemd policy. That way when I run the server locally for development weird shit doesn’t happen in my root.


yesh, that's why my post literally said, "A lot of times you know you never called chdir()..."

Sure, don't put this in the library, but there is nothing wrong with using it in the app where you know no one makes this call.


I think you forget the exec system call’s first argument is a path to an executable, followed by an array of arguments, where arg[0] lives.

I can’t find issue with exec(“/proc/self/exe”, [ program , … ).


Well, it could be for example that /proc is not mounted. A lot of software breaks for this, while really there is no need for it to be so. Also that approach only works on Linux, if you want to write a portable software what you do?


I am mainly pointing out that arg[0] is still valid. Writing portable software is an entirely different topic.


Note though that both of these solutions are racy and so should not be done if "someone symlinking really fast and swapping the binaries" is in your threat model. Linux proc/self is safe though, just not the result from readlink.


Well that's true, but also something that can't be addressed within a currently running process afaik.


There's also this very handy and tiny cross-platform library:

https://github.com/gpakosz/whereami


Four cardinal sins of programming: 1. Self modifying code. ( The word 'recalcitrant' comes to mind. 2. calling your own program to execute itself. 3. Interrupting the flow of control with a jump. 4. Non-graceful exit. 5. Renaming 'hack' as 'vi' or 'ps'


There's no guarantee that the name and the path are still the same executable that is running, or that they even exist anymore.


In most of the variants of exec*() there are separate arguments for the thing to be executed and the *argv[] list. Argv[0] being the executable is just a convention. In perl $ARGV[0] is the first positional parameter. In

    $ perl myscript.pl a b c
$ARGV[0] is "a".


I mean sure. All software is built on assumptions. Make sure the assumptions you’re making are appropriate in context.


Unless you are on Windows


You can actually rename an executable that is running, on Windows. That's a way to handle self updates: rename the executable, create its replacement, execute the new one to make it remove the old executable.


Beware TOC TOU problems when doing this.


You can do this without assuming the name by execing /proc/$PID/exe. Then you're not vulnerable to the argv[0] spoofing described in the article. (But of course since argv[0] does exist, you should set it properly and pass through your own argv[0] unchanged.)


That's not portable, though. OpenBSD, for example, doesn't have /proc.


That’s Linux only. Wouldn’t even work on macOS, which would likely be a significant number of your users.


coreutils-static did this too. The advantage of shared libraries and multiple-use single static binaries is they're only loaded once.


The article discusses this.




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: