I think you mean the LGPL? It allows you to "convey a combined work under terms of your choice" as long as the LGPL-covered part can be modified, which can be achieved either via dynamic linking or by providing the proprietary code as bare object files to relink statically. The GPL doesn't have this exception.
First IANACC (I'm not a compiler programmer), but this is my understanding:
What do you mean by interface?
A dynamic library is handled very different from a static one. A dynamic library is loaded into the process virtual memory address space. There will be a tree trace there of loaded libraries. (I would guess this program walks this tree. But there may be better ways i do not know of that this program utilize)
In the world of gnu/linux a static library is more or less a collection of object files. The linker, to my best knowledge, will not treat the content of the static libraries different than from your own code. LTO can take place.
In the final elf the static library will be indistinguishable from your own code.
My experience of the symbole table in elf files is limited and I do not know if they could help to unwrap static library dependencies. (A debug symbol table would of course help).
The tool is interesting, but doesn't account for the fact that some shared libraries opened via dlopen are done so lazily. So it might miss those if you haven't executed a code path that triggers them to load.
The other side of not accidentally loading more into your process than you thought is breaking down shared libraries into increasingly smaller sizes. In its limit I imagine it would be akin to a function per shared library, which probably defeats the point a bit.
That is true, but the root comment was specifically referring to dependencies of dlopen'ed libs not getting loaded. That one is 'fixable'.
(Btw, I'm pretty sure dlopen itself can't be lazy, due to needing to run constructors; the root comment is a bit vaguely worded... but ofc that only matters after dlopen is called.)
Really the big finding here is that Xlib et. al. get pulled in to GPU compute tools, because access to GPU contexts has traditionally been mediated by the desktop subsystem, because the GPU was traditionally "owned" by the rendering layers of the device abstraction.
The bug here is much more a changing hardware paradigm than it is an issue with shared library dependencies that recapitulate it. Things moved and the software layers kludged along instead of reworking from scratch.
Obviously what's needed is a layer somewhere in the device stack that "owns" the GPU resources and doles them out to desktop rendering and compute clients as needed, without the two needing to know about each other. But that's a ton more work than just untangling some symbol dependencies!
I was debugging a crash in vlc today - actually in the Intel VDPAU driver - and debuginfod (which dynamically downloads the debuginfo for everything in a coredump) took a good 15 minutes to run. If you look at the 'ldd /usr/bin/vlc' output it's only about 10 libraries, but it loads dozens and dozens more dynamically using dlopen, and I think probably those libraries dlopen even more. This tool could be pretty useful to visualise that.
The glibc separation into multiple shared libraries is such a weird thing. Anyone happen to know how that happened? See musl for an example where they put it all in one lib and thus avoid a whole pile of failure modes.
However, that "requirement" doesn't prevent you from shipping an empty libm (or other libs listed there.)
(The actual reason is probably that glibc is old enough to have lived in a time where you cared about saving time and space by not linking the math functions when you didn't need them...)
We're in this situation because we're using a model of dynamic linking that's decades out of date. Why aren't we using process-isolated sandboxed components talking over io_uring-based low-latency IPC to express most software dependencies? The vast majority of these dependencies absolutely do not need to be co-located with their users.
Consider liblzma: would liblzma-as-a-service really be that bad, especially if the service client and service could share memory pages for zero-copy data transfer, just as we already do for, e.g. video decode?
Or consider React Native: RN works by having an application thread send a GUI scene to a renderer thread, which then adjusts a native widget tree to match what the GUI thread wants. Why do these threads have to be in the same process? You're doing a thread switch anyway to jump from the GUI thread to the renderer thread: is switching address spaces at the same time going to kill you? Especially if the two threads live on different cores and nothing has to "switch"?
Both dynamic linking and static linking should be rare in modern software ecosystems. We need to instead reinvigorate the idea of agent-based component systems with strongly isolated components.
And if you have one per core anyway so nothing "switches"? Computers aren't single-core 80486es anymore. We have highly parallel machines nowadays and old intuition about what's expensive and what's cheap decays by the year.
I only have 16 cores. Linux, windows and macOS already load about 50+ processes at startup. If we moved shared libraries into their own processes, we’d be talking hundreds or thousands of processes running all the time. They don’t get a core each.
But, if you’re interested in this architecture, smalltalk did something similar. Fire up a smalltalk vm and play around!
> Why aren't we using process-isolated sandboxed components talking over io_uring-based low-latency IPC to express most software dependencies?
To some extent we are, if what you do is work on backend RPC or web app frameworks.
But the better answer is because sometimes what you actually want is the ability to put a C function in a separate file that can be versioned and updated on its own, which is what a shared library captures. Trying to replace a function call of 2-3 instructions with your io_uring monstrosity is... suboptimal for a lot of applications.
And in any case, the protocol parsing you'd need to provide to enable all that RPC is going to need to live somewhere, right? What is that going to be, other than a shared library or equivalent?
currently involved professionally in a software architecture based on pretty much raw shared memory IPC, it's still too slow compared to in-process.
See also VST hosts that allow grouping plug-ins together in one process or separating them in distinct processes, like Bitwig: for just a few dozen plug-ins you can very easily get 10+% of CPU impact (and CPU is an extremely dire commodity when making pro audio, it's pretty much a constant fight against high CPU usage in larger music making sessions)
Why? Relative to the in-process case, properly done multi-process data flow pipelines don't necessarily incur extra copies. Sure, switching to a different process is somewhat more expensive than switching to a different thread due to page table changes, but if you're doing bulk data processing, you amortize any process-separation-driven costs across lots of compute anyway --- and in a many-core world, you can run different parts of your system on different cores anyway and get away with not paying context-switch costs at all.
Also, 10% is actually a pretty modest price to pay for increased software robustness and modularity. We're paying more than that for speculative execution vulnerability anyway. Do you run your fancy low-level audio processing pipeline with "mitigations=off" in /proc/cmdline?
> Also, 10% is actually a pretty modest price to pay for increased software robustness and modularity. We're paying more than that for speculative execution vulnerability anyway.
it's a completely crazy price to pay in a field where people routinely spend thousands of $$$ for <5% improvement
> Do you run your fancy low-level audio processing pipeline with "mitigations=off" in /proc/cmdline?
obviously yes! along with power saving CPU C-states or anything throttling-related disabled, specific real-time IRQ and threading configuration (e.g. making sure that the sound card interrupts aren't going to happen on a core handling network interrupts) and two dozen other optimizations (which do make a difference, I regularly set-up new machines from scratch for shows, art installations, etc. and always do this setup step-by-step to see if things are finally "good enough" and they always make a difference, in really a make-or-break sense).
Someone got a microservice hammer, so everything looks like a nail eh?
What is really needed, is sane memory model where you can easily call any function with buffers (pointer + size) and it is allowed to access only these buffers and nothing else(note). Not this mess coming from C where this is difficult by design.
(note)since HN likes to split hairs: except for its private storage and other well thought exceptions
I don't understand why this idea keeps failing to take hold even though it's constantly reintroduced in various forms. Surely now, 30 years after that paper was published, we can bear the "slightly increased execution time for distrusted modules" in return for (as the paper suggests) faster communication between isolated modules?
No modern processor architecture has a proper message passing mechanism. All of them expect you to use interruptions; with it's inherent problems of losing cache, disrupting pipelines, and well, interrupting your process flow.
All the modern architectures are also so close to have a proper message passing mechanism that it's unsettling. You actually need this to have uniform memory in a multi-core CPU. They have all the mechanisms for zero copy sharing of memory, enforcing coherence, atomicity, etc. AFAIK, they just lack a userspace mechanism to signal other processes.
> This doesn't seem to touch on any of the points on my comment.
What points did your comment make? You didn't define "signalling" specifically enough to discuss. Can you elaborate on precisely what kind of "signaling" primitive processors or operating systems should provide?
If the goal is to put a security boundary within the process between libraries, there might be better ways to do it than process boundaries. One approach is to wasm sandbox library code. Firefox apparently does this - compiling some libraries to wasm, then compiling the wasm back to C and linking it. They get all the benefits of wasm but without any need to JIT compile the code.
Another approach would be to leverage a language like rust. I’d love it if rust provided a way to deny any sensitive access to part of my dependency tree. I want to pull a library but deny it the ability to run unsafe code or make any syscall (or maybe, make syscalls but I’ll whitelist what it’s allowed to call). Restrictions should be transitive to all of that library’s dependencies (optionally with further restrictions).
Both of these approaches would stop the library from doing untoward things. Way more so than you’d get running the library in a separate process.
A big issue with IPC is thread scheduling. Thread B needs to get scheduled to see the request from thread A and thread A needs to get scheduled to see the response from thread B. I think there are WIP solutions to deal with this [1] this but I'm not up to date.
This was roughly the dream of DBus. However, outside of desktop-shaped niches it proved to be extremely difficult to secure, standardize, and debug.
Process-level/address-space-level dependency sharing remains both easier to think about and simpler to implement (and capabilities are taking bites out of the security risks entailed by this model as time goes on).
I don't think this stuff is as hard as you make it out to be. Consider that companies like Apple regularly change how things are done successfully. It just requires a good plan, time, & budget. Wayland is one example of what that looks like & it's not a good story. Pulse audio followed by pipewire is another example of migrations happening. I suspect this would probably be slightly worse than Wayland unless some kind of transparent shim could be written for each boundary so that it can be slotted in transparently.
Apple can regularly change how things are done because they have absolute control over their platform and use an "our way or the highway" approach to breaking changes, where developers have to go along or lose access to a lucrative market. This approach really really would not work on Linux: consider that the rollout of systemd was one-tenth as dictatorial as is SOP for changes from Apple, and it caused legions of Linux users to scream for Poettering's head on a stick.
That's not a path dependency though. That's just a critique of the bazaar development model. And honestly I think if the big distros got together and agreed this would be a significant security improvement, they could drag the community kicking & screaming just like they did with systemd (people hated systemd so much at first that they tried to get other init systems to not suck but over time persistent effort wins out).
You would replace function calls with syscalls ? Yeah well, if you omit performance and complexity, why not ..
io_uring is nice. Yet my application have to call that lzma function and get the result now : you now add cross-process synchronization (via the kernel => syscall) as well as insert the scheduler in the mix
You can load libs at runtime with dl-open, for exemple if you need a feature you can load the corresponding lib. While with dynamic linking you'd load everything when the process is launched slowing the launch
A more obscure use would be for loading multiple instances of a singleton library. This is especially helpful in something like a unit test suite, where you want each test case to start in a cleanly initialized state. If the code under test has a bunch of globally initialized variables, reloading the library at runtime is one of only a few possible ways of doing it.
Just wait until Lennart pushes his idea of doing linking entirely via dlopen() in systemd (see the story from a few days ago). Last bits of sane and efficient means to track dependencies will be gone forever after that. Good luck creating any lean Docker/k8s images without pulling in systemd-based stack after that.
Thank you. So is this a Vulkan emulator which does not send the commands into a software renderer but rather to the host's GPU? What reserves the resources on the GPU, also this driver? Can one reserve resources explicitly through the API or does this happen dynamically, as-needed? Because if explicitly, then I'd wonder if this is also part of the library, of the Vulkan spec, or if it is some Mesa offering.
It's basically allowing you to use the vulkan API on the virtualized guest, by writing vulkan API commands in a ring buffer in memory that is visible both by guest and host. These memory regions are only alive and accessible as long as the allocation lives, which is controlled by virtio control commands (in specific, create, map, unmap, destroy BLOB where a blob is the shared memory allocation).
This allows textures shaders and generally large amounts of data to skip being copied to and from the virtqueues, which is the usual method of virtio communication.
So to answer your question, if you use the Vulkan API on a guest to for example query the available Vulkan devices, if the correct mesa library is installed and virtio-gpu Venus is available, you will be able to use resources on the host with the Vulkan API.
Just because the complexity is hidden from you doesn't mean it's not there. You have no idea what is statically bundled into the CUDA libs.