I think there is a prima facie case that nobody* cares about linker speed. bfd is a terrible linker written by weirdos and it's still the default linker on every mainstream Linux. You'd think that an organization like Canonical would benefit from faster builds, but they still use bfd. Not even gold, which has been available for 16 years. Definitely not lld, which has been around for 5 years. The amount of performance they are already leaving on the table is huge, so who would expect them to be suddenly interested in build performance?
Games dev cared, at least up to the playstation 4. The Sony linker was much faster than the Microsoft one (plus built to an unreasonable QA bar) and that cut turnaround times for builds enough that it was worth developing against the playstation and testing against the xbox, which Sony considered a competitive advantage.
Also people, at least those same users, had really strong views on bugs in linkers. If the game is broken because the linker trashed it, you've only worked that out after debugging through your own code and through the compiler output, by which point you're well past patient and understanding.
Sadly for this project I consider linkers to be a fundamental design mistake. Or at least obsolete. Lowering to machine code before combining files wins you runtime overhead and implementation obfuscation in exchange for reduced memory consumption. Linking an intermediate form then writing machine code (in elf if you like) from that single blob is better. I'm pretty sure it can be done in lower memory overhead than linking machine code if you're so inclined.
> Linking an intermediate form then writing machine code (in elf if you like) from that single blob is better. I'm pretty sure it can be done in lower memory overhead than linking machine code if you're so inclined.
do you mean its better for build performance or for the final output?
Better across the board. Currently we translate to machine code and tables of extra data saying how to patch said code, store it in an elf/coff/other, then read it back in and use the extra data to work out what to do with it.
If instead you keep it in IR and combine that, e.g. llvm-link style, then emit machine code from it you don't need the tables of information for the linker. You can transform debug info before it has been compressed into dwarf. Optionally optimise across translation units. Then finally emit the same format the loader expects.
That should be simpler tooling (compiler & linker don't need the side channel relocation stuff, the linker doesn't need to disassemble and recombine dwarf), faster residual program, faster build times.
It's not totally theoretical either - amdgpu on llvm only passes a single file to the linker at a time, but lld is multi-architecture so doesn't get to delete the combining files code. A machine learning toolchain gave up on calling the linker at one point in favour of doing the data layout from the compiler directly. The latter because lld is/was too annoying to use as a library iiuc.
The current system lends itself well to memory constrained systems. Each source/ASM file gets turned into a single binary with just enough metadata to paste it together with another one. It caches nicely too.
Whole program optimisation is difficult to do without enough memory. Given machine code linking as how things are done, things like debug info got spliced into the same model. Shared libraries are a similar sort of incremental memory saving scheme.
> The current system lends itself well to memory constrained systems. Each source/ASM file gets turned into a single binary with just enough metadata to paste it together with another one. It caches nicely too.
yea, i had some idea in my head of early unix/c and all the constrains they had to deal with must have informed the compilation model like that... interesting!
That's just not true. Every c++ project I've worked on has cared about linker and compiler speed. Unfortunately I work in windows-land where we use MSVC, but stuff like [0] makes our lives so much easier.
* for values of "nobody" excluding Google.