I think there is a prima facie case that nobody\* cares about linker speed. bfd ...

JonChesterfield · on Nov 13, 2022

Games dev cared, at least up to the playstation 4. The Sony linker was much faster than the Microsoft one (plus built to an unreasonable QA bar) and that cut turnaround times for builds enough that it was worth developing against the playstation and testing against the xbox, which Sony considered a competitive advantage.

Also people, at least those same users, had really strong views on bugs in linkers. If the game is broken because the linker trashed it, you've only worked that out after debugging through your own code and through the compiler output, by which point you're well past patient and understanding.

Sadly for this project I consider linkers to be a fundamental design mistake. Or at least obsolete. Lowering to machine code before combining files wins you runtime overhead and implementation obfuscation in exchange for reduced memory consumption. Linking an intermediate form then writing machine code (in elf if you like) from that single blob is better. I'm pretty sure it can be done in lower memory overhead than linking machine code if you're so inclined.

andrekandre · on Nov 13, 2022

  > Linking an intermediate form then writing machine code (in elf if you like) from that single blob is better. I'm pretty sure it can be done in lower memory overhead than linking machine code if you're so inclined.

do you mean its better for build performance or for the final output?

JonChesterfield · on Nov 14, 2022

Better across the board. Currently we translate to machine code and tables of extra data saying how to patch said code, store it in an elf/coff/other, then read it back in and use the extra data to work out what to do with it.

If instead you keep it in IR and combine that, e.g. llvm-link style, then emit machine code from it you don't need the tables of information for the linker. You can transform debug info before it has been compressed into dwarf. Optionally optimise across translation units. Then finally emit the same format the loader expects.

That should be simpler tooling (compiler & linker don't need the side channel relocation stuff, the linker doesn't need to disassemble and recombine dwarf), faster residual program, faster build times.

It's not totally theoretical either - amdgpu on llvm only passes a single file to the linker at a time, but lld is multi-architecture so doesn't get to delete the combining files code. A machine learning toolchain gave up on calling the linker at one point in favour of doing the data layout from the compiler directly. The latter because lld is/was too annoying to use as a library iiuc.

andrekandre · on Nov 14, 2022

interesting, sounds promising, i guess the reason why we do the current way is just simplicity or just local optimum kind of situation?

JonChesterfield · on Nov 15, 2022

The current system lends itself well to memory constrained systems. Each source/ASM file gets turned into a single binary with just enough metadata to paste it together with another one. It caches nicely too.

Whole program optimisation is difficult to do without enough memory. Given machine code linking as how things are done, things like debug info got spliced into the same model. Shared libraries are a similar sort of incremental memory saving scheme.

andrekandre · on Nov 15, 2022

just wanna say thanks for the reply, again

  > The current system lends itself well to memory constrained systems. Each source/ASM file gets turned into a single binary with just enough metadata to paste it together with another one. It caches nicely too.

yea, i had some idea in my head of early unix/c and all the constrains they had to deal with must have informed the compilation model like that... interesting!

maccard · on Nov 13, 2022

That's just not true. Every c++ project I've worked on has cared about linker and compiler speed. Unfortunately I work in windows-land where we use MSVC, but stuff like [0] makes our lives so much easier.

[0] https://devblogs.microsoft.com/cppblog/faster-c-iteration-bu...

rlpb · on Nov 13, 2022

> You'd think that an organization like Canonical would benefit from faster builds, but they still use bfd.

In practice, switching this kind of thing isn't trivial. The edge cases fall out as build failures. See for example: https://wiki.debian.org/ToolChain/DSOLinking