I'm pretty sure you could have saved yourself a lot of work by just using profile guided optimisation (PGO). It's mature & ready to go in GCC at least.
(Integrating it into you build process is a little bit more challenging, I admit. I've set it up in a large dev environment used by multiple large projects, and it was well, well worth the effort.)
(Integrating it into you build process is a little bit more challenging, I admit. I've set it up in a large dev environment used by multiple large projects, and it was well, well worth the effort.)