I've always wondered how people actually get the profiles for Profile-Guided-Opt...

mhh__ · 2024-10-05T22:39:57 1728167997

You might be surprised how much speedup you can get from (say) just running a test suite as PGO samples. If I had to guess this is probably because compilers spend a lot of time optimising cold paths which they otherwise would have no information about

CyberDildonics · 2024-10-05T23:57:17 1728172637

That's not how it works. BOLT is mainly about figuring out the most likely instructions that will run after branches and putting them close together in the binary. Unlikely instructions like error and exception paths can be put at the end of the binary. Putting the most used instructions close together leverages prefetching and cache so that unused instructions aren't what is being prefetched and cached.

In short it is better memory access patterns for instructions.

foota · 2024-10-06T01:25:21 1728177921

I suspect you know this based on the detail in your comment and just missed it, but parent is talking about FDO, not BOLT.

mhh__ · 2024-10-06T02:22:42 1728181362

Yes, but I'm not talking about BOLT

pgaddict · 2024-10-05T22:44:34 1728168274

Yeah, getting the profile is obviously a very important step. Because if it wasn't, why collect the profile at all? We could just do "regular" LTO.

I'm not sure there's one correct way to collect the profile, though. ISTM we could either (a) collect one very "general" profile, to optimize for arbitrary workload, or (b) profile a single isolated workload, and optimize for it. In the blog I tried to do (b) first, and then merged the various profiles to do (a). But it's far from perfect, I think.

But even with the very "rough" profile from "make installcheck" (which is the basic set of regression tests), is still helps a lot. Which is nice. I agree it's probably because even that basic profile is sufficient for identifying the hot/cold paths.

foota · 2024-10-06T01:27:42 1728178062

I think you have to be a bit careful here, since if the profiles are too different from what you'll actually see in production, you can end up regressing performance instead of improving it. E.g., imagine you use one kind of compression in test and another in production, and the FDO decides that your production compression code doesn't need optimization at all.

If you set up continuous profiling though (which you can use to get flamegraphs for production) you can use that same dataset for FDO.

pgaddict · 2024-10-06T07:58:36 1728201516

Yeah, I was worried using the "wrong" profile might result in regressions. But I haven't really seen that in my tests, even when using profiles from quite different workloads (like OLTP vs. analytics, different TPC-H queries, etc.). So I guess most optimizations are fairly generic, etc.

mhh__ · 2024-10-06T02:23:32 1728181412

There are some projects (not sure if available to use in anger) to generate PGO data use using AI.

touisteur · 2024-10-07T17:18:59 1728321539

I was working some years ago on 'happy path fuzzing', trying to find heuristics to guide through code avoiding all error handling, runtime checks. Never got better results than afl-go or other targeted fuzzing, but you have to know what's your happy path.

Also tried to use previous-version or previous-previous-version coverage ('precise' through gcov, or intel processor trace, or sampled perf traces, down until poor-man's-profiler samples) coupled with program repair tools, and... never managed to jump from fun small toy examples to actual 100+kloc applications. Maybe one day.

still_grokking · 2024-10-06T13:07:04 1728220024

AI can predict how some code behaves when run?

So AI can predict whether some program halts?

Seriously?

mhh__ · 2024-10-07T06:33:14 1728282794

https://www.researchgate.net/publication/357418662_Profile_G...

What exactly do you think PGO data looks like? The main utility is knowing that (say) your error handling code is cold and your loops are hot, which compilers currently (and so on).

This is indeed unknowable in general but clearly pretty guessable in practice.

MonkeyClub · 2024-10-06T16:01:46 1728230506

Well spotted! :)

mhh__ · 2024-10-07T06:33:37 1728282817

Barely

its_bbq · 2024-10-06T11:26:47 1728214007

If I remember correctly, at Google we would run a sampling profiler on some processes in prod to create these profiles, with some mechanism for additional manual overrides

tdullien · 2024-10-06T09:36:49 1728207409

Google and Meta do in-production profiling. I think that tech is coming to everyone else slowly.