Hey, I just read through this paper. The phase ordering issues are currently based on heuristics. I noticed that you're only using the instruction count of LLVM as the measurement. However, this metric might not accurately reflect the program's actual performance and code size. LLVM has instructions like GEP that can translate into several lines of assembly code. Additionally, I suggest trying to run some large benchmarks like SPEC to demonstrate the performance benefits of using the LLM.
Hey, yes that's right, and good callout on GEP instructions. We admit that instruction count is a bit handwavy, but we use it as a starting point as that's what the prior works we compare against optimize for. We'll be looking at true binary size next, and code runtime after.