One of the biggest things that seems to be holding back ML in compilers right no...

isaacfung · on Sept 18, 2023

Quality matters just as much as quantity.

LIMA: Less Is More for Alignment https://arxiv.org/abs/2305.11206

AlpaGasus: Training A Better Alpaca with Fewer Data https://arxiv.org/abs/2307.08701

Textbooks Are All You Need II: phi-1.5 technical report https://arxiv.org/abs/2309.05463

uoaei · on Sept 18, 2023

What's holding them back is provable correctness.

It's possible, nay, mandatory to constrain the outputs of the model at each step of generation in order to guarantee that a given structure or grammar is adhered to. If you can fine-tune the model with these constraints in place you can offload a lot of the effort that the LLM otherwise has to perform in comprehending correctness so it has more capacity for generating good content. To be sure, quality and quantity of data are important, but it's all too easy to introduce subtle bugs that take years to tease out if you don't adhere to the right constraints.

boomanaiden154 · on Sept 18, 2023

Most of the work in this space is not focused on neural compilation (having a ML model perform the transformation/entire compilation), but on replacing heuristics or phase ordering, where the issue of correctness falls back onto the compiler. For pretty much exactly the reasons you mentioned, neural compilation isn't really tractable.

This specific paper focuses on phase ordering, which should guarantee correctness, assuming the underlying transformations are correct. They do train the model to perform compilation, but as an auxiliary task.