Yes, but this Z3 analysis is not done at runtime. It's done offline, based on JI...

ainoobler · on July 21, 2024

You'd still have to write a proof for verifying semantic equivalence before implementing the optimization so I don't see what the neural network gains you here unless it is actually supplying the proof of correctness along with the optimization.

screcth · on July 21, 2024

The idea is that the LLM would provide "intuition" to guide the optimizer to find better optimizations, but a formal proof would be necessary to ensure that those optimizations are actually valid.

fwip · on July 21, 2024

I might be incorrect, but I don't believe that most compiler optimizations have formal proofs written out before implementation. Does Pypy do this?

derdi · on July 21, 2024

Pypy doesn't do this in general. The same Z3 model that is used to find these missing optimizations is also used to verify some integer optimizations.

But the point is that as long as optimization rules are hand-written, a human has thought about them and convinced themselves (maybe incorrectly) that the rules are correct. If a machine generates them without a human in the loop, some other sort of correctness argument is needed. Hence the reasonable suggestion that they should be formally verified.

cfbolztereick · on July 21, 2024

PyPy has formally verified the integer abstract domain using Z3, a quite important part of our jit optimizer (will write about that in the coming weeks).

We also run a fuzzer regularly to find optimization bugs, using Z3 as a correctness check:

https://pypy.org/posts/2022/12/jit-bug-finding-smt-fuzzing.h...

The peephole optimizations aren't themselves formally verified completely yet. We've verified the very simplest rules, and some of the newer complicated ones, but not systematically all of them. I plan to work on fully and automatically verifying all integer optimizations in the next year or so. But we'll see, I'll need to find students and/or money.

fwip · on July 21, 2024

Ah, yes, I meant that the LLM could output suggestions, which a human would then think about and convince themselves, and only then, implement in Pypy.

derdi · on July 21, 2024

Presumably the LLM would generate a lot of proposed rules for humans to wade through. Reviewing lots of proposed rewrites while catching all possible errors would be tedious and error-prone. We have computers to take care of this kind of work.

dkersten · on July 22, 2024

Perhaps not, but they’re based on heuristics and checks that are known, checked and understood by humans, and aren’t prone to hallucination like LLM’s are. An LLM suggests something that looks plausible, but there’s no guarantee that it’s suggestions actually work as intended, hence the need for a proof.