Something about IR Optimization

slavapestov · on March 20, 2019

Every JIT that starts out by generating code directly from expression trees or bytecode eventually grows an ad-hoc optimizer and IR. I think its probably better to avoid all the dead ends and lower directly to an SSA IR. You can then implement tried-and-true optimizations from compiler textbooks, and concerns such as "optimizing across expression boundaries" or "my expression tree is actually a DAG" are defined away completely.

brrt · on March 20, 2019

Hi! Author here. Honest question, because I don't know all the facts:

- Given that in this IR, every value has a single definition, and every use must reference that value, I'd argue that this is already SSA form.

- In fact, the concept of 'variable' is mostly missing, there's only a repeated reference to the same value. I think that simplifies things a lot, but maybe too much.

- Optimizing across expression boundaries is in fact very much the point, if we can keep the semantics the same.

Thanks for your interest!

comex · on March 20, 2019

Well... one thing that's usually done as part of converting something to SSA form is computing the dominator tree. If you did that, you could relatively cheaply solve this:

> But it also means that the property we're interested in - a value is computed before it is used in, in all possible code flow paths - isn't really expressible by the IR.

...since you could verify that the blocks a value is used in are dominated by the block it's computed in.

matt0987 · on March 20, 2019

Just a note that this IR sounds a lot like the Testarossa Trees IR: https://github.com/eclipse/omr/blob/master/doc/compiler/il/I...

In that IR the rule is that sharing cannot pass basic-block boundaries because of the exact kind of bug you indicate.

brrt · on March 20, 2019

Thank you for that link! That does look like it is a very similar IR structure indeed. Very interesting.

UncleEntity · on March 20, 2019

> For instance, if two branches of a conditional expression both refer to the same value (represented by name $foo) then the code generator will only emit code to compute its value when it encounters the first reference. When the code generator encounters $foo for the second time in the other branch, no code will be emitted.

Isn't that what phi nodes were designed to fix?

ridiculous_fish · on March 20, 2019

I don't think so - phi nodes solve the "problem" of expressing a value in SSA one-definition form, when that value is control-flow dependent. But this is really an artifact of SSA form.

This seems like a much weirder problem, in which the optimizer is unaware of control flow and so will allow a value to reference an instruction that does not dominate it.

LLVM's IR verifier ensures that instructions dominate all of their uses, which simply means that all users of an instruction must provably execute after that instruction. This verification step seems to be what's missing.

brrt · on March 20, 2019

Hi! Author here. I think you hit the nail on the head here. The optimizer is unaware of the control flow, because the control flow is added by the code generator.(And with it, the distinction between computing a value and referencing it). It's a control-flow-less IR, sort of.

UncleEntity · on March 20, 2019

I was thinking that this just needs the common value "promoted" outside of the conditional block but then it seemed that the different branches were probably modifying it so thought of phi nodes.

Anyways, seems to be some weird block scoping rules going on like with python.