Show HN: Optimizing Higher Order Functions with Hypothetical Inverses

ericbb · on April 29, 2016

This approach is strongly reminiscent of abstract algebra; for example, the Map-Reduce Transformation example seems to be(?) justified exactly when f is an isomorphism over r.

I've been thinking along somewhat similar lines but I've been looking at it from the perspective of introduction and elimination rules, as found in type theory. If your compiler can find a transformation that brings an intro form and an elim form together, then you can apply the computation rule to remove both from the program. My motivation is to remove allocations that aren't really essential. (I do have a functional language compiler that I work on but it doesn't implement any optimizations yet).

Another interesting idea that seems related is from a Philip Wadler talk[1] I saw which discusses how normalization can be used to achieve something like your "hypothetical inverses", which should always be eliminated from the final result.

> These transformations might only be useful in obvious optimization scenarios (cases that no programmer would intentionally create).

I would be surprised if that turned out to be the case.

[1] "Everything Old is New Again: Quoted Domain Specific Languages" https://www.youtube.com/watch?v=DlBwJ4rvz5c

pschanely · on April 30, 2016

Yes! I think perhaps that f needs only to be a homomorphism instead of an isomorphism, but I don't trust myself enough to make such a claim definitively :)

Always curious to follow people doing compiler work; loop me in, yeah?

And, just watched that Wadler talk; it's good stuff. Of course, it sounds like his scenario is like: "there is a way to do the transformation, you just have to find it," and in my case, it's more like: "there might or might not be an algebra in which f acts as a homomorphism, and, even if there is, you may not have the transformations you need to determine it!"

ericbb · on April 30, 2016

That's funny. I was thinking isomorphism because you postulate an inverse, which homomorphisms don't always have. But the identity doesn't depend on f being invertible. You could just have a rule:

If f is a homomorphism with respect to r, then reduce(r, map(f, l)) = f(reduce(r, l)).

The rule can be proven without reference to an inverse for f so isn't it enough to just add such a rewrite rule to your optimizer? (Now I'm not sure what the inversion trick accomplishes?).

> Always curious to follow people doing compiler work; loop me in, yeah?

norstrulde.org and twitter.com/ericbb

> "there might or might not be an algebra in which f acts as a homomorphism, and, even if there is, you may not have the transformations you need to determine it!"

To my way of interpreting it, the algebra is determined by r, which is given. You might not know whether f is a homomorphism with respect to r and it'd be tricky to figure that out by inspecting the definitions for f and r but if you have a table built in advance that characterizes functions in relation to each other (incr is a homomorphism with respect to max, etc.), then you can drive your rewrite engine using that table... So I wouldn't say that you want to use transformations to determine if f is a homomorphism but rather that you want to use the fact that f is a homomorphism to decide which transformations are applicable. Does that make sense?

Edit: I guess, more generally, you could have a relation with the form homomorphism(f:a->b, q:a->a->a, r:b->b->b) which justifies the transformation reduce(r, map(f, l)) -> f(reduce(q, l)).

js8 · on April 29, 2016

This is excellent, I was thinking about similar optimization using inverses:

If one calculates some reduction over a data structure, and this reduction is group operator (so it has inverse, like addition), and this data structure is then updated, it may be easier to recalculate the reduction by updating the result through inverse than to recalculate the whole reduction again.

For example, we are calculating sum over a list, we take element out, then it's easier to just subtract the element from the sum rather than calculating the sum again.

Anyway, all in all, I think knowledge of function inverses (and perhaps also isomorphisms between types) by compiler will become very handy in the future, and will enable huge number of optimizations.

pschanely · on April 29, 2016

Thank you. And, heck yes, more people need to be talking about how to deal with computations that are largely repeated. That problem comes up all over the place.

Couchdb had an interesting approach for this that did not require inverses: they persist intermediate reduce results, so that you can re-use nodes in the tree that haven't changed. So clever. http://horicky.blogspot.com/2008/10/couchdb-implementation.h...

btown · on April 30, 2016

Database indices (SQL and no) are similar in that regard - they are persisted so that a single insertion or update does not invalidate the entire index. Cache invalidation is a hard general problem, but easier when a closed system (like a database) is the only thing using its own cache, and the interface is much higher level.

joefkelley · on April 30, 2016

I would love to see an analytics database system built on this idea. Let me define my "incoming" tables, and then define how other aggregated or transformed materialized views should be computed, and automatically update the downstream datasets based on changes in the input tables.

Let me define only "what" to compute, instead of spending all my time worrying about "how" and "when". Isn't that the point of declarative languages like SQL?

_lm_ · on April 29, 2016

For the Map-Reduce transformation, you require f(r(a,b)) = r(f(a),f(b)). This is equivalent to requiring r to be a homomorphism, I think.

You could probably generalize that condition to situations where f can be lifted over the output of r so you don't have to require that r's output type is the same as its input.

pschanely · on April 29, 2016

Unfortunately, I took barely enough grad school classes to be dangerous! It's worth clarifying that the lifting of f through r can change r (to, say r'), in which case, yes, f is a homomorphism from an algebra with r' to an algebra with r. (hopefully I got that right?)

I am curious about the idea that the inputs and outputs of r may not need to be of the same type, and want to investigate further (help is appreciated!). Since r is given to reduce(), at least one of its inputs must match its output, depending on how you define reduce(). Haskell's fold(l/r) permit the non-accumulator side to differ in type from the output, but the output must match the accumulator side. I think some similar transformation may apply to folds, but haven't thought about it enough.

panic · on April 29, 2016

How does this compare to stream fusion? http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.104.7...

pschanely · on April 29, 2016

Ah, I knew that HN would tell me what I should be reading! I've only been able to spend a few minutes reading this one, but it looks to me like the first focus there is on eliminating intermediate containing structures, with some interesting opportunities for optimization afterwards. Their stream definition of fold(l/r) is still recursive, so I don't think it would be able to pull, for example, the incr() outside the reduce in my first example of "reduce(max, map(incr, l))" I think the approaches might be complimentary, actually; you might apply the higher level transformations I describe first and use a stream approach to implement the resulting chain of operations. Going to need to do a deeper investigation later today, but, anything else I should be looking at?

pschanely · on May 2, 2016

Referenced in the paper panic suggested below ( https://news.ycombinator.com/item?id=11595273 ), and suggested to me directly by @mattcmd on twitter ( https://twitter.com/mattmcd/status/726342652845809664 ), there is something called the "fusion property of fold," which would explain transformations I'm discussing here applied to folds (sided reduce). From the paper ( http://www.cs.nott.ac.uk/~pszgmh/fold.pdf ), (and some minimal editorial substitution from me) we see that:

h(fold(g, w, l)) = fold(f, h(w), l)

if

h(g(x, y)) = f(x, h(y))

This is exciting because (1) it suggests the equivalent transformation for sided reduces (folds), and (2) fold is a more general operator than the others. My somewhat silly use of "hypothetical inverses" effectively just creates a goalpost to (1) constructively find the the function f, given g (or vice-versa) and (2) prove the necessary condition.

Very exciting.

kazinator · on April 29, 2016

How do we make compilers smart enough to know that “sum(reverse(x))” can be rewritten to “sum(x)”.

That examle is pretty easy; you apply a pattern match to the tree which looks for the pattern (sum (reverse ?var)) and rewrites it to (sum ?var).

In Common Lisp, we would use define-compiler-macro on the sum function which looks for the argument form being (reverse whatever), in which case the macro returns the form (sum whatever).

Not so fast, though; order of summing matters in floating-point! And over integers too, in the presence of overflows (in junk languages without bignums). That is to say, if we add together three or more integers, the arithmetic result can be in range of the type, but some orders might overflow, whereas others do not.

rntz · on April 29, 2016

> And over integers too, in the presence of overflows (in junk languages without bignums)

I don't think this is true. Modular addition is commutative and associative. As long as the behavior of overflow is modular (as it is in 2's complement), order shouldn't matter. And if the behavior of overflow is to trap, then it should trap regardless of the order you add the numbers in. I guess if you care about the stack trace you get being the same even under optimizations, then you could argue this isn't a safe optimization. And if the behavior of overflow is undefined, it's undefined whether you reverse the list or not.

Maybe you're thinking of Javascript, which not only doesn't have bignums, but uses floating point even for integers?

AnimalMuppet · on April 29, 2016

Seconded. I can't see any encoding system that has fixed-sized integers (and therefore can overflow) where the result of addition is order-dependent. I mean, I'm sure you could design one. But if there's an example of an actually-used encoding that has this property, I'd like to hear it. (And as you said, using floats doesn't count...)

kazinator · on April 29, 2016

The result of the addition isn't order dependent; whether or not there is overflow is order-dependent.

I put that into my original comment for the sake of completeness, in an additional edit after the floating-point remark. Ironically, it was to fend off smart-alecky follow-ups about how I forgot integer overflow.

A higher-level language should have a plus operator/function that traps integer overflows if it doesn't provide bignums.

If a sum function is defined as a simple left fold over the sequence using that plus, then part of the semantics of that sum is to throw the exception as soon as adding the next element from the sequence to the left accumulator overflows.

If an order-permuting operation is removed from the expression, it can introduce an overflow where there was none. Perhaps the programmer had that apparently useless order-permuting operation precisely to prevent overflow.

AnimalMuppet · on April 30, 2016

I was going to ask, once again, how that is possible, but I think I figured it out while reading this, though you don't explicitly state it.

It can happen if some of the numbers are negative. For a simple example, if overflow occurs at 100, 50+60-20 overflows, but 50-20+60 does not.

kazinator · on April 30, 2016

It's very hard to be aware of the numerous details you find obvious yourself when you're communicating.

Also, by the way, I have an intuition that the issue might not specifically prevent the removal of a reverse operation. That is to say, if a series with some positives or negative integers does/doesn't overflow under a left fold, I have a hunch that it does/doesn't under a right fold.

If this can be proven, then that specific case is fine over integers (optimizing away reverse).

dllthomas · on April 30, 2016

[1, intmax, -1]

hawkice · on April 30, 2016

So, yes, one direction overflows and one doesn't, but as long as the contract says overflow works as two's compliment, the answer is the same. This is why not having undefined behavior is so useful.

dllthomas · on April 30, 2016

It doesn't have to be two's compliment - for any modular arithmetic, the answer will be the same.

But that's at a cost of giving up knowledge of whether it overflowed. In this case it overflows and underflows, when it does, an equal number of times, so there happens to be no problem. In some cases, all you care about is the answer modulo some number which jives well with your numbers' representation. In that case, there is also no problem.

In many cases, if there is a different number of overflows than underflows, you'd get the wrong answer without knowing it, and your answer is likely (but not guaranteed) to be very wrong when you do.

There are other approaches we can take. Overflow could saturate. In that case, we get the wrong answer in one direction here but it's only off by a little instead of a lot. Overflow could also trap, in which case we'd notice that we had risk of giving the wrong answer.

In either of these cases, we could not apply this optimization unless we allow for some measure of undefined behavior - which is why undefined behavior is so useful. It is absolutely the case that undefined behavior in a specification has significant drawbacks as well - but the issue at hand here is not the result of them.

kazinator · on April 29, 2016

> And if the behavior of overflow is to trap, then it should trap regardless of the order you add the numbers in.

How come? Suppose A + B doesn't trap, and if T0 = A + B, then T0 + C doesn't trap. Why should (A + B) + C trap?

prodigal_erik · on April 29, 2016

Say you have a register whose range is -9..+9. (5-1)+5 doesn't trap but (5+5)-1 does.

kazinator · on April 29, 2016

I understand how to make an example of this; the question is why should (5-1)+5 trap, just because (5+5)-1 does.

rntz · on April 29, 2016

Ah, good point! I was thinking of unsigned overflow, but signed overflow is a different matter, as pointed out elsewhere in this thread. I retract my statement about trapping overflow.

pschanely · on April 29, 2016

Great observations. For the purposes of this exploration, I am entirely assuming arbitrary precision integers (and the math geek in me is really enjoying your use of the adjective "junk", above).

Bigger picture, I want to rewrite "sum(reverse(x))" without explicitly saying so. How do we optimize that expression using the definitions of sum and reverse? Note that in my discussion of this in my second example, I am inconsistent here: I expand the definition of reverse() but treat sum() as a primitive. That said, I am pretty sure that the example still works if you substitute the appropriate definition of sum as "reduce(add,x)"; the derivation is just longer.

kazinator · on April 29, 2016

> I want to rewrite "sum(reverse(x))" without explicitly saying so.

You can endow functions with attributes (standard functions in the language library, and possibly user-defined ones). The reverse function can carry the attribute that it only changes the order of the elements of its argument. This is encoded as some boolean predicate: (order-identity-p x) is true if x is reverse. And if x is other things, like the identity function, the sort function and whatnot.

Then our optimzer looks for the pattern/rewrite (sum (f? arg?)) -> (sum arg?), where there is a semantic constraint: the pattern matches subject to the predicate (order-identity-p f?). This is effectively part of the match; if the predicate fails, there is no match.

Now the sum function could also be "patterned out". There could be a predicate which asserts that certain arguments of a function are sequences whose order doesn't matter to the outcome. This predicate could be list valued. If an argument has no such arguments, it returns nil/false, otherwise a list of the argument positions or whatever.

So then we're looking for the pattern (f? (g? arg?)) -> (f? arg) which is very general, and can be met in lots of ways. For instance, suppose f? and g? are the same function, and f? is idempotent. One of the rules that applies is that g? satisfies (order-identity-p g?), and (order-irrelevant-args f?) returns (1), where we note that (g? ...) is argument 1 of f?. For any argument argument expression which is an orer-identity-p function call, if that argument's position is in the list returned by order-irrelevant-args on the main constituent function, we can remove that order-altering function from the argument.

We code all that up into the tree walking engine, and then just suitably declare attributes on functions.

sivvy · on April 30, 2016

The premise of the optimizations in this article don't always hold, unfortunately. If x is floating point, sum(reverse(x)) is unlikely to be equal to sum(x) for long enough x, and sorting x on beforehand will make this effect even more apparent.

quickie69 · on April 29, 2016

Beat me to it wrt floating point! It's not a field.

brudgers · on April 29, 2016

Interesting article. Not sure if it meets the of the "Show HN" guidelines -- there is nothing for users to try out or play with. If it is still withing the window, maybe change the title. If not, maybe email the moderators.

pschanely · on April 29, 2016

Thanks! I have a link to code written in Maude at the very bottom; The code needs so much explanation though. Perhaps I'll highlight the code link at the top of the article?

dang · on April 29, 2016

Yes, that is just the trick. Great post.

brudgers · on April 29, 2016

Or post a clickable link here in the thread? Explaining the code might be an opportunity to create a rough draft of something more polished. A bit of background on Maude might be nice too. Save readers from Googling and maybe help people understand why it was used. All things that tend to be "intellectually interesting."

Anyway, I was a bit philosophically torn since the content tends toward mathematical/algorithmic and made me wonder what does it mean to play around with mathematics/algorithms?

pschanely · on April 29, 2016

Totally; great point. My code is here: https://github.com/pschanely/wf-optimizer/blob/master/demo.m...

Maude is a programming language based on rewriting: http://maude.cs.illinois.edu/w/index.php

It's pretty wild; your code doesn't "execute" in the traditional sense; instead patterns match the code and transform it repeatedly until it reaches a "normal form" - one that cannot be rewritten further.

Maude is handy for playing with optimizers, because you very often want to specify behavior in terms of patterns (when you see these functions together, replace them with this...)

gmfawcett · on April 29, 2016

I hadn't heard of Maude before, thanks! For those interested in this kind of thing, another well-known term-rewriting language is Pure (http://purelang.bitbucket.org/), which grew out of an earlier project called Q (http://q-lang.sourceforge.net/).