When I first looked at this, there was a comment stating that (from memory) despite ~10 years of research and numerous PhDs, papers and conference talks, polly just doesn't work for production code. That comment is now gone.
Was it retracted because it was inaccurate? Seems to me like an important part of the conversation, whether true or not.
Yes, but likewise for the classical loop optimizations that this is aiming to improve on. They clearly do work -- to a lesser extent? -- for production code, just not the sort that many commentators give the impression of thinking is all that matters. Linear algebra does crop up all over the place.
(In contrast, I would say gcc's -floop-nest-optimize "experimental" implementation doesn't work on matmul, at least.)
True.
Almost everyone who does matrix/vector operations in production code uses libraries though. That is why it is tricky to see results from these kinds of projects on production code.
I looked up the Polly reference concerning discussion of BLIS missing optimized implementations, and GCC and clang apparently failing on BLIS-type loops. (It mentioned examples of people not using the relevant libraries in national labs, not that I'd defend that.) It might also have been other types of kernel that don't exist for ARM or POWER, say. Surely it would be good to have an effective general-purpose tool (compiler) that could replace labourious hand-optimization or building special-purpose tools.
Was it retracted because it was inaccurate? Seems to me like an important part of the conversation, whether true or not.