I'm sure TFA's conclusion is right; but its argument would be strengthened by pr...

azeemba · 2025-02-09T14:45:23 1739112323

The main point is that the conditional didn't actually introduce a branch.

Showing the other generated version would only show that it's longer. It is not expected to have a branch either. So I don't think it would have added much value

comex · 2025-02-09T21:58:41 1739138321

But it's possible that the compiler is smart enough to optimize the step() version down to the same code as the conditional version. If true, that still wouldn't justify using step(), but it would mean that the step() version isn't "wasting two multiplications and one or two additions" as the post says.

(I don't know enough about GPU compilers to say whether they implement such an optimization, but if step() abuse is as popular as the post says, then they probably should.)

MindSpunk · 2025-02-10T01:10:40 1739149840

Okay but how does this help the reader? If the worse code happens to optimize to the same thing it's still awful and you get no benefits. It's likely not to optimize down unless you have fast-math enabled because the extra float ops have to be preserved to be IEEE754 compliant

account42 · 2025-02-10T09:11:44 1739178704

Fragment and vertex shaders generally don't target strict IEEE754 compliance by default. Transforming a * (b ? 1.0 : 0.0) into b ? a : 0.0 is absolutely something you can expect a shader compiler to do - that only requires assuming a is not NaN.

burnished · 2025-02-10T02:46:18 1739155578

..how is it awful if it has the same result?

dcrazy · 2025-02-10T02:50:21 1739155821

Because it perpetuates a misconception and is harder to read.

seba_dos1 · 2025-02-10T02:51:11 1739155871

Just look at it.

idunnoman1222 · 2025-02-09T15:33:18 1739115198

Unless you’re writing an essay on why you’re right…

chrisjj · 2025-02-09T16:22:30 1739118150

> Unless you’re writing an essay on why you’re right…

He's writing an essay on why they are wrong.

"But here's the problem - when seeing code like this, somebody somewhere will invariably propose the following "optimization", which replaces what they believe (erroneously) are "conditional branches" by arithmetical operations."

Hence his branchless codegen samples are sufficient.

Further, regarding.the side-issue "The second wrong thing with the supposedly optimizer [sic] version is that it actually runs much slower", no amount of codegen is going to show lower /speed/.

ncruces · 2025-02-09T16:30:22 1739118622

The other either optimizes the same, or has an additional multiplication, and it's definitely less readable.

TheRealPomax · 2025-02-09T18:12:54 1739124774

Correct: it would show proof instead of leaving it up to the reader to believe them.

Lockal · 2025-02-09T23:17:25 1739143045

You missed the second part where article says that "it actually runs much slower than the original version", "wasting two multiplications and one or two additions", based on idea that compiler is unable to do a very basic optimization, implying that compiler compiler will actually multiply by one. No benchmarks, no checking assembly, just straightforward misinformation.

creata · 2025-02-09T23:23:00 1739143380

Generated code for RDNA 1:

https://shader-playground.timjones.io/5d3ece620f45091678dcee...

stevemk14ebr · 2025-02-09T16:18:23 1739117903

There are 10 types of people in this work. Those who can extrapolate from missing data, and

account42 · 2025-02-10T09:22:11 1739179331

Making assumptions about performance when you can measure is generally not a good idea.

robertlagrant · 2025-02-10T11:11:58 1739185918

and what? AND WHAT?