Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I'm sure TFA's conclusion is right; but its argument would be strengthened by providing the codegen for both versions, instead of just the better version. Quote:

"The second wrong thing with the supposedly optimizer [sic] version is that it actually runs much slower than the original version [...] wasting two multiplications and one or two additions. [...] But don't take my word for it, let's look at the generated machine code for the relevant part of the shader"

—then proceeds to show only one codegen: the one containing no multiplications or additions. That proves the good version is fine; it doesn't yet prove the bad version is worse.



The main point is that the conditional didn't actually introduce a branch.

Showing the other generated version would only show that it's longer. It is not expected to have a branch either. So I don't think it would have added much value


But it's possible that the compiler is smart enough to optimize the step() version down to the same code as the conditional version. If true, that still wouldn't justify using step(), but it would mean that the step() version isn't "wasting two multiplications and one or two additions" as the post says.

(I don't know enough about GPU compilers to say whether they implement such an optimization, but if step() abuse is as popular as the post says, then they probably should.)


Okay but how does this help the reader? If the worse code happens to optimize to the same thing it's still awful and you get no benefits. It's likely not to optimize down unless you have fast-math enabled because the extra float ops have to be preserved to be IEEE754 compliant


Fragment and vertex shaders generally don't target strict IEEE754 compliance by default. Transforming a * (b ? 1.0 : 0.0) into b ? a : 0.0 is absolutely something you can expect a shader compiler to do - that only requires assuming a is not NaN.


..how is it awful if it has the same result?


Because it perpetuates a misconception and is harder to read.


Just look at it.


Unless you’re writing an essay on why you’re right…


> Unless you’re writing an essay on why you’re right…

He's writing an essay on why they are wrong.

"But here's the problem - when seeing code like this, somebody somewhere will invariably propose the following "optimization", which replaces what they believe (erroneously) are "conditional branches" by arithmetical operations."

Hence his branchless codegen samples are sufficient.

Further, regarding.the side-issue "The second wrong thing with the supposedly optimizer [sic] version is that it actually runs much slower", no amount of codegen is going to show lower /speed/.


The other either optimizes the same, or has an additional multiplication, and it's definitely less readable.


Correct: it would show proof instead of leaving it up to the reader to believe them.


You missed the second part where article says that "it actually runs much slower than the original version", "wasting two multiplications and one or two additions", based on idea that compiler is unable to do a very basic optimization, implying that compiler compiler will actually multiply by one. No benchmarks, no checking assembly, just straightforward misinformation.



There are 10 types of people in this work. Those who can extrapolate from missing data, and


Making assumptions about performance when you can measure is generally not a good idea.


and what? AND WHAT?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: