Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don’t know enough about these implementations to know if this can be interpreted as a blanket ‘conditionals are fine’ or, rather, ‘ternary operations which select between two themselves non-branching expressions are fine’.

Like does this apply if one of the two branches of a conditional is computationally much more expensive? My (very shallow) understanding was that having, eg, a return statement on one branch and a bunch of work on the other would hamstring the GPU’s ability to optimize execution.



A GPU/SIMT branch works by running both sides, unless all threads in the thread group (warp/wavefront) make the same branch decision. As long as both paths have at least one thread, the GPU will run both paths sequentially and simply set the active mask of threads for each side of the branch. In other words, the threads that don’t take a given branch sit idle while the active threads do their work. (Note “sit idle” might involve doing all the work and throwing away the result.)

If you have two branches, and one is trivial while the other is expensive, and if the compiler doesn’t optimize away the branch already, it may be better for performance to write the code to take both branches unconditionally, and use a conditional assignment at the end.

It’s worth knowing that often there are clever techniques to completely avoid branching. Sometimes these techniques are simple, and sometimes they’re invasive and difficult to implement. It’s easy (for me, anyway) to get stuck thinking in a single-threaded CPU way and not see how to avoid branching until you’ve bumped into and seen some of the ways smart people solve these problems.


A real branch is useful if you can realistically skip a bunch of work, but this requires all the lanes to agree, on a GPU that means 32 to 64 lanes need to all agree, also for something basic like a few arithmetic ops there is no point.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: