I thought NVIDIA GPUs support OpenCL? Or do they not do that anymore?

eslaught · on May 10, 2017

It's always been 10-20% slower than CUDA and frankly NVIDIA doesn't have an incentive to make it faster than that.

On the other hand, I believe Google is working on a CUDA compiler [1] so we may actually see meaningful improvement in the sense that it may become possible to run CUDA on other GPUs. (Edit: And Google actually has an incentive to achieve performance parity, so it might really happen.)

[1]: https://research.google.com/pubs/pub45226.html

jlebar · on May 10, 2017

> On the other hand, I believe Google is working on a CUDA compiler [1]

Hi, I'm one of the developers of the open-source CUDA compiler.

It's not actually a separate compiler, despite what that paper says. It's just plain, vanilla, open-source clang. Download or build the latest version of clang, give it a CUDA file, and away you go. That's all there is to it.

In terms of compiling CUDA on other GPUs, that's not something I've worked on, but judging from the commits going by to clang and LLVM, other people are quite interested in making this work.

qb45 · on May 10, 2017

Interesting, appears to have been merged upstream:

http://llvm.org/docs/CompileCudaWithLLVM.html

But it still targets NVIDIA GPUs and uses NVIDIA libraries so not that universal yet.

mrb · on May 10, 2017

"It's always been 10-20% slower than CUDA"

This is an untrue, yet often repeated statement. For example Hashcat migrated their CUDA code to OpenCL some time ago, with zero performance hits. What is true is that Nvidia's OpenCL stack is less mature than CUDA. But you can write OpenCL code that performs just as well as CUDA.

nl · on May 10, 2017

It has historically been slower for neural networks, especially considering the lack of a CuDNN equivalent.

slizard · on May 11, 2017

Also the opposite can be true as well (>2x slower); e.g try to rely heavily on shuffle.

chronic940 · on May 10, 2017

What is Hashcat and why should we care?

khedoros1 · on May 10, 2017

A password cracking utility, and because it was put forth as at least one example of a real-world application purported to perform just as well under OpenCL as CUDA. If true, it provides evidence against the claim "[OpenCL]'s always been 10-20% slower than CUDA".

PetahNZ · on May 10, 2017

Because its a performance critical application that has made the switch so is a good comparison.

DannyBee · on May 10, 2017

As someone said, we already merged it upstream. :)

Nowadays our CUDA compiler is just clang

agumonkey · on May 10, 2017

10-20% slower seems an honest delta, I can't blame a company for working more on their desires/ideas if they provide a standardized non crippled solution.

slizard · on May 11, 2017

> It's always been 10-20% slower than CUDA and frankly NVIDIA doesn't have an incentive to make it faster than that.

Incorrect. Our kernels (GROMACS molecular simulation package) are 2-3x slower implemented in OpenCL vs CUDA.

> On the other hand, I believe Google is working on a CUDA compiler

They were. It's upstream clang by now.