It's always been 10-20% slower than CUDA and frankly NVIDIA doesn't have an incentive to make it faster than that.
On the other hand, I believe Google is working on a CUDA compiler [1] so we may actually see meaningful improvement in the sense that it may become possible to run CUDA on other GPUs. (Edit: And Google actually has an incentive to achieve performance parity, so it might really happen.)
> On the other hand, I believe Google is working on a CUDA compiler [1]
Hi, I'm one of the developers of the open-source CUDA compiler.
It's not actually a separate compiler, despite what that paper says. It's just plain, vanilla, open-source clang. Download or build the latest version of clang, give it a CUDA file, and away you go. That's all there is to it.
In terms of compiling CUDA on other GPUs, that's not something I've worked on, but judging from the commits going by to clang and LLVM, other people are quite interested in making this work.
This is an untrue, yet often repeated statement. For example Hashcat migrated their CUDA code to OpenCL some time ago, with zero performance hits. What is true is that Nvidia's OpenCL stack is less mature than CUDA. But you can write OpenCL code that performs just as well as CUDA.
A password cracking utility, and because it was put forth as at least one example of a real-world application purported to perform just as well under OpenCL as CUDA. If true, it provides evidence against the claim "[OpenCL]'s always been 10-20% slower than CUDA".
10-20% slower seems an honest delta, I can't blame a company for working more on their desires/ideas if they provide a standardized non crippled solution.