>*I doubt, that you will find it easy to beat well written numeric Python co...

dchichkov · on March 28, 2013

Let's go down to some example and consider a typical numeric problem, for example estimation of a cross entropy gradient of some function on your data.

With Python (and Theano computational engine) you can write something along the lines:

    def cost(goal, prediction):
        crossEntropy = -goal * log(prediction) - (1 - goal) * log(goal - prediction)
        return mean(crossEntropy)

    prediction = 1 / (1 + exp( ....  -dot(x,w) - b + ...)) > 0.5
    gradW, gradB = grad(cost(y, prediction) + (w**2).sum(), [w,b])

And then just apply that function to a matrix containing your data. That's it. When you apply a function, it will be interpreted, converted into a computation graph, this computation graph will be optimized and parts of the computation will be offloaded to GPU with memory transfers between the host and GPU taken care of, and you will get a result in a user friendly and efficient Numpy array.

The resulting computation will be nearly optimal and limited by memory bandwith, CPU - GPU bus bandwidth and GPU FOPS rate. With luck you can get close to theoretical maximum of your GPU floating point performance. And all done in a few lines of Python.

Now consider the same in C++. Yes, it can be done. But there are just no open source libraries available that can do that. Closest open-source implementation that I know of is gpumatrix, a port of C++ Eigen library to GPU. And it doesn't even come close to what is available in Python. So with C++, if you want to match the performance of these few lines of Python code, good luck studying Cuda or OpenCL and implementing the computation engine right, from the first time.

(disclaimer) I'm not in any way affiliated with OP and I actually use (and like) C/C++ a lot.