Hacker News new | past | comments | ask | show | jobs | submit login

I'd like to compare parallel CPU and GPU versions with Thrust: http://thrust.github.io When is a vector big enough that is worth processing on the GPU?



It depends on the type of vector operation(s) you are doing and the machine you are on. For one off vector operations it is never worth it to make a transfer to the gpu.

If you are going to have a lot of temporary vectors as part of a larger algorithm, it is usually beneficial to copy the inputs once, do all the computations on the gpu, and copy them back.


With the caveat that integrated GPUs with unified memory can skip the copy.


You would still need to transfer the data from the core L1/L2 to the GPU (same as for inter core communication). While cheaper than a copy though the pci bus it is not free.


I'm really curious how you would do that ? What kind of APIs can express this ? Can they detect unified integrated gpus?




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: