I'd like to compare parallel CPU and GPU versions with Thrust: http://thrust.git...

pavanky · on June 26, 2018

It depends on the type of vector operation(s) you are doing and the machine you are on. For one off vector operations it is never worth it to make a transfer to the gpu.

If you are going to have a lot of temporary vectors as part of a larger algorithm, it is usually beneficial to copy the inputs once, do all the computations on the gpu, and copy them back.

cma · on June 27, 2018

With the caveat that integrated GPUs with unified memory can skip the copy.

gpderetta · on June 27, 2018

You would still need to transfer the data from the core L1/L2 to the GPU (same as for inter core communication). While cheaper than a copy though the pci bus it is not free.

makapuf · on June 27, 2018

I'm really curious how you would do that ? What kind of APIs can express this ? Can they detect unified integrated gpus?