With one notable detail: Its much easier to stream data through the GPU than it is through an FPGA.
That's an interesting assertion. FPGAs are better at moving data from one place to another than any other general-purpose device I can think of.
How many GPUs have Xilinx GTx-class transceivers, for instance? A GPU with JESD204B/C connectivity would be an extremely interesting piece of hardware.
"Easier" as I'm using it is a metric of engineering effort more than throughput. I don't disagree - there's vastly more aggregate bandwidth available, both internally and on external interfaces.
But its vastly easier to get started with CUDA or OpenCL than it is to get started with a big FPGA.
That's an interesting assertion. FPGAs are better at moving data from one place to another than any other general-purpose device I can think of.
How many GPUs have Xilinx GTx-class transceivers, for instance? A GPU with JESD204B/C connectivity would be an extremely interesting piece of hardware.