If I remember correctly about 80% of a modern FPGA's silicon is is used for connections. FPGA have their uses and very often a big part in them is the Field Programmability. If that is not required, there is no good reason another solution (ASIC, GPU, etc.) couldn't beat the FPGA in theory. Now, in practice there are some niches, where this is not absolutely true, but I agree with GP that I see challenges for deep learning.
An ASIC will always have better performance than an FPGA, but it will have an acceptable cost only if it is produced in a large enough number. You will always want an ASIC, but only seldom you will able to afford it.
So the decision of ASIC vs. FPGA is trivial, it is always based on the estimated price of the ASIC, based on the number of ASICs that would be needed.
The decision between off-the-shelf components, i.e. GPUs and FPGAs, is done based on performance per dollar and performance per W and it depends very strongly on the intended application. If the application must compute many operations with bigger numbers, e.g. FP32 or FP16, then it is unlikely that an FPGA can compete with a GPU. When arithmetic computations do not form the bulk of an algorithm, then an FPGA may be competitive, but a detailed analysis must be made for any specific application.
I'm definitely not! I'm a hardware designer and I work with FPGAs all the time, for both work and for personal projects. Like with all things, there's a right tool for every job, and I think for modern DL algorithms like Transformers, GPUs and AI ASICs are the better tools. For rapid hard prototyping, or for implementing specialized architectures, FPGAs are far better.
Large fast FPGAs are great but very expensive, small size slow FPGAs are not practical for most solutions, where ARM controllers are used, significantly cheaper.
500GB/s is going to limit it to at best 1/4 the DL performance of an nvidia gpu. I’m not sure what the floating point perf of these FPGAs are but I imagine that also might set a fundamental performance limit at a small fraction of a GPU.
Well I keep seeing all models quantized and for 2-bit, 4-bit and 1-bit quantizations I had good very good inference performance (either througput or latency) on CNNs and some RNNs on Alveo boards using FINN (so, mostly high level synthesis and very little actual fpga wrangling). No idea about the current status of all these, will read the paper though :-)
You can rent high end FPGAs on AWS, https://github.com/aws/aws-fpga there is no better time to get into FPGAs. On the low end there is the excellent https://hackaday.com/2019/01/14/ulx3s-an-open-source-lattice...
Modern FPGA platforms like Xilinx Alveo have 35TB/s of SRAM bandwidth and 460GB/s of HBM bandwidth. https://www.xilinx.com/products/boards-and-kits/alveo/u55c.h...