Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

My day job is ASIC design and we do some prototyping on FPGAs, so the exact same RTL is used as an input. We always benchmark power, performance, etc between ASIC and FPGA, so this is based on some real deigns. A 5x reduction in power is fair for most of what I've seen and the FPGA is actually better at achieving FMAX than you'd expect - control paths do need a lot more pipelining than ASIC, but compute intensive (DSP) datapaths are pretty good with a few tweaks. I think sometimes people throw code at them and get 100 MHz and say we'll FPGAs are slow so it's expected, but in my experience with a little tuning you can get most datapaths to run at 500MHz. You do pay the power penalty vs dedicated ASIC, but the performance is very good.


I think it depends a great deal on what you're doing. A fully pipelined double-precision floating-point fused multiply-add in FPGA tech will reach well over 500 MHz on current parts, but takes almost 30 cycles of pipelined latency to deliver each result. On the same process node, a well-optimized CPU will run at 6-8x the clock frequency and only require 4 cycles of latency to deliver each result.

Is this flow filled with divide-and-conquer algorithms with very low work per step? Yes. Is that particularly ill-suited to FPGA logic? Yes. Is it unfair to the FPGA? Not in my opinion.

I stand by my claim: If you normalize a general circuit's speed in units of time instead of cycles, then you'll find that ASICs come out much much farther ahead.


From this [0] it suggests the xilnx floating point core can run at >600Mhz,and the latency of many operations is just a few cycles. Also a s its pipelined the throughput could mean one result per clock, depending on how you configure the core. Seems closer to the 5x to me.

[0] https://www.xilinx.com/support/documentation/ip_documentatio...


That chart doesn't show you the result latency, only the maximum achievable frequency. You have to use Vivado to instantiate an instance with the specific suite of configurable options. When you do that, it will inform you of the result latency: 27-30 cycles for FMA.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: