AMD to Acquire Xilinx

gvb · on Oct 27, 2020

Everybody seems to view this as AMD mimicing Intel when it acquired Altera. (That acquisition has not born visible fruit.)

My contrarian speculation is that this is a move driven by Xilinx vs. Nvidia given Nvidia’s purchase of Arm and Xilinx’ push into AI/ML. Xilinx is threatened by Nvidia’s move given their dependence on Arm processors in their SOC chips and their ongoing fight in the AI/ML (including autonomous vehicles) product space. My speculation is that this gives Xilinx an alternative high performance AMD64 (and possibly lower performance & lower power x86) "hard cores" to displace the Arm cores.

Interesting times.

m0zg · on Oct 27, 2020

I think you're onto something here. AMD is likely seeing the end of the road for their CPU business within the next decade, since it will run up against physics and truly insane cost structures that will come after 5nm. At the same time we're far past the practical limit wrt ISA complexity (as evidenced by periodic lamentations about AVX512 on this site). The only real way to go past all of that right now is specialized compute, reconfigurable on demand, deployment of which is hampered by the fact that it's very expensive and not integrated into anything, so the decision to use it is very deliberate, which in practice means it rarely ever happens at all. Bundle a mid-size FPGA as a standardized chiplet on a CPU, integrate it well, provide less painful tooling, and that will change. Want hardware FFT? You got it. Want hardware TPU for bfloat16? You got it. Want it for int8? You got it. Think of just being able to add whatever specialized instruction(s) you want to your CPU.

I'm not sure this is worth $35B, but if Lisa Su thinks so, it probably is. She's proven herself to be one of the most capable CEOs in tech.

ohazi · on Oct 27, 2020

Also, the advantage Altera supposedly got after being acquired by Intel was better fab integration with what was then the best process technology available (High-end FPGAs genuinely need good processes).

1. That's no longer the case, so sucks for Altera / Intel

2. AMD doesn't have a fab, so any advantages are necessarily on the design / architecture / integration side.

andromeduck · on Oct 27, 2020

My read on the Altera acc was that Intel needed to shore up fab volumes in the face of their foundry customers jumping to TSMC first chance they could. As the capital required per node continues to rise exponentially, they need more and more volume to amortize that over. This is also why they're trying to get into GPUs again.

samps · on Oct 27, 2020

To slightly refine this, Intel didn't have many "foundry customers" before Altera. Via Wikipedia (https://en.wikipedia.org/wiki/Intel#Opening_up_the_foundries...), the need to fill up the manufacturing lines was engendered by poor x86 CPU sales around ~2013, not poor third-party fab runs. In 2013, Intel was still ahead of TSMC with 22 nm.

andromeduck · on Oct 28, 2020

Didn't they also drag down Panasonic or something? I remember there being a lot of rumors of them being an extremely bad partner at the time, basically no technical support, unreliable capacity due to by core business and not giving two shits about the success of the venture in general.

dogma1138 · on Oct 27, 2020

Intel got their 3D/2.5D stacking tech from Altera, the FPGA in Xeon sockets also is doing as well as it can considering the niche market.

totalZero · on Oct 28, 2020

Altera was already an Intel Custom Foundry partner before the acquisition. I'm not sure the 2015 acquisition was necessary to accomplish what you are describing.

jl2718 · on Oct 27, 2020

I don’t think NVidia/ARM would affect Xilinx much. Given what the bulk of FPGAs are doing in the data center, I think AMD was looking more at NVidia/Mellanox and of course Intel/Altera, but for networking, not compute. For Xilinx, this gives a path to board-level integration with x86.

andy_ppp · on Oct 27, 2020

Or package level integration...

pclmulqdq · on Oct 27, 2020

Intel had some heat problems when they tried this. The FPGAs weren't able to use their heat budget dynamically, and as a result, the whole SiP had bad performance.

lallysingh · on Oct 28, 2020

AMD chiplets?

andy_ppp · on Oct 28, 2020

Yes this was my thought too, does heat from a separate chiplet really effect the other cores maybe a cm away? I don’t know...

gumby · on Oct 27, 2020

Possible, but I suspect heat and area would be problems, at least for the CPUs.

A smaller AMD core could be supplied as a hard core on the Xilinx part but would that really be worth it?

state_less · on Oct 28, 2020

It would be an interesting to get a cpu, gpu and fpga in one package. ML abstractions could be sent to the most sensible implementation. Maybe handheld mobile SDR transceivers could benefit from such an arrangement too. AMD has had some great low power parts and maybe they’ll drive for that here?

gumby · on Oct 28, 2020

“Lowmpower” is not what comes to mind when I think “FPGA”.

ansible · on Oct 27, 2020

And for their products that include hard cores, maybe they will switch to RISC-V like with the MicroSemi PolarFire.

I'm still debating on getting the Icicle development kit.

WWLink · on Oct 28, 2020

> My speculation is that this gives Xilinx an alternative high performance AMD64 (and possibly lower performance & lower power x86) "hard cores" to displace the Arm cores.

I'm trying to imagine an x86-based Ultascale+ style processor.

Hopefully AMD can help fix the mess known as vivado and the petalinux tools. lol.

baybal2 · on Oct 27, 2020

I do not believe it makes sense to spend so much money for a niche, in a niche product like AI/ML chips.

And I believe AMD are good with using calculators.

datameta · on Oct 27, 2020

Being early on the ML hardware acceleration boat is going to pay off astronomically. Embedded inferencing is going to be a society defining technology by the time we hit mid-decade. It's already being used for predictive maintenance of machinery in IIoT with huge payoffs via decrease in unforeseen total machine failure or need of heavy overhauls.

hacknat · on Oct 27, 2020

It really blows my mind how many people are still bearish on ML. It’s fair to argue timelines (although even that is becoming less true), but I think the evidence is firmly on the side of the bulls now.

person_of_color · on Oct 27, 2020

What is the difference between hard and soft cores?

KSteffensen · on Oct 27, 2020

Soft cores use the configurable logic matrix of the FPGA. You can choose to implement them or not, depending on your use case. They can also be tuned to the use case, adding or modifying CPU instructions, cache structure, e.t.c. This involves writing RTL code, with all the design, verification and backend synthesis work that comes with that. Tools like Synopsys ASIP Designer tries to help with this effort.

Hard cores are not part of the configurable logic matrix but are separate resources on the FPGA. That means they can't be tuned to the use case in the same way as a soft core. The trade-off is that they typically are better optimized with regards to clock frequency and power consumption since the components are made to be a CPU and not generic configurable logic. One example of an FPGA with a hard core CPU would be the Xilinx Zynq devices.

dreamcompiler · on Oct 27, 2020

An FPGA is (to a very crude approximation) just a bunch of static RAM organized in an unusual way. If you think of normal static RAM as "address wires go in one side and data wires come out the other", in an FPGA there are no "address wires" -- it's all data in/data out. The memory cells are still just memory cells; what we label the wires is merely a matter of engineering perspective. In a Xilinx memory cell we choose labels for the wires typically used for logic gates.

Anyway in a Xilinx chip the bits of data you put in the memory cells determine what logic function gets executed. That works because in general any particular stored memory -- in any computer, anywhere -- is (conceptually) just a logic function, and conversely all logic is implementable with the stuff we conventionally call memory.

But we typically don't do that, because "real" logic made of fixed-function transistors is much faster than logic built with changeable memory cells. However, there's a market for fully-changeable logic--even if it's slower--and that's what Xilinx chips are.

Every CPU is just a bunch of registers and logic. If you hand me a few million discrete NAND gates, I can use them to build an X86, a RISC-V, and ARM, or whatever. It will be the size of a house and it will be very slow, but it will run the binary code for that processor. With a Xilinx chip, you have a few million NAND gates (or NOR gates or inverters or whatever you like) at your disposal and they're all on one chip and you can wire them up however you want with nothing but software. Bingo: You can build an X86 out of pure logic, and it's all on one chip rather than being the size of a house. That's a soft core.

The nice thing about soft cores is that you can build whatever CPU functions you want and leave off the functions you don't need. If you want to change the design, you just download a bunch of new bits to the Xilinx memory cells. Thus you can change an ARM into an X86 in an instant, without changing any hardware.

Soft cores are very flexible, but they're also slow, because implementing logic with static RAM cells is slower than doing it with dedicated transistors.

That's where hard cores come in: A hard core is a dedicated area of silicon on the Xilinx chip carved out to only implement an ARM chip or a PowerPC or other CPU with fixed-function transistors. So it's fast. The downside is you can't change its functionality on-the-fly. If you decide you'd rather have a PowerPC than an ARM chip you have to change the whole chip.

In both types of cores, you still have a bunch of memory cells left over that you can program to do whatever kind of logic you like.

AareyBaba · on Oct 27, 2020

How much slower would an FPGA (soft core) implementation of say an ARM core be compared to the hard core implementation ?

dreamcompiler · on Oct 28, 2020

I've seen numbers like 50 MHz vs 1 GHz, so a factor of 20. But that's just raw clock speed. Here's a very interesting talk by a person named Henry Wong who built an out-of-order instruction-level parallel x86 implementation with a soft core, which compensates somewhat for the much slower clock speed. This is a pretty heroic piece of FPGA wizardry:

https://hackaday.com/2019/06/22/fpga-soft-cpu-is-superscalar...

acallan · on Oct 27, 2020

A soft core is a CPU that is programmed into an FPGA instead of a "regular" core that is made of discrete components.

gmueckl · on Oct 27, 2020

Why not license other softcores, e.g. from SiFive?

gvb · on Oct 27, 2020

The performance of soft cores are significantly lower than hard cores.

Xilinx already has a RISC soft core in their MicroBlaze architecture so they don't have a pressing need for a low power, reasonable performance RISC soft core. Ref: https://en.wikipedia.org/wiki/MicroBlaze

AMD has high performance CPUs being fabbed by TSMC (same foundry as Xilinx), so (theoretically) AMD CPUs can be grafted onto the Xilinx FPGA as a hard core.

With AMD and the MicroBlaze, they have the high performance and low power processor spectrum covered with no need for 3rd party licensing costs.

brandmeyer · on Oct 27, 2020

FPGAs are to ASICs as interpreted languages are to compiled languages. I don't mean that literally, but I do mean it in the performance sense. At the same process node, an FPGA is over 50x the power and 1/20th the speed of a dedicated ASIC and it isn't getting any better.

dragontamer · on Oct 27, 2020

That's a decent analogy.

Note however that Xilinx has a dsp slice (UltraScale) which is a prefabbed adder / multiplier. This would be PyTorch in the analogy.

FPGA LUTs cannot compete against ASICs, so modern FPGAs have thousands of dedicated multipliers to compete.

The LUTs compete against software, while the dedicated 'Ultrascale DSP48 Slices' competes against GPUs or Tensors.

--------

It's not easy, and it's not cheap. But those UltraScale DSP48 units are competitive vs GPUs.

It's still my opinion that GPUs win in most cases, due to being more software based and easier to understand. It is also cheaper to make a GPU. But I can see the argument for Xilinx FPGAs if the problem is just right...

brandmeyer · on Oct 27, 2020

With one notable detail: Its much easier to stream data through the GPU than it is through an FPGA. And I say that fully knowing how much of a (relative) PITA it is to stream data through a GPU.

I think it also works better to think of the DSP resources as a big systolic array with lots of local connectivity and memory and only sparse remote connectivity. The SIMD model doesn't really apply.

XMPPwocky · on Oct 27, 2020

What's annoying is there's no real reason it has to be harder to stream through an FPGA- it's largely just that the ecosystem and FPGA vendor tooling is so utterly garbage that installing it will probably attract raccoons to your /opt.

CamperBob2 · on Oct 27, 2020

With one notable detail: Its much easier to stream data through the GPU than it is through an FPGA.

That's an interesting assertion. FPGAs are better at moving data from one place to another than any other general-purpose device I can think of.

How many GPUs have Xilinx GTx-class transceivers, for instance? A GPU with JESD204B/C connectivity would be an extremely interesting piece of hardware.

brandmeyer · on Oct 28, 2020

"Easier" as I'm using it is a metric of engineering effort more than throughput. I don't disagree - there's vastly more aggregate bandwidth available, both internally and on external interfaces.

But its vastly easier to get started with CUDA or OpenCL than it is to get started with a big FPGA.

CamperBob2 · on Oct 28, 2020

True, that much certainly isn't debatable.

NOGDP · on Oct 27, 2020

> But those UltraScale DSP48 units are competitive vs GPUs.

Are they really competitive from a price / performance perspective? Based on my limited understanding, nvidia GPUs, for example, are several times cheaper for similar performance?

dragontamer · on Oct 27, 2020

Mass produced commodity processors will always win in price/performance. That's why x86 won, despite "more efficient" machines (Itanium, SPARC, DEC Alpha, PowerPC, etc. etc.) being developed.

One of the few architectures to beat x86 in price/performance was ARM, because ARM aimed at even smaller and cheaper devices than even x86's ambitions. Ultimately, ARM "out-x86'd" the original x86 business strategy.

-------------

GPUs managed to commoditize themselves thanks to the video game market. Most computers have a GPU in them today, if only for video games (iPhone, Snapdragon, normal PCs, and yes, game consoles). That's an opportunity for GPU-coders, as well as supercomputers who want a "2nd architecture" more suited for a 2nd set of compute problems.

-----

FPGAs will probably never win in price / performance (unless some "commodity purpose" is discovered. I find that highly unlikely). Where FPGAs win is absolute performance, or performance/watt, in some hypothetical tasks that CPUs or GPUs don't do very well. (Ex: BTC Mining, or Deep Learning Systolic Arrays, or... whatever is invented later)

Computers are so cheap, that even a $10,000 FPGA may save more electricity than an equivalent GPU, over the 3 year lifespan of their usage. Electricity costs of data-centers are pretty huge.

The ultimate winner is of course, ASICs, a dedicated circuit for whatever you're trying to do. (Ex: Deep Blue's chess ASIC. Or Alexa's ASIC to interpret voice commands). But FPGAs serve as a stepping stone between CPUs and ASICs.

------

If you have a problem that's already served by a commodity processor, then absolutely use a standard computer! FPGAs are for people who have non-standard problems: weird data-movement, or compute so DENSE that all those cache-layers in the CPU (or GPU) just gets in the way.

XMPPwocky · on Oct 27, 2020

Also note Xilinx's next-gen parts have dedicated VLIW "AI accelerators" (full hard CPUs!)

dragontamer · on Oct 27, 2020

Those AI accelerators aren't really "full CPUs", since there's no cache coherence, really. They're tiny 32kB memory slabs + decoder + ALUs + networking to connect to the rest of the FPGA.

But its certainly more advanced than a DSP slice (which was only somewhat more complicated than a multiply-and-add circuit).

-------

I guess you can think of it as a tiny 32kB SRAM + CPU though. But its still missing a bunch of parts that most people would consider "part of a CPU". But even a GPU provides synchronization functions for its cores to communicate / synchronize together with.

TomVDB · on Oct 27, 2020

In my experience, the 20x performance number is after taking the DSPs into account.

dragontamer · on Oct 27, 2020

Just looking at raw FLOPs: the 7nm Xilinx Versal series tops out at 8 32-bit TFlops (DSP Cores only), plus whatever the CPU-core and LUTs can do (but I assume CPU-core is for management, and LUTs are for routing and not dense compute).

In contrast: the NVidia A100 has 19 32-bit TFlops. Higher than the Xilinx chip, but the Xilinx chip is still within an order of magnitude, and has the benefits of the LUTs still.

-----

It should be noted that Xilinx Versal "AI engine" is a VLIW SIMD-architecture: https://www.xilinx.com/support/documentation/white_papers/wp..., effectively an ASIC-GPU hardwired into the FPGA.

banjo_milkman · on Oct 27, 2020

Raw FLOPs is completely misleading, which is why Nvidia focus on it as a metric. The GPU can't keep those ops active - particularly during inference when most of the data is fresh so caches don't help. It's the roofline model.

In my experience FPGA>GPU for inference, if you have people who can implement good FPGA designs. And inference is more common than training. Much of this is due to explicit memory management and more memory on FPGA.

dragontamer · on Oct 27, 2020

Well, my primary point is that the earlier assertion: "GPUs are 20x faster than FPGAs" is no where close to the theory of operations, let alone reality.

ASICs (in this case: a fully dedicated GPU) obviously wins in the situation it is designed for. The A100, and other GPU designs, probably will have higher FLOPs than any FPGA made on the 7nm node.

But not a "lot" more FLOPs, and the additional flexibility of an FPGA could really help in some problems. It really depends on what you're trying to do.

------

At best, 7nm top-of-the-line GPU is ~2x more FLOPs than 7nm top-of-the-line FPGA under today's environment. In reality, it all comes down to how the software was written (and FPGAs could absolutely win in the right situation)

TomVDB · on Oct 27, 2020

> GPUs are 20x faster than FPGAs

The original comment by brandmeyer said "ASIC", not "GPU".

Take the same RTL. Synthesize it for ASIC and for FPGA. Observe a 20x difference after normalizing for power, area, and clock speed.

tails4e · on Oct 27, 2020

Yes, and FPGAs can be better than GPUs for some applications, even more power efficient, and cost effective.

davrosthedalek · on Oct 27, 2020

The question is, how much does your algorithm get from the 19 TFlops for a GPU, and how much from the 8 from the Versal. I'm sure many algos fit GPUs fine, but some don't, and might get more out of an FPGA.

tails4e · on Oct 27, 2020

I agree with the sentiment, but the numbers are off. It's about 10x the power worst case (maybe 5x for some dsp heavy apps) and also around 5 to 10x for speed. An FPGA can easily run at 100s of MHz, up to 500 with good design pipelining, so suggesting an ASIC could do 500x20 times the speed is 10Ghz, so definitely beyond most ASICs, so I think 5x is more reasonable.

brandmeyer · on Oct 27, 2020

My experience is that to get those "high" clock frequencies that the work per cycle has to be extremely small. If you normalize to total circuit delay in units of time than you still end up many times worse, because you need many extra pipeline cycles to get the Fmax that high.

tails4e · on Oct 28, 2020

My day job is ASIC design and we do some prototyping on FPGAs, so the exact same RTL is used as an input. We always benchmark power, performance, etc between ASIC and FPGA, so this is based on some real deigns. A 5x reduction in power is fair for most of what I've seen and the FPGA is actually better at achieving FMAX than you'd expect - control paths do need a lot more pipelining than ASIC, but compute intensive (DSP) datapaths are pretty good with a few tweaks. I think sometimes people throw code at them and get 100 MHz and say we'll FPGAs are slow so it's expected, but in my experience with a little tuning you can get most datapaths to run at 500MHz. You do pay the power penalty vs dedicated ASIC, but the performance is very good.

brandmeyer · on Oct 28, 2020

I think it depends a great deal on what you're doing. A fully pipelined double-precision floating-point fused multiply-add in FPGA tech will reach well over 500 MHz on current parts, but takes almost 30 cycles of pipelined latency to deliver each result. On the same process node, a well-optimized CPU will run at 6-8x the clock frequency and only require 4 cycles of latency to deliver each result.

Is this flow filled with divide-and-conquer algorithms with very low work per step? Yes. Is that particularly ill-suited to FPGA logic? Yes. Is it unfair to the FPGA? Not in my opinion.

I stand by my claim: If you normalize a general circuit's speed in units of time instead of cycles, then you'll find that ASICs come out much much farther ahead.

tails4e · on Oct 28, 2020

From this [0] it suggests the xilnx floating point core can run at >600Mhz,and the latency of many operations is just a few cycles. Also a s its pipelined the throughput could mean one result per clock, depending on how you configure the core. Seems closer to the 5x to me.

[0] https://www.xilinx.com/support/documentation/ip_documentatio...

brandmeyer · on Oct 28, 2020

That chart doesn't show you the result latency, only the maximum achievable frequency. You have to use Vivado to instantiate an instance with the specific suite of configurable options. When you do that, it will inform you of the result latency: 27-30 cycles for FMA.

dnautics · on Oct 27, 2020

I kind of love this crude analogy.

rjsw · on Oct 27, 2020

I'm guessing you didn't mean to write "softcores", licensing a SiFive design to be a hard core connected to the FPGA fabric would be one option.

duskwuff · on Oct 27, 2020

Xilinx already has a number of devices with ARM hard cores, though (like the Zynq series). There's no compelling reason for them to switch away from that.

rjsw · on Oct 27, 2020

Unless NVidia gives Xilinx a compelling reason to switch away from ARM.

gmueckl · on Oct 27, 2020

Argh, you are right! Thanks for pointing it out. I did pick the wrong wording.

petra · on Oct 27, 2020

What happened recently in AMD's market ?

AWS based ARM processor looks to be widely deployed in the cloud.Nvidia, the leader in GPU compute in buying ARM.Intel, which has suffered deeply because of their 10nm fab problems are going to work with TSMC.And AMD's P/E ratio is at 159, higher than Amazon's!

So Maybe AMD is looking to convert some inflated stock with a predictable business.

And it's better to invest in a predictable business that may have possible synergies with yours. Otherwise it looks bad to the stock market.

And Xilinx is probably the biggest company AMD can buy.

Traster · on Oct 27, 2020

AMD's P/E is high but that's based on the fact that AMD earnings are $390m (2020Q3) vs. $6Bn (2020Q3) for Intel - essentially people are pricing in AMD being the obvious alternative to Intel in the data centre and the potential profit from that is enormous compared to AMD's current market share.

paulmd · on Oct 27, 2020

or another way of putting that is investors are jumping the gun and pricing in years and years of expected marketshare growth that haven't happened yet.

it's a relatively safe bet now that intel has more or less conceded leadership through 2023 but it's not zero risk. The market generally doesn't have an appreciation of that, P/E was still nuts even before the release of Zen2 when AMD's success was far less clear (Zen1/Zen+ were far less appealing products and scaled far less well into server class). It's a lot of amateurs (see: r/AMD_Stock on reddit) buying it because they like the company rather than trading on the fundamentals.

Right now the stock market is just nuts in general though, there's so much money from the Fed's injections sloshing around and looking for any productive asset, and tech companies look like a good bet when everyone is stuck at home, building home offices, consuming tech hardware and electronic media. Housing is getting even more weird as well.

01100011 · on Oct 27, 2020

I don't know, but I been told... AMD margins aren't great. AMD is making phenomenal products lately but if they're giving them away to gain market share then they may never be as profitable as investors would like.

If Intel gets their house in order in a couple years, AMD won't have much time to gain market and raise prices. I've rooted for AMD since the K6 days but I think there's a risk that they'll always be #2(or less).

disordinary · on Oct 27, 2020

They have raised prices for the 5000 series of Ryzen - although they still have a price per performance advantage over intel they're pricing themselves as the market leader.

kbumsik · on Oct 27, 2020

Yup, in 2020 it's getting the new era of semiconductor industry.

phendrenad2 · on Oct 27, 2020

<many years ago> when Intel acquired Altera, and announced Xeon CPUs with on-chip FPGAs, I was optimistic that eventually they would add FPGAs to more low-end desktop CPUs (or at least Xeons in the sub-$1000 zone). But it never materialized. I'm slightly optimistic this time around too, but I suspect that the fact that Intel didn't do it hints at some fundamental difficulty.

Nokinside · on Oct 27, 2020

Nokia designed their ReefShark 5G SoC chipset with significant FPGA component and used Intel as their supplier. Intel couldn't deliver what they promised. It was complete disaster.

They had to redesign ReefShark and cancel dividends. It was a huge setback.

whereistimbo · on Oct 28, 2020

So it's Nokia that Charlie @ Semiaccurate was talking about back in 2018 saying that Intel crushed a USD 20bn market cap company

https://semiaccurate.com/2018/07/02/intel-custom-foundrys-10...

noki_throway · on Oct 27, 2020

This is utter bullshit. Nokia f*cked up because they over-engineered their FPGA solution for 5G. They took largest FPGA in the market and couldn't squeeze their design in it.

It was not Nokia SoC just plain Stratix10. They moved to own SoC after that glorious project.

himinlomax · on Oct 27, 2020

I wonder how much of the delay in FPGA tech adoption is due to the utterly hilarious disaster that are the toolchains. They look like huge brittle proprietary monstrosities, incompatible with modern development methodologies.

fuster · on Oct 28, 2020

I did FPGA development for a few years a little over a decade ago. I recently came back to it for a project after doing software and just wow--the tooling is still absolutely awful. Possibly worse than before. Vivado in particular seems almost designed to foil version control systems. Which files actually contain user input and are necessary to rebuild a project? Why would you want to keep source and configuration files separate from derived objects? Entire swaths of documentation and examples become immediately obsolete with each new tool version. Not to mention infuriating bugs at every turn.

tails4e · on Oct 28, 2020

Version control aside, Vivado is very good at what its intended to do, take RTL and synthesize, place and route, STA, and simulate it all in one tool. With plenty of higher level abstractions like IPI, etc. It's really good at visualisation and cross probing. I use it to check my ASIC RTL designs as it's better than the (way) more expensive ASIC tools. All sources needed to rebild a project are refered to in the .xpr project file. Project rebuilds are completely scriptable, it's really not thst opaque.

KSteffensen · on Oct 27, 2020

And they are based on SystemVerilog and TCL, two of the worst programming languages in serious use.

Those toolchain disasters are not quite as hilarious when you have to use them daily....

bsder · on Oct 27, 2020

Oy. I'm a Python guy, but Tcl is NOT that bad. Do not blame the horrible software engineering at Altera and Xilinx on Tcl. Those companies make more than enough money that they could sit down with Tcl and Tk, spend some time on the code, and have a quite decent tool. Instead, they keep their bitstream completely closed to lock out competitors and saddle the world with shitty tools.

I'm really surprised that Lattice hasn't tried to go around Xilinx and Altera by doing exactly that. You would think that an open bitstream format and a couple million dollars thrown at academic researchers (Lattice makes about $200 million per quarter in gross profit) would produce some real progress, but I digress ...

SystemVerilog, on the other hand, was specifically created because Verilog and SystemC got loose to the end users and the EDA companies were not going to make that mistake again. So, yeah, SystemVerilog is pretty bad.

tails4e · on Oct 28, 2020

Open source tooling would not materalize if bitstream formats were opened, at least not competitive ones. Why? There are already open source versions for synthesis and PnR , and while functional they are very far off the 'terrible' EDA tools everyone rags on Xilinx for. The reality is SystemVerilog is a huge language, and already an open standard yet no open source project supports it fully, so I don't believe for a second if bitstreams were opened we'd see a load of top class tooling appear for synthesis and PnR. The reason is if it has not happened for the first (and arguably easiest) step in the chain i.e. System Verilog, they why would it happen for the others?

bsder · on Oct 28, 2020

> There are already open source versions for synthesis and PnR , and while functional they are very far off the 'terrible' EDA tools everyone rags on Xilinx for.

These "tools" have no target so no incentive to improve. To use them you have to basically push their results back into a Cadence/Synopsys/Mentor toolchain anyway, so you might as well stick to the supported toolchain.

> The reality is SystemVerilog is a huge language, and already an open standard yet no open source project supports it fully

Most commercial systems don't support it fully. And its not clear that SystemVerilog is that superior to VHDL. And, for quite a while, SystemVerilog wasn't open and had some fairly obnoxious patents surrounding it. I don't know when/if that has changed as I have been out of semiconductors for about 20 years now.

Icarus Verilog has been slowly supporting features from SystemVerilog but doesn't have a lot of manpower.

In general, the consolidation of the semiconductor industry and EDA has hurt open-source EDA improvements. There's not very much money coming from companies to fund EDA research. EDA startups can't really get venture funding since VC's all want to fund the next pile of viral social trashware. And anyone with good software skills left the semiconductor industry eons ago because the pay differential is ridiculous.

aseipp · on Oct 28, 2020

The commercial FPGA tools have tremendous technological advantages, but the free part is inherently what many FOSS users value, not the other stuff. You're trying to talk about technical QoR between tools but the difference for anyone who really cares is ideological, not technical.

> The reason is if it has not happened for the first (and arguably easiest) step in the chain i.e. System Verilog, they why would it happen for the others?

Ehhhhh, I don't think I buy this at all. There are dozens of alt-HDLs out there, many of which are quite powerful, designed by solo users. People had working, simple-but-practical PnR for real devices in a ~7k C++ LOC codebase written by an individual (arachne-pnr) and many individuals have independently reverse engineered small-ish scale device families for packing utilities. nextpnr was written by a very small group (solo?) in a year or something. I don't think you could fit an equivalent parser for SV2017 in ~7k LOC, much less elaboration, type checking, a netlist database, to all go along with it. SystemVerilog might actually be the most difficult part of the whole equation because it simply has so much surface area. PnR tools are limited by their target: only targeting small iCE40 devices? Your PNR algorithms don't need to be cutting edge. Targeting SV2017? Your job is hard no matter what device you synthesize for. And I can't think of even a single commercial tool I know from any vendor that supports all of it, up-to-date with SV2017.

All that said, I use SystemVerilog as my "normal" RTL when using commercial tools for stitching together IP, wiring up top modules, etc.

tails4e · on Oct 28, 2020

My point a out SV was that the two major open source simulation tools (Icarus and Verilator) both only support a subset of SV, and not SV 2017, but a lot of SV 2009 is still not supported. Vivado has a free (not open) SV simulator that supports much more of the language. I agree not all of SV is needed for PnR, but what I'm saying is if we don't have the gcc or clang version of SV for simulation yet (vs MSVC or ICC), then what makes you think we'd get a near commercial grade synth / PnR tool? If Xilinx opened up their bitstream format, academics would rejoice, but it would not suddenly spur on a huge improvement in open source PnR tooling. In terms of improving the usability of what is there, given vivado is scriptable, if you want to make a better open one (like an IDE) you can, just call synth_design, etc in batch. This was what Heir Design were doing, and what turned into Vivado after they were acquired by Xilinx. So my point is lots of open source tooling could exist without opening the bitstream format, so given it largely does not, I am of the opinion opening the bitstream format would not change much.

himinlomax · on Oct 28, 2020

> but the free part is inherently what many FOSS users value

The free part is valuable not in that it's cheap, but in that it saves you from having to deal with licensing.

DevOps pioneers hailed from the likes of Google, Amazon and Facebook, who are not exactly short on cash, but you simply couldn't do what they did if you had been nickeled and dimed at every VM and container.

tails4e · on Oct 28, 2020

I have not benchmarked the open source PnR tools, but I expect they are orders of magnitude worse qor than what a commercial one can do. I don't know a LOC comparison between SV and PnR but I'd say both are huge undertakings at commercial features set.

himinlomax · on Oct 28, 2020

> already an open standard yet no open source project supports it fully

Bitstreams are closed. There's little to no point in doing an open source compiler if the target is not just proprietary, but deliberately opaque.

Overall your comment strikes me as what a proprietary compiler advocate would say in the 90s. "GCC? Lol"

Since then, Microsoft had to include Linux in Windows just because they absolutely needed Docker. DevOps was invented based on free/open source, it just couldn't be done proprietary style by a company as large as Microsoft.

KSteffensen · on Oct 28, 2020

I disagree. By far the largest part of SystemVerilog deals with verification, both simulation and formal property proving. These parts have nothing to do with the bitstream formats and the tooling in that area is quite as lacking as the synthesis and PnR tools.

The limitation here is writing the SystemVerilog parser and compiler.

himinlomax · on Oct 28, 2020

What's the incentive for free software hackers and startups to even begin to work on this if the rest of the stack is not just proprietary, but held by actively hostile entities?

KSteffensen · on Oct 28, 2020

There are other places to start working on the stack which is not as actively hostile as place and route, e.g. simulation.

As for the incentive I'm fairly pessimistic. There is definitely no money to be made for a start-up in this space, it is way too conservative. Maybe the hobbyist intellectual challenge of working on some hard problems like constraint solving or formal property proving? There is a massive task of writing a SystemVerilog parser before you get there though and the SAT solving and property proving problems are present elsewhere with lowers barriers to entry.

himinlomax · on Oct 28, 2020

Challenges can be stimulating, but there are diminishing returns. It's not like say, lockpicking or DRM-cracking, in that the subject matter is super hard to begin with, even without the proprietary sabotage.

Having said that, there has been some promising F/OSS work on the small Lattice devices. It allows for a decent, modern workflow, and it's possible because the devices are approachable, but also because Lattice hasn't been hostile. Why they haven't been more supportive is a mystery to me however.

himinlomax · on Oct 27, 2020

TCL itself is not that bad for the purpose IMO; it's more the stuff around it, the proprietary binary formats, the gooey crap, and the non-open nature thereof.

tails4e · on Oct 28, 2020

Tcl is the de facto EDA tool scripting language. It's standard in the HW design world - of course it does not stop there being a second alternative scripting language, but not having TCL would alienate much of the HW design community, so must be there. As for the HDL, vivado supports SystemVerilog, VHDL, and C via HLS. I happen to like SystemVerilog, what about it makes it terrible?

KSteffensen · on Oct 28, 2020

TCL is the only language I have ever worked with where a comment would affect the next line. Might have been an interpreter issue, but it was enough for me never to want to touch it again.

SystemVerilog is a good examle of an organically grown language with no 'benevolent dictator'. A few pet peeves:

* Why is the simulation delta cycle split into 17 regions? Exactly when does the Pre-Re-NBA region happen and what assignments take place there?

* Why can't a function return a dynamic/associative array or a queue? This is clearly possible, since the array find functions return a queue, but it's not possible to define a user function with this return type.

* It has way too much cruft. E.g. what problem does the forkjoin keyword solve? Who thought that was necessary and why? Not a fork-join block, the forkjoin keyword.

* Why can't you have a modport inside a modport? This would be great for e.g. register interfaces, but modports are not composable.

* What is the difference between a const variable and a localparam and why does the language need both constructs?

* Is a covergroup a class or what? It behaves very much like it is, it has a constructor, some class local information and at least one class local function (the sample() function), but you can't extend it.

* Why are begin-end used for scope delimitation everywhere except in constraints where curly brackets are used? I know it was a Cadence donation, but why wasn't the syntax changed before it was merged? Backwards compatibility can only justify so much...

//rant off

edit: formatting

tails4e · on Oct 28, 2020

You're right about tcl, a comment can mess stuff up as the comment is a command that says do nothing. It's a terrible language, and that may be it's worst flaw, but it's still in every EDA tool. It's kind of like how C is still around despite its foot shooting ability costing billions every year due to security and bugs due to buffer overflows, etc. If an EDA tool wanted to break the mold and use say python for scripting they would still likely need to offer a tcl option. It's very ingrained in industry.

As for SV - a lot of your gripes are Verilog issues, and SV has tried to fix some of them. I agree the blocking / nonblocking is a mess but most folks just learn the rules to avoid issues, but delta cycles can be a pain. The syntax limitations/quirks you point out are intersting, though not enough to say the language is terrible, it's extremely powerful with very good composability of types, constrained random is very powerful, the coverage is extensive, assertions again are very powerful. In a way its line a few seperate languages bolted together so sure there is some duplication, but it works surprisingly well in the whole.

rowanG077 · on Oct 27, 2020

I think pricing is also an issue. Anyone with 5 dollars in their pocket can buy an arduino clone and go to town. And many people do as can be seen by the huge hobbyist scene. You want to try FPGA development and do anything that is not blinking a LED? Good luck shelling out hundreds to thousands of dollars for the shittiest software known to this planet.

bsder · on Oct 27, 2020

A Max10 T-Core board from Terasic is $55 academic and tools are free for the Max10 class.

You only start paying for FPGA tools when you need the really big FPGAs.

And, I'll go out on a limb, but, at this point, I think Arduino causes more harm to beginning embedded developers than good. Yeah, the ecosystem is wonderful if you aren't a developer.

However, Arduino is now weird compared to mainstream embedded development. Most things have converged to 32-bit instead of 8-bit. Arm Cortex-M is now mainstream so your architectural understanding is useless. 5V causes a lot of grief given that everybody else in the world is at 3V/3.3V.

A developer basically has to unlearn a bunch of things to move up from an Arduino. I still recommend Arduino to non-developers or somebody just trying to throw together a project, but I no longer recommend them to someone actually trying to learn embedded development.

elcritch · on Oct 28, 2020

Just to clarify, there are many Cortex-M* based Arduino or Arduino compatible boards. There's official Arduino-SAMD BSP support, though they do lack the depth of features, like Timers and such. Though it seems 8-bit procs are still common for super cheap MCU's.

himinlomax · on Oct 28, 2020

The issue is not whether the end-user has to pay, the issue is that this kills incentives for free software tools. gcc and BSD were initially developed on machines costing hundreds of thousands of dollar, that didn't stop them.

captainmarble · on Oct 28, 2020

Arduino gets the job done for most hobbyst, it's also easy to move forward from arduino to esp32 which is 32bit and freertos based.

rowanG077 · on Oct 27, 2020

What does usb 3 gigabit ethernet or pcie ip cost? Is it for free using intel?

bryant · on Oct 27, 2020

I'm optimistic... not so much because of the merits of the acquisition but moreso because of AMD's history with strategic actions. ATI kept them afloat through a CPU performance drought, and divesting globalfoundries secured necessary liquidity. These two alone essentially saved AMD, so I've got faith in leadership being able to make the appropriate strategic maneuvers.

But maybe I'm being overly optimistic. (Probably because—disclosure—I'm long AMD. Been long for years.)

eqvinox · on Oct 27, 2020

I'm hoping/expecting a chip that goes into the Epyc/SP3 socket and has the memory & PCIe & socket crossconnect as hard IP but the CPU cores replaced with programmable logic. If you have a use case for FPGAs, it's more likely you want it in a concentrated form like this... not on low-end or desktop systems :/

If I remember correctly, there was something similar back in the early HyperTransport days...

hderms · on Oct 27, 2020

Are you envisioning retaining at least a few cores? It seems like you'd probably still want an OS running on native silicon.

detaro · on Oct 27, 2020

I assume they are thinking about a design for multi-socket systems.

eqvinox · on Oct 27, 2020

Yeah I think it's both more effective and cheaper to have dual/quad socket systems with 1 "normal" CPU and the rest filled with FPGAs without CPU cores, just to max out on the raw crunching ability. The PCIe block on the FPGA chips could be flexible enough to (re-?)wire directly into the programmable logic, maybe even reconfigurable to other protocols (e.g. 100GE). Also in "normal" NUMA fashion each FPGA would have the memory channels associated with that socket (presumably through the interconnect as if it were a CPU, so the CPU can access it too.)

I'm just looking at this from a logical chain of "who needs FPGAs in their computers?" => "cases with loooots of specific data crunching" => "want a controlling/driving CPU for the complicated parts, but then just concentrate as much FPGA in as possible." => Multi-socket with 1 CPU & rest FPGAs.

(There currently is no commodity Quad-socket SP3 mainboard, not sure if this is a design limitation or just no one made one yet? I'd still say the approach works great with only 2 sockets.)

zrm · on Oct 27, 2020

I wouldn't expect to see anything like this on SP3 anyway, since it would take some time to do the work and by then the current generation would likely be whatever they replace SP3 with in order to support DDR5.

einpoklum · on Oct 27, 2020

Well, you want something more low-end for developers' and hobbyists' machines, I would guess.

eqvinox · on Oct 27, 2020

As much as I agree with you and want one for myself too, I doubt that this market segment is interesting to AMD at all. The kinds of workloads that warrant going FPGA are the kind of workloads where you just give your devs a bunch of high-priced development systems. Those would likely be close to identical to the production boxes, just with more debug pieces plugged in.

dboreham · on Oct 27, 2020

This opinion is unlikely to be popular, and it's been decades since I was a full participant in the hardware business, but...I just have never seen the use case for FPGAs beyond niche prototyping / small run applications, which by definition make no money. I suppose there are also scenarios where you want to keep your design secret from the fab and/or change it every week, but those seem very niche too (NSA, GCHQ, ..?).

ljhsiung · on Oct 27, 2020

Couple things--

1) You underestimate how critical prototyping has become, again likely since you say it's been a couple decades. Time to market has become more important, and verification has become harder as CPUs have gotten even more complex. FPGAs enable cosimulation and emulation, leading to faster iteration of both design and verification efforts and thus better TTM.

FPGAs are so important in the hardware development process that I would even say you're not a serious hardware company if you don't have any FPGA frameworks to design silicon.

2) As others have mentioned, FPGAs are also critical for low-latency workloads that require constant tweaks-- high frequency trading (ugh...) comes to mind. The need for "constant tweaks" could also be satisfied with just "normal" software, but that has higher latency as opposed to an FPGA, and FPGAs can get some crazy performance if you're willing to pay the price (south of 7 figures).

Overall sure, usage of FPGAs might be niche compared to, idk, Javascript; but it's commonplace/practically essential in hardware.

reassembled · on Oct 29, 2020

Whenever discussion of FPGA comes up on HN, someone inevitably points to low latency workflows but nobody ever mentions video capture and play-out boards using FPGAs. Companies like Blackmagic, Elgato, Matrox, etc.

sgerenser · on Oct 29, 2020

Hardware like this often uses FPGAs because there’s a need for highly parallel processing that is difficult or even impossible to do on a off the shelf CPU, but the volumes are too low to justify a custom ASIC. Being able to fix bugs or add features after shipping is a big bonus too.

ATsch · on Oct 27, 2020

It is very likely that the packets of this comment traveled through several FPGAs to get from your computer to my screen. Yes, they are definitely more niche than CPUs. But niche products have really high margins and people willing to pay for them.

FPGAs are already incredibly popular. They're just mostly in things you are unlikely to personally own or know about. You're going to find at minimum one, but probably more FPGAs in things like big routers and other telecom equipment, e.g. cell towers, firewalls, load balancers, enterprise wifi controllers, video conferencing hardware, test equipment like oscilloscopes, sensor buoys, scientific instruments, MRI machines, LIDARs, high end radio equipment, or even just glue logic tying together other components, like in the iphone.

kyting · on Oct 30, 2020

Yes, in addition to that FPGA also appears under the hood of automobiles and electric vehicles.

teleforce · on Oct 27, 2020

I am not sure whether you are serious or trolling but I will bite ;-)

FPGA are being used in many type of applications where real-time is necessary and non-recurrent engineering (NRE) cost need to be minimized, for example here [1].

One classic example is that if you poke under the hood of any signal generator like AWGs, you will probably find an FPGA inside. As you probably aware since you in hardware business, AWGs are probably one of most common equipment in any electronic and electrical labs or companies.

[1]https://www.electronicdesign.com/technologies/fpgas/article/...

y2kenny · on Oct 27, 2020

For those who are not familiar, AWG stands for Arbitrary Waveform Generator.

galaxyLogic · on Oct 28, 2020

Would that be used as part of a synthesizer?

bsder · on Oct 27, 2020

> I just have never seen the use case for FPGAs beyond niche prototyping / small run applications, which by definition make no money.

You are precisely correct. FPGAs are useful when your volume doesn't reach volumes where an ASIC would get amortized.

Networking companies (Cisco, Juniper, etc.) are classically big consumers of FPGAs.

Tektronix seems to make quite a bit of money and there is at least one FPGA in practically every test instrument they make. This holds true for practically all test instrument manufacturers.

I know a LOT of industrial automation and testing companies that generally have FPGAs in their systems. Both for latency and for legacy support (Yeah, GPIB still exists ...).

Yes, they aren't "Arm in a cell phone" type volumes, but that doesn't mean they aren't quite profitable if you can aggregate them.

_9vzr · on Oct 27, 2020

Easily changing the design and being the cheaper option to ASICs for small productions are the two main uses for FPGAs. You may be designing a box that can be configured to do different things so you may want to support multiple FPGA images to switch back and forth depending on the mission. You may just want to be able to easily upgrade firmware for a complex design in the future. For Space DSP applications, the FPGA is king and will probably be for a long time simply due to the ability to cram a lot of functionality into a small space (DSP, microcontroller, combinational logic circuits, and massive I/O banks all in one chip)

seany · on Oct 27, 2020

For low volume (sub 100k units?) they're often the only good way to do configurable* SERDES in any environment that is latency sensitive.

Configurable as in one SKU is in several products, but not necessarily reconfigurable by the end user.

jleahy · on Oct 27, 2020

Not long ago there was an FPGA inside the iPhone (an ice40). Hardly niche.

Answerawake · on Oct 27, 2020

Really? What function did it serve the iPhone?

ATsch · on Oct 27, 2020

Likely just simple glue logic. Things like converting one protocol into another, doing some multiplexing or some simple pre-processing or filtering on some sensor data. They're incredibly tiny (2x2mm) and use little power, so they pop up in designs pretty regularly.

908B64B197 · on Oct 27, 2020

I wonder if they are reprogrammable, so if what's running on these could ever be updated.

saddlerustle · on Oct 27, 2020

They never ended up shipping the high end ones either.

ianhowson · on Oct 27, 2020

It's the usual "fundamental difficulty" with FPGAs -- CPUs and GPUs are faster and more power efficient for compute-intensive tasks. An algorithm on FPGA needs to overcome the 20x worse architectural efficiency just to break even with a CPU or GPU.

The big benefit of having FPGA closely attached to CPU is that you can access the memory and internal buses quickly. Transferring stuff over PCIe hurts a lot. So you could make an argument for jobs using small work units requiring fast turnaround; CUDA kernels take milliseconds to launch.

I worked with some of the early Xeon+FPGA parts and there just wasn't that much we could do with them. There wasn't enough fabric to build anything meaningful and we had an abundance of CPU cores, so the best we could do was specialized I/O accelerators.

Symmetry · on Oct 27, 2020

I think the more relevant comparison here would be ASICs. Softcores on FPGAs are indeed terrible but if you're implementing some algorithm directly at the gate level for cryptography or signal processing or whatever then being able to arrange inputs outputs into dataflows is a big win with no roundrips to general purpose registers or bypass networks. Not having to fetch instructions and being limited in paralellism is also a big win. And generally if you're doing something like mining bitcoin you should expect an FGPA to perform somewhere between an ASIC and a GPU.

The problem is that if a task is common then someone is just going to make an ASIC to do it. And if its uncommon then the terrible FPGA software ecosystem and low prevalence of general purpose FPGAs in the wild mean that people will just do it on a CPU or GPU.

ianhowson · on Oct 27, 2020

> if you're implementing some algorithm directly at the gate level for cryptography or signal processing or whatever then being able to arrange inputs outputs into dataflows is a big win with no roundrips to general purpose registers or bypass networks

This is true, but keep in mind that that sort of algorithm runs insanely well on any CPU or GPU because they, too, do not want to touch main memory. You would be blown away by how much work a CPU can do if you can keep the working set within L1 cache.

Re. ASICs, it's a continuum:

- "flexible, low performance, cheap in small quantities" (CPUs)

- "reasonably flexible, better performance, cheap-ish in small quantities" (GPUs)

- "inflexible, best performance, expensive in small quantities" (ASICs)

FPGAs fit somewhere between GPUs and ASICs -- poor flexibility, maybe great performance, moderate small-quantity price.

If your problem is too big for GPUs, as you say, sometimes it's easiest to jump straight to an ASIC. But it's such a narrow window in the HPC landscape. The vast majority of customers, even with large problems, are just buying a lot of GPUs. They're using off-the-shelf frameworks even though a custom CUDA kernel would give them 10x performance and 10% cost. The cost to go to an FPGA is too great and the performance gain simply isn't there.

QuixoticQuibit · on Oct 27, 2020

Im skeptical as well. The primary reason IMO is the software. How do you easily reconfigure your FPGA to efficiently run whatever computationally intensive and/or specialized algorithm you have?

nomercy400 · on Oct 27, 2020

It is doable. I've seen it during my Computer Engineering courses 14 years ago.

Basically you analyze the code for candidates, select a candidate, upload your custom hardware design, run your operation on the hardware, and repeat.

The difficult part is that uploading your hardware to FPGA is in the order of tenths of seconds, which is ages when compared to the nano and micro seconds your CPU works. So your specific operation must be worthwhile to upload.

A bit of FPGA on your CPU makes it more flexible, for example your could set a profile such as 'crypto' or 'video' to add some specific hardware acceleration to you general purpose CPU.

Imagine your CPU being able to switch your embedded GPU into another CPU core.

hajile · on Oct 27, 2020

Codecs are a great example.

Let's say the current zen 2 had an FPGA onboard. AMD could sell you an upgraded design with AV1 support for a few dollars. Most people aren't going to buy a new CPU on the basis of a video decoder, but they'll buy an upgrade to the chip that auto "installs" itself. That's a sale AMD otherwise wouldn't have made.

dboreham · on Oct 27, 2020

Except the new codec won't fit into the FPGA they put on that chip that's in the field.

eqvinox · on Oct 27, 2020

The codec is gonna get nowhere near to filling a "CPU-class" FPGA, so if anything you get fewer parallel instances of it.

Someone · on Oct 27, 2020

Also, for the way most modern CPUs are used: how do you task switch? If the hardware is large enough, you can deploy multiple configurations at a time, but does software support that? Is is possible to have relocatable configurations?

In theory, you could even page out code, but I guess the speed of that will be slow. Also, paging in probably would be challenging because the logical units aren’t uniform (if only because not all of them will be connected to external wires)

varispeed · on Oct 27, 2020

This can be used with a client-server model, that is if there are enough free cells and I/O available on FPGA it could let it install the configuration and then any application could communicate with it concurrently, maybe with some basic auth.

Someone · on Oct 27, 2020

But from what I understand of FPGAs, fragmentation would be a serious issue. You may have the free cells and I/O you need to implement some circuit, but if they’re dispersed over your FPGA or even connected, but in the wrong shape for the circuit you’re building, that’s useless.

An enormous crossbar could solve that, but I would think that would be way too costly, if practically possible at all.

rjsw · on Oct 27, 2020

You can reconfigure just part of the FPGA, it isn't used all that often though.

threatripper · on Oct 27, 2020

I would see it being used more like a GPU than a CPU.

gmueckl · on Oct 27, 2020

Even GPUs multitask all the time, even though it's less obvious. Cooperative multitasking in this context means setting up and executing different shaders/kernels. The overhead involved in this is quite manageable.

Repurposing FPGAs to different tasks means loading a new bitstream into the device every time. So it is much more efficient to grant exclusive access to each user of the device for long stretches od time. The proper pattern for that is more like a job queue.

dragontamer · on Oct 27, 2020

An actual GPU or CPU will always run circles around an FPGA CPU or FPGA GPU.

Where FPGAs win are new architectures, like Systolic engines. Entirely different computer designs from the ground up.

wtetzner · on Oct 27, 2020

I believe there is some amount of support in OpenCL for FPGAs. If only we could get companies to property support OpenCL, we'd have a nice software interface to pretty much any kind of compute resource on a machine.

SSLy · on Oct 27, 2020

My armchair amateur brain immediately thought about something CUDA-like.

numpad0 · on Oct 27, 2020

FPGA code takes hours to compile, yet product/model specific

simias · on Oct 27, 2020

You're not wrong but I expect they'd make it so that the various models would be similar enough (at least within a given CPU generation) so that you could use mostly precompiled artifacts instead of rerouting everything from scratch.

I've always been pretty skeptical of their approach though, in order to be usable they'd need excellent tooling to support the feature, and if there's one thing that existing FPGA software isn't it's "excellent".

Getting FPGAs to perform well is often an art more than a science ("hey guys, let's try a different seed to see if we get better timings") so the idea that non-hardware people would start to routinely generate FPGA bitstreams for their projects is so implausible that it's almost comical to me.

Maybe one day we'll have a GCC/LLVM for FPGAs and it'll be a different story.

pclmulqdq · on Oct 27, 2020

Beyond the GCC/LLVM, you also really need a standard library. Nobody is talking about that. Today, if you want a std::map on an FPGA, you have to either pay $100k or build it yourself. That's untenable.

FPGAhacker · on Oct 27, 2020

You would use precompiled modules or compositions of these modules (pipeline or parallel).

This can be a relatively fast operation. Seconds or less depending on complexity.

rathel · on Oct 27, 2020

Apparently after Altera acquisition they sought "synergies" in all the different divisions. My friend was an intern who was tasked with porting some of the network protocol stack to SystemVerilog. Apparently it did work and SystemVerilog was the right HDL to use because of support for structs that can map to packet headers. I'm not sure it's being used in production.

It'd be interesting to see how AMD will execute and integrate this acquisition, considering they are less of a madhouse company than Intel.

ip26 · on Oct 27, 2020

It absolutely seems like there are some incredible opportunities in the high end. But as far as I know, FPGAs are quite area hungry which makes them inherently expensive. It's hard to think you'd find FPGAs of meaningful size included in $60 desktop CPUs, unless the harvesting opportunity is significant.

baybal2 · on Oct 27, 2020

On-package, not on-chip

MrXOR · on Oct 27, 2020

But FPGAs are the future. They will put the CPU out of work almost entirely.

https://www.nextplatform.com/2020/01/31/when-will-fpgas-outw...

https://www.nextplatform.com/2018/03/19/fpga-maker-xilinx-sa...

throwmemoney · on Oct 27, 2020

It is really funny when you find out that Intel uses Xilinx FPGAs for prototyping as they cannot get what they acquired (Altera) working in house to make things work.

cjwinans79 · on Oct 27, 2020

If true, ouch! Intel seems to be getting kicked every which way these days. Too complacent when they once ruled the roost.

rustybolt · on Oct 27, 2020

Source?

kyting · on Oct 30, 2020

Intel uses Synopsys Zebu ASIC emulator, which essentially has Xilinx Ultrascale inside.

hongseleco · on Oct 27, 2020

I second this, I want details!

GloriousKoji · on Oct 27, 2020

I don't work for Intel but I do work for a semiconductor company. While Xilinx FPGAs aren't directly used for prototyping, there are a large number of third party boxes purchased to accelerate hardware simulations and they're chocked full of FPGAs.

yvdriess · on Oct 27, 2020

That's the more likely explanation. Altera vs Xilinx isn't just the hardware, it's an entirely different toolchain. It would be insane of Intel to demand third parties to move all their technology over to Altera's.

tails4e · on Oct 28, 2020

I imagine they are using a 3rd party prototyping solution like HAPS from synosys, which uses Xilinx FPGAs inside - for good reason, for quite some time Xilinx have had some very large devices built specifically for this market. It must sting a little bit though....

tgtweak · on Oct 28, 2020

Xilinx are very flexible and the tooling makes them even more effective.

If AMD wants to get serious in the datacenter/AI/ML space they need a xilinx-like approach to developing tooling. Cuda, nvenc, cudnn etc craps all over amd's offerings in the same space where they are even available.

AMD is prepping to take over datacenter terf and this puts them in a good place to bring a bigger offering.

imtringued · on Oct 27, 2020

I would rather see Processing in Memory (PIM) become mainstream than FPGAs. FPGAs are basically an assembly line that you can change overnight. Excellent at one task and they minimize end to end latency but if it's actually about performance you are entirely dependent on the DSP slices.

With PIM your CPU resources grow with the size of your memory. All you have to do is partition your data and then just write regular C code with the only difference being that it is executed by a processor inside your RAM.

Having more cores is basically the same thing as having more DSP slices. Since those cores are directly embedded inside memory they have high data locality which is basically the only other benefit FPGAs have over CPUs (assuming same number of DSP and cores). Obviously it's easier to program than either GPUs or FPGAs.

nickff · on Oct 27, 2020

You're comparing two completely different paradigms.

FPGAs are not an assembly line at all; the assembly line analogy applies much more closely to a processor's pipeline.

FPGAs are just a massive set of very simple logic units which can be interconnected in many different ways. FPGAs are best used in situations where you want to perform a series of simple operations on a massive incoming dataset, in parallel, especially in real-time situations. Performing domain transforms on data coming in from sensor arrays is one very good application for FPGAs.

charlesdaniels · on Oct 27, 2020

I think GP meant in the sense that reconfiguration time is large. FPGAs cannot be effectively time-division multiplexed, as a full reconfiguration can take up to tens of seconds.

GP is also correct that DSP/SRAM blocks are critical to performance. FPGAs are not very efficient at raw compute if you have to synthesize everything out of LEs.

The performance benefit of FPGAs, which PIMs also share (in theory, there aren't any PIMs ready for real-world deployment AFAIK) is that they can leverage much larger memory bandwidths than general purpose CPUs can. An FPGA might run at a lower clock rate (low 100s of MHz), but be able to operate on several kb per clock cycle. This can work really well when paired with off-chip logic to convert high rate serial interfaces to lower clock rate parallel interfaces, then back after the FPGA is done processing.

There is also a lot of work going on in the space of time-division multiplexing FPGAs effectively. The two main approaches are overlay architectures and partial reconfiguration. The former implements another high-level fabric on top of the FPGA which will be less general-purpose, but can be reconfigured faster. The latter is a feature vendors have added to some high-end chips where specific regions of the FPGA can be reconfigured without affecting other regions.

nickff · on Oct 27, 2020

I agree with your statements regarding reconfiguration and TDM, though I still think GP (and to a lesser extent, your comment) are very focused on traditional computing paradigms. FPGAs are much more promising for real-time systems, particularly those with very large incoming datasets to transform or otherwise process in parallel. Thinking about FPGAs in terms of how 'quickly' they process data is really missing the point IMO.

One common, and very good application for FPGAs is for use in Active Electronically Scanned Array radar, sonar, or camera image processing. You can perform parallel filtering and transforms with various frequency and phase settings, which would be impossible for a similarly-sized processor to do.

FPGAs have the potential to revolutionize sensor arrays, by making them much more useful and affordable.

charlesdaniels · on Oct 27, 2020

I agree yes. "Traditional computing paradigms" are (IMO) not all that interesting as research topics at this point. As far as I know, most of the work in that space is in branch prediction and cache replacement policies.

FPGAs are what you really want when you need to deal with high resolution data that is coming in at very high data rates. Often even a very fast general-purpose processor with hand-tuned assembly simply won't have even the theoretical memory throughput to process your data without "dropping frames". They also have the benefit of deterministic performance, which with modern caching/branch prediction systems you can't guarantee (AFAIK, my computer architecture knowledge isn't that cutting edge).

They can also work really well if you have some computation you want to do that is so far off the beaten path for general-purpose processors (or so memory bound) that FPGAs can take the cake.

There is also some work in sprinkling even more hardlogic into the FPGA dies, like processors or accelerator cores for various applications. FPGAs are great for implementing the glue logic to move data between those.

nickff · on Oct 27, 2020

I think you touched on one of the biggest things about FPGAs in your comment, which is that they are perfect for computation that does not involve branches. If you've got a lot of data, and you're doing transforms, you usually don't need to branch, so being able to crunch everything through in parallel is a massive benefit.

Also agree that additional hard logic or peripherals will be a game-changer for FPGAs, though they would make each design more domain-specific. Alternatively, we may see a shift in how the interconnects are done, which allows for flexible use of these 'modules'. It's also possible that we'll see continual increases in LE counts which make more specialized hardware unnecessary. I don't know which way things will go.

qwertox · on Oct 27, 2020

Are FPGAs rewritable at will with almost no degradation (for example a rewrite every minute over many days), or do they suffer the same degradation problems as EEPROMs (like the ones in Arduinos)?

calacatta · on Oct 27, 2020

FPGAs use SRAM to store their program, while CPLDs (complex programmable logic devices) use flash. Some clever marketeers here & there will stretch this distinction but it's an established convention. The internal architecture between FPGAs and CPLDs is typically different, based on cost of memory vs. logic and typical use cases. FPGAs tend to be used for higher-capacity computations but require more life support; CPLDs tend to serve smaller, true glue logic applications, where the low config overhead (just apply power) and quicker & simpler power-up is a strong pull.

So CPLDs will have some kind of NVRAM wear-out concern, and this is almost always specified as a number of maximum erase & program cycles.

legulere · on Oct 27, 2020

Having memzero and memcpy happen in memory without polluting caches would already be a huge gain.

AnthonyMouse · on Oct 27, 2020

Though you could presumably just do the same thing with new instructions, i.e. have an instruction for secure zeroing which zeros the data in any memory or cache where it exists but doesn't cause the zeros to be cached anywhere they weren't already.

ohazi · on Oct 27, 2020

Please, God, no.

The surface area for security vulnerabilities is already impossibly high. Do we really want to add "firmware running on a DIMM exfiltrating key material" to that list?

slashdev · on Oct 27, 2020

There are security problems with every architecture. There is no fundamental reason PIM should be less secure than what we do now. This is just fear of the unknown talking.

varispeed · on Oct 27, 2020

I hope they will not drop their CPLD chips. They were made obsolete at least once but Xilinx fortunately decided to extend the support for a couple of more years. CPLD are very useful for repairing vintage gear where logic components fail and are no longer available (for example custom programmed PALs), so you can describe the logic in Verilog and often solder it in place of multiple chips. If they drop it then the only way to do it would be to use full blown FPGA which is a bit wasteful.

thrtythreeforty · on Oct 27, 2020

I would be very interested in reading a blog post about this. Is there one that I can read or you'd be willing to write?

varispeed · on Oct 27, 2020

Have look at materials linked there http://dangerousprototypes.com/blog/2011/04/17/replacing-dis...

znpy · on Oct 27, 2020

Hopefully we'll get better open source tools, a better Vivado maybe ?

NotCamelCase · on Oct 27, 2020

Open source or not, I think this'll (hope so!) lead to a much better FPGA development flow within Xilinx ecosystem in not-so-distant future.

amelius · on Oct 27, 2020

They could have better open source tools simply by opening up all their specs.

The same holds for any other vendor.

galangalalgol · on Oct 27, 2020

That is my hope as well. Also the new versal line compares itself to a gpu, maybe we see a gpu/fpga combo?

thecureforzits · on Oct 27, 2020

Why? What market is there in open source any more?

detaro · on Oct 27, 2020

One that has tools that don't make your users hate you. Seriously, the open-source FPGA toolchains are breath of fresh air to use, despite being small projects with few contributors (although due to that and no vendor support they are severly limited in supported targets and special features).

duskwuff · on Oct 27, 2020

Yep. The Icestorm toolchain for Lattice FPGAs is a real breath of fresh air -- fast compile times, multiple sets of interoperable tools, open file formats, development in the open... it's great. I just wish something like this was available for more than just Lattice parts.

mhh__ · on Oct 27, 2020

FPGA development tools are generally dated, very very expensive, and one way streets for customisation.

From what I understand, open sourcing the bitstream format in its entirety will only do so much but it would certainly help. It's not just building GCC for FPGAs

rcxdude · on Oct 27, 2020

Just better tools would be nice (and open-sourcing would bring some hope for that). FPGA tooling is atrocious, especially if you're used to software tooling. And the difference in tooling can sell chips all on its own.

johnwalkr · on Oct 27, 2020

Xilinx Zynq and Ultrascale series are multiple Ghz ARM cores plus FPGA. They're incredibly useful for small volume niche use cases and to give an example from my industry, becoming popular in space applications. The reason is hardware qualification/verification is extremely expensive but a change to FPGA fabric is not.

My point is Xilinx have already proven ARM CPU+FPGA on one die and I think AMD CPU+FPGA is very likely to be a success.

Between this, ARM adoption, Apple Silicon and similar offerings (which kind of skipped ARM+FPGA for ARM+ASIC), RISC-V, it's like 1992 again with exciting architectures. Only this time software abstraction is much better so there is not a huge pressure to converge on only 1-2 architectures.

panpanna · on Oct 27, 2020

ARM + ASIC? Isn't that simply a SoC?

Edit: technically, the arm part is also ASIC, but you get what I mean

FPGAhacker · on Oct 27, 2020

Could be interesting. I prefer an independent Xilinx, but maybe competition with intel will stimulate the whole reconfigurable computing revolution that fizzled out.

hehetrthrthrjn · on Oct 27, 2020

This is a smart move, reflecting Intel's own, with an eye to the datacenter where the FPGA is seen as having a bright future.

saagarjha · on Oct 27, 2020

Has Intel done much with Altera? I haven’t heard much of anything come out of that partnership. (Then again, I’m not plugged in to this stuff.)

mastax · on Oct 27, 2020

I don't use FPGAs (tooling is too poor, languages are bad, up-front costs are high) but I hang out on FPGA forums and the overwhelming consensus has been bad. Chipmakers and especially high-performance chipmakers have always been focused on high-volume and/or high-margin customers, but the Intel acquisition has made Altera worse in that regard. Their sales and support teams were integrated into Intel and now you can't get any support from them whatsoever even if you spend $MM/yr. You need to funnel even basic questions and bug reports through a distributor contact to have any chance. I forget the specifics but they made tooling even more restrictive/expensive. The only new products out of it are a few Xeons with built-in FPGA ($$$$$), good for HFT guys I guess.

QuixoticQuibit · on Oct 27, 2020

Can you expand on why Intel’s move was smart (what did the Altera acquisition do for them) and why FPGAs have a bright future in the datacenter?

From what little I’ve seen in this space, FPGAs have not made large inroads in the ML space or datacenters in general. This seems partly due to their inefficient nature compared to ASICS and moreover their software.

Unless AMD is planning something really ambitious (e.g., true software-based hardware reconfiguration that doesn’t require HDL knowledge) and are confident they’ve figured it out, I’m not sure what they hope to achieve here.

brandmeyer · on Oct 27, 2020

> true software-based hardware reconfiguration that doesn’t require HDL knowledge

This has been a holy grail for at least two decades. Its very much like asking for a programming language that can be used by non-programmers.

ansible · on Oct 27, 2020

> From what little I’ve seen in this space, FPGAs have not made large inroads in the ML space or datacenters in general.

I don't know that they have actually made large inroads into those spaces, but Xilinx is indeed pushing hard for that. For years now.

voxadam · on Oct 27, 2020

I'd love to know why Intel chose to buy Altera instead of the industry leader Xilinx.

ksec · on Oct 27, 2020

Both Altera and Xilinx were on TSMC. Altera wanted an Edge over Xilinx, at the time Intel was committing ( on paper ) to their Custom Foundry. Altera switched and bet to Intel Custom Foundry. Nothing ever worked out with Intel Custom Foundry because they were not used to working with others on Foundry Process. Intel thought the problem was with Altera not being part of company and they had too much cash so they might as well buy them for better synergy. And it did, getting internal access seems to have ( on paper or slides ) speed things up with product launches and roadmap, until they hit the Intel 10nm fiasco.

andrew_gen · on Oct 27, 2020

I believe Altera was already manufacturing chips using Intel process prior to acquisition, while Xilinx is using TSMC?

saagarjha · on Oct 27, 2020

Yes, I believe they switched off of TSMC in 2013 or so.

pclmulqdq · on Oct 27, 2020

Altera Stratix V FPGAs actually had more market share than Virtex 7s. They were better chips. That said, the production delays around Arria 10 and Stratix 10 and the time lag caused by the Intel acquisition totally killed their market position. The only reasons to use Intel FPGAs now are (1) 64-bit floating point support or (2) if your Intel salesman gives you a really good deal.

hehetrthrthrjn · on Oct 27, 2020

It may have been Xilinx not wanting to get into bed with Intel. Xilinx may have wanted a degree of technical independence or freedom to carry out their own strategy that was not forthcoming from Intel.

saagarjha · on Oct 27, 2020

Wonder what AMD told them.

dbcooper · on Oct 27, 2020

"We will use TSMC."

rjsw · on Oct 27, 2020

Probably worth looking at where the different FPGA brands were being fabbed. Xilinx is a better fit to AMD.

baybal2 · on Oct 27, 2020

> This is a smart move, reflecting Intel's own, with an eye to the datacenter where the FPGA is seen as having a bright future.

What in the world FPGAs have to do in a datacentre?

sadiq · on Oct 27, 2020

Microsoft have been using them in Bing and other projects for a while: https://www.microsoft.com/en-us/research/project/project-cat...

unsigner · on Oct 27, 2020

Word on the street is that this was a vanity project of a VP, and never resulted in performance levels that couldn't be achieved with a little bit of focused optimization of boring old CPU work (threading + SIMD).

ATsch · on Oct 27, 2020

There's been a recent trend to increasingly move more compute capabilities into NICs. This has been going on for a while, but has gained a new dimension with cloud providers. For example, with their "Nitro" system, AWS can more or less run their Hypervisor entirely on the NIC and completely offload the network and storage virtualization from their servers. This development is likely to continue. FPGAs are going to play a significant part in that because they allow the customers to reconfigure this hardware according to their needs.

saagarjha · on Oct 27, 2020

Aren’t they already widely used as NICs? And I many places are beginning to offer them for ML workloads and such.