Everybody seems to view this as AMD mimicing Intel when it acquired Altera. (That acquisition has not born visible fruit.)
My contrarian speculation is that this is a move driven by Xilinx vs. Nvidia given Nvidia’s purchase of Arm and Xilinx’ push into AI/ML. Xilinx is threatened by Nvidia’s move given their dependence on Arm processors in their SOC chips and their ongoing fight in the AI/ML (including autonomous vehicles) product space. My speculation is that this gives Xilinx an alternative high performance AMD64 (and possibly lower performance & lower power x86) "hard cores" to displace the Arm cores.
I think you're onto something here. AMD is likely seeing the end of the road for their CPU business within the next decade, since it will run up against physics and truly insane cost structures that will come after 5nm. At the same time we're far past the practical limit wrt ISA complexity (as evidenced by periodic lamentations about AVX512 on this site). The only real way to go past all of that right now is specialized compute, reconfigurable on demand, deployment of which is hampered by the fact that it's very expensive and not integrated into anything, so the decision to use it is very deliberate, which in practice means it rarely ever happens at all. Bundle a mid-size FPGA as a standardized chiplet on a CPU, integrate it well, provide less painful tooling, and that will change. Want hardware FFT? You got it. Want hardware TPU for bfloat16? You got it. Want it for int8? You got it. Think of just being able to add whatever specialized instruction(s) you want to your CPU.
I'm not sure this is worth $35B, but if Lisa Su thinks so, it probably is. She's proven herself to be one of the most capable CEOs in tech.
Also, the advantage Altera supposedly got after being acquired by Intel was better fab integration with what was then the best process technology available (High-end FPGAs genuinely need good processes).
1. That's no longer the case, so sucks for Altera / Intel
2. AMD doesn't have a fab, so any advantages are necessarily on the design / architecture / integration side.
My read on the Altera acc was that Intel needed to shore up fab volumes in the face of their foundry customers jumping to TSMC first chance they could. As the capital required per node continues to rise exponentially, they need more and more volume to amortize that over. This is also why they're trying to get into GPUs again.
To slightly refine this, Intel didn't have many "foundry customers" before Altera. Via Wikipedia (https://en.wikipedia.org/wiki/Intel#Opening_up_the_foundries...), the need to fill up the manufacturing lines was engendered by poor x86 CPU sales around ~2013, not poor third-party fab runs. In 2013, Intel was still ahead of TSMC with 22 nm.
Didn't they also drag down Panasonic or something? I remember there being a lot of rumors of them being an extremely bad partner at the time, basically no technical support, unreliable capacity due to by core business and not giving two shits about the success of the venture in general.
Altera was already an Intel Custom Foundry partner before the acquisition. I'm not sure the 2015 acquisition was necessary to accomplish what you are describing.
I don’t think NVidia/ARM would affect Xilinx much. Given what the bulk of FPGAs are doing in the data center, I think AMD was looking more at NVidia/Mellanox and of course Intel/Altera, but for networking, not compute. For Xilinx, this gives a path to board-level integration with x86.
Intel had some heat problems when they tried this. The FPGAs weren't able to use their heat budget dynamically, and as a result, the whole SiP had bad performance.
It would be an interesting to get a cpu, gpu and fpga in one package. ML abstractions could be sent to the most sensible implementation. Maybe handheld mobile SDR transceivers could benefit from such an arrangement too. AMD has had some great low power parts and maybe they’ll drive for that here?
> My speculation is that this gives Xilinx an alternative high performance AMD64 (and possibly lower performance & lower power x86) "hard cores" to displace the Arm cores.
I'm trying to imagine an x86-based Ultascale+ style processor.
Hopefully AMD can help fix the mess known as vivado and the petalinux tools. lol.
Being early on the ML hardware acceleration boat is going to pay off astronomically. Embedded inferencing is going to be a society defining technology by the time we hit mid-decade. It's already being used for predictive maintenance of machinery in IIoT with huge payoffs via decrease in unforeseen total machine failure or need of heavy overhauls.
It really blows my mind how many people are still bearish on ML. It’s fair to argue timelines (although even that is becoming less true), but I think the evidence is firmly on the side of the bulls now.
Soft cores use the configurable logic matrix of the FPGA. You can choose to implement them or not, depending on your use case. They can also be tuned to the use case, adding or modifying CPU instructions, cache structure, e.t.c. This involves writing RTL code, with all the design, verification and backend synthesis work that comes with that. Tools like Synopsys ASIP Designer tries to help with this effort.
Hard cores are not part of the configurable logic matrix but are separate resources on the FPGA. That means they can't be tuned to the use case in the same way as a soft core. The trade-off is that they typically are better optimized with regards to clock frequency and power consumption since the components are made to be a CPU and not generic configurable logic. One example of an FPGA with a hard core CPU would be the Xilinx Zynq devices.
An FPGA is (to a very crude approximation) just a bunch of static RAM organized in an unusual way. If you think of normal static RAM as "address wires go in one side and data wires come out the other", in an FPGA there are no "address wires" -- it's all data in/data out. The memory cells are still just memory cells; what we label the wires is merely a matter of engineering perspective. In a Xilinx memory cell we choose labels for the wires typically used for logic gates.
Anyway in a Xilinx chip the bits of data you put in the memory cells determine what logic function gets executed. That works because in general any particular stored memory -- in any computer, anywhere -- is (conceptually) just a logic function, and conversely all logic is implementable with the stuff we conventionally call memory.
But we typically don't do that, because "real" logic made of fixed-function transistors is much faster than logic built with changeable memory cells. However, there's a market for fully-changeable logic--even if it's slower--and that's what Xilinx chips are.
Every CPU is just a bunch of registers and logic. If you hand me a few million discrete NAND gates, I can use them to build an X86, a RISC-V, and ARM, or whatever. It will be the size of a house and it will be very slow, but it will run the binary code for that processor. With a Xilinx chip, you have a few million NAND gates (or NOR gates or inverters or whatever you like) at your disposal and they're all on one chip and you can wire them up however you want with nothing but software. Bingo: You can build an X86 out of pure logic, and it's all on one chip rather than being the size of a house. That's a soft core.
The nice thing about soft cores is that you can build whatever CPU functions you want and leave off the functions you don't need. If you want to change the design, you just download a bunch of new bits to the Xilinx memory cells. Thus you can change an ARM into an X86 in an instant, without changing any hardware.
Soft cores are very flexible, but they're also slow, because implementing logic with static RAM cells is slower than doing it with dedicated transistors.
That's where hard cores come in: A hard core is a dedicated area of silicon on the Xilinx chip carved out to only implement an ARM chip or a PowerPC or other CPU with fixed-function transistors. So it's fast. The downside is you can't change its functionality on-the-fly. If you decide you'd rather have a PowerPC than an ARM chip you have to change the whole chip.
In both types of cores, you still have a bunch of memory cells left over that you can program to do whatever kind of logic you like.
I've seen numbers like 50 MHz vs 1 GHz, so a factor of 20. But that's just raw clock speed. Here's a very interesting talk by a person named Henry Wong who built an out-of-order instruction-level parallel x86 implementation with a soft core, which compensates somewhat for the much slower clock speed. This is a pretty heroic piece of FPGA wizardry:
The performance of soft cores are significantly lower than hard cores.
Xilinx already has a RISC soft core in their MicroBlaze architecture so they don't have a pressing need for a low power, reasonable performance RISC soft core. Ref: https://en.wikipedia.org/wiki/MicroBlaze
AMD has high performance CPUs being fabbed by TSMC (same foundry as Xilinx), so (theoretically) AMD CPUs can be grafted onto the Xilinx FPGA as a hard core.
With AMD and the MicroBlaze, they have the high performance and low power processor spectrum covered with no need for 3rd party licensing costs.
FPGAs are to ASICs as interpreted languages are to compiled languages. I don't mean that literally, but I do mean it in the performance sense. At the same process node, an FPGA is over 50x the power and 1/20th the speed of a dedicated ASIC and it isn't getting any better.
Note however that Xilinx has a dsp slice (UltraScale) which is a prefabbed adder / multiplier. This would be PyTorch in the analogy.
FPGA LUTs cannot compete against ASICs, so modern FPGAs have thousands of dedicated multipliers to compete.
The LUTs compete against software, while the dedicated 'Ultrascale DSP48 Slices' competes against GPUs or Tensors.
--------
It's not easy, and it's not cheap. But those UltraScale DSP48 units are competitive vs GPUs.
It's still my opinion that GPUs win in most cases, due to being more software based and easier to understand. It is also cheaper to make a GPU. But I can see the argument for Xilinx FPGAs if the problem is just right...
With one notable detail: Its much easier to stream data through the GPU than it is through an FPGA. And I say that fully knowing how much of a (relative) PITA it is to stream data through a GPU.
I think it also works better to think of the DSP resources as a big systolic array with lots of local connectivity and memory and only sparse remote connectivity. The SIMD model doesn't really apply.
What's annoying is there's no real reason it has to be harder to stream through an FPGA- it's largely just that the ecosystem and FPGA vendor tooling is so utterly garbage that installing it will probably attract raccoons to your /opt.
With one notable detail: Its much easier to stream data through the GPU than it is through an FPGA.
That's an interesting assertion. FPGAs are better at moving data from one place to another than any other general-purpose device I can think of.
How many GPUs have Xilinx GTx-class transceivers, for instance? A GPU with JESD204B/C connectivity would be an extremely interesting piece of hardware.
"Easier" as I'm using it is a metric of engineering effort more than throughput. I don't disagree - there's vastly more aggregate bandwidth available, both internally and on external interfaces.
But its vastly easier to get started with CUDA or OpenCL than it is to get started with a big FPGA.
> But those UltraScale DSP48 units are competitive vs GPUs.
Are they really competitive from a price / performance perspective? Based on my limited understanding, nvidia GPUs, for example, are several times cheaper for similar performance?
Mass produced commodity processors will always win in price/performance. That's why x86 won, despite "more efficient" machines (Itanium, SPARC, DEC Alpha, PowerPC, etc. etc.) being developed.
One of the few architectures to beat x86 in price/performance was ARM, because ARM aimed at even smaller and cheaper devices than even x86's ambitions. Ultimately, ARM "out-x86'd" the original x86 business strategy.
-------------
GPUs managed to commoditize themselves thanks to the video game market. Most computers have a GPU in them today, if only for video games (iPhone, Snapdragon, normal PCs, and yes, game consoles). That's an opportunity for GPU-coders, as well as supercomputers who want a "2nd architecture" more suited for a 2nd set of compute problems.
-----
FPGAs will probably never win in price / performance (unless some "commodity purpose" is discovered. I find that highly unlikely). Where FPGAs win is absolute performance, or performance/watt, in some hypothetical tasks that CPUs or GPUs don't do very well. (Ex: BTC Mining, or Deep Learning Systolic Arrays, or... whatever is invented later)
Computers are so cheap, that even a $10,000 FPGA may save more electricity than an equivalent GPU, over the 3 year lifespan of their usage. Electricity costs of data-centers are pretty huge.
The ultimate winner is of course, ASICs, a dedicated circuit for whatever you're trying to do. (Ex: Deep Blue's chess ASIC. Or Alexa's ASIC to interpret voice commands). But FPGAs serve as a stepping stone between CPUs and ASICs.
------
If you have a problem that's already served by a commodity processor, then absolutely use a standard computer! FPGAs are for people who have non-standard problems: weird data-movement, or compute so DENSE that all those cache-layers in the CPU (or GPU) just gets in the way.
Those AI accelerators aren't really "full CPUs", since there's no cache coherence, really. They're tiny 32kB memory slabs + decoder + ALUs + networking to connect to the rest of the FPGA.
But its certainly more advanced than a DSP slice (which was only somewhat more complicated than a multiply-and-add circuit).
-------
I guess you can think of it as a tiny 32kB SRAM + CPU though. But its still missing a bunch of parts that most people would consider "part of a CPU". But even a GPU provides synchronization functions for its cores to communicate / synchronize together with.
Just looking at raw FLOPs: the 7nm Xilinx Versal series tops out at 8 32-bit TFlops (DSP Cores only), plus whatever the CPU-core and LUTs can do (but I assume CPU-core is for management, and LUTs are for routing and not dense compute).
In contrast: the NVidia A100 has 19 32-bit TFlops. Higher than the Xilinx chip, but the Xilinx chip is still within an order of magnitude, and has the benefits of the LUTs still.
Raw FLOPs is completely misleading, which is why Nvidia focus on it as a metric. The GPU can't keep those ops active -
particularly during inference when most of the data is fresh so caches don't help. It's the roofline model.
In my experience FPGA>GPU for inference, if you have people who can implement good FPGA designs. And inference is more common than training. Much of this is due to explicit memory management and more memory on FPGA.
Well, my primary point is that the earlier assertion: "GPUs are 20x faster than FPGAs" is no where close to the theory of operations, let alone reality.
ASICs (in this case: a fully dedicated GPU) obviously wins in the situation it is designed for. The A100, and other GPU designs, probably will have higher FLOPs than any FPGA made on the 7nm node.
But not a "lot" more FLOPs, and the additional flexibility of an FPGA could really help in some problems. It really depends on what you're trying to do.
------
At best, 7nm top-of-the-line GPU is ~2x more FLOPs than 7nm top-of-the-line FPGA under today's environment. In reality, it all comes down to how the software was written (and FPGAs could absolutely win in the right situation)
The question is, how much does your algorithm get from the 19 TFlops for a GPU, and how much from the 8 from the Versal. I'm sure many algos fit GPUs fine, but some don't, and might get more out of an FPGA.
I agree with the sentiment, but the numbers are off. It's about 10x the power worst case (maybe 5x for some dsp heavy apps) and also around 5 to 10x for speed. An FPGA can easily run at 100s of MHz, up to 500 with good design pipelining, so suggesting an ASIC could do 500x20 times the speed is 10Ghz, so definitely beyond most ASICs, so I think 5x is more reasonable.
My experience is that to get those "high" clock frequencies that the work per cycle has to be extremely small. If you normalize to total circuit delay in units of time than you still end up many times worse, because you need many extra pipeline cycles to get the Fmax that high.
My day job is ASIC design and we do some prototyping on FPGAs, so the exact same RTL is used as an input. We always benchmark power, performance, etc between ASIC and FPGA, so this is based on some real deigns. A 5x reduction in power is fair for most of what I've seen and the FPGA is actually better at achieving FMAX than you'd expect - control paths do need a lot more pipelining than ASIC, but compute intensive (DSP) datapaths are pretty good with a few tweaks. I think sometimes people throw code at them and get 100 MHz and say we'll FPGAs are slow so it's expected, but in my experience with a little tuning you can get most datapaths to run at 500MHz. You do pay the power penalty vs dedicated ASIC, but the performance is very good.
I think it depends a great deal on what you're doing. A fully pipelined double-precision floating-point fused multiply-add in FPGA tech will reach well over 500 MHz on current parts, but takes almost 30 cycles of pipelined latency to deliver each result. On the same process node, a well-optimized CPU will run at 6-8x the clock frequency and only require 4 cycles of latency to deliver each result.
Is this flow filled with divide-and-conquer algorithms with very low work per step? Yes. Is that particularly ill-suited to FPGA logic? Yes. Is it unfair to the FPGA? Not in my opinion.
I stand by my claim: If you normalize a general circuit's speed in units of time instead of cycles, then you'll find that ASICs come out much much farther ahead.
From this [0] it suggests the xilnx floating point core can run at >600Mhz,and the latency of many operations is just a few cycles. Also a
s its pipelined the throughput could mean one result per clock, depending on how you configure the core. Seems closer to the 5x to me.
That chart doesn't show you the result latency, only the maximum achievable frequency. You have to use Vivado to instantiate an instance with the specific suite of configurable options. When you do that, it will inform you of the result latency: 27-30 cycles for FMA.
Xilinx already has a number of devices with ARM hard cores, though (like the Zynq series). There's no compelling reason for them to switch away from that.
AWS based ARM processor looks to be widely deployed in the cloud.Nvidia, the leader in GPU compute in buying ARM.Intel, which has suffered deeply because of their 10nm fab problems are going to work with TSMC.And AMD's P/E ratio is at 159, higher than Amazon's!
So Maybe AMD is looking to convert some inflated stock with a predictable business.
And it's better to invest in a predictable business that may have possible synergies with yours. Otherwise it looks bad to the stock market.
And Xilinx is probably the biggest company AMD can buy.
AMD's P/E is high but that's based on the fact that AMD earnings are $390m (2020Q3) vs. $6Bn (2020Q3) for Intel - essentially people are pricing in AMD being the obvious alternative to Intel in the data centre and the potential profit from that is enormous compared to AMD's current market share.
or another way of putting that is investors are jumping the gun and pricing in years and years of expected marketshare growth that haven't happened yet.
it's a relatively safe bet now that intel has more or less conceded leadership through 2023 but it's not zero risk. The market generally doesn't have an appreciation of that, P/E was still nuts even before the release of Zen2 when AMD's success was far less clear (Zen1/Zen+ were far less appealing products and scaled far less well into server class). It's a lot of amateurs (see: r/AMD_Stock on reddit) buying it because they like the company rather than trading on the fundamentals.
Right now the stock market is just nuts in general though, there's so much money from the Fed's injections sloshing around and looking for any productive asset, and tech companies look like a good bet when everyone is stuck at home, building home offices, consuming tech hardware and electronic media. Housing is getting even more weird as well.
I don't know, but I been told... AMD margins aren't great. AMD is making phenomenal products lately but if they're giving them away to gain market share then they may never be as profitable as investors would like.
If Intel gets their house in order in a couple years, AMD won't have much time to gain market and raise prices. I've rooted for AMD since the K6 days but I think there's a risk that they'll always be #2(or less).
They have raised prices for the 5000 series of Ryzen - although they still have a price per performance advantage over intel they're pricing themselves as the market leader.
<many years ago> when Intel acquired Altera, and announced Xeon CPUs with on-chip FPGAs, I was optimistic that eventually they would add FPGAs to more low-end desktop CPUs (or at least Xeons in the sub-$1000 zone). But it never materialized. I'm slightly optimistic this time around too, but I suspect that the fact that Intel didn't do it hints at some fundamental difficulty.
Nokia designed their ReefShark 5G SoC chipset with significant FPGA component and used Intel as their supplier. Intel couldn't deliver what they promised. It was complete disaster.
They had to redesign ReefShark and cancel dividends. It was a huge setback.
This is utter bullshit.
Nokia f*cked up because they over-engineered their FPGA solution for 5G. They took largest FPGA in the market and couldn't squeeze their design in it.
It was not Nokia SoC just plain Stratix10. They moved to own SoC after that glorious project.
I wonder how much of the delay in FPGA tech adoption is due to the utterly hilarious disaster that are the toolchains. They look like huge brittle proprietary monstrosities, incompatible with modern development methodologies.
I did FPGA development for a few years a little over a decade ago. I recently came back to it for a project after doing software and just wow--the tooling is still absolutely awful. Possibly worse than before. Vivado in particular seems almost designed to foil version control systems. Which files actually contain user input and are necessary to rebuild a project? Why would you want to keep source and configuration files separate from derived objects? Entire swaths of documentation and examples become immediately obsolete with each new tool version. Not to mention infuriating bugs at every turn.
Version control aside, Vivado is very good at what its intended to do, take RTL and synthesize, place and route, STA, and simulate it all in one tool. With plenty of higher level abstractions like IPI, etc. It's really good at visualisation and cross probing. I use it to check my ASIC RTL designs as it's better than the (way) more expensive ASIC tools. All sources needed to rebild a project are refered to in the .xpr project file. Project rebuilds are completely scriptable, it's really not thst opaque.
Oy. I'm a Python guy, but Tcl is NOT that bad. Do not blame the horrible software engineering at Altera and Xilinx on Tcl. Those companies make more than enough money that they could sit down with Tcl and Tk, spend some time on the code, and have a quite decent tool. Instead, they keep their bitstream completely closed to lock out competitors and saddle the world with shitty tools.
I'm really surprised that Lattice hasn't tried to go around Xilinx and Altera by doing exactly that. You would think that an open bitstream format and a couple million dollars thrown at academic researchers (Lattice makes about $200 million per quarter in gross profit) would produce some real progress, but I digress ...
SystemVerilog, on the other hand, was specifically created because Verilog and SystemC got loose to the end users and the EDA companies were not going to make that mistake again. So, yeah, SystemVerilog is pretty bad.
Open source tooling would not materalize if bitstream formats were opened, at least not competitive ones. Why? There are already open source versions for synthesis and PnR , and while functional they are very far off the 'terrible' EDA tools everyone rags on Xilinx for. The reality is SystemVerilog is a huge language, and already an open standard yet no open source project supports it fully, so I don't believe for a second if bitstreams were opened we'd see a load of top class tooling appear for synthesis and PnR. The reason is if it has not happened for the first (and arguably easiest) step in the chain i.e. System Verilog, they why would it happen for the others?
> There are already open source versions for synthesis and PnR , and while functional they are very far off the 'terrible' EDA tools everyone rags on Xilinx for.
These "tools" have no target so no incentive to improve. To use them you have to basically push their results back into a Cadence/Synopsys/Mentor toolchain anyway, so you might as well stick to the supported toolchain.
> The reality is SystemVerilog is a huge language, and already an open standard yet no open source project supports it fully
Most commercial systems don't support it fully. And its not clear that SystemVerilog is that superior to VHDL. And, for quite a while, SystemVerilog wasn't open and had some fairly obnoxious patents surrounding it. I don't know when/if that has changed as I have been out of semiconductors for about 20 years now.
Icarus Verilog has been slowly supporting features from SystemVerilog but doesn't have a lot of manpower.
In general, the consolidation of the semiconductor industry and EDA has hurt open-source EDA improvements. There's not very much money coming from companies to fund EDA research. EDA startups can't really get venture funding since VC's all want to fund the next pile of viral social trashware. And anyone with good software skills left the semiconductor industry eons ago because the pay differential is ridiculous.
The commercial FPGA tools have tremendous technological advantages, but the free part is inherently what many FOSS users value, not the other stuff. You're trying to talk about technical QoR between tools but the difference for anyone who really cares is ideological, not technical.
> The reason is if it has not happened for the first (and arguably easiest) step in the chain i.e. System Verilog, they why would it happen for the others?
Ehhhhh, I don't think I buy this at all. There are dozens of alt-HDLs out there, many of which are quite powerful, designed by solo users. People had working, simple-but-practical PnR for real devices in a ~7k C++ LOC codebase written by an individual (arachne-pnr) and many individuals have independently reverse engineered small-ish scale device families for packing utilities. nextpnr was written by a very small group (solo?) in a year or something. I don't think you could fit an equivalent parser for SV2017 in ~7k LOC, much less elaboration, type checking, a netlist database, to all go along with it. SystemVerilog might actually be the most difficult part of the whole equation because it simply has so much surface area. PnR tools are limited by their target: only targeting small iCE40 devices? Your PNR algorithms don't need to be cutting edge. Targeting SV2017? Your job is hard no matter what device you synthesize for. And I can't think of even a single commercial tool I know from any vendor that supports all of it, up-to-date with SV2017.
All that said, I use SystemVerilog as my "normal" RTL when using commercial tools for stitching together IP, wiring up top modules, etc.
My point a out SV was that the two major open source simulation tools (Icarus and Verilator) both only support a subset of SV, and not SV 2017, but a lot of SV 2009 is still not supported. Vivado has a free (not open) SV simulator that supports much more of the language. I agree not all of SV is needed for PnR, but what I'm saying is if we don't have the gcc or clang version of SV for simulation yet (vs MSVC or ICC), then what makes you think we'd get a near commercial grade synth / PnR tool? If Xilinx opened up their bitstream format, academics would rejoice, but it would not suddenly spur on a huge improvement in open source PnR tooling. In terms of improving the usability of what is there, given vivado is scriptable, if you want to make a better open one (like an IDE) you can, just call synth_design, etc in batch. This was what Heir Design were doing, and what turned into Vivado after they were acquired by Xilinx. So my point is lots of open source tooling could exist without opening the bitstream format, so given it largely does not, I am of the opinion opening the bitstream format would not change much.
> but the free part is inherently what many FOSS users value
The free part is valuable not in that it's cheap, but in that it saves you from having to deal with licensing.
DevOps pioneers hailed from the likes of Google, Amazon and Facebook, who are not exactly short on cash, but you simply couldn't do what they did if you had been nickeled and dimed at every VM and container.
I have not benchmarked the open source PnR tools, but I expect they are orders of magnitude worse qor than what a commercial one can do. I don't know a LOC comparison between SV and PnR but I'd say both are huge undertakings at commercial features set.
> already an open standard yet no open source project supports it fully
Bitstreams are closed. There's little to no point in doing an open source compiler if the target is not just proprietary, but deliberately opaque.
Overall your comment strikes me as what a proprietary compiler advocate would say in the 90s. "GCC? Lol"
Since then, Microsoft had to include Linux in Windows just because they absolutely needed Docker. DevOps was invented based on free/open source, it just couldn't be done proprietary style by a company as large as Microsoft.
I disagree. By far the largest part of SystemVerilog deals with verification, both simulation and formal property proving. These parts have nothing to do with the bitstream formats and the tooling in that area is quite as lacking as the synthesis and PnR tools.
The limitation here is writing the SystemVerilog parser and compiler.
What's the incentive for free software hackers and startups to even begin to work on this if the rest of the stack is not just proprietary, but held by actively hostile entities?
There are other places to start working on the stack which is not as actively hostile as place and route, e.g. simulation.
As for the incentive I'm fairly pessimistic. There is definitely no money to be made for a start-up in this space, it is way too conservative. Maybe the hobbyist intellectual challenge of working on some hard problems like constraint solving or formal property proving? There is a massive task of writing a SystemVerilog parser before you get there though and the SAT solving and property proving problems are present elsewhere with lowers barriers to entry.
Challenges can be stimulating, but there are diminishing returns. It's not like say, lockpicking or DRM-cracking, in that the subject matter is super hard to begin with, even without the proprietary sabotage.
Having said that, there has been some promising F/OSS work on the small Lattice devices. It allows for a decent, modern workflow, and it's possible because the devices are approachable, but also because Lattice hasn't been hostile. Why they haven't been more supportive is a mystery to me however.
TCL itself is not that bad for the purpose IMO; it's more the stuff around it, the proprietary binary formats, the gooey crap, and the non-open nature thereof.
Tcl is the de facto EDA tool scripting language. It's standard in the HW design world - of course it does not stop there being a second alternative scripting language, but not having TCL would alienate much of the HW design community, so must be there. As for the HDL, vivado supports SystemVerilog, VHDL, and C via HLS. I happen to like SystemVerilog, what about it makes it terrible?
TCL is the only language I have ever worked with where a comment would affect the next line. Might have been an interpreter issue, but it was enough for me never to want to touch it again.
SystemVerilog is a good examle of an organically grown language with no 'benevolent dictator'. A few pet peeves:
* Why is the simulation delta cycle split into 17 regions? Exactly when does the Pre-Re-NBA region happen and what assignments take place there?
* Why can't a function return a dynamic/associative array or a queue? This is clearly possible, since the array find functions return a queue, but it's not possible to define a user function with this return type.
* It has way too much cruft. E.g. what problem does the forkjoin keyword solve? Who thought that was necessary and why? Not a fork-join block, the forkjoin keyword.
* Why can't you have a modport inside a modport? This would be great for e.g. register interfaces, but modports are not composable.
* What is the difference between a const variable and a localparam and why does the language need both constructs?
* Is a covergroup a class or what? It behaves very much like it is, it has a constructor, some class local information and at least one class local function (the sample()
function), but you can't extend it.
* Why are begin-end used for scope delimitation everywhere except in constraints where curly brackets are used? I know it was a Cadence donation, but why wasn't the syntax changed before it was merged? Backwards compatibility can only justify so much...
You're right about tcl, a comment can mess stuff up as the comment is a command that says do nothing. It's a terrible language, and that may be it's worst flaw, but it's still in every EDA tool. It's kind of like how C is still around despite its foot shooting ability costing billions every year due to security and bugs due to buffer overflows, etc. If an EDA tool wanted to break the mold and use say python for scripting they would still likely need to offer a tcl option. It's very ingrained in industry.
As for SV - a lot of your gripes are Verilog issues, and SV has tried to fix some of them. I agree the blocking / nonblocking is a mess but most folks just learn the rules to avoid issues, but delta cycles can be a pain. The syntax limitations/quirks you point out are intersting, though not enough to say the language is terrible, it's extremely powerful with very good composability of types, constrained random is very powerful, the coverage is extensive, assertions again are very powerful. In a way its line a few seperate languages bolted together so sure there is some duplication, but it works surprisingly well in the whole.
I think pricing is also an issue. Anyone with 5 dollars in their pocket can buy an arduino clone and go to town. And many people do as can be seen by the huge hobbyist scene. You want to try FPGA development and do anything that is not blinking a LED? Good luck shelling out hundreds to thousands of dollars for the shittiest software known to this planet.
A Max10 T-Core board from Terasic is $55 academic and tools are free for the Max10 class.
You only start paying for FPGA tools when you need the really big FPGAs.
And, I'll go out on a limb, but, at this point, I think Arduino causes more harm to beginning embedded developers than good. Yeah, the ecosystem is wonderful if you aren't a developer.
However, Arduino is now weird compared to mainstream embedded development. Most things have converged to 32-bit instead of 8-bit. Arm Cortex-M is now mainstream so your architectural understanding is useless. 5V causes a lot of grief given that everybody else in the world is at 3V/3.3V.
A developer basically has to unlearn a bunch of things to move up from an Arduino. I still recommend Arduino to non-developers or somebody just trying to throw together a project, but I no longer recommend them to someone actually trying to learn embedded development.
Just to clarify, there are many Cortex-M* based Arduino or Arduino compatible boards. There's official Arduino-SAMD BSP support, though they do lack the depth of features, like Timers and such. Though it seems 8-bit procs are still common for super cheap MCU's.
The issue is not whether the end-user has to pay, the issue is that this kills incentives for free software tools. gcc and BSD were initially developed on machines costing hundreds of thousands of dollar, that didn't stop them.
I'm optimistic... not so much because of the merits of the acquisition but moreso because of AMD's history with strategic actions. ATI kept them afloat through a CPU performance drought, and divesting globalfoundries secured necessary liquidity. These two alone essentially saved AMD, so I've got faith in leadership being able to make the appropriate strategic maneuvers.
But maybe I'm being overly optimistic. (Probably because—disclosure—I'm long AMD. Been long for years.)
I'm hoping/expecting a chip that goes into the Epyc/SP3 socket and has the memory & PCIe & socket crossconnect as hard IP but the CPU cores replaced with programmable logic. If you have a use case for FPGAs, it's more likely you want it in a concentrated form like this... not on low-end or desktop systems :/
If I remember correctly, there was something similar back in the early HyperTransport days...
Yeah I think it's both more effective and cheaper to have dual/quad socket systems with 1 "normal" CPU and the rest filled with FPGAs without CPU cores, just to max out on the raw crunching ability. The PCIe block on the FPGA chips could be flexible enough to (re-?)wire directly into the programmable logic, maybe even reconfigurable to other protocols (e.g. 100GE). Also in "normal" NUMA fashion each FPGA would have the memory channels associated with that socket (presumably through the interconnect as if it were a CPU, so the CPU can access it too.)
I'm just looking at this from a logical chain of "who needs FPGAs in their computers?" => "cases with loooots of specific data crunching" => "want a controlling/driving CPU for the complicated parts, but then just concentrate as much FPGA in as possible." => Multi-socket with 1 CPU & rest FPGAs.
(There currently is no commodity Quad-socket SP3 mainboard, not sure if this is a design limitation or just no one made one yet? I'd still say the approach works great with only 2 sockets.)
I wouldn't expect to see anything like this on SP3 anyway, since it would take some time to do the work and by then the current generation would likely be whatever they replace SP3 with in order to support DDR5.
As much as I agree with you and want one for myself too, I doubt that this market segment is interesting to AMD at all. The kinds of workloads that warrant going FPGA are the kind of workloads where you just give your devs a bunch of high-priced development systems. Those would likely be close to identical to the production boxes, just with more debug pieces plugged in.
This opinion is unlikely to be popular, and it's been decades since I was a full participant in the hardware business, but...I just have never seen the use case for FPGAs beyond niche prototyping / small run applications, which by definition make no money. I suppose there are also scenarios where you want to keep your design secret from the fab and/or change it every week, but those seem very niche too (NSA, GCHQ, ..?).
1) You underestimate how critical prototyping has become, again likely since you say it's been a couple decades. Time to market has become more important, and verification has become harder as CPUs have gotten even more complex. FPGAs enable cosimulation and emulation, leading to faster iteration of both design and verification efforts and thus better TTM.
FPGAs are so important in the hardware development process that I would even say you're not a serious hardware company if you don't have any FPGA frameworks to design silicon.
2) As others have mentioned, FPGAs are also critical for low-latency workloads that require constant tweaks-- high frequency trading (ugh...) comes to mind. The need for "constant tweaks" could also be satisfied with just "normal" software, but that has higher latency as opposed to an FPGA, and FPGAs can get some crazy performance if you're willing to pay the price (south of 7 figures).
Overall sure, usage of FPGAs might be niche compared to, idk, Javascript; but it's commonplace/practically essential in hardware.
Whenever discussion of FPGA comes up on HN, someone inevitably points to low latency workflows but nobody ever mentions video capture and play-out boards using FPGAs. Companies like Blackmagic, Elgato, Matrox, etc.
Hardware like this often uses FPGAs because there’s a need for highly parallel processing that is difficult or even impossible to do on a off the shelf CPU, but the volumes are too low to justify a custom ASIC. Being able to fix bugs or add features after shipping is a big bonus too.
It is very likely that the packets of this comment traveled through several FPGAs to get from your computer to my screen. Yes, they are definitely more niche than CPUs. But niche products have really high margins and people willing to pay for them.
FPGAs are already incredibly popular. They're just mostly in things you are unlikely to personally own or know about. You're going to find at minimum one, but probably more FPGAs in things like big routers and other telecom equipment, e.g. cell towers, firewalls, load balancers, enterprise wifi controllers, video conferencing hardware, test equipment like oscilloscopes, sensor buoys, scientific instruments, MRI machines, LIDARs, high end radio equipment, or even just glue logic tying together other components, like in the iphone.
I am not sure whether you are serious or trolling but I will bite ;-)
FPGA are being used in many type of applications where real-time is necessary and non-recurrent engineering (NRE) cost need to be minimized, for example here [1].
One classic example is that if you poke under the hood of any signal generator like AWGs, you will probably find an FPGA inside. As you probably aware since you in hardware business, AWGs are probably one of most common equipment in any electronic and electrical labs or companies.
> I just have never seen the use case for FPGAs beyond niche prototyping / small run applications, which by definition make no money.
You are precisely correct. FPGAs are useful when your volume doesn't reach volumes where an ASIC would get amortized.
Networking companies (Cisco, Juniper, etc.) are classically big consumers of FPGAs.
Tektronix seems to make quite a bit of money and there is at least one FPGA in practically every test instrument they make. This holds true for practically all test instrument manufacturers.
I know a LOT of industrial automation and testing companies that generally have FPGAs in their systems. Both for latency and for legacy support (Yeah, GPIB still exists ...).
Yes, they aren't "Arm in a cell phone" type volumes, but that doesn't mean they aren't quite profitable if you can aggregate them.
Easily changing the design and being the cheaper option to ASICs for small productions are the two main uses for FPGAs. You may be designing a box that can be configured to do different things so you may want to support multiple FPGA images to switch back and forth depending on the mission. You may just want to be able to easily upgrade firmware for a complex design in the future. For Space DSP applications, the FPGA is king and will probably be for a long time simply due to the ability to cram a lot of functionality into a small space (DSP, microcontroller, combinational logic circuits, and massive I/O banks all in one chip)
Likely just simple glue logic. Things like converting one protocol into another, doing some multiplexing or some simple pre-processing or filtering on some sensor data. They're incredibly tiny (2x2mm) and use little power, so they pop up in designs pretty regularly.
It's the usual "fundamental difficulty" with FPGAs -- CPUs and GPUs are faster and more power efficient for compute-intensive tasks. An algorithm on FPGA needs to overcome the 20x worse architectural efficiency just to break even with a CPU or GPU.
The big benefit of having FPGA closely attached to CPU is that you can access the memory and internal buses quickly. Transferring stuff over PCIe hurts a lot. So you could make an argument for jobs using small work units requiring fast turnaround; CUDA kernels take milliseconds to launch.
I worked with some of the early Xeon+FPGA parts and there just wasn't that much we could do with them. There wasn't enough fabric to build anything meaningful and we had an abundance of CPU cores, so the best we could do was specialized I/O accelerators.
I think the more relevant comparison here would be ASICs. Softcores on FPGAs are indeed terrible but if you're implementing some algorithm directly at the gate level for cryptography or signal processing or whatever then being able to arrange inputs outputs into dataflows is a big win with no roundrips to general purpose registers or bypass networks. Not having to fetch instructions and being limited in paralellism is also a big win. And generally if you're doing something like mining bitcoin you should expect an FGPA to perform somewhere between an ASIC and a GPU.
The problem is that if a task is common then someone is just going to make an ASIC to do it. And if its uncommon then the terrible FPGA software ecosystem and low prevalence of general purpose FPGAs in the wild mean that people will just do it on a CPU or GPU.
> if you're implementing some algorithm directly at the gate level for cryptography or signal processing or whatever then being able to arrange inputs outputs into dataflows is a big win with no roundrips to general purpose registers or bypass networks
This is true, but keep in mind that that sort of algorithm runs insanely well on any CPU or GPU because they, too, do not want to touch main memory. You would be blown away by how much work a CPU can do if you can keep the working set within L1 cache.
Re. ASICs, it's a continuum:
- "flexible, low performance, cheap in small quantities" (CPUs)
- "reasonably flexible, better performance, cheap-ish in small quantities" (GPUs)
- "inflexible, best performance, expensive in small quantities" (ASICs)
FPGAs fit somewhere between GPUs and ASICs -- poor flexibility, maybe great performance, moderate small-quantity price.
If your problem is too big for GPUs, as you say, sometimes it's easiest to jump straight to an ASIC. But it's such a narrow window in the HPC landscape. The vast majority of customers, even with large problems, are just buying a lot of GPUs. They're using off-the-shelf frameworks even though a custom CUDA kernel would give them 10x performance and 10% cost. The cost to go to an FPGA is too great and the performance gain simply isn't there.
Im skeptical as well. The primary reason IMO is the software. How do you easily reconfigure your FPGA to efficiently run whatever computationally intensive and/or specialized algorithm you have?
It is doable. I've seen it during my Computer Engineering courses 14 years ago.
Basically you analyze the code for candidates, select a candidate, upload your custom hardware design, run your operation on the hardware, and repeat.
The difficult part is that uploading your hardware to FPGA is in the order of tenths of seconds, which is ages when compared to the nano and micro seconds your CPU works.
So your specific operation must be worthwhile to upload.
A bit of FPGA on your CPU makes it more flexible, for example your could set a profile such as 'crypto' or 'video' to add some specific hardware acceleration to you general purpose CPU.
Imagine your CPU being able to switch your embedded GPU into another CPU core.
Let's say the current zen 2 had an FPGA onboard. AMD could sell you an upgraded design with AV1 support for a few dollars. Most people aren't going to buy a new CPU on the basis of a video decoder, but they'll buy an upgrade to the chip that auto "installs" itself. That's a sale AMD otherwise wouldn't have made.
Also, for the way most modern CPUs are used: how do you task switch? If the hardware is large enough, you can deploy multiple configurations at a time, but does software support that? Is is possible to have relocatable configurations?
In theory, you could even page out code, but I guess the speed of that will be slow. Also, paging in probably would be challenging because the logical units aren’t uniform (if only because not all of them will be connected to external wires)
This can be used with a client-server model, that is if there are enough free cells and I/O available on FPGA it could let it install the configuration and then any application could communicate with it concurrently, maybe with some basic auth.
But from what I understand of FPGAs, fragmentation would be a serious issue. You may have the free cells and I/O you need to implement some circuit, but if they’re dispersed over your FPGA or even connected, but in the wrong shape for the circuit you’re building, that’s useless.
An enormous crossbar could solve that, but I would think that would be way too costly, if practically possible at all.
Even GPUs multitask all the time, even though it's less obvious. Cooperative multitasking in this context means setting up and executing different shaders/kernels. The overhead involved in this is quite manageable.
Repurposing FPGAs to different tasks means loading a new bitstream into the device every time. So it is much more efficient to grant exclusive access to each user of the device for long stretches od time. The proper pattern for that is more like a job queue.
I believe there is some amount of support in OpenCL for FPGAs. If only we could get companies to property support OpenCL, we'd have a nice software interface to pretty much any kind of compute resource on a machine.
You're not wrong but I expect they'd make it so that the various models would be similar enough (at least within a given CPU generation) so that you could use mostly precompiled artifacts instead of rerouting everything from scratch.
I've always been pretty skeptical of their approach though, in order to be usable they'd need excellent tooling to support the feature, and if there's one thing that existing FPGA software isn't it's "excellent".
Getting FPGAs to perform well is often an art more than a science ("hey guys, let's try a different seed to see if we get better timings") so the idea that non-hardware people would start to routinely generate FPGA bitstreams for their projects is so implausible that it's almost comical to me.
Maybe one day we'll have a GCC/LLVM for FPGAs and it'll be a different story.
Beyond the GCC/LLVM, you also really need a standard library. Nobody is talking about that. Today, if you want a std::map on an FPGA, you have to either pay $100k or build it yourself. That's untenable.
Apparently after Altera acquisition they sought "synergies" in all the different divisions. My friend was an intern who was tasked with porting some of the network protocol stack to SystemVerilog. Apparently it did work and SystemVerilog was the right HDL to use because of support for structs that can map to packet headers. I'm not sure it's being used in production.
It'd be interesting to see how AMD will execute and integrate this acquisition, considering they are less of a madhouse company than Intel.
It absolutely seems like there are some incredible opportunities in the high end. But as far as I know, FPGAs are quite area hungry which makes them inherently expensive. It's hard to think you'd find FPGAs of meaningful size included in $60 desktop CPUs, unless the harvesting opportunity is significant.
It is really funny when you find out that Intel uses Xilinx FPGAs for prototyping as they cannot get what they acquired (Altera) working in house to make things work.
I don't work for Intel but I do work for a semiconductor company. While Xilinx FPGAs aren't directly used for prototyping, there are a large number of third party boxes purchased to accelerate hardware simulations and they're chocked full of FPGAs.
That's the more likely explanation. Altera vs Xilinx isn't just the hardware, it's an entirely different toolchain. It would be insane of Intel to demand third parties to move all their technology over to Altera's.
I imagine they are using a 3rd party prototyping solution like HAPS from synosys, which uses Xilinx FPGAs inside - for good reason, for quite some time Xilinx have had some very large devices built specifically for this market. It must sting a little bit though....
Xilinx are very flexible and the tooling makes them even more effective.
If AMD wants to get serious in the datacenter/AI/ML space they need a xilinx-like approach to developing tooling. Cuda, nvenc, cudnn etc craps all over amd's offerings in the same space where they are even available.
AMD is prepping to take over datacenter terf and this puts them in a good place to bring a bigger offering.
I would rather see Processing in Memory (PIM) become mainstream than FPGAs. FPGAs are basically an assembly line that you can change overnight. Excellent at one task and they minimize end to end latency but if it's actually about performance you are entirely dependent on the DSP slices.
With PIM your CPU resources grow with the size of your memory. All you have to do is partition your data and then just write regular C code with the only difference being that it is executed by a processor inside your RAM.
Having more cores is basically the same thing as having more DSP slices. Since those cores are directly embedded inside memory they have high data locality which is basically the only other benefit FPGAs have over CPUs (assuming same number of DSP and cores). Obviously it's easier to program than either GPUs or FPGAs.
You're comparing two completely different paradigms.
FPGAs are not an assembly line at all; the assembly line analogy applies much more closely to a processor's pipeline.
FPGAs are just a massive set of very simple logic units which can be interconnected in many different ways. FPGAs are best used in situations where you want to perform a series of simple operations on a massive incoming dataset, in parallel, especially in real-time situations. Performing domain transforms on data coming in from sensor arrays is one very good application for FPGAs.
I think GP meant in the sense that reconfiguration time is large. FPGAs cannot be effectively time-division multiplexed, as a full reconfiguration can take up to tens of seconds.
GP is also correct that DSP/SRAM blocks are critical to performance. FPGAs are not very efficient at raw compute if you have to synthesize everything out of LEs.
The performance benefit of FPGAs, which PIMs also share (in theory, there aren't any PIMs ready for real-world deployment AFAIK) is that they can leverage much larger memory bandwidths than general purpose CPUs can. An FPGA might run at a lower clock rate (low 100s of MHz), but be able to operate on several kb per clock cycle. This can work really well when paired with off-chip logic to convert high rate serial interfaces to lower clock rate parallel interfaces, then back after the FPGA is done processing.
There is also a lot of work going on in the space of time-division multiplexing FPGAs effectively. The two main approaches are overlay architectures and partial reconfiguration. The former implements another high-level fabric on top of the FPGA which will be less general-purpose, but can be reconfigured faster. The latter is a feature vendors have added to some high-end chips where specific regions of the FPGA can be reconfigured without affecting other regions.
I agree with your statements regarding reconfiguration and TDM, though I still think GP (and to a lesser extent, your comment) are very focused on traditional computing paradigms. FPGAs are much more promising for real-time systems, particularly those with very large incoming datasets to transform or otherwise process in parallel. Thinking about FPGAs in terms of how 'quickly' they process data is really missing the point IMO.
One common, and very good application for FPGAs is for use in Active Electronically Scanned Array radar, sonar, or camera image processing. You can perform parallel filtering and transforms with various frequency and phase settings, which would be impossible for a similarly-sized processor to do.
FPGAs have the potential to revolutionize sensor arrays, by making them much more useful and affordable.
I agree yes. "Traditional computing paradigms" are (IMO) not all that interesting as research topics at this point. As far as I know, most of the work in that space is in branch prediction and cache replacement policies.
FPGAs are what you really want when you need to deal with high resolution data that is coming in at very high data rates. Often even a very fast general-purpose processor with hand-tuned assembly simply won't have even the theoretical memory throughput to process your data without "dropping frames". They also have the benefit of deterministic performance, which with modern caching/branch prediction systems you can't guarantee (AFAIK, my computer architecture knowledge isn't that cutting edge).
They can also work really well if you have some computation you want to do that is so far off the beaten path for general-purpose processors (or so memory bound) that FPGAs can take the cake.
There is also some work in sprinkling even more hardlogic into the FPGA dies, like processors or accelerator cores for various applications. FPGAs are great for implementing the glue logic to move data between those.
I think you touched on one of the biggest things about FPGAs in your comment, which is that they are perfect for computation that does not involve branches. If you've got a lot of data, and you're doing transforms, you usually don't need to branch, so being able to crunch everything through in parallel is a massive benefit.
Also agree that additional hard logic or peripherals will be a game-changer for FPGAs, though they would make each design more domain-specific. Alternatively, we may see a shift in how the interconnects are done, which allows for flexible use of these 'modules'. It's also possible that we'll see continual increases in LE counts which make more specialized hardware unnecessary. I don't know which way things will go.
Are FPGAs rewritable at will with almost no degradation (for example a rewrite every minute over many days), or do they suffer the same degradation problems as EEPROMs (like the ones in Arduinos)?
FPGAs use SRAM to store their program, while CPLDs (complex programmable logic devices) use flash. Some clever marketeers here & there will stretch this distinction but it's an established convention. The internal architecture between FPGAs and CPLDs is typically different, based on cost of memory vs. logic and typical use cases. FPGAs tend to be used for higher-capacity computations but require more life support; CPLDs tend to serve smaller, true glue logic applications, where the low config overhead (just apply power) and quicker & simpler power-up is a strong pull.
So CPLDs will have some kind of NVRAM wear-out concern, and this is almost always specified as a number of maximum erase & program cycles.
Though you could presumably just do the same thing with new instructions, i.e. have an instruction for secure zeroing which zeros the data in any memory or cache where it exists but doesn't cause the zeros to be cached anywhere they weren't already.
The surface area for security vulnerabilities is already impossibly high. Do we really want to add "firmware running on a DIMM exfiltrating key material" to that list?
There are security problems with every architecture. There is no fundamental reason PIM should be less secure than what we do now. This is just fear of the unknown talking.
I hope they will not drop their CPLD chips. They were made obsolete at least once but Xilinx fortunately decided to extend the support for a couple of more years. CPLD are very useful for repairing vintage gear where logic components fail and are no longer available (for example custom programmed PALs), so you can describe the logic in Verilog and often solder it in place of multiple chips.
If they drop it then the only way to do it would be to use full blown FPGA which is a bit wasteful.
One that has tools that don't make your users hate you. Seriously, the open-source FPGA toolchains are breath of fresh air to use, despite being small projects with few contributors (although due to that and no vendor support they are severly limited in supported targets and special features).
Yep. The Icestorm toolchain for Lattice FPGAs is a real breath of fresh air -- fast compile times, multiple sets of interoperable tools, open file formats, development in the open... it's great. I just wish something like this was available for more than just Lattice parts.
FPGA development tools are generally dated, very very expensive, and one way streets for customisation.
From what I understand, open sourcing the bitstream format in its entirety will only do so much but it would certainly help. It's not just building GCC for FPGAs
Just better tools would be nice (and open-sourcing would bring some hope for that). FPGA tooling is atrocious, especially if you're used to software tooling. And the difference in tooling can sell chips all on its own.
Xilinx Zynq and Ultrascale series are multiple Ghz ARM cores plus FPGA. They're incredibly useful for small volume niche use cases and to give an example from my industry, becoming popular in space applications. The reason is hardware qualification/verification is extremely expensive but a change to FPGA fabric is not.
My point is Xilinx have already proven ARM CPU+FPGA on one die and I think AMD CPU+FPGA is very likely to be a success.
Between this, ARM adoption, Apple Silicon and similar offerings (which kind of skipped ARM+FPGA for ARM+ASIC), RISC-V, it's like 1992 again with exciting architectures. Only this time software abstraction is much better so there is not a huge pressure to converge on only 1-2 architectures.
Could be interesting. I prefer an independent Xilinx, but maybe competition with intel will stimulate the whole reconfigurable computing revolution that fizzled out.
I don't use FPGAs (tooling is too poor, languages are bad, up-front costs are high) but I hang out on FPGA forums and the overwhelming consensus has been bad. Chipmakers and especially high-performance chipmakers have always been focused on high-volume and/or high-margin customers, but the Intel acquisition has made Altera worse in that regard. Their sales and support teams were integrated into Intel and now you can't get any support from them whatsoever even if you spend $MM/yr. You need to funnel even basic questions and bug reports through a distributor contact to have any chance. I forget the specifics but they made tooling even more restrictive/expensive. The only new products out of it are a few Xeons with built-in FPGA ($$$$$), good for HFT guys I guess.
Can you expand on why Intel’s move was smart (what did the Altera acquisition do for them) and why FPGAs have a bright future in the datacenter?
From what little I’ve seen in this space, FPGAs have not made large inroads in the ML space or datacenters in general. This seems partly due to their inefficient nature compared to ASICS and moreover their software.
Unless AMD is planning something really ambitious (e.g., true software-based hardware reconfiguration that doesn’t require HDL knowledge) and are confident they’ve figured it out, I’m not sure what they hope to achieve here.
Both Altera and Xilinx were on TSMC. Altera wanted an Edge over Xilinx, at the time Intel was committing ( on paper ) to their Custom Foundry. Altera switched and bet to Intel Custom Foundry. Nothing ever worked out with Intel Custom Foundry because they were not used to working with others on Foundry Process. Intel thought the problem was with Altera not being part of company and they had too much cash so they might as well buy them for better synergy. And it did, getting internal access seems to have ( on paper or slides ) speed things up with product launches and roadmap, until they hit the Intel 10nm fiasco.
Altera Stratix V FPGAs actually had more market share than Virtex 7s. They were better chips. That said, the production delays around Arria 10 and Stratix 10 and the time lag caused by the Intel acquisition totally killed their market position. The only reasons to use Intel FPGAs now are (1) 64-bit floating point support or (2) if your Intel salesman gives you a really good deal.
It may have been Xilinx not wanting to get into bed with Intel. Xilinx may have wanted a degree of technical independence or freedom to carry out their own strategy that was not forthcoming from Intel.
Word on the street is that this was a vanity project of a VP, and never resulted in performance levels that couldn't be achieved with a little bit of focused optimization of boring old CPU work (threading + SIMD).
There's been a recent trend to increasingly move more compute capabilities into NICs. This has been going on for a while, but has gained a new dimension with cloud providers. For example, with their "Nitro" system, AWS can more or less run their Hypervisor entirely on the NIC and completely offload the network and storage virtualization from their servers. This development is likely to continue. FPGAs are going to play a significant part in that because they allow the customers to reconfigure this hardware according to their needs.
Virtual machines are very much a thing now, and virtualisation has made it into network cards reasonably well ... but pretty well nothing else.
In our future datacentre we want to say how many cores, connected to how much ram, how much GPU resource, some NVME etc. etc. and there's going to be a whole lot of very specialised switching and tunnelling going on. This needs to be as close to the cores/cache as possible, a good order of magnitude faster than we run our present networking stuff, and probably an area where there will be a significant pace of development ie a software defined solution would be nice.
So, a software defined north bridge, in essence. And an FPGA is pretty much the only thing we have right now that could do the job.
Because an FPGA lets you optimize your "hardware" solution to a computing problem without the hassle of fabricating a chip of your own (although the performance with an FPGA is much lower than with a custom chip).
I understand that they need a big push in DPU market, but I do not understand why companies as big as AMD do not invest and build what they need in house? If anyone can, it is AMD that can gather the talent. Everyone was talking about future data centers, and as far as I can tell I have been hearing about heterogeneous IO since 2009 (and that's me, and I was hearing it while working on Xen).
To asnwer my question maybe the market is so volatile that they cannot do strategic planning like that?
With larger purchases like this one that can still be part of the equation, though there is also the matter of lead times needed to bring a significant team and related infrastructure needed for the project(s) online and up to speed.
Also if a company is seen as ripe for buying, it can sometimes be done in part to stop a competitor getting a chance at the above advantages.
Dicipline of the company is more important than the talent. Oracle has a lot of talent, but they lack the discipline of generating anything novel. They buy the idea. Anyhow semi-conductor industry is different. Apple or Amazon played it in my opinion better although in vastly different markets.
This is a good point, but it mostly matters when you are in middle of developign the strategy so that you can protect it and wanna have a robust plan. Maybe they are!
Hmm, this was rumored but I guess now it is actually happening. Nice bump on the share price there I guess, it’s currently trading at around $115 and it seems to be converted to $143 in AMD. I assume this is to help AMD push more into the server and ML compute spaces?
In my opinion, I would also like for AMD to invest in ML tooling while they have the cash.
I hope one day Pytorch, XLA, Glow would have native AMDGPU support, and I will be able to buy a couple Radeon 6000 series cards, undervolt them and make a good ML box.
I think AMD gpus on TSMC 7nm, then maybe even 5nm, will have the best performance or watt. Even though they might be 10% or 20% slower than the alternative. For me performance per watt and dollar is more important.
Anyway, it's sad that they couldn't make a 5 to 10 people (I might be too optimistic) engineering team that would make their product relevant in this market.
I'd like to see consumer-level CPU + GPU + FPGA products that emulators could take advantage of. I'm thinking of floating point math for PS2 right now, but I'm sure there are other examples where an FPGA could be beneficial.
The apparent driver here isn’t about AMD wanting to get into the FPGA business. The real motivation appears to be a combination of platforms and programmable chiplets. There are two problems that programmable chips address https://semiengineering.com/amd-wants-an-fpga-company-too/
In the stock trading world, HFT on FPGA is closer to the edge than GPU / TPU solutions and they are using ML models, if you count that as AI. When the logic and the NIC are on the same hardware, it's really fast. An ASIC would be even faster, but you can't really iterate on that.
Their new versal line puts a tpu on the die with the fpga. Great for inference, especially if you want to use the fpga to quickly extract features in the fpga and infer from feature space.
Perhaps they would not make good competition. FPGAs have been known to be slower than ASICs. But then again, perhaps some other company will find a good use for rapidly changing IC design.
For those in ASIC and chip design industry, the two of the largest chip companies namely Intel and AMD buying two of the largest FPGA companies is inevitable, it's just a matter of "when" rather than "if".
I think the more interesting news is what they are going to do pro-actively with these mergers rather than just sitting on it.
I really hope their respective CEOs will take a page from the open source Linux/Android and GCC/LLVM revolutions. I'd say the chip makers companies are the ones that benefit most (largest beneficiary) from the these open source movement not the end users. To understand this situation we need to understand the economic rules of complementary goods or commodity [1].
In the case of chip makers if the price of designing/researching/maintaining OS like Linux/Android and the compilers infrastructure is minimized (i.e. close to zero) they can basically sell the hardware of their processors at a premium price with handsome profits. If on another hand, the OSes and the compilers are expensive, their profit will be inversely proportional to the complementary elements' (e.g. OSes & compilers) prices.
Unfortunately as of now, the design tools or CAD software for hardware design and programming, and also parallel processing design tools are prohibitively expensive, disjointed and cumbersome (hence expensive manpower), and if you're in the industry you know that it's not an exaggeration.
Having said that, I think it's the best for Intel/AMD and the chip design industry to fund and promote robust free and open source software development tools for their ASIC design including CPU/GPU/TPU/FPGA combo design.
IMHO, ETH Zurich's LLHD [2] and Chris Lattner's LLVM effort on MLIR [3] are moving in the right direction for pushing the envelope and consolidation of these tools (i.e. one design tool to rule them all). If any Intel or AMD guys are reading this you guys need to knock your CEO/CTO's doors and convinced them to make these complementary commodity (design and programming tools) as good and as cheap as possible or better free.
I don't recall where I read this, but hardware vendors have been trying to comoditize software, and vice-versa.
It's really obvious when you think about it. If you sell nails, you want to make sure that everyone has or can afford a hammer, and hammer manufacturers like to make sure that there is a large supply of compatible nails.
As much as I would like to see it, I am not sure the equation is that simple in the case of CAD software. Sure, that would make it easier to use FPGAs, but it would also make it easier to create competing products, as a stretch.
I still think it's worth it, and wish bitcode format was documented, at the very least.
May well be, even if they have no current use for them, one would really want to have a patent bulletproof vest in case the king of the hill battle intensifies.
This comment is off topic, but while I'm listening to the earnings call, I don't hear about specifically official PyTorch and Tensorflow support for AMD graphic cards. All the questions and answers are generic with buzzwords like AI, doubling down on our software support, but it doesn't give me confidence to change my NVIDIA GPU to an AMD one for the foreseeable future.
I remember the time when Elon Musk said to an analyst that he's asking boring questions to fill in his spreadsheet, and I'm feeling the same thing while listening to the earnings call.
My impression is that the accelerated computing side of AMD is receiving far, far too little attention. For example, their flagship GPU is still not officially supported by ROCm (AMD's answer to CUDA) [1]. Imagine the 2080 Ti not being supported by CUDA.
I've become a huge AMD fan, both because of their hardware, and because of their commitment to open source. But while the battles they have one against Intel on the x86 side are impressive, it seems that CUDA is leaving them far behind.
,,AMD ROCm is validated for GPU compute hardware such as AMD Radeon Instinct GPUs. Other AMD cards may work, however, they are not officially supported at this time.''
It seems like Lisa Su thinks that there's a separate ,,gamer market'' and ,,accelerator market''.
Jensen Huang understands that the same person can like to play games and train machine learning models on the same machine.
I'd love to switch to AMD CPU to have a portable laptop with low resource usage, as I spend most of my time travelling, but as GPUs in the cloud are overpriced (thanks to Jensen with separated pricing for servers), and internet in hotels are unpredictable, I don't want to train models in the cloud.
Anyways, Lisa said that she reads all comments about AMD, so I hope she'll listen :)
It likely will never have support. AMD chose to bifurcate their GPU designs into compute (CDNA) and games (RDNA) lines, with different architectures. RDNA sheds all the fancy features needed to support modern compute, thus gets more efficient in games that do not use it, but also cannot support the modern compute APIs.
NVIDIA is adding more features, like super-scaling to games, and machine learning models are improving faster than Moore's law. I expect those fancy features, like tensor cores to be a must for 4K gaming in the future.
What's funny is that the same strategy (leaving out specialized instructions from consumer level hardware) that worked extremely well for CPUs won't work for GPUs in my opinion.
If you look at ray tracing hardware (I have it on my RTX 2070 Max-Q card in my laptop), it sucks right now, but it's improving very fast as machine learning algorithms improve.
One thing that I forgot is that AMD can just focus on inferencing hardware (INT16 operations), and leave out tensor cores...so actually you are right, I'll just stay with NVIDIA GPUs.
Yes I posted another link [1] for earning call but didn't get much traction / upvote. Although I have to admit Acquiring Xilinx is much bigger news.
Judging from watching the Financial News and reading analyst's comment for years. My feeling was that their Job was not to push for hard question or an honest answer. Their job is to push whatever interest they had with the company. So a spin for better long term prospect and downplay risk.
I was happy with the Enterprise results ( +116% YoY ) until I read this
>Revenue was higher year-over-year and quarter-over-quarter due to higher semi-custom product sales and increased EPYC processor sales.
Semi-Custom is definitely PS5 and Xbox.
Basically I still dont see EPYC making enough inroad in the Server Market. And this is worrying, while the Stocks, Reviews, Hype are all going to AMD. No Results so far have shown Intel is hurt or AMD is making big gains in market shares and revenue shares.
The only good part I guess is Ryzen Mobile contribution to Computing and Graphics segment.
Earnings calls never seem to have any hard questions. There was a projections miss a few quarters ago (very rare for amd), and even then the questions feel more like PR.
Semi-Custom is indeed PS5 and Xbox. This increase was expected of course. It's lower margins than other segments though.
Epyc adoption is indeed slower than I would've liked. But so far it has matched or beat short and long term projections from AMD.
Enterprise is weird. AMD has better price and performance? Let's buy Intel. Meltdown lowers performance? Let's compensate by buying more Intel. Intel is supply constrained? Let's complain, and still buy Intel.
My guess is Intel is still pressuring OEMs to favor Intel. Cloud providers are increasing adoption though, and there have been a few nice HPC wins.
And lets not forget that AMD is selling every chip they can, TSMC production is fully booked.
Earnings calls do have hard questions in them, but you might miss them because they're asked with a big dose of circumlocution.
If you look at the incentives a bit, management gets to decide which analysts can ask questions, so analysts need to stay on management's good side.
Analysts know the company's figures inside and out (often have a 10-tab spreadsheet with an extensive operational model of the company), and are asking questions to tweak key model assumptions.
So analysts ask pointed questions in shared jargon with management. You don't ask 'are you seeing a big sales drop because the crypto bubble blew up?', you say 'can you provide some color on when you excess inventory will clear from the channel?'
Analysts get their answers. Management avoids bad headlines written by casual listeners.
There's an additional layer, which is the analysts know the industry and company very well, so general bad things are already background knowledge (there's no reason to ask about them). If you want to know what the analyst already knows, then pay for their report—they're not reporters fishing for a sound bite.
My investment in AMD is so small that it seems silly to pay for analyst reports.
I don't care about short term, and I have enough faith in my own research to keep believing in a brighter long term.
Are there any public resources dissecting an earnings call? Doesn't have to be recent or AMD.
You don't get much information from earnings calls, only how analysts are thinking (usually short term). Product reviews on HN, tech oriented in-detail sites, youtube reviews from gamers, and long term analysts, like Ark Invest, and even github issues/comments are much better long term predictors of success/failure.
Just as an example, I remember reading a lot about AirBnB here at HN when it wasn't even known in Eastern Europe. I suggested him to use it to rent out his luxury apartment that he just bought there, and he was the first one to rent out a luxury apartment in the country (also somebody from AirBnB-s management flew there personally)...he made lots of money from rental fees of course, but also AirBnB got incredibly successful. There are lots of other examples of course, this was just the least controversial that I can write here :)
From a personal point of view of somebody who just upgraded to an AMD-based desktop from an Intel one, the CPUs work great. It's the software support that feels half-baked. Enabling memory integrity blue-screens my computer. I need to update the chipset drivers every month because AMD is still dialing in their scheduler and frequency scaling over a year after release, and the idle power usage is inexcusable.
AMD's performance is fantastic, which is why I was fine getting it for a home gaming desktop, but I'm not sure I'd be willing to pull the trigger on AMD on an enterprise server buildout. Intel still simply (actually or otherwise) feels more reliable.
The server market moves and replaces slowly. Even when Intel was beating AMD by 30% or more it still took years for AMD percentages to drop.
AMD EPYC on AWS is 10.42% cheaper than Intel per hour across the board for m5 instances (9.6% cheaper for t3 instances). 7nm EPYC saves more than 10% power vs Intel 14nm and per-chip savings from buying AMD are way more than 10% vs similar Intel offerings. Why can Amazon spike prices that much? Because people will pay and still consider it a deal.
AMD's main issue still seems to be available due to do much competition for 7nm and tsmc being reluctant to build new fabs.
This keeps coming up every time there is "perceived" shortage of capacity.
TSMC dont have capacity problem. They are very much willing to built Fabs if their customers are committed to it. But no company other than Apple has ever done that. AMD could have place a large order over the course of the year. Basically a bet that they will sell that many chips. And TSMC will adjust or built accordingly. That problem is no company is willing to place that bet. What if they dont sell and stuck with big pile of chips?
This is simple Supply Chain Management, it is the same principle in every other industry.
Because outside of very specific cases like solar panels, display and LEDs spinning up new fabs seems to be quite risky the investment costs is huge and it seems that TSMC is very much content to make major bucks with fewer customers that are chasing the latest and greatest node.
The more fabs they have the slower their node progression will be and the cost of each new node these days seems to be almost exponential.
I can imagine that switching from Intel to AMD takes a bit of time for servers, as supporting 2 architectures at the same time is usually bad news for 99%-ile latencies for web services. At the same time x86 is a mature instruction set (as long as you don't use 512 bit vectorization, where things are getting tricky), so the transition shouldn't be that hard.
My contrarian speculation is that this is a move driven by Xilinx vs. Nvidia given Nvidia’s purchase of Arm and Xilinx’ push into AI/ML. Xilinx is threatened by Nvidia’s move given their dependence on Arm processors in their SOC chips and their ongoing fight in the AI/ML (including autonomous vehicles) product space. My speculation is that this gives Xilinx an alternative high performance AMD64 (and possibly lower performance & lower power x86) "hard cores" to displace the Arm cores.
Interesting times.