OpenCAPI Unveiled: AMD, IBM, Google, Xilinx, Micron and Mellanox Join Forces

valarauca1 · on Oct 14, 2016

I really enjoy the way AMD and Nvidia fight.

Nvidia makes GSync. So the GPU can control and adjust display refresh rate on the fly. Proprietary, closed source, requires private Nvidia License. AMD makes FreeSync. FreeSync open sourced, added to the HDMI, and Display Port standards.

Now we see Nvidia makes NvLink. Proprietary, very fast, requires private Nvidia license. AMD partners with IBM, Google, etc., etc. to make OpenCAPI an open standard that can revise/replace PCIe3.0

Why does this feel like Microsoft vs Linux but with Hardware Standards.

arcanus · on Oct 15, 2016

> I really enjoy the way AMD and Nvidia fight

That's why I was so sad AMD was having trouble later year and hope they pull through. Competition is valuable and benefits the consumer!

asimuvPR · on Oct 15, 2016

Linus gives the best answer: https://youtu.be/IVpOyKCNZYw

tl;dr: Nvidia, fuck you.

bch · on Oct 14, 2016

It felt so buried in the article, it's worth pointing out: OpenCAPI == "Open Coherent Accelerator Processor Interface"

_bz2r · on Oct 14, 2016

Thanks for posting this. What does "coherent" mean in this context, and why is it special?

cmrx64 · on Oct 15, 2016

Basically, https://en.wikipedia.org/wiki/Cache_coherence. You want to ensure all the devices on the bus have the same view of memory contents.

X86BSD · on Oct 14, 2016

Thank you. I wasn't about to dig just to find that out. Thanks :) +1

mablap · on Oct 14, 2016

This is what I was looking for before deciding to read the article or not. Didn't find it, came back here, and now I just won't bother. Unfortunate.

zmanian · on Oct 14, 2016

Anand Tech seems quite excited about OpenPower and it does tend make me more interested in the Talos platform. https://www.raptorengineering.com/TALOS/prerelease.php

PhantomGremlin · on Oct 14, 2016

Was "anyone" as "annoyed" as I was about the "scare quotes" littered "throughout" the article? It was "hard" to read the "story".

I've been guilty of using too many quotes in the recent past. Someone on HN called me on it, and I've since toned it down. Now it's something that sticks out at me.

joblessjunkie · on Oct 15, 2016

It makes it seem like the "journalist" doesn't believe his own "reporting".

praseodym · on Oct 14, 2016

Related post on the Google Cloud Platform Blog: https://cloudplatform.googleblog.com/2016/10/introducing-Zai...

eeks · on Oct 14, 2016

Hopefully this effort will get us rid of PCIe for good, unlike the version of CAPI available on POWER8.

sargun · on Oct 14, 2016

What's wrong with PCIe? The devices are ubiquitous. It's point-to-point, allowing for device to device connectivity. We have external PCIe enclosures and cables. We have a healthy set of PCIe switches.

This is similar. It's also point-to-point, and it has an external story. Skimming the spec, they've even thought about higher latency (200ns) links, and optical. Even with these advantages, I'm unsure how it'll work due to the lack of IP available as compared to PCIe.

Interesting, though. Will be fun to see it play out. It bums me out a little bit though that POWER isn't easily available yet.

eeks · on Oct 14, 2016

PCIe is still rather slow. A single-packet transaction on PCIe costs 120ns.

The lack of IP is the reason why P8/CAPI stuck to PCIe. In the original design, CAPI was to simply reuse the PCI link layer, not the transaction layer.

With a cache-coherent system based on PCIe, especially when the coherency layer is set at L3 like on POWER8, you are looking at ~500ns latency for a single cache line. This kind of latency is just too much for many applications.

theandrewbailey · on Oct 14, 2016

It depends on how low latency they mean by 'low latency'. If it can't drive fat gaming GPUs to full utilization, PCIe will be around for a while still. Also, Intel isn't joining up, so PCIe is absolutely sticking around.

sargun · on Oct 14, 2016

In the spec, they say the maximum acceptable network delay is 200ns. The smallest network delay is 5ns.

codebook · on Oct 14, 2016

It is really hard to believe that OpenCAPI can achieve this short latency with 25Gbps per lane interface and off-chip connectivity.

iheartmemcache · on Oct 14, 2016

Xilinx offers 25gbps single-lanes that can bond up to 4x to get IEEE 802.3-2012 spec compliance for free* with their suite. Sure, you're going to need to control those trace impedences and your board won't be something coming out of OSH Park, but those are definitely attainable speeds for the consumer (e.g., in the single-thousands of dollars; not 800k Cisco VXR tier-1 infrastructure).

You can configure it in CAUI-10 (10 lanes x 10.3125G) or CAUI-4 (4 lanes x 25.78125G), either way, it's been production-ready for quite some time now. (The docs have numbers, but trust me, you can get full throughput within that 200 ns).

There's even production Agilent off-the-shelf test equipment out there that can fully sample at those speeds (none of that over-sampling tomfoolery, we're talking live, Bill O'Reilly style).

In 1989, UltraSPARC had similar facilities (SBus) to push 100MBit between other Sun machines, so I mean, not too insane comparatively.

* Free with purchase of Virtex® UltraScale™ and Kintex® UltraScale FPGA required haha.

codebook · on Oct 14, 2016

Thanks for the info.

I would like to search the term and understand how they can achieve 200ns latency. Maybe am I the only one who thinks 200ns latency as completion latency?

sargun · on Oct 14, 2016

It sounds like 200ns is the upper expected network delay. If 9 network delays at 200ns are experienced, it'll retrain the link.

codebook · on Oct 14, 2016

Still, many devices weren't able to fully utilize PCIe bandwidth and also latency. Turn around time is 1us to 1.5us for typical PCIe IP.

IMO, I'm a little bit pessimistic for OpenCAPI to reduce this latency much, unless it directly connects inside the chip.

algorithmsRcool · on Oct 14, 2016

What is wrong with PCIe? Hasn't it been extremely successful?

KSS42 · on Oct 15, 2016

See also :

http://www.ccixconsortium.com

madenine · on Oct 14, 2016

> "NVIDIA is a member of the OpenCAPI consortium, at the "contributor level", which is the same level Xilinx has. The same is true for HPE (HP Enterprise)"

Awesome.

visionscaper · on Oct 14, 2016

Hmmm, interesting. I wonder what this means for the new Intel Xeon Phi Knights Landing? I liked the approach of many cores on one bootable chip, all having a reasonable amount of local memory, and high bandwidth interconnects: no need to offload data to a peripheral (GPU) device. However with this standard the currently limited bandwidth between peripherals and the main cpu will improve a lot.

To me it is obvious why Intel is not joining this party.

convolvatron · on Oct 14, 2016

well, i think its a bit more about market segments and interoperability than anything else. currently overall systems from ibm, and, and intel are fundamentally incompatible. the PCI-E bus that knights landing hangs off of is a qualitatively different thing than the kinds of memory-coherent inter-cpu busses that are being addressed with the CAPI proposal. Intel has their own proprietary QPI. AMD has a quasi-open hyper transport (still?).

If this is done properly it means you could make generic motherboards, and generic memory controllers, and all sorts of different accelerators and mix and match them from various vendors. So its no surprise that the smaller players in the market and trying to gang together and the larger player is trying to keep lock-in.

Knights Landing as it stands would already integrate in systems better if it were using an inter-cpu bus than a peripheral bus as well as GPUs, FPGAs, and certainly RDMA/memory window systems like Mellanox.

Inherent distrust of standards aside, this could be a great win for people putting together bespoke systems in interesting configurations (i.e. Google), I don't think there is any downside in theory for Intel except more competition.

gpderetta · on Oct 14, 2016

There are KNL variants that sit on a cpu socket and talk to the rest of the system via QPI.

convolvatron · on Oct 15, 2016

I haven't caught up with the latest phi releases, but I'm really interested to do so. I haven't been able to find any discussion about a coherent QPI on KNL, but have found reference to OmniPath, which looks like a non-coherent large scale memory network. Is that what you were thinking of, or maybe could you post a reference?

gpderetta · on Oct 16, 2016

You are right, I was misremembering. The socketed KNL does not have QPI at all, so no multisocket boards are possible. Still the socketed KNL doesn't hang-off the PCI bus as it is its own host processor and has a dedicated link with memory.

Omnipath is used to drive both the Ethernet and PCIe.

samfisher83 · on Oct 14, 2016

No Intel ? Without them joining on board this might not be as useful.

angry_octet · on Oct 14, 2016

It is precisely because Intel has developed its own next generation buses, and is licencing them very restrictively, that CAPI exists. Intel has QPI (Quickpath) for internal and Omnipath for external.

nVidia would have loved to have a GPU with a QPI interconnect, but Intel wouldn't let them because they have their own GPU ambitions in Xeon Phi. So they came up with NvLink, which is kind-of PCIe but faster. They don't have any switch asics as yet so they are limited to fully connected topologies. Details on NvLink (without NDA) are scant, but I don't think it has the ability to be multi-node (it really is a bus).

Intel are now making Xeon Phi with on board Omnipath (more like Infiniband), which is curious.

wickberg · on Oct 14, 2016

This is meant as a competitor, at least in the high-performance computing market, to Intel's Xeon Phi (aka Knights Landing) based systems which will start including their OmniPath network fabric on-die (based on QLogic's Infiniband tech).

If those gain widespread adoption, that leave little room for Mellanox's IB platform, hurts NVIDIA's sales of accelerator cards, and cuts AMD's and IBM's processors out of the picture entirely.

nickpsecurity · on Oct 14, 2016

We have every big name in this, including an x86 vendor. We don't need Intel. That's not a statement I can make often. :)

samfisher83 · on Oct 14, 2016

Intel has 99% of the server market.

grive · on Oct 14, 2016

Do they need the market leader to take on the market?

In the end, that's also what is at stake, and the reason Intel might not be that interested. They will maybe see the light and see that they would have to steer the collaboration toward a more beneficial direction (for them), but at first, they will probably wait for it to gain momentum before deciding that they have to invest money to push the effort in the right direction (for them).

They have to be a threat for the established company to make a move. What nickpsecurity is saying by telling that they don't need Intel is that they are not yet a sufficient threat and that they are capable of becoming one.

nickpsecurity · on Oct 14, 2016

Which they will be if they have a competitive technology that has a migration path for both x86 ISA and about everything else dominating in acceleration space. Intel might respond by supporting it or deploying their own thing. They're sure to loose market share, though, if the coalition's standardization commoditizes accelerators more than Intel's side does. They could loose some profit margin in the niche along with the market share.

pjmlp · on Oct 14, 2016

Yes, but the commoditization of UNIX-like OSes and processor agnostic runtimes on the data center, means that processor architectures aren't any longer a way to keep developers captive from processor manufacturers.

So any HPC application that makes use of abstraction libraries for SIMD and GPGPU code can be easily moved to non-Intel processors.

DannyBee · on Oct 14, 2016

The people involved or wanting to use this buy so many chips from intel that if intel doesn't get on board, it's going to likely turn out badly for intel.

I note that Facebook and Intel are missing, which makes me wonder if they are off in a corner somewhere doing their own thing.

manawy · on Oct 14, 2016

or it could turn badly for the others, it will not gain attraction if intel products are not supported

Inertia is a very strong decisive factor, especially when you need to make sure that 30+ year-old code still work like it's the case in HPC

nickpsecurity · on Oct 14, 2016

"especially when you need to make sure that 30+ year-old code still work like it's the case in HPC"

Most code that works on Intel works on AMD. It's rare that it doesn't. The HPC vendor will have low risk on migration + get a bunch of competing accelerators at various price points for their problem. This is quite an incentive to move even if there would be stragglers.

manawy · on Oct 14, 2016

I have no doubt they do, but still...

The question is : are the DOD/DOE (or the chinese/european equivalent) going to risk millions for their next supercomputers on a new architecture or prefer good old intel ?

Especially when we know that the jump to exascale computation will only be possible if we get rid of the buses and have everything on the same chip (which is one direction taken by Intel).

My point is that the conclusion of the article "Intel has to come up with an answer..." is just plain wrong. This new architecture is an answer to intel new developments. We will have to wait to see what stick to the wall.

nickpsecurity · on Oct 14, 2016

They're the people that bought all the POWER's (SP2 onward), Alphas, Itaniums (eg SGI), Cell's, and so on. They'll take risks esp if they think the firm will be around to supply the upgrades.

ajdlinux · on Oct 15, 2016

IBM is already building two new supercomputers for DoE based on Power. (Look up the CORAL Project.)

(disclaimer: I work for IBM, opinions my own)

DannyBee · on Oct 14, 2016

Err, again, the people involved, on their own, buy enough chips to have a completely self supporting ecosystem. Even if it only ever stayed the current set of participants, it'd be enough money to be worth it

nickpsecurity · on Oct 14, 2016

Plus the suppliers are already making enough to justify building these things. It's existing products being extended with a new interface to support new products. Whatever Intel does probably won't kill the existing products. With that in mind, I think the move can be only beneficial for Intel's competitors given they're already in an uphill battle needing differentiators. What you think on that angle?

trendia · on Oct 14, 2016

AMD wants to gain market share. The way to do that is to develop technology that competes with existing market leaders.

If Intel were involved, then AMD would gain no strategic advantage.

mjevans · on Oct 14, 2016

I would argue that in some respects they would have //less// of a strategic advantage, but that consumers would see the most benefit from everyone getting along and competing on merit on an open playing field. Even in the latter case AMD would still have /an/ advantage as they would at least still be in the primary game, and would also be restricting Intel to targeting that same hardware interface.

nickpsecurity · on Oct 14, 2016

That's a good point. They did that with 64-bit x86 IIRC.

crudbug · on Oct 14, 2016

Exciting news on the hardware bus side.

Now waiting for AMD Power9 processors.

honkhonkpants · on Oct 14, 2016

Given AMD's involvement, what is the difference between this and coherent hypertransport?

wmf · on Oct 14, 2016

The difference is that NVIDIA, Mellanox, and Xilinx never adopted coherent Hypertransport.

honkhonkpants · on Oct 14, 2016

There was at one point cHT IP available for Xilinx. But that is what I am kinda getting at. Did cHT fail for reasons of pure timing? Too far ahead of the market?

_kotv · on Oct 14, 2016

Hewlett-Packard Enterprise is also a member of this consortium (the source article was updated to reflect this)