Hacker News new | past | comments | ask | show | jobs | submit login
Nvidia’s Integration Dreams (stratechery.com)
279 points by kaboro on Sept 15, 2020 | hide | past | favorite | 170 comments



Huang's first reason why they would take up ARM is really, really interesting (and serves as a big counterpoint for my pessimistic take on the announcement yesterday).

> Number one is this: as you know, we would love to take Nvidia’s IP through ARM’s network. Unless we were one company, I think the ability for us to do that and to do that with all of our might, is very challenging. I don’t take other people’s products through my channel! I don’t expose my ecosystem to to other company’s products. The ecosystem is hard-earned — it took 30 years for Arm to get here — and so we have an opportunity to offer that whole network, that vast ecosystem of partners and customers Nvidia’s IP. You can do some simple math and the economics there should be very exciting.

If Nvidia truely sees ARM as an opportunity kill off Mali and expand Geforce's install base, they might create an incentive for themselves to keep the ARM ecosystem alive. Note this is a pretty credible take I think - Samsung's next Exynos processor will have Radeon graphics and Nvidia can quickly nip that stuff in the bud by this (assuming Geforce is better and cheaper). If this plays out this way would simply be great for the ARM ecosystem. If Nvidia can sell Geforce like ARM sells Mali and leave the ARM ecosystem truly intact, I don't think many will lament the demise of Mali (although I expect some counterviews for this on HN :) ).

Having Geforce be a commanding presence in the ARM ecosystem might be a big problem for the future diversity of GPU vendors though, but that's something I'm interested in seeing play out at least. I do hope AMD can take the battle to Nvidia on ARM too, and that Qualcomm and PowerVR find ways to stay relevant.


> If Nvidia truely sees ARM as an opportunity kill off Mali and expand Geforce's install base,

Perfect replacement, from one unsupported hard to use non-open-source Linux-drivered gpu to another hard to use non open source Linux drivered gpu.

> I do hope AMD can take the battle to Nvidia on ARM too, and that Qualcomm and PowerVR find ways to stay relevant.

Qualcomm's Adreno amusingly enough came from AMD, as Imageon, in 2009. I definitely hope AMD can get back in the mobile game though. Good luck to powervr & anyone else too. You are probably up for some hard competition soon!!

https://en.m.wikipedia.org/wiki/Imageon


The first version of Adreno had some fixed-function blocks from AMD (and Bitboys), but the programmable shader core came from Qualcomm's never-commercialized Qshader architecture. The result was a buggy mess which took a ton of effort on our drivers to debug and correct.

In hindsight, I'm amazed at how well it worked given the schedule and the magnitude of the work involved in fusing those two architectures.


I really appreciate this kind of mention, thank you. I want to know so much more but this history, it feels like it evaporates so quickly, only a few people have any idea.what happened.


> non-open-source Linux-drivered gpu

I have seen this concern countless times, but not why that matters to them. I can understand it matters from Linus' perspective as kernel maintainer, but from users perspective I can't really get the issue. Anyways, not all code that runs on your system is open source. Why not demand your bootloader manufacturer for open source with the same intensity. If say NVIDIA wants the driver to contain malicious backdoor, open source is not going to stop them.


> from users perspective I can't really get the issue ... If say NVIDIA wants the driver to contain malicious backdoor, open source is not going to stop them.

No, but if such a backdoor were discovered, it would be possible to do something about it. The quote from the article in top comment here says it well: https://news.ycombinator.com/item?id=23944954

> Anyways, not all code that runs on your system is open source.

Not yet, but it is my goal. If/when that's achieved, I'd also like to run it exclusively on free/libre/open (FLO) hardware.

> Why not demand your bootloader manufacturer for open source with the same intensity.

My bootloader plays a much smaller role in my computing endeavors than my gpu. And less importantly, as a practical matter, there's many more major motherboard vendors, and few FLO alternatives; whereas both nvidia alternatives (amd, integrated intel) do have FLO drivers.


It means when people are trying to do things like experiment with how to make for example frame timing more useful, such that specs like Vulkan can advance[1], we can't experiment with & try to advance & figure out what might work, because closed proprietary software doesn't allow mankind to explore & progress.

We basically have to keep going back to Nvidia & relying on them to be authorities on their own system & to be acting in everyone's interest when we try to develop extensions like VK_EXT_present_timing. This greatly injures the development of good standards, obstructing there from being a collaborative healthy environment where people can work together to make standards that work well.

Another example is EGLStreams which is not that bad but very different approach to handling video buffers from what everyone else does which has been obstructing the use of the newer Wayland display server on nvidia hardware for 6 years now[2]. Nvidia wants their thing, & closed drivers mean no one can play around & attempt to make their hardware work if they wanted to. Ridiculously harsh limitations, no choice, no experimenting.

[1] https://www.phoronix.com/scan.php?page=news_item&px=VK_EXT_p...

[2] https://www.phoronix.com/scan.php?page=news_item&px=MTgxMDE

This creates a science-free vacuum where research & experimentation & progress wither, where peership dies.


Putting a driver into the Linux project presumably means better integration with the rest of the kernel?

Also: less package management work.


O yea users don't give a shit. But it's further reduction and shrinking of the playing field to corporate giants that'll only share details with other giants to develop products.


> Why not demand your bootloader manufacturer for open source with the same intensity.

Folks want that too, and lots of ARM platforms use open source bootloaders already, mainly u-boot.


The problem isn't inherently that the drivers are closed source, it's that Nvidia is actively hostile towards the ecosystem. For instance Mesa added a generic buffer management API (GBM) allowing compositors like Weston to be hardware accelerated using OpenGL. Nvidia could have followed suite and supported GBM but instead went their own route with EGLStreams. So now Wayland, XWayland and every single Wayland compositor has to implement Nvidia specific code to support their hardware.


Fun little fact: Adreno as an anagram of Radeon.

Honestly I really don't see the Geforce play. Nvidia tried it with Tegra and failed pretty miserably. Mali and Adreno pretty much cornered the market from that era(with PowerVR pivoting over to Apple). I just don't see their IP really hitting it home with the type of workloads you see in SoCs.

The primary driver for most SoC GPUs since the qHD days is to push pixels for the UI layers which has a different set of requirements and features compared to modern GPUs use for Gaming or ML. They're almost exclusively heavily tiling based and biased more towards power consumption than raw horsepower.


> Honestly I really don't see the Geforce play. Nvidia tried it with Tegra and failed pretty miserably. Mali and Adreno pretty much cornered the market from that era

I want to be nice but I don't know what rock you've been sleeping under. TX2 is 3 years old & it's not just nvidia cornering the entire hapless AI/ML market with proprietary CUDA that keeps it & the jetson platform as the #1 most obvious go to for robotics, in spite of having a fairly trash terrible not very good ARM cpu: those couple of nv cores are way better than the rest of the arm offerings. Even outside ML, the nv gpu arm offerings radically outstrip everyone else. No one else has the ram bandwidth to begin to compete, much less the cores. 3 years have passed & the only one with the X2's 60GBps is the NV Xavier top end part with 137GBps. No one else is playing the league as NV has been playing with arm gpus. I don't know how you would call this massive raring colossal success a failure. Word nothing of the Nintendo Switch.


The Tegra thing is interesting, and I wrote about it in the other thread too.

It is a failure for NVidia in that they launched it as a mainstream mobile phone/tablet part, and it's not used in anything outside the NVidia Shield in that market that I'm aware of (and the Switch of course).

But it has seen success in robotics and self driving cars, because NVidia makes it easy to use and it has great performance.

So it's not obvious how to judge it. Commercially, compared to their initial goals it is probably a failure. But it has opened new markets that didn't exist so that's successful?


Horsepower isn't everything, usually cost and power consumption come first in SoC selection followed by feature set(of which your GPU is one part of a larger picture).

If you want to really succeed in the SoC space(which is where Arm has) then what you need is volume and I don't think Tegra ever really made any serious inroads there.

The switch is a game console and so it sits somewhat outside of the traditional high-volume SoC market.


Feels like you are arguing that boring mainstream success is the only thing we can judge by. Disagree.


Linux and Mesa have open source drivers for Mali GPUs now (lima & panfrost).


> from one unsupported hard to use non-open-source Linux-drivered gpu to another hard to use non open source Linux drivered gpu.

I thought panfrost was pretty good these days


Do any Android phones use Panfrost?


Bet you ChromeOS runs on Panfrost first. :)


Oh well look, some actual support from the owners.

https://www.phoronix.com/scan.php?page=news_item&px=Arm-Panf...


How is NVIDIAs Linux driver hard to use? Are you referring just to the fact that using the driver means installing and maintaining more packages?


You can’t just upgrade your kernel when you want

Sway refuses to support their nonstandard apis

Any kernel bugs you report are tainted

I’m sure there’s more


It's a great line and one that I would expect Jensen to take - and it's almost certainly true - to an extent.

I'd expect Nvidia to keep the Arm ecosystem alive, but only where they don't see an opportunity to take control. So they keep Radeon off Exynos (and incidentally why couldn't they do that anyway?) by offering GeForce. But elsewhere they can deny Arm IP to other firms where they have a competitive SoC.

Take one example. So Nvidia / Arm invest heavily in data center focused designs - are they really going to offer this IP to Ampere / Amazon on equal terms when compared to an Nvidia CPU? As Ben says 'color me skeptical'.

Essentially they will have a full overview of the Arm ecosystem - will lots of confidential information - and pick and choose where they drive out competitors whilst farming license fees from the rest.


> (and incidentally why couldn't they do that anyway?)

Why try to compete on merit when you can swoop in and dictate the market to do your bidding? Coming at it from this perspective, it's looking like almost the same old Nvidia again :)

> Take one example. So Nvidia / Arm invest heavily in data center focused designs - are they really going to offer this IP to Ampere / Amazon on equal terms when compared to an Nvidia CPU? As Ben says 'color me skeptical'.

That I don't believe, indeed. They will crush their datacenter competition, but I think the server market here might ironically a bit more flexible in moving to different ISAs when compared to mobile.


Absolutely! I think actually the biggest constraint on Nvidia will be 'bandwidth' (management not memory!). Where do they focus their energies and where do they leave the Arm ecosystem alone.

Incidentally Intel's stock seems to have risen over the last day or so which is a bit surprising given the datacenter story?


> Essentially they will have a full overview of the Arm ecosystem - will lots of confidential information

This is the real danger among everything.


I think my biggest concern is Nvidia is looking to be another Qualcomm. With all of the issues there.


But Mali is an also-ran in a market where the one making all the money has their own solution. What does "sell Geforce like ARM sells Mali" mean? Make it an utter commodity, a bunch of transistors in one much bigger chip, like the GPU in a console but valued even less? Nobody knows the name of what you are selling and 90% of the market would rather have more runtime on their phone than ever care in the slightest for 3D game performance?

Mobile GPUs are like integrated graphics on CPUs, those didn't kill AMD or Nvidia because that entire market doesn't want to spend a dime to begin with.


Because it pushes the CUDA ecosystem into dominance of yet another platform. You can run your acceleration routine on anything from a smartphone/raspberry pi to an enterprise accelerator, one algorithm. And it will be everywhere, a defacto capability of most reference-implementation ARM devices.

(and sure opencl too but that's too loose a standard to have any platform effect, it's just a standard that everyone implements a little differently and needs to be ported to their own compiler/hardware/etc, so there is no common codebase and toolchain that everyone can use like with CUDA.)

Everyone laughed at Huang saying that NVIDIA is a software company. He was right.


Nobody wants to run CUDA anymore. All the mobile SoCs jumped right over that one and have AI coprocessors now. CUDA is what runs on the developer workstation, not the "edge device" as they call it. Like there was a tiny window of software supremacy there, then everyone remembered how matrix multiplication works.


What are you even talking about? Jumped over? It was literally never an option (and still isn't). That said, compute shaders are indeed used on mobile SOCs but even then the install base is pretty abysmal.

Maybe nVidia will decide to push this type of tech a lot harder.


>Mobile GPUs are like integrated graphics on CPUs

I really don't think so. For the high end PC consumer, integrated performance can be ignored because you'll get a dedicated card. Not so with mobile SOCs.

It makes sense that nVidia might want to squeeze out Qualcomm and get their GPUs in Samsung flagships, for example.


I think they need a Qualcomm modem more than they will ever miss a Nvidia GPU.


The majority of the handsets Samsung sells is their own Exynos. I believe they only use Qualcomm in the US anyway. Is this referring to some US specific 5G band or something?


Can't Nvidia do this without buying ARM?


A comment yesterday in the other thread mentions that ARM currently sells their ARM CPU plus Mali GPU IP in a single package. Buying the IP together is much cheaper than buying just the CPU IP. This is why nearly every ARM CPU maker uses the Mali cores, and why PowerVR as a company is nearly dead.

Reading this comment from Huang, I read it as if Nvidia wants to sell this package, but with Geforce IP instead of Mali IP.


Minor nit: Architecture and not “Package”, the latter has specific meaning. ARM defines not just instruction set, but also the architecture - Core, Memory layout, peripheral busses, etc.


Ahh, the old comcast TV+Internet is cheaper than just Internet trick....


They did, and failed:

https://www.notebookcheck.net/Nvidia-to-licence-GPU-technolo...

My guess is they failed because Geforce at the time was too much of a "desktop architecture", and my guess is that it still is. But now that they own Arm they can just tell all Mali customers "take it or leave it."


A less cynical take is that they can now ask Mali customers "how can we modify this better and more mature architecture to be appropriate for your use case"


Am sad to say I'd call that take more naive more so than less cynical.


I hope this is true.


AMD is going to "own" x86 for the foreseeable future. AMD will have to either produce cut down x86 cores or start using RISC-V to compete with NVidia across their larger portfolio.


Hopefully they would need to divest Mali to an independent spinoff rather than just squash it in the acquisition.


Does nvdia even have product which would work at the power envelop that Mali works?


Not all the way down to <5W TDP but Orin goes down to 5W.

They had tablets at one point, and they've continued to refresh their product line for automotive and set-top use.

https://www.anandtech.com/show/15800/nvidia-announces-new-dr...


The Jetson Nano has a predefined 5W mode (see page 33 on [1]), so presumably the SoC could be even lower.

[1] https://info.nvidia.com/rs/156-OFN-742/images/Jetson_Nano_We...


I think that if Nvidia open sourced it's Graphic Drivers , 80℅ of backslash against Nvidia would disappear suddenly.

I think that Backslash against Nvidia is due to it's somewhat hostile nature towards OpenSource (to be accurate, they didn't open source Nvidia's Graphic Drivers).

But, Nvidia's drivers work well(Nvidia's Linux drivers are somewhat on parity with Windows ones) . I heard that until AMD open sourced their drivers, lot of them were preaching that Nvidia is only way to Play high end games on Linux(also CUDA).

Personally, I prefer AMD GPU. I find their drivers to work great with Linux.


Yes, nvidia (the proprietary GPU driver) was way better than fglrx (the proprietary amd driver) and even nouveau (the open source reverse engineered driver for Nvidia) worked better than radeon for me in those days. It was intel for smooth sailing or nvidia for acceptable performance.

I think their unpopularity these days come from three factors:

1. For the gamer crowd, RTX 2000 was a price hike and even the higher end cards were not that impressive for performance. It looks like they were resting on their lead against AMD.

2. People love to root for the underdog and AMD was behind for the longest time.

3. For the developer crowd, the closed source Nvidia drivers were not great for Linux compatibility, and the Mac crowd couldn't care because they couldn't have Nvidia after their fight with Apple.

I think for a sample of the general public who have opinions on Nvidia, it's 1,2 and 3 in that order while for HN users, it's 3, 2 and 1 in that order.


> 2. People love to root for the underdog and AMD was behind for the longest time.

AMD has had pretty much always a solid mid-tier offering in the last five years, but people would still rather buy an nVidia card (like, say, a 1050) instead of the better AMD (at the price point) because of the "1080 Ti" halo effect.

nVidias software stack is pretty bad. Sure, the driver core works pretty well. The nVidia control panel looks like it was last updated in 2004, which is actually true (go find some screenshots of it from 15 years ago running on XP, it looks the same). Now, that doesn't mean it has to be bad (don't fix what isn't broken), but the NCP is actually clunky to use. Not to imply AMD's variant is necessarily better, but at least they're working on it.

The nVidia "value-adds" like shadow play and so on are all extremely buggy. And while their core driver may be good, you still get somewhat frequent driver resets and hangs with certain interactive GPU compute applications.


>AMD has had pretty much always a solid mid-tier offering in the last five years, but people would still rather buy an nVidia card (like, say, a 1050) instead of the better AMD (at the price point) because of the "1080 Ti" halo effect.

Do people actually think this way? "I'll buy the nvidia 2060 super rather than the amd 5700 xt (for the same price, and benchmarks higher), because nvidia has the 2080 ti"


The people who look at benchmarks aren't the ones deciding that way. It's more like "I heard Nvidia makes the fastest GPUs, I'll buy the Nvidia one I can afford."


Sure, but not consciously. People hear that nvidia XX80 is top-of-the-line and then will associate nvidia with higher performance generally.


Have you seen the NVidia RTX 3090? US$1500 and up.


people say that but much of the time AMD has excluded themselves from consideration for other reasons. 5000 series, 7000/200 series, and 480/Vega series all saw major cryptomining booms that raised prices far beyond those of the equivalent NVIDIA cards, so for much of the time they were simply not priced competitively. Like, for almost a year you would have to pay $1000+ for a Vega that barely matched a 1080 while you could get a 1080 Ti for $800 or so.

Also, there's been constant driver problems literally since the ATI days, just recently drivers basically crippled Navi for the entire duration of the product generation, drivers crippled Vega previously, Fiji was a driver mess, Hawaii and Tahiti were not problem free either (the famous "company A vs B vs C" article [0] notes that "company B's drivers are bad and are on a downtrend" and that was during the heyday of GCN, in 2014, those were the "stable drivers" that everyone rhapsodizes about in comparison to the mess that is Navi/Vega). Terascale was a fucking mess too.

AMD products often have weaker feature sets. It took them years to catch up with G-Sync (offering low-quality products with poor quality control for years until NVIDIA cleaned things up with GSync Compatible certification). NVENC is far better than AMD's equivalent (Navi's H264 encoder is still broken entirely as far as I know, it provides less than realtime encoding speed and extremely poor quality), so if you want to stream with AMD you have to use your CPU and crush your framerate, or purchase a much more expensive CPU. AMD has no answer for DLSS, and is just implementing their first generation RTX support with the next generation.

This is what I refer to as the "NVIDIA mind control field" theory. That consumers are just so wowed by the NVIDIA brand that they can't help themselves. The reality is AMD simply has not been that compelling an offering for that much of the time. There has been a lot of poor execution over the years from the Radeon team and prices have often not been as good as people remember them to be.

And a few good products don't change that either. Like, it took something like 8 years of AMD slacking off (Raja mentioned in a presentation that around 2012 thatAMD management thought "discrete GPUs were going away" [1] and pulled the plug on R&D - certainly a very attractive idea given their budgetary problems at the time) for NVIDIA to reach their current level of dominance. AMD has never led the market for 8 years at a time. Maybe a year tops, usually NVIDIA has responded pretty quickly with price cuts and new products. NVIDIA has never let off the pedal the way AMD did (and yes money was the reason but that doesn't matter to consumers, they're buying products not giving to charity).

[0] http://richg42.blogspot.com/2014/05/the-truth-on-opengl-driv...

[1] https://youtu.be/590h3XIUfHg?t=1956


I've heard everyone complain about drivers for amd, and (though I realize anecdotal evidence is no evidence at all) in my experience those people are full of crap.

I used the Fury(Fiji) and the VII(Vega 20) on linux and windows, and haven't expereinced any of the crazy shit people have claimed. Such as....

>(Navi's H264 encoder is still broken entirely as far as I know, it provides less than realtime encoding speed and extremely poor quality), so if you want to stream with AMD you have to use your CPU and crush your framerate

Where did you hear that? Not even slightly true. I used GPU encoding on both cards (@1080p, 60fps, 5500kbps, 1 second keyframe interval, 'quality' preset, full deblocking filters, and pre-pass turned off because I'm not insane) with zero issues. I would have liked to see more of an improvement in the VII's quality vs the Fury, but I wouldn't go as far as calling it non-functional.

I don't know who managed to get any of those to encode at less than realtime. I've done 1440 at 15500kbps to another machine for re-encoding and I can still play star citizen on High@60 fps with the same pc that's encoding.

And for the record, I also have a GTX 1660, 1080, and had a 770 before the Fury. I'm not fanboying, just relaying my experience.

Edit: Just to be clear, I'm not saying Navi is good to go. I'm refuting the general statement "if you want to stream with AMD you have to use your CPU and crush your framerate"

edit2: Full config for reference

  {"CodingType": 0,
      "DeblockingFilter": 1,
      "Debug": false,
      "EnforceHRD": 0,
      "FillerData": 1,
      "FrameSkipping": 1,
      "Interval.Keyframe": 1.0,
      "MaximumReferenceFrames": 1,
      "MotionEstimation": 3,
      "MultiThreading": 1,
      "OpenCL.Conversion": 1,
      "OpenCL.Transfer": 0,
      "PrePassMode": 0,
      "Profile": 77,
      "ProfileLevel": 51,
      "QP.Maximum": 42,
      "QP.Minimum": 12,
      "QualityPreset": 2,
      "QueueSize": 16,
      "VBAQ": 1,
      "VBVBuffer": 0,
      "Version": 562967133290498,
      "Video.API": "Direct3D 11",
      "Video.Adapter": 63990,
      "View": 3,
      "bitrate": 18000,
      "lastBFrame.Pattern": 0,
      "lastBFrame.Reference": 0,
      "lastRateControlMethod": 3,
      "lastVBVBuffer": 0,
      "lastVideo.API": "Direct3D 11",
      "lastVideo.Adapter": 63990,
      "lastView": 3}
edit3 (lol): I 100% have no way to refute anything navi related. Looks like they completely changed the encoding engine in navi: https://en.wikipedia.org/wiki/Video_Coding_Engine#GPUs. Equally so, that chart incorrectly lists the VII as having version 4.0 instead of 4.1, so it may not be that trustworthy. I can't say until I have a Navi card to play with.


I had a 5770 and a 290x and both had periods (2013 for the 5770, 2018 for the 290x) of about a year where the latest drivers would randomly bsod on windows if you played a game on one monitor and watched a hardware accelerated video on the other.

It turns out anecdotes are different for everyone, that doesn't make them full of crap.


>It turns out anecdotes are different for everyone, that doesn't make them full of crap.

Indeed, wouldn't have included it if I didn't think the same.

I do however think making a blanket statement which has direct anecdotal evidence refuting it is pretty much bunk.


> I've heard everyone complain about drivers for amd, and (though I realize anecdotal evidence is no evidence at all) in my experience those people are full of crap.

> I used the Fury(Fiji) and the VII(Vega 20) on linux and windows, and haven't expereinced any of the crazy shit people have claimed. Such as....

I don't know what kind of reply you're expecting. You've managed to use two AMD cards and not have driver crashes? Uh, good for you, but that doesn't mean the rest of us are "full of crap". I've also owned two AMD cards, and for both of them the drivers were flaky, on Linux and Windows. I'm sure they're not broken for everyone, but they were broken for me. Maybe if I bought another AMD card I'd get lucky this time around, but I'm not going to take the risk.


That was an intro to the second half of the comment which exclusively focuses on video encode capability. I Didn't directly refute that driver quality sucks. From the original:

>I'm refuting the general statement "if you want to stream with AMD you have to use your CPU and crush your framerate"


"those people are full of crap" is a pretty direct insult. You shouldn't make that kind of accusation when you're not able to substantiate it or even willing to stand by it.


I still don't really understand the context around the Apple fight and it's a huge bummer, since Apple hardware with Nvidia GPUs would be the best combination.

When I used Linux the closed source Nvidia drivers were better than anything else and easily available, the complaints around them seemed mostly to be ideological?

The price complaints seemed mostly about 'value' since the performance was still better than the competition in absolute terms.


Nvidia had some GPUs that ran hot and had a above average failure rate so apple were unhappy because it made a couple of models of macs look bad. They also had enough revenue of their own that they didn't care enough to invent some SKUs so people couldn't compare macs to PC laptops.

The big issue with Nvidia GPUs in Linux these days is with Wayland. There are some graphics APIs that are the current way to create contexts, manage GPU resources etc but Nvidia went their own way which would require compositors to have driver specific code.

Many smaller compositors (such as the most popular tiling one for Wayland) don't want to write or support one implementation for Intel/AMD and one for Nvidia so they either don't support Nvidia or require snarky sounding cli options to enable Nvidia at the cost of support.


Interesting - makes sense, thanks for the context.

I'd suspect part of the reason Nvidia went their own way is because their way is better? Is that the case - or is it more about just keeping things proprietary? Probably both?

If I had to guess, some mixture of ability to improve things faster with tighter integration at the expense of an open standard (pretty much what has generally happened across the industry in most domains).

Though this often leads to faster iteration and better products (at least in the short term, probably not long term).


I'd take a look at this post from a few years ago about why SwayWM does not support nvidia gpus using proprietary drivers (nouveau works).

https://drewdevault.com/2017/10/26/Fuck-you-nvidia.html


"When people complain to me about the lack of Nvidia support in Sway, I get really pissed off. It is not my fucking problem to support Nvidia, it’s Nvidia’s fucking problem to support me."

I'd suggest his users asking for Nvidia support are evidence of this being wrong.

That aside though, it seems like Nvidia's proprietary driver doesn't have support for some kernal APIs and the other vendors (AMD, Intel) do?

I wonder why I've always had better experience with basic OS usage using the Nvidia proprietary driver over AMD in Linux. Maybe I just didn't use any applications relying on these APIs. Nouveau has never been good though.

Not really a surprise given the tone of that blog post that Nvidia doesn't want to collaborate with OSS community.

Don't people rely on Nvidia for deep learning workflows? I thought that stuff ran on Linux? Maybe this is just about different dev priorities for what the driver supports?


> Don't people rely on Nvidia for deep learning workflows? I thought that stuff ran on Linux?

It all comes down to there always being two ways to do things when interacting with a GPU under Linux: The FOSS/vendor-neutral way, and the Nvidia way.

The machine learning crowd has largely gone the Nvidia way. Good luck getting your CUDA codebase working on any other GPU.

The desktop Linux crowd has largely gone the FOSS route. They have software that works well with AMD, Intel, VIA, and other manufacturers. Nvidia is the only one that wants vendor-specific code.


Thanks - makes sense.

> Nvidia is the only one that wants vendor-specific code.

Isn't that because CUDA is better and that tight software/hardware integration is more powerful?

If it wasn't then presumably people would be using AMD GPUs for their deep learning, but they're not.


Even when it comes to graphics and CUDA is not being used, Nvidia does not support the standard Linux APIs that every other GPU supports.


While AMD has been great at open source and contributing to the kernel, they also (from what I can remember) have been subpar with their reliability (both in proprietary and open source).

NVIDIA has been more or less good with desktop Linux + Xorg for the last 5-7 years (not accounting for the non support for hybrid graphics on Linux laptops).

I think you can use an NVIDIA GPU as a pure accelerator without it driving a display very easily.


> The price complaints seemed mostly about 'value' since the performance was still better than the competition in absolute terms.

Just because company A makes the fastest card does not imply that company A makes the faster card at every price point.


True, but in Nvidia's case I think they did?

Well - they charged more at each price point because they were faster.

At some prices I think it wasn't enough to justify the extra cost from a price to performance ratio, but that doesn't seem like a reason to think they're bad.

It's possible I'm a little out of date on this, I only keep up to date on the hardware when it's relevant for me to do a new build.


> Well - they charged more at each price point...

Are you joking? How is it the same price point when they are charging more?


If vendor A's cards are $180, $280 and $380 and vendor B's cards are $150, $250 and $350, it's common practice to group them into three price points of $150-199, $250-299 and $350-399 so that each card gets compared to its nearest in price.


The prices are way closer together than that, because both companies sell way more than 3 cards. There are three variants of 2080 priced differently and two of the 2070 and 2060 each. That's seven price points above 300$ alone without looking at the lower segment (2 of those cards are EOL but still available a bit cheaper at some vendors). nVidia and AMD have always had enough cards that are at the same MSRP.

E.g. the bottom of this page: https://www.anandtech.com/show/14618/the-amd-radeon-rx-5700-...

At the lower price point nVidias line up is similarly crowded: https://www.anandtech.com/show/15206/the-amd-radeon-rx-5500-...

Either way, there would be no reason to group the 350$ and the 400$ card but not the 300$ and the 350$ card.

BTW, AMD definitely didn't always have a price/performance advantage, e.g. the nice scatter plots here from ten years ago (that I randomly found): https://techreport.com/review/19342/gpu-value-in-the-directx...


Nouveau is extremely slow, and was slow on several of my machines for at least 5 years. I'm not sure how it's any better.


Nouveau is only slow because about 5 years ago Nvidia started signing the card firmware, excluding Nouveau from using some features like setting appropriate clock speeds. They release some pared down firmware that Nouveau is allowed to use, but not after years from release have passed.

Before that mess (around the 900 series), nouveau was fast.


Can confirm; I had a laptop (Dell XPS 14) with a GT630m and Nouveau worked well enough that I never felt the need to install the proprietary drivers over 4 years of use.


I'm replying to the parent comment about AMD pre-open-source drivers. So we're already 5 years ago.

I had my Linux install in a VM for a number of years because fglrx was an aggravating experience (unstable, often broke on kernel updates) to use and radeon had abysmal performance on my 290x. To the point that my laptops 540m (on nouveau) would outperform the 290x under linux.

I still have that 290x in one of my systems, and since amdgpu supported it it has been a very pleasant experience. But even that took a while as they started with 3xx and supported the 4xx cards before going back for the 2xx cards.


Why do you use the Unicode "CARE OF" symbol?

https://util.unicode.org/UnicodeJsps/character.jsp?a=2105


Wow, didn't notice until you pointed it out. Rebinding someone's % to that symbol would be a good prank...


The GP does not appear to be a native English speaker so I suspect a key mapping error.


What does GP mean?


Grandparent, 2 posts up the tree.


> I think that Backslash against Nvidia is due to it's somewhat hostile nature towards OpenSource

I would add also Nvidia's hostility towards standards. They are pretty much the Apple of the GPUs : Optix, RTX, CUDA, etc. they are screwing up the whole ecosystem by making closed APIs that only work on their platform.

If you are a graphic or ML dev you can only hate Nvidia for how they are hurting standards and making our job so much harder while we could just agree on Khronos standards.

Hopefully some standards catch up (raytracing in Vulkan coming) but some other don't (ML is CUDA only)


> I would add also Nvidia's hostility towards standards.

With CUDA I get a comfortable C++ dev environment that integrates with their products AND Visual Studio, FOR FREE. I played a bit with raw OpenCL a couple years ago, and it's.. uncomfortable. I've looked for free (as in beer) SYCL implementations... couldn't find any. IDEs with integrated debugging for OpenCL... couldn't find any, at least not for free. (Intel charged $$$ for their full-fledged OpenCL tools.)

CUDA is popular because the development environment is free AND comfortable. Has the situation changed the last years?

> Khronos standards

You could argue the other way: CUDA has become the de facto standard in certain areas and other vendors should release CUDA-compatible tooling for their products. I as a developer want to get work done and I don't care about ideals like "standards". I choose and recommend tooling that gives the shortest way to the result.

---

I'm mostly working with REST these days and I curse the HTTP protocol and URIs, etc. that are a mishmash of hacks to get it work with 7-bit charsets. At one occasion I said: "Just because something has a 30 year old RFC doesn't mean it's suitable for use in _today's_ applications." Standards can just as much hold back and prevent better ideas from emerging as they can help with interoperability.


> I as a developer want to get work done and I don't care about ideals like "standards".

The "I'm just doing my job" mentality often gets us in very bad situations down the road (be it for Privacy, Ecology, Standards, etc.)

Totally agree that for "getting the job done" and having free tools then yes CUDA is better.

For your REST example, I understand your frustration, but now think what would it be like without standards :

You would have to pay 100$ to use Apple.iHTTP©®™ + implement a version for Google G.HTTP + implement yet another version for Microsoft Visual.HTTP™. For each you would have to get a licence, agree to conditions, you would have to buy both a windows & mac because ofc they wouldn't be cross platform, each one would only work in one browser, they could decide over a night to remove your right to use HTTP, Microsoft Visual.HTTP dev tools would be crap but couldn't use the cool one from another company because not compatible, etc.

Might seems like an exaggeration but when you look at the Epic vs. Apple, Google AMP, DirectX & Metal, Apple developer fees & conditions, Web DRMs, the lack of cross-compability, etc. Well that's what HTTP would look like if people didn't care about standard.

Also standard don't have to always be backward compatible (for e.g OpenGL vs. Vulkan) that's specific to the web, not to standardisation.


In what way is NVIDIA hurting Vulkan? I suspect the opposite is true, AMD designed the API Vulkan is based on, but the whole Vulkan thing would be meaningless without NVIDIA also standing behind it.

Are you blaming NVIDIA for Vulkan not being on par with their proprietary APIs? Presumably Nvidia is not sabotaging Vulkan, rather the issue could be that Vulkan is an abstraction that supports multiple GPUs, and that improving Vulkan is resource intensive.


Well, parity with Windows is the right wording here.

Windows drivers are starting to work better now that Microsoft adopted the Linux model of centralizing them. But they still aren't any great. For any non-usual hardware the drivers still come bundled with all kinds of shit, stop working after a while, and get bugs at random.


> Windows drivers are starting to work better now that Microsoft adopted the Linux model of centralizing them.

No. Intel ships WHQL Certified broken WiFi drivers via Windows Update. It took me three weeks to try all available versions, settings etc to troubleshoot. At last, I've roll backed the driver to earliest version and pinned it there. Now the wireless card is working as intended.

Device is an HP Spectre X2 convertible tablet so everything is a package.

Edit: Use kinder words.


I've installed Intel's Driver Update Utility and it keeps the wifi up to date from them rather than Windows Update. I remember conflicts with Windows Update when trying to use Intel's own drivers years ago but it hasn't been an issue with my current device.


> I've installed Intel's Driver Update Utility and it keeps the wifi up to date from them rather than Windows Update.

The problem is, Intel's all newer drivers are broken for that particular card, even the ones installed by Intel's Driver Update Utility (I've tried that too, yes).

It's the same time that Intel's e1000e Linux drivers started breaking older cards so, there's something off in that department IMHO.

The add insult to injury, both Windows and Linux drivers' release notes claim that the cards in question are supported by the respective, newer drivers.


> I think that if Nvidia open sourced it's Graphic Drivers , 80℅ of backslash against Nvidia would disappear suddenly

They don't even have to do that. All they have to do is allow nouveau to actually drive the graphics card. Maybe it won't be as performant as the proprietary driver but it will actually work out of the box with no system instability.


"I don’t know if it will work; data centers are about the density of processing power, which is related to but still different than performance-per-watt, ARM’s traditional advantage relative to Intel, and there are a huge amount of 3rd-parties involved in such a transition." But isn't performance-per-watt starting to become more important than the density of processing power which is limited by your power consumption?

They have been buying some other data center/HPC companies as well this year like Mellanox and Cumulus, to me it almost seems like they want to own the entire data center stack and eventually provide a similar offering as AWS, Azure and Google Cloud.


They'd rather sell the picks & shovels to AWS, Azure and GCP.


Isn't this more selling the ore to the pick and shovel makers? The hapless gold miners in this case being all the Uber for street sweeper startups


Totally OT I know, but Uber for street sweeper -- I want this. Give me an app to request a quick cleanup and maybe try to get my neighbors to chip in. Or to report roadway debris to the city. Or for driveway snow removal during the winter.


Your city might have an app for reporting issues to the city. https://www.sandiego.gov/get-it-done has it for example.


The comments about edge, and combining Nvidia / graphics tech with Arm makes a lot of sense to me. I suspect that's where most of the value of this merger will come from. When it comes to Arm in the datacenter, my question is: How do they actually plan to extract money from ARM displacing x86?

AWS has been deploying at least one ARM chip (Annapurna/Nitro) on every single EC2 server for 5+ years. Surely they would have made sure their license ensures future rights to keep using ARM at a reasonable price. Their Graviton instances are effectively just one more ARM chip, so an Intel instance has 2 Arm chips (1 for EBS, 1 for networking) while a Graviton/Arm instance has 3 (EBS, networking, CPU).

Unless the Arm license reads something like "If you make a really big multi-core chip we get to charge you more"... how does the x86 -> Arm transition actually move the needle for Arm Holdings / Nvidia?


1. Licensing costs are higher for more complex chips. So that one extra CPU chip is likely to cost a lot more.

2. If it's Nvidia / Arm actually selling the CPU (and not just the ISA) you'd expect margins to be higher again.


Arm licensing of their processor includes a royalty. More processor sales make more money for Arm.


This estimate has 6 billion Arm chips shipped in Q4 2019: https://www.statista.com/statistics/1131983/arm-based-chip-u...

The hyperscale/cloud datacenter market is single-digit millions of CPUs shipped per year. So if the whole cloud went to Arm tomorrow it would be less than 1% of total Arm chip shipments.

My point is that unless there's a way to extract higher $ per chip in the datacenter then actually it doesn't make a difference for Arm.

Edit: By datacenter market I'm really referring to AWS/GCP/Azure... the rest of the market ain't going en-masse to Arm anytime soon.


What makes you think ARM charges the same for a Neoverse N1 CPU as a Cortex M0 CPU?

That wouldn't make business sense.


I don't know what's in those licenses. Feel free to enlighten me :)

But it also wouldn't make business sense for Amazon to hang their hat on ARM without assurances or contractual guarantees that they can keep selling ARM chips for a reasonable price into the future. Where I define "reasonable" as Arm taking a much, much smaller cut of the final price than what Intel can do.


On the Arm in the datacenter point, Nvidia doesn't actually have to cut off IP from competitors to achieve dominance, just the ability to do so is enough. Would anyone continue to fund Ampere (the company not the graphics architecture) if they know that they are dependent on one of their biggest competitors for IP?

I'm prepared to make a prediction: In 5 years Nvidia will be the only significant Arm supplier in the datacenter and all the other entrants will have given up - and that this explains why Intel's stock has risen over the last couple of days.


I agree with this except to point out that Nuvia and Marvell are not dependent on Arm.


Back in May 2017, a talk in RISC-V con (https://riscv.org/wp-content/uploads/2017/05/Tue1345pm-NVIDI... ) was given. Here are the conclusions of the talk:

- NVIDIA will use RISC-V processors in many of its products

- We are contributing because RISC-V and our interests align

- Contribute to the areas that you feel passionate about!

Since then, I didn't observe much NVIDIA activities in RISC-V software community. But some NVIDIA people do participate Virtual Memory Task Group.

Apparently, the stance have changed.


I wonder how the world would've looked like if companies with valuation greater than 100 billion (adjusted to inflation) just wouldn't be allowed to purchase and\or merge with other companies.


Hard to wrap my mind around what a "data center software platform" means in practice. Isn't that covered by things like AWS and Azure?


I read it as a software platform for programming in a data centre with nvidia's stack.

Ie a library thats optimized to drive a bunch of hosts with ARM cores, strapped to nvidia gpu's, connected by mellanox interconnects.

Sell it a a unit to the public clouds for a massive markup, or maybe offer on their own cloud for an unbeatably low cost.


Own NVidia GPU's. Own ARM chips. Connect ARM chips to NVidia GPU's in big racks of racks connected to power.

ARM is effectively embarrassingly parallel, low-power compute, and GPU's are parallel high-power compute.

If NVidia got into datacenter design (with ARM+GPU), dropped K8S+S3+Postgres onto it, charged $1999.95/mo for access, what could they deliver to customers?

What would a vertically integrated datacenter look like where you are the designer + manufacturer for everything from the concrete up?


> What would a vertically integrated datacenter look like where you are the designer + manufacturer for everything from the concrete up?

A lot like AWS, Oracle Cloud, and soon to be Google Cloud and Azure?

people are losing their shit but this isn't even that novel lol, quite a few other players already own their full stack.


None of those companies make their CPUs or GPUs.


AWS Graviton, although an arm.. Google TPU and affinity for experiments like with POWER.. microsoft hiring people with custom cpu/soc experience (and they have experience with sq1/snapdragon)


Go down a layer or two. What platform do you want for the actual hardware in a data center that is a part of AWS or Azure?


My guess would be tight software integration between Nvidia Arm CPUs and Nvidia GPUs that would not be available to other Arm vendors or users of Nvidia GPUs with Intel CPUs.

CUDA provides software lock-in to Nvidia GPUs. This would provide software lock in to the combined Nvidia Arm / Nvidia GPU platform.


This sounds like a blessing in disguise for Intel. Nvdia trying to compete in the server space effectively means that most of the other companies designing Arm servers will go out of business. Since Nvdia will not be interested in any of the other market Arm is in, it will be neglected.


Old cynic:

This all sounds like buzzword bingo and vaporware. It is one possible future. It also highly depends on the definition of "our time" --

We are joining arms with Arm to create the leading computing company for the age of AI.

AI is the most powerful technology force of our time. Learning from data, AI supercomputers can write software no human can.

Amazingly, AI software can perceive its environment, infer the best plan, and act intelligently.

This new form of software will expand computing to every corner of the globe


NVidia will make CUDA cores available across the whole ARM ecosystem, from Cortex-M? on up.

NVidia will integrate Arm cores into their GPUs where the GPU itself will start to become server hosted on a PCIe bus, potentially with its own Infiniband connection.


“Owning it all ... from cloud to edge” is an exciting ambition not only for Nvidia but also as a developer and a consumer. It’s going to be hard to pull it off, of course, but it’s worth a shot.

EDIT - explaining the benefits for developers, and therefore consumers.

A "cloud to edge" stack from hardware to the application layer could create new application patterns that can tremendously accelerate autonomous driving, everyday robots, gaming. It could democratize this for small (maybe indie) developer teams. Wouldn't this have a great impact on consumers?


The idea of a company "owning it all" does not sound like something that I am going to benefit from as a consumer long term.


AMD and Intel also already "own it all" in this sense. NVIDIA is merely a third competitor in this space, with a much more undesirable CPU IP.

You should welcome a third competitor to the duopoly that has strangled CPU development. We surely could have made much more progress if x86 was not limited to only two (really three) competitors, you can already see how much change that AMD getting back in the game has made.

And I'm not sure Huang is going to burn it all down anyway. That seems like it would be a shortsighted move that would negatively affect the long-term value of ARM.

But I mean - I don't think anyone can deny that Huang would do great things with ARM. Terrible, perhaps, but also great.

(and it says a lot that a lot of people are probably nodding along with a comparison of one of the greatest tech CEOs of all time to literal Voldemort, the public opinions on NVIDIA and Huang are just ridiculously hyperbolic)


AMD and Intel don't actually own it all, certainly not from a software perspective. That's the difference here. The upshot of this article is that Nvidia basically wants to "CUDAify" the entire datacenter software stack. Intel and AMD absolutely do not have that kind of lock-in. You don't need to use their proprietary language to write programs that run on their systems.


This is such an important point.

Plus CUDA shows what can happen even when alternatives are available (OpenCL for example) you just have to use the hardware / software integration to be sufficiently ahead and establish a virtuous circle.


CUDA is basically just C/C++ with parallel-programming concepts. The "alternatives" like OpenCL are still tied to their graphics API origins.


I'm no fan of Apple, but they now own everything from chips and hardware to the OS and 30% of app store revenue, and they are the most valuable company on Earth.


> I'm no fan of Apple

This suggests that as a consumer, you haven't benefited from this.


Benefited from a price that's 30% higher than it could have been?


That's what I'm saying. The above comment was implying that due to Apple's worth, they were benefiting consumers.

I was pointing out that the above commentors personal experience showed otherwise.


The fact that they are the most valuable company on Earth suggest that many consumers do benefit from it.


Many consumers benefit from a good product they enjoy using, regardless of ownership of the stack behind it

The fact they own their full stack, has resulted in some of the most anti consumer parts of the business. Allowing monopolization on repairs / part pricing etc


Or it suggests patent laws prevent consumer friendly companies from competing with Apple.


Don’t forget their competitors in the space are Amazon, Microsoft and Google. They won’t be able to do more than make a dent to that market share.


Yes, but owning the IP that runs Amazon CPUs gives them a clear advantage?


If you are licensing the ISA but have your own implementation design, pivoting to RISC-V should be easy.


Market calls for competition...


Will they buy micron or another major memory company?


I hope this fails miserably.

NVidia doubly fucks over desktop Linux with it's proprietary drivers and CUDA beating Khronos Group stuff, and I don't want their abilities to grow.


The post author is very bullish on commoditization and modularization winning in the end, and points out this would damper or sink NVidia's plans, and I very much hope he is right.


> The most advanced versions of Nvidia’s just-announced GeForce RTX 30 Series, for example, has an incredible 10,496 cores.

I'm so tired of Nvidia getting away with this blatant falsehood.

They weren't even actual-cores when Nvidia started counting SIMT lanes as "cores" (how many hands do you have, 2 or 10? And somebody who writes twice as fast as you must obviously have 20 hands, yes?), and now that the cores can, under some conditions, do dual-issue, they are counting each one double.

What's next, calling each bit in a SIMT lane a "core"?


> What's next, calling each bit in a SIMT lane a "core"?

No, because that would break the convention they've used since the introduction of the term CUDA core: it's the number of FP32 multiply-accumulate instances in the shader cores. Nothing more, nothing less. If you see more into it, then that's on you.

You may not like that they define core this way (all other GPU vendors do it the same way, of course), but they've never used a different definition.

BTW, I checked the 8800 GTX review and there's no trace of 'core' being used in that way. It was AMD who started calling FP32 ALUs 'shader cores' or 'shaders' with the introduction of the ill fated 2900 XT. Since comparing the number of same-function resources is one of the favorite hobbies of GPU buyers, Nvidia subsequently started using the same counting method and came up with the terms "CUDA core."


> they've never used a different definition

The term "core" had a concrete meaning before Nvidia defined it to mean number of FMA units, and it's obviously no accident: they get to claim they have 32x more cores than would be the case by common definition. They are the odd one out; should Intel and AMD start multiplying their number of cores by the SIMD width? You would accept that as a "core"?

Now, as I wrote above, they have changed it AGAIN: they are now double-counting each "core" because it has some (limited) superscalar capability.


> The term "core" had a concrete meaning before Nvidia defined it to mean number of FMA units

It was AMD/ATI who started doing it. Nvidia followed, and they didn't really have a choice if they wanted to avoid an "AMD has 320 shader cores yet Nvidia only has 16" marketing nightmare.

> Now, as I wrote above, they have changed it AGAIN: they are now double-counting each "core" because it has some (limited) superscalar capability.

They did not. In Turing, there's one FP32 and one INT32 pipeline with dual issue, in Ampere, there's one FP32 and one (INT32+FP32) pipeline, allowing dual issue of 2 FP32 and INT32 is not being used.

That can only be done if there are 2 physical FP32 instances. There is no double counting.

If your point is that this second FP32 unit can't always be used at 100%, e.g. because the INT32 is used, then see my initial comment: it's the number of physical instances, nothing more, nothing less. It doesn't say anything about their occupancy. The same was obviously always the case for AMD as well, since they had a VLIW5 ISA when they introduced the term, and I'm pretty sure that those were not always fully occupied either.


> If your point is that this second FP32 unit can't always be used at 100%

My point is that the second FP32 unit is not a core, in the sense of https://www.amazon.com/Computer-Architecture-Quantitative-Jo... which, it is my understanding, was a well-established standard; nothing more, nothing less.


Is that it? According to that definition, the first FP32 unit is not a core either...

But, again, if you want to blame someone for this terrible, terrible marketing travesty, start with AMD. They started it all...

My point was simply been that, contrary to your assertion, Nvidia and AMD have never changed the definition of what they consider to be a core, even if that definition doesn't adhere to computer science dogma.


Well Cloud Vendor also use the term "Core" when they meant a CPU thread. With SMT8 you now get 8 CPU Core. per Actual Core.

I would love people to backlash on that too. Unfortunately the ship has sailed.


One "CUDA core" is indeed one GPU thread. The lane of a GPU SIMD is nothing like CPU SIMD, and can independently branch (even if that branching can be much more expensive than on a CPU).


> One "CUDA core" is indeed one GPU thread.

This is not true, just like a shader core with AMD was not a GPU thread.

For example, the 2900 XT had 320 shader cores, but since it used VILW-5 ISA, that corresponds to 64 GPU threads.

Similarly, an RTX 3080 has 8704 CUDA cores, but there are 2 FP32 ALUs per thread, resulting in 4252 threads, and 68 SMs since, just like Turing, there are 64 threads per SM.


I think at this point it's just Azure using "vCores" and everybody else has migrated to less dishonest terminology.


Yes they used to be vCore ( But you will be surprised how many developers and industry expert didn't know it was a thread )

They then dropped the vCore, so it is now simply Core.


I can't find any cloud providers using the term "core". Azure is using "vCore" and the others are using "vCPU".


> It was AMD who started calling FP32 ALUs 'shader cores' or 'shaders' with the introduction of the ill fated 2900 XT

IIRC, ATI started to use proper FP32 ALUs for both pixel shading and video processing/acceleration [0] around the same time. I guess doing this stuff needs more than simple MUL/ACC instructions.

So, there's no misdirection here.

[0]: https://en.wikipedia.org/wiki/Video_Shader


Yes.

If you're a computer science fundamentalist like pixelpoet who wants to stick to definition of core as what's AMD and Nvidia call "CU" and "SM", using "threads and warps" instead of "strands and theads", then "CUDA core" is obviously jarring.

It's simply a case where marketing won. At least Nvidia and AMD, and now Intel, are using the same term. You can go on with your live, or you can whine about it, but it's not going to change.


If we're going to stick to literal definitions of the things in computer science, we should get the forks out and march to the sotrage manufacturers' HQs to force them to accept 1kB is actually 1024 bytes, not 1000.

From my PoV, any functionally complete computational unit (in the context of the device) can be called a core. Should we say that GPUs has no core because they don't support 3D Now! or SSE or AES instructions?

Consider an FPGA. I can program it to have many small cores which can do a small set of operations or I can program it to be single but more capable core. Which one is a real core then?


a CU is not a computationally complete unit, it has no independent thread of execution, all cores in a warp follow the same execution path (indeed all execution paths, even if not applicable to that CU).

That is in contrast to your FPGA example. Either one big core or many small cores can all execute their own threads.

It's not that simple either, you've got stuff like SMT and CMT where you have multiple threads executing on a single set of execution resources - but CUs are clearly on the line of "not a self-contained core".


Thanks for the clarification. I've read nVidia's architecture back in the day but, it seems I'm pretty rusty on the GPU front.

Is it possible for you to point me to the right direction so, I can read how these things work and bring myself up to speed?

SMT is more like cramming two threads into a core and hoping they don't compete for the same ports/resources in the core. CMT is well... we've seen how that went.


https://en.wikipedia.org/wiki/Single_instruction,_multiple_t...

https://course.ece.cmu.edu/~ece740/f13/lib/exe/fetch.php?med...

The short of it is that GP is right, SIMT is more or less "syntactic sugar" that provides a convenient programming model on top of SIMD. You have a "processor thread" that runs one instruction on an AVX unit with 32 lanes. What they are calling a "CUDA core" or a "thread" is analogous to an AVX lane, the software thread is called a "warp" and is executed using SMT on the actual processor core (the "SM" or "Streaming Multiprocessor"). The SM is designed with a lot of SMT threads (warps) being able to be living on the processor at once (potentially dozens per core), being put to sleep when they need to do long-term data accesses, and then it swaps to some other warp to process while it waits. This covers for the very long latency of GDDR memory accesses.

The distinction between SIMT and SIMD is that basically instead of writing instructions for the high-level AVX unit itself, you write instructions for what you want the AVX lane to be doing and the warp will map that into a control flow for the processor. It's more or less like a pixel shader type language - since that's what it was originally designed for.

In other words, under AVX you would load some data into the registers, then run an AVX mul. Maybe a gather, AVX mul, and then a store.

In SIMT, you would write: outputArr[threadIdx] = a[threadIdx] * b[threadIdx]; or perhaps otherLocalVar = a[threadIdx] * threadLocalVar; The compiler then maps that into loads and stores and allocates registers and schedules ALU operations for you. And of course like any "auto-generator" type thing this is a leaky abstraction, it behooves the programmer to understand the behavior of the underlying processor, since it will faithfully generate code with suboptimal performance.

In particular, in order to handle control flow, basically any time you have a code branch ("if/else" statement, etc), the thread will poll all its lanes. If they all go one way it's good, but if you have them go both ways then it has to run both sides, so it takes twice as long. The warp will turn off the lanes that took branch B (so they just run NOPs) and then it will run Branch A for the first set of the cores. Then it turns off the first set of cores and runs Branch B. This is an artifact of the way the processor is built - it is one thread with an AVX unit, each "CUDA core" has no independent control, it is just an AVX lane. So if you have say 8 different ways through a block of code, and all 8 conditions exist in a given warp, then you have to run it 8 times, reducing your performance to 1/8th. Or potentially exponentially more if there is further branching in subfunctions/etc.

(obviously in some cases you can structure your code so that branching is avoided - for example replacing "if" statements with multiplication by a value, and you just multiply-by-1 the elements where "if" is false, or whatever. But in others you can't avoid branching, and regardless you have to manually provide such optimizations yourself in most cases.)

AMD broadly works the same way but they have their own marketing names, the AVX lane is a "Stream Processor", the "warp" is a "wavefront", and so on.

AVX-512 actually introduces this programming model to the CPU side, where it is called "Opmask Registers". Same idea, there is a flag bit for each lane that you can use to set which lanes an operation will apply to, then you run some control flow on it.

https://en.wikipedia.org/wiki/AVX-512#Opmask_registers


1 kB is 1000 bytes, though.

1 kiB, on the other hand, is 1024 :)


1kiB (kibibyte) is standardized as 1024 bytes in 1998. Before that it was kilobyte which was used for 1024 bytes [0].

> Prior to the definition of the binary prefixes, the kilobyte generally represented 1024 bytes in most fields of computer science, but was sometimes used to mean exactly one thousand bytes. When describing random access memory, it typically meant 1024 bytes, but when describing disk drive storage, it meant 1000bytes. The errors associated with this ambiguity are relatively small (2.4%).

[0]: https://en.wikipedia.org/wiki/Kibibyte


As a GPU buyer, I don't understand Nvidia's marketing at all. They list all their specs as like "X ray tracing cores, Y tensor cores, Z CUDA cores" and I have no idea how that translates to real-world performance. Which of those does my game use? Which of those does Blender use when raytracing? (Those are the two applications that I use a GPU for, so the ones I personally care about.) And the cores are listed as separate things, but I have the feeling they're not; if you're using all Y tensor cores, then there aren't Z unused CUDA cores sitting around, right?

I think it all ends up being useless at best and completely misleading at worse. The reality is that I don't even know if I want to buy their product or not. I guess that's what reviews are for? But why do reviewers have to do the job of Nvidia's marketing department? Seems strange to me.


This is a criticism that has been made of consumer computing since its inception: Even if you're versed in the technical details, you can still get misled.

In all cases it helps to know your must-haves and prioritize accordingly, given that it's rarely the case that you can "just" get the new one and be happy: even if it benchmarks well, if it isn't reliable or the software you want to run isn't compatible, it'll be a detriment. So you might as well wait for reviews to flesh out those details unless you are deadset on being an early adopter. The specs say very little about the whole of the experience.

I actually hate the idea of having a high-end GPU for personal use these days. It imposes a larger power and cooling requirement, which just adds more problems. I am looking to APUs for my next buy - the AMD 4000G series looks to be bringing in graphics performance somewhere between GT 1030 and GTX 1050 equivalent which is fine for me, since I mostly bottleneck on CPU load in the games I play now(hello Planetside 2's 96 vs 96 battles, still too intense for my 2017 gaming laptop) and these APUs now come in 6 and 8 core versions with the competitive single-thread performance of more recent Zen chips. I already found recordings of the 2000G chips running this game at playable framerates, so two generations forward I can count on being a straight up improvement. The only problem is availability - OEMs are getting these chips first.


> But why do reviewers have to do the job of Nvidia's marketing department?

Are you suggesting you would rather read reviews of Nvidia products that are written by Nvidia, and you would trust them more than 3rd party reviews?

> I don't understand Nvidia's marketing at all.

Do read @TomVDBs comments; this isn't Nvidia, this is the industry-wide marketing terminology.

Cores are important to developers, so what you're talking about is that some of the marketing is (unsurprisingly) not targeted for you. If you care most about Blender and games, you should definitely seek out the benchmarks for Blender and the games you play. Even if you understood exactly what cores are, that wouldn't change anything here, you would still want to focus on the apps you use and not on the specs, right?

> I have the feeling they're not; if you're using all Y tensor cores, then there aren't Z unused CUDA cores sitting around, right?

FWIW, that's a complicated question. There's more going on than just whether these cores are separate things. The short answer is that they are, but there are multiple subsystems that both types of cores have to share, memory being one of the more critical examples. The better answer here is to compare the perf of the applications you care about, using Nvidia cards to using AMD cards, picking the same price point for each. That's how to decide which to buy, not worrying about the internal engineering.


I wouldn't trust them more, but it would be a good rule of thumb for whether or not I need to be awake during this product cycle. For example, if they're like "3% more performance on the 3090 vs. the RTX Titan" then I can just ignore it and not even bother reading the reviews. Instead, they're just like "well it has GDDR6X THE X IS FOR XTREME" which is totally meaningless.


> Instead, they're just like "well it has GDDR6X THE X IS FOR XTREME" which is totally meaningless.

That's referring to memory and not cores; is that a realistic example? I'm not very aware of Nvidia marketing that does what you said specifically - the example feels maybe a little exaggerated? I will totally grant that there is marketing speak, and understanding the marketing speak for all tech hardware can be pretty frustrating at times.

> if they're like "3% more performance on the 3090 vs. the RTX Titan" then I can just ignore it and not even bother reading the reviews.

Nvidia does publish some perf ratios, benchmarks, and peak perf numbers with each GPU, including for specific applications like Blender. Your comment makes it sound like you haven't seen any of those?

Anyway, I think that would be a bad idea to ignore the reviews and benchmarks of Blender and your favorite games, even if you saw the headline you want. There is no single perf improvement number. There never has been, but it's even more true now with the distinction between ray tracing cores and CUDA cores. It's very likely that your Blender perf ratio will be different than your Battlefield perf ratio.


I haven't seen any of those. All I've seen is a green-on-black graph where the Y axis has no 0, they don't say what application they're testing, and they say that the 3070 is 2x faster than the 2080 Ti. Can you link me to their performance numbers? As you can tell, I'm somewhat interested. (And know that real reviews arrive tomorrow, so... I guess I can wait :)


The official press release of the 3000 series[1] has a graph that seems to be what you're looking for. Look for the section named "GeForce RTX 30 Series Performance".

It has a list of applications, each GPU and their relative performance. Y=0 is even on the graph!

[1]: https://www.nvidia.com/en-us/geforce/news/introducing-rtx-30...


Ah, OK, I saw that. I am just very suspicious of the fact that they didn't use "fps" as the units, and instead chose "relative performance".


Nvidia does publicize benchmarks in its marketing, but many people (correctly) are skeptical of benchmarks published by the same company that makes the product. The number of CUDA cores and other hardware resources is a number that feels a bit more objective, even though it is hard to understand the direct implications on performance.

Apple is a good example of a company that just doesn't really talk too much about the low-level details of their chips. People buy their products anyway.


Is HT/SMT on CPU any different?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: