Hacker News new | past | comments | ask | show | jobs | submit login
An interview with AMD CEO Lisa Su about solving hard problems (stratechery.com)
247 points by wallflower 7 months ago | hide | past | favorite | 195 comments



I was todays years old, when I found out that Lisa Su (CEO of AMD) and Jensen Huang (co-founder and CEO of NVIDIA) are relatives! If you can't do a merge, it's good to have family onboard ;-)


They didn't know about their relation until much later, but if they had Jensen would have been the "cousin you don't want to be like" - he went to Oregon State and worked at a Denny's while Lisa Su went to Bronx Science and on to MIT.


> he went to Oregon State and worked at a Denny's while Lisa Su went to Bronx Science and on to MIT.

Seems like a bad vibe to imply someone shouldn't aspire to go to state school or work a humble job for money to get through it, even though given both options, indeed they may dream about the fancy one. Denny's has the best milkshakes anyway and state school is probably a much more sensible place to attend.


This makes Jensen one of those people that climbed all the way from the bottom to the top, which is more admirable.


And yet this is exactly how families talk about their cousins in hushed tones, as bad and immoral as it is.


Although he did graduate high school two years early, so he had the intuition. Maybe his parents thought working food service for a bit was a rite of passage.


Hm, if he graduated early perhaps I can't use him as a positive example to stop me from killing myself. I guess I need to read his early-years biography.


I have to take this comment at face value. Contact NIMH[1] or your local equivalent.

If you or someone you know is in crisis Call or text the 988 Suicide & Crisis Lifeline at 988 (para ayuda en español, llame al 988). The Lifeline provides 24-hour, confidential support to anyone in suicidal crisis or emotional distress. Call 911 in life-threatening situations. If you are worried about a friend’s social media updates, you can contact safety teams at the social media company . They will reach out to connect the person with the help they need.

[1] https://www.nimh.nih.gov/health/topics/suicide-prevention


The "cousin you don't want to be like" is someone with an electrical engineering degree at 21?


Graduating with an EE degree in 3.5 to 4 years is basically the norm? I know dozens of people from my own state school that did that.


And then he went to Stanford so hardly a failure...


Not a failure at all, but Stanford Masters isn't nearly as selective as MIT undergrad or Stanford undergrad.

He also graduated at the age of ~29. Not sure if it was a full-time MS or a part-time program paid for by his employer.


I'd check your numbers on that from 30 years ago. They weren't even in the same universe of selectivity as they are now. Full-time/part-time is totally irrelevant. What, are you the most elitist credentialist of all time lol? Jesus.


It's relevant - some companies have seats effectively reserved for them at good grad schools for masters programs for their employees, even today at less prestigious companies like Carrier and GE - the selectivity isn't based on who won beauty pageants or had 7 first author NeurIPS papers like it is for typical MBA and PhD programs at the same institutions.

Getting a Stanford MS while working was somewhat normal then (possible for mere mortals and not superhumans) if you worked at the right company, not really the same as getting into undergrad at all.


(-; Indeed always warming to the core to see productive emulation between parallel lines! I'm sure after all their achievements, neither of them wastes time pondering woulda, shoulda, cuda…


You made my day with woulda shoulda cuda.

I’m going to watch finding NeMo now


I couldn't believe this so I had to Google it. First cousins once removed. I don't really know what to think about this...


All US presidents are directly related to each other: https://curiousmindmagazine.com/all-us-presidents-including-...

The world's a stage. :)


That means they're descendent of arguably the most powerful woman in the world's history, Eleanor the Aquitaine [1].

She's married to both King of France and King of England, and mother of three kings during her lifetime including King John Lackland Plantagenet (progenitor of all 43 US presidents except one).

This takes a new meaning to the popular idiom "the apple doesn't fall far from the tree".

[1] Eleanor of Aquitaine:

https://en.wikipedia.org/wiki/Eleanor_of_Aquitaine


Ok, being ancestor to 42 presidents is impressive, but how many loser great^30 grand kids does she have?


In a grand scale of scheme the losers do not really matter, for example imagine that the most successful VC today does have countless failed ventures but if 99% of fortune top 100 of US companies the particular VC has invested in their early startup days, I can say that's much better return than Berkshire Hathaway of Warren Buffet (and BH is not a VC company).


I disagree - there are two independent systems at play here - human success, and evolutionary success.

It's unlikely that there are many ancestors of Eleanor of Aquitaine that achieved more human success than George Washington. However, George left this world with zero descendants and his line effectively ended with him. Whereas I'm sure there are a bunch of duds out there who are the same relation to Eleanor of Aquitaine and produced 9 children who each produced a handful of children.


And we are always being reminded that human is well within evolutionary paradigm, are we not?

I think there's very much underestimation that the so called modern world is more civilized and successful in their worldly endeavours compared to the earlier generations. It's probable that for Eleanor of Aquitaine, Alexander the Great and/or Aristotle are the ancestors. Comparing George Washington against Alexander and Aristotle, I'd say there is no contest for the former.


The bottom of that link has a video that is far more meaningful[0]

It asks the question if the presidents are more related to one another than to another random group. The answer to this is no.

This is probably pretty obvious if you actually look at population sizes through time[1]. There's 8 billion people alive today, but we have a billion less in 2010, 1999, (5) 1986, 1974, 1960, 1927, 1800. So in the last 100 years we grew 6 billion people! But in the last 200 years only 7 billion. In 1200 (approximately the time of King John) there was 360 million people in the world. Which is like taking the entire US and distributing across the globe. For reference, there were only 68 million people in Europe[2], which is about the current population of the UK[3] or about the combined population of Tokyo and Delhi (add Shanghai and Sao Paulo if you require city proper).

So you can probably just guess through how quickly population exploded that you're going to have convergent family trees without going back very far. Honestly, I'm more surprised it takes 800 years and not less.

[0] https://www.youtube.com/watch?v=9shzqqcfvfw

[1] https://www.worldometers.info/world-population/world-populat...

[2] https://en.wikipedia.org/wiki/Medieval_demography

[3] https://www.worldometers.info/world-population/population-by...

[4] https://en.wikipedia.org/wiki/List_of_largest_cities


A relative from 800 years ago doesn't honestly seem that impressive. That's ~30 generations? That's a whole lot of people. Good luck finding a venue large enough for that family reunion.


Agreed—all that it really shows is that all of them but Van Buren have some form of British ancestry. Even the fact that they're all descended from a King of England isn't especially noteworthy at those kinds of time scales.

That said, what's incredibly impressive, if true, is that someone managed to track each of their geneologies that far back along that line. That's not an easy feat. And this isn't just an internet legend: at the very least there really was a girl who really did put together a chart, and she got taken seriously enough to have it included in a Library of Congress exhibit on the Magna Carta [0].

I'd be interested to see the actual chart she made. One possible explanation for how she was able to do this is if each of the presidents connects up to English aristocracy fairly recently, which would account for the records being intact and would be more interesting than just the fact of a shared ancestor in ~1200.

[0] https://loc.gov/exhibits/magna-carta-muse-and-mentor/magna-c...


> I'd be interested to see the actual chart she made. One possible explanation for how she was able to do this is if each of the presidents connects up to English aristocracy fairly recently, which would account for the records being intact and would be more interesting than just the fact of a shared ancestor in ~1200.

Obama's most recent aristocratic ancestor appears to be a "Sir Henry Bold" his 16x great-grandfather.

A quick look at some of the genealogies makes it appear that many rely on the same source, which lists 900 royal descendants that immigrated to the new world[1].

With 900 of them several generations back, it wouldn't be too surprising that many politicians would be related.

1: https://www.amazon.com/gp/product/0806320745/ref=as_li_tl?ie...


Would be more clear if impressive if could see what fraction of USA population is related to this one person.


Oh, it's certainly an impressive feat. I too would like to see the chart she made.

> One possible explanation for how she was able to do this is if each of the presidents connects up to English aristocracy fairly recently

That would honestly be a more interesting finding personally, discovering that all of America's greatest political leader are just imported British aristocracy.


Where do you think the Founding Fathers came from?

In the early US "democracy" , only British aristocrats had a vote.


It’s more meaningful than you might think because the majority of people on earth probably aren’t decedents of this guy. Especially in the 1700’s when our first presidents where born.

Essentially zero US presidents are Asian, Hispanic, etc. Pick a Native American from 30 generations ago and you don’t see this kind of family tree. It’s an expanded circle of privilege through time.


Every European is related Charlamagne.

"In 2004 mathematical modeling and computer simulations ... indicated that our most recent common ancestor probably lived no earlier than 1400 B.C.and possibly as recently as A.D. 55. "

https://www.scientificamerican.com/article/humans-are-all-mo...


2004 - 800 is a lot longer than 1778(or even 2024) - 1200.


Sure, but that's probably still just sample bias of presidents all being at least mostly anglo-saxon. Probably everyone on the British Isles is related to King John.


Seems unlikely in the 1700’s we’re talking ~20 generations and a population of 8+ million. It’s more believable if we are talking genetics, but ancestry.com is working off of official records.


It's certainly a point, but I don't think "the US presidents are white guys" is going to win any awards for research or journalism. Even Obama, famously the first black president, has a family history of white guys.


Today most white people are descends of that guy, but go back to 1732 when George Washington was born and you’re looking at a much tighter family tree.


It's true, but knowing that Washington's great grandparents were English certainly cuts that down a fair bit.


Washington probably thought of himself as English, for that matter.


England already had 8 million people at the time, so were are still talking class within England not just anyone from England.


It's a short history, considering Obama's mother is white. Or if you're only counting guys, his maternal grandfather is white.


There were several US presidents who had modest beginnings. I think this is more a consequence of exponential growth. 2**30 is ~1 billion ancestors, so by the pigeonhole principle and some fairly weak assumptions on mixing, you can count nearly every member of your ethnic group from that time as a distant ancestor. I think this girl landed on Plantagenet mostly because he would have a well-documented lineage compared to your average serf.


~2^30 describes the situation today, not in the 1700’s. ~2^20 would only be a million or so people at the time this country was founded.

Give it another 250 years and we’ll probably have a largely Asian president that’s also a descendant. IE: It’s an expanding circle of political elites combined with an expanding circle of descendants.


Did you mean “that’s also a descendant?”


… and you ain’t on it.


> please don’t think of AMD as an x86 company, we are a computing company, and we will use the right compute engine for the right workload.

Love that quote.


Yep, I think that's the best quote of the interview and the runner up is the Napster quote.

People seems to have forgotten that Intel used to had StrongARM in their lineup, with the same logic Intel is a computing company not x86 company, similar to AMD [1].

For one of the latest trends in the computing landscape please check this new Embedded+ platform recently introduced by AMD [2]. I'm biased towards IoT but I've a very strong feeling that IoT is the killer application of AI, similar to AI is a killer application of HPC as the latter statement was mentioned by Lisa Su during the interview. The turning point for IoT however is yet to happen, it will probably happened when the number of machine nodes talking to directly to each other, surpassing the number of machine nodes talking to human, then it will be its ChatGPT moment.

[1] StrongARM:

https://en.wikipedia.org/wiki/StrongARM

[2] AMD Unveils Embedded+ Architecture; Combines Embedded Processors with Adaptive SoCs to Accelerate Time-to-Market for Edge AI Applications:

https://www.amd.com/en/newsroom/press-releases/2024-2-6-amd-...


Crazy that they need to write such things because incompetent investors think that ISA has huge implications


It has huge implications because it makes competing in that market much harder due to licensing issues. Intel and AMD have a duopoly on the x86 market which compromises a huge chunk of server and personal computing, but that is changing fast. If they go ARM (or risc-v or whatever) they will have more competition to contend with, including their existing cloud computing clients designing their own chips and fabbing them with other foundries.


I was going to correct you on the duopoly, but apparently the Cyrix/Centaur/Via agreement expired some time ago.


Perf/Energy implications


The shocker is: a lot of engineers think that the ISA has huge implications.


There was an interview with one of the SPARC creators who said that a huge benefit of control of the instruction set was the ability to take the platform in directions that (Intel) resellers could not.

He was otherwise largely agnostic on the benefits of SPARC.


ISA shouldn’t be more than a 10% or 20% performance impact or so, per jim Keller


What this means is that a better ISA might permanently be one generation ahead in terms of performance


That's not trivial...


ISAs do have huge implications. A poorly designed ISA can balloon the instruction count, which has a direct impact on program size and the call stack.


Investors dont care about those nerdy details

The talk is about perf and energy usage, where diff between x86 and arm is not even close to what they believe


It is good to mention the AMD's Steam Deck CPU [1] running in the Steam Deck [2] and not less important that the Steam Deck also has Linux (and KDE) incorporated [3].

[1] https://www.techpowerup.com/cpu-specs/steam-deck-cpu-lcd.c33...

[2] https://store.steampowered.com/steamdeck

[3] https://help.steampowered.com/en/faqs/view/671A-4453-E8D2-32...


I have endless admiration for Lisa Su, but lets be honest, the reason AMD and Nvidia are so big today is that Intel has had amazingly bad management since about 2003.

They massively botched the 32-bit to 64-bit transition....I did some work for the first Itanium, and everybody knew even before it came out that it would never show a profit. And they were contractually obligated to HP to not make 64-bit versions of the x86....so we just had to sit there and watch while AMD beat us to 1 Gigahertz, and had the 64-bit x86 market to itself....

When they fired Pat Gelsinger, their doom was sealed. Thank God they hired him back, but now they are in the same position AMD and Nvidia used to be in: Intel just has to wait for Nvidia and AMD to have bad management for two straight decades....


You're talking about very old events. Intel made mistakes during 2000-2005 (which allowed the rise of Opteron) but they crushed it from 2006-2016. Then they had different problems from 2016-2024.


Intel went into Rest & Vest mode after Haswell (4000 series) in 2013 - they hardly improved until 2020 - 6 lost generations of relabeled CPUs like 13000 - 14000 series (lots of money paid for innovative new igpu marketing names like 520 and 620!). They found out that community college talent does not make good employees in the C-Suite ... The iris pro 5200 (2013) wasn't improved upon until 2020 !


> They found out that community college talent does not make good employees in the C-Suite

This is new to me, what is this referring to?


This is probably wrong. The 10 nm problems were due to Intel being too ambitious, not "resting".


Thanks for making me feel old :-)

There was a palpable change when Andy Grove retired. It was like the elves leaving Middle earth.


I don't mean to take away from Intel's underwhelming management.

But regardless, Keller's Athlon 64 or Zen are great competitors.

Likewise, CUDA is Nvidia's massive achievement. The growth strategy of that product (involving lots of free engineer hours given to clients on-site) deserves credit.


// I don't mean to take away from Intel's underwhelming management

chuckle lets give full credit where credit is due :-)

Athlon was an epochal chip. Here's the thing though---if you are a market leader, one who was as dominant as Intel was, it doesn't matter what the competition does, you have the power to keep dominating them by doing something even more epochal.

That's why it can be so frustrating working for a #2 or #3 company....you are still expected to deliver epochal results like clockwork. But even if you do, your success is completely out of your hands. Bringing out epochal products doesn't get you ahead, it just lets you stay in the game. Kind of like the Red Queen in Alice in Wonderland. You have to run as fast as you can just to stay still.

All you can do is try to stay in the game long enough until the #1 company makes a mistake. If #1 is dominate enough, they can make all kinds of mistakes and still stay on top, just by sheer market inertia. Intel was so dominate that it took DECADES of back-to-back mistakes to lose its dominate position.

Intel flubbed the 32-64 bit transition. On the low end, it flubbed the desktop to mobile transition. On the high end, it flubbed the CPU-GPU transition.

Intel could have kept its dominate position if it had only flubbed one of them. But from 2002 to 2022, Intel flubbed every single transition in the market.

Its a measure of just how awesome Intel used to be that it took 20 years....but there's only so many of those that you can do back-to-back and still stay #1.


I began supporting AMD as my choice for a gaming CPU when they began trouncing Intel in terms of performance vs. total draw power with an attractive price point around 2016 or so.

Then, a wave of speculative execution vulnerabilities were discovered / disclosed, resulting in an even larger differential for performance and power use after the SPECTRE patches were applied.

Considering this, I'm not sure that it's fair to cast the successes of AMD as mere failures from Intel. Dr. Su is simply a brilliant engineer and business leader.


Well geez, Intel might have amazingly bad management but where on the scale do we put AMD not giving any fucks about challenging the CUDA monopoly for going on 10 years now?

Instead they put mediocre people on things like OpenCL which even university students forced to use the mess could tell was going nowhere.


AMD basically did not have any money from 2010 until 2020. They were teetering near death. No money for r&d no money for new features in their gpus no money no money no money. There excavator architecture was extremely foolish trying to support multiple instruction dispatches with only 1 ALU and FPU per core! This was corrected with Ryzen and the last 4 years they have been able to pay off their massive debts, especially with the 5000 series!

Going forward I expect we will see much more innovation from them because now they have some spare cash to spend on real innovations hopefully in hardware AND software! Note that Intel is much worse than AMD in software!


I have endless admiration for Lisa Su, but lets be honest, the reason Nvidia is so big today is that AMD has had amazingly bad management.


Was I the only one who was expecting Lisa Su to attempt Leetcode hard in this interview?


It is an interview with a CEO - expect platitudes and fortune cookie wisdom that contradict their own (corporate) behaviours.


As an IC this is exactly what makes 'fireside chats' with executives so unsatisfying.


There's also never an actual fire. Like at least put a video of a fireplace on in the background.


It's called "fireside chat" because of the layoffs. :-/


For the people downvoting this joke: the joke is that "fireside" means "the side of the people who are doing the firing". I think it's funny.


Not really. I wouldn't classify leetcode as "hard problems". Maybe dumb problems that don't really help anyone, but no, not "hard problems"


Sounds like you have beef with leetcode. I think that comment was referencing the problems on leetcode tagged with the "hard" label.


This article inspired me to check out the respective market caps for Intel and AMD. Hell of a turnaround! I remember the raging wars between the Pentiums and the Athlons. Intel won. The GPU wars between Nvidia and ATI; Nvidia won. Thereafter AMD, the supposed loser to Intel, absorbed ATI, the supposed loser to Nvidia. But I love that the story didn't end there. Look at what AMD did with Sony PlayStation (extensively discussed in this interview)...and that's without getting into the contemporary GPU AI revolution that's likely driving AMD's $250+ billion market cap. Epic!


And yet they still can't solve the problem of their GPU driver/software stack for ML being much worse than NVidia's. It seems like the first step is easy: pay more for engineers. AMD pays engineers significantly less than NVidia, and it's presumably quite hard to build a competitive software stack while paying so much less. You get what you pay for.


Everyone does software poorly, hardware companies more so.


Well this is glaringly obvious to whole world, and Nvidia managed to get it right. Surely a feat that can be repeated elsewhere when enough will is spread over some time. And it would make them grow massively, something no shareholder ever frowns upon.


> Nvidia managed to get it right

I don't think they did. If you work in the space and watched it develop over the years, you can see that there's been (and still are) plenty of jank and pain points to be found. Their spread mostly comes from their dominant market position before, afaik.


They succeeded not because they are perfect but because they are the least bad. By far.


Also CUDA first released in 2007, I remember researching GPUs at the time and wondering what the hell CUDA was (I was a teenager). They were VERY early to the party and has A LOT of time to improve. Everyone is catching up to ~10 years of a headstart.


Or people are just used to the same kind of bad


Exactly. ROCm is only available for the top tier RX 7900 GPUs and you’re expected to run Linux.

AMD was (is) tinkering with a translation layer for CUDA, much like how WINE translates directX. Great idea but it’s been taking a while in this fast paced market.


> AMD was (is) tinkering with a translation layer for CUDA

From what I understand, they dropped the contract with the engineer who was working on it.

Fortunately, as part of the contract, said engineer stipulated that the project would become open source, so now it is, and is still being maintained by that engineer.


> ROCm is only available for the top tier RX 7900 GPUs and you’re expected to run Linux.

Fixed it for you: ROCm is only officially supported for the top tier RX 7900 GPUs and you’re expected to run Linux.

Desktop class cards work if you apply a "HSA version override".


Cool. I was thinking of getting a 7800 XT over a 4070. Hope I can get llama 70B working nearly as well.


Came here to say this. They only just recently got an AMD GPU on MLPerf thanks to a (different company), Tinycorp by George Hotz. I guess basic ML performance is too hard a problem.


I dunno, it a world where hardware companies like, sold hardware, and then software companies wrote software and sold that could be pretty nice. It is cool that Hotz is doing something other than contribute to an anticompetitive company’s moat.


A couple of thoughts here.

* AMD's traditional target market for its GPUs has been HPC as opposed to deep learning/"AI" customers.

For example, look at the supercomputers at the national labs. AMD has won quite a few high profile bids with the national labs in recent years:

- Frontier (deployment begun in 2021) (https://en.wikipedia.org/wiki/Frontier_(supercomputer)) - used at Oak Ridge for modeling nuclear reactors, materials science, biology, etc.

- El Capitan (2023) (https://en.wikipedia.org/wiki/El_Capitan_(supercomputer)) - Livermore national lab

AMD GPUs are pretty well represented on the TOP500 list (https://top500.org/lists/top500/list/2024/06/), which tends to feature computers used by major national-level labs for scientific research. AMD CPUs are even moreso represented.

* HPC tends to focus exclusively on FP64 computation, since rounding errors in that kind of use-case are a much bigger deal than in DL (see for example https://hal.science/hal-02486753/document). NVIDIA innovations like TensorFloat, mixed precision, custom silicon (e.g., the "transformer engine") are of limited interest to HPC customers. It's no surprise that AMD didn't pursue similar R&D, given who they were selling GPUs to.

* People tend to forget that less than a decade ago, AMD as a company had a few quarters of cash left before the company would've been bankrupt. When Lisa Su took over as CEO in 2014, AMD market share for all CPUs was 23.4% (even lower in the more lucrative datacenter market). This would bottom out at 17.8% in 2016 (https://www.trefis.com/data/companies/AMD,.INTC/no-login-req...).

AMD's "Zen moment" didn't arrive until March 2017. And it wasn't until Zen 2 (July 2019), that major datacenter customers began to adopt AMD CPUs again.

* In interviews with key AMD figures like Mark Papermaster and Forrest Norrod, they've mentioned how in the years leading up to the Zen release, all other R&D was slashed to the bone. You can see (https://www.statista.com/statistics/267873/amds-expenditure-...) that AMD R&D spending didn't surpass its previous peak (on a nominal dollar, not even inflation-adjusted, basis) until 2020.

There was barely enough money to fund the CPUs that would stop the company from going bankrupt, much less fund GPU hardware and software development.

* By the time AMD could afford to spend on GPU development, CUDA was the entrenched leader. CUDA was first released in 2003(!), ROCm not until 2016. AMD is playing from behind, and had to make various concessions. The ROCm API is designed around CUDA API verbs/nouns. AMD funded ZLUDA, intended to be a "translation layer" so that CUDA programs can run as a drop-in on ROCm.

* There's a chicken-and-egg problem here.

1) There's only one major cloud (Azure) that has ready access to AMD's datacenter-grade GPUs (the Instinct series).

2) I suspect a substantial portion of their datacenter revenue still comes from traditional HPC customers, who have no need for the ROCm stack.

3) The lack of a ROCm developer ecosystem means that development and bug fixes come much slower than they would for CUDA. For example, the mainline TensorFlow release was broken on ROCm for a while (you had to install the nightly release).

4) But, things are improving (slowly). ROCm 6 works substantially better than ROCm 5 did for me. PyTorch and TensorFlow benchmark suites will run.

Trust me, I share the frustration around the semi-broken state that ROCm is in for deep learning applications. As an owner of various NVIDIA GPUs (from consumer laptop/desktop cards to datacenter accelerators), in 90% of cases things just work on CUDA.

On ROCm, as of today it definitely doesn't "just work". I put together a guide for Framework laptop owners to get ROCm working on the AMD GPU that ships as an optional add-in (https://community.frame.work/t/installing-rocm-hiplib-on-ubu...). This took a lot of head banging, and the parsing of obscure blogs and Github issues.

TL;DR, if you consider where AMD GPUs were just a few years ago, things are much better now. But, it still takes too much effort for the average developer to get started on ROCm today.


Summary: AMD works if you spend 500m USD+ with them. Then they'll throw an army of their own software engineers into the contract who will hold your hand every step of the way, and remove all the jank for you. By contrast, since at least 10 years ago, I could buy any GTX card and CUDA worked out of the box, and that applied right down to a $99 Jetson Nano.

AMD's strategy looks a lot like IBM's mainframe strategy of the 80s. And that didn't go well.


No, not really?

The customers at the national labs are not going to be sharing custom HPC code with AMD engineers, if for no other reason than security clearances. Nuclear stockpile modeling code, or materials science simulations are not being shared with some SWE at AMD. AMD is not “removing jank”, for these customers. It’s that these customers don’t need a modern DL stack.

Let’s not pretend like CUDA works/has always worked out of the box. There’s forced obsolescence (“CUDA compute capability”). CUDA didn’t even have backwards compatibility for minor releases (.1,.2, etc.) until version 11.0. The distinction between CUDA, CUDA toolkit, CUDNN, and the actual driver is still inscrutable to many new devs (see the common questions asked on r/localLlama and r/StableDiffusion).

Directionally, AMD is trending away from your mainframe analogy.

The first consumer cards got official ROCm support in 5.0. And you have been able to run real DL workloads on budget laptop cards since 5.4 (I’ve done so personally). Developer support is improving (arguably too slowly), but it’s improving. Hugging Face, Cohere, MLIR, Lamini, PyTorch, TensorFlow, DataBricks, etc all now have first party support for ROCm.


> customers at the national labs are not going to be sharing custom HPC code with AMD engineers

There are several co-design projects in which AMD engineers are interacting on a weekly basis with developers of these lab-developed codes as well as those developing successors to the current production codes. I was part of one of those projects for 6 years, and it was very fruitful.

> I suspect a substantial portion of their datacenter revenue still comes from traditional HPC customers, who have no need for the ROCm stack.

HIP/ROCm is the prevailing interface for programming AMD GPUs, analogous to CUDA for NVIDIA GPUs. Some projects access it through higher level libraries (e.g., Kokkos and Raja are popular at labs). OpenMP target offload is less widespread, and there are some research-grade approaches, but the vast majority of DOE software for Frontier and El Capitan relies on the ROCm stack. Yes, we have groaned at some choices, but it has been improving, and I would say the experience on MI-250X machines (Frontier, Crusher, Tioga) is now similar to large A100 machines (Perlmutter, Polaris). Intel (Aurora) remains a rougher experience.


> The customers at the national labs are not going to be sharing custom HPC code with AMD engineers, if for no other reason than security clearances. Nuclear stockpile modeling code, or materials science simulations are not being shared with some SWE at AMD. AMD is not “removing jank”, for these customers.

I work closely with OLCF and Frontier (I have a job running on Frontier right now). This is incorrect. The overwhelming majority of compute and resource allocation are not "nuclear stockpile modeling code" projects or anything close to it. AMD often gets directly involved with various issues (OLCF staff has plenty of stories about this). I know because I've spoken with them and AMD.

Speaking of Frontier, you get fun things like compiling an AWS project just to get RCCL to kind of work decently with Slingshot interconnect via libfabric[0] vs NCCL that "just works", largely due to Nvidia's foresight with their acquisition of Mellanox over five years ago.

> Let’s not pretend like CUDA works/has always worked out of the box.

It is and has been miles beyond the competition and that's clearly all you need. Nvidia has > 90% market share and is worth ~10x AMD. 17 years of focus and investment (30% of their R&D spend is software) when your competitors are wandering all over the place in fits and starts will do that. I'm also of the personal opinion that AMD just doesn't have software in their DNA and don't seem to understand that people don't want GPUs, they want solutions that happen to work best on GPUs and that entails broad and significant investment in the accompanying software stacks.

AMD has truly excellent hardware that is significantly limited by their lack of investment in software.

> There’s forced obsolescence (“CUDA compute capability”).

Compute capability is why code targeting a given lineage of hardware just works. You can target 8.0 (for example) and as long as your hardware is 8.0 it will run on anything with Nvidia stamped on it from laptop to Jetson to datacenter and the higher-level software doesn't know the difference (less VRAM, which is what it is). Throw in "+PTX" when building and it will run on anything up too (albeit not taking full advantage of new hardware). With official support, without setting various environment variable and compiler hacks to end up with code that often randomly crashes (I know from personal experience). It is extremely common for projects to target SM 7.x, 8.x and 9.x. The stack just figures it out from there.

This is the PTX intermediary available with CUDA and the driver that makes this possible, where in AMD land you have some pretty drastic differences within CDNA or RDNA families not to mention CDNA vs RDNA in the first place.

IMO it's an elegant solution that works and makes it simple, even more so than CPUs (AVX, etc). How would you suggest they divide something like eight year old Pascal vs Blackwell? In terms of obsolescence, Pascal is a great example - it's supported by up to and including latest drivers, CUDA 12, and everything in their frameworks support matrix[1] of which AMD doesn't have an equivalent. Like we saw with CUDA 11, CUDA 12 will be supported by major projects for years, resulting in at least a decade of support for Pascal. Please show me an AMD GPU with even eight years of support. Back to focus, ROCm isn't even that old and AMD is infamous for removing support for GPUs, often within five years if not less.

> CUDA didn’t even have backwards compatibility for minor releases (.1,.2, etc.) until version 11.0.

Yes but they have it and CUDA 11 is four years old. They also do nice things like when they added Hopper support in 11.7 so on the day of release it "just worked" with whatever you were already running (PTX again). Same for their consumer GPUs, it "just works" the day of release. AMD took over a year to officially support their current flagship desktop GPU (7900 XTX) and even that is dicey in practice due to CDNA vs RDNA. Even when they did they were doing bizarre things like supporting Python 3.10 with ROCm 5.7 docker containers and Python 3.9 in ROCm 6 docker containers for the first few months.

Python 3.10 is pretty much the de-facto standard for these stacks, cue my surprise when I was excited for ROCm 6 only to find out Python code with popular projects was blowing up all over the place because 3.9. It just screams "we don't get this".

> The distinction between CUDA, CUDA toolkit, CUDNN, and the actual driver is still inscrutable to many new devs (see the common questions asked on r/localLlama and r/StableDiffusion).

Yes, and AMD has direct equivalents that are even less clear. The reddit communities you mention are not the best examples (I would not call those users "devs"). Even so, look at any post of someone coming along asking what hardware to buy. The responses are overwhelmingly "AMD is a world of pain, if you want for it to just work buy Nvidia". IMO the only "AMD is fine, don't believe the FUD" responses are an effect of the cult-like "team red vs team green" bleeding over from hobbyist/gamer subs on Reddit because it's just not accurate. I don't know a single dev or professional in the space (who's livelihood depends on it) who agrees.

They will also often point out that due to significantly better software AMD hardware is often bested by previous generation Nvidia hardware with dramatically inferior paper specs [2]. I like to say that AMD is at the "get it to work" stage while Nvidia and the broader CUDA ecosystem has been at the "squeeze every last penny out of it" stage for many years.

> And you have been able to run real DL workloads on budget laptop cards since 5.4 (I’ve done so personally).

Depends on what you mean by "real DL workloads". Vanilla torch? Yes. Then start looking at flash attention, triton, xformers, and production inference workloads...

> Developer support is improving (arguably too slowly), but it’s improving.

Generally agree but back to focus and discipline it's a shame that it took a massive "AI" goldrush over the past ~18 months for them to finally take it vaguely seriously. Now you throw in the fact that Nvidia has absurdly more resources, their 30% R&D spend on software is going to continue to rocket CUDA ahead of ROCm.

For Frontier and elsewhere I really want AMD to succeed, I just don't think it does them (or anyone) any favors by pretending that all is fine in ROCm land.

[0] - https://www.olcf.ornl.gov/wp-content/uploads/OLCF_AI_Trainin...

[1] - https://docs.nvidia.com/deeplearning/frameworks/support-matr...

[2] - https://www.reddit.com/r/LocalLLaMA/comments/191srof/amd_rad...


(Split into two parts due to comment length restrictions)

> I work closely with OLCF and Frontier (I have a job running on Frontier right now). This is incorrect. The overwhelming majority of compute and resource allocation are not "nuclear stockpile modeling code" projects or anything close to it. AMD often gets directly involved with various issues (OLCF staff has plenty of stories about this). I know because I've spoken with them and AMD.

I don't have any experience running a job on one of these national supercomputers, so I'll defer to you on this. (Atomic Canyon looks very cool!)

Just two follow-ups then: is it the case that any job, small or large, enjoys this kind of AMD optimization/debugging support? Does your typical time-grant/node-hour academic awardee get that kind of hands-on support?

And, for nuclear modeling (be it weapons or civilian nuclear), do you know if AMD engineers can get involved? (https://insidehpc.com/2023/02/frontier-pushes-boundaries-86-... this article claims "86% of nodes" were used on at least one modeling run, which I imagine is among the larger jobs)

> It is and has been miles beyond the competition and that's clearly all you need. Nvidia has > 90% market share and is worth ~10x AMD. 17 years of focus and investment (30% of their R&D spend is software) when your competitors are wandering all over the place in fits and starts will do that.

No dispute here that NVIDIA is the market leader today, deservedly so. NVIDIA to its credit has invested in CUDA for many years, even when it wasn't clear there was an immediate ROI.

But, I bristle at the narrative fallacy that it was some divine inspiration and/or careful planning (“focus”) that made CUDA the perfect backbone for deep learning.

In 2018, NVIDIA was chasing crypto mining, and felt the need to underplay (i.e., lie) to investors about how large that segment was (https://wccftech.com/nvidia-sued-cryptocurrency-mining-reven...). As late as 2022, NVIDIA was diverting wafer supply from consumer, professional, and datacenter GPUs to produce crippled "LHR" mining cards.

Jensen has at various points pumped (during GTC and other high profile events):

- Ray tracing (2018) (https://www.youtube.com/watch?v=95nphvtVf34)

- More ray tracing (2019) (https://youtu.be/Z2XlNfCtxwI)

- "Omniverse" (2020) https://youtu.be/o_XeGyg2NIo?list=PLZHnYvH1qtOYOfzAj7JZFwqta...)

- Blockchain, NFTs, and the metaverse (2021) (https://cointelegraph.com/news/nvidia-ceo-we-re-on-the-cusp-...) (https://blockonomi.com/nvidiz-ceo-talks-crypto-nfts-metavers...)

- ETH (2021) (https://markets.businessinsider.com/currencies/news/nvidia-c...)

- "Omniverse"/digital twins (2022) (https://www.youtube.com/watch?v=PWcNlRI00jo)

- Autonomous vehicles (2022) (https://www.youtube.com/watch?v=PWcNlRI00jo)

Most of these predictions about use cases have not panned out at all. The last GTC keynote prior to the "ChatGPT moment" took place just 2 months before the general availability of ChatGPT. And, if you click through to the video, you'll see that LLMs got under 7 minutes of time at the very end of a 90 minute keynote. Clearly, Jensen + NVIDIA leadership had no idea that LLMs would get the kind of mainstream adoption/hype that they have.

On the business side, it hasn't exactly always been a smooth ride for NVIDIA either. In Q2 2022 (again right before the "ChatGPT moment"), the company missed earnings estimates by 18%(!) due to inventory writedowns (https://www.pcworld.com/article/828754/nvidia-preannounces-l...).

The end markets that Jensen forecasts/predicts on quarterly earnings calls (I’ve listened to nearly every one for the last decade) are comically disconnected from what ends up happening.

It's a running joke among buy-side firms that there'll always be an opportunity to buy the NVDA dip, given the volatility of the company's performance + stock.

NVIDIA's "to the moon" run as a company is due in large part to factors outside of its design or control. Of course, how large is up for debate.

If/when it turns out that most generative products can't turn a profit, and NVIDIA revenues decline as a result, it wouldn't be fair to place the blame for the collapse of those end markets at NVIDIA’s feet. Similarly, the fact that LLMs and generative AI turned out to be hit use cases has little to do with NVIDIA's decisions.

AMD is a company that was on death’s door until just a few years ago (2017). It made one of the most incredible corporate comebacks in the history of capitalism on the back of its CPUs, and is now dipping its toes into GPUs again.

NVIDIA had a near-monopoly on non-console gaming. It parlayed that into a dominant software stack.

It’s possible to admire both, without papering over the less appealing aspects of each’s history.

> Depends on what you mean by "real DL workloads". Vanilla torch? Yes. Then start looking at flash attention, triton, xformers, and production inference workloads...

As I mentioned above, this is a chicken-and-egg phenomenon with the developer ecosystem. I don't think we really disagree.

CUDA is an "easy enough" GPGPU backbone that due to incumbency and the lack of real competition from AMD and Intel for a decade led to the flourishing of a developer ecosystem.

Tri Dao (sensibly) decided to write his original Flash Attention paper with an NVIDIA focus, for all the reasons you and I have mentioned. Install base size, ease of use of ROCm vs CUDA, availability of hardware on-prem & in the cloud, etc.

Let's not forget that Xformers is a Meta project, and that non-A100 workloads (i.e., GPUs without 8.0 compute capability) were not officially supported by Meta for the first year of Xformers (https://github.com/huggingface/diffusers/issues/2234) (https://github.com/facebookresearch/xformers/issues/517#issu...). This is the developer ecosystem at work.

AMD right now is forced to put in the lion's share of the work to get a sliver of software parity. It took years to get mainline PyTorch and Tensorflow support for ROCm. The lack of a ROCm developer community (hello chicken and egg), means that AMD ends up being responsbile for first-party implementations of most of the hot new ideas coming from research.

Flash Attention for ROCm does exist (https://github.com/ROCm/flash-attention) (https://llm-tracker.info/howto/AMD-GPUs#flash-attention-2), albeit only on a subset of cards.

Triton added (initial) support for ROCm relatively recently (https://github.com/triton-lang/triton/pull/1983).

Production-scale LLM inference is now entirely possible with ROCm, via first-party support for vLLM (https://rocm.blogs.amd.com/artificial-intelligence/vllm/READ...) (https://community.amd.com/t5/instinct-accelerators/competiti...).

> Compute capability is why code targeting a given lineage of hardware just works. You can target 8.0 (for example) and as long as your hardware is 8.0 it will run on anything with Nvidia stamped on it from laptop to Jetson to datacenter and the higher-level software doesn't know the difference (less VRAM, which is what it is).

This in theory is the case. But, even as an owner of multiple generations of NVIDIA hardware, I find myself occasionally tripped up.

Case in point:

RAPIDS (https://rapids.ai/) is one of the great non-deep learning success stories to come out of CUDA, a child of the “accelerated computing” push that predates the company’s LLM efforts. The GIS and spatial libraries are incredible.

Yet, I was puzzled when earlier this year I updated cuSpatial to the newest available version (24.02) (https://github.com/rapidsai/cuspatial/releases/tag/v24.02.00) via my package manager (Mamba/Conda), and started seeing pretty vanilla functions start breaking on my Pascal card. Logs indicated I needed a Volta card (7.0 CC or newer). They must've reimplemented certain functions altogether.

There’s nothing in the release notes that indicates this bump in minimum CC. The consumer-facing page for RAPIDS (https://rapids.ai/) has a mention under requirements.

So I’m led to wonder, did the RAPIDS devs themselves not realize that certain dependencies experienced a bump in CC?


(Part 2 of 2)

> Please show me an AMD GPU with even eight years of support. Back to focus, ROCm isn't even that old and AMD is infamous for removing support for GPUs, often within five years if not less.

As you yourself noted, CDNA vs RDNA makes things more complicated in AMD land. I also think it’s unfair to ask about “eight years of support” when the first RDNA card didn’t launch until 2019, and the first CDNA “accelerator” in 2020.

The Vega and earlier generation is so fundamentally different that it would’ve been an even bigger lift for the already small ROCm team to maintain compatibility.

If we start seeing ROCm removing support for RDNA1 and CDNA1 cards soon, then I’ll share your outrage. But I think ROCm 6 removing support for Radeon VII was entirely understandable.

> Generally agree but back to focus and discipline it's a shame that it took a massive "AI" goldrush over the past ~18 months for them to finally take it vaguely seriously. Now you throw in the fact that Nvidia has absurdly more resources, their 30% R&D spend on software is going to continue to rocket CUDA ahead of ROCm.

> For Frontier and elsewhere I really want AMD to succeed, I just don't think it does them (or anyone) any favors by pretending that all is fine in ROCm land.

The fact is that the bulk of AMD profits is still coming from CPUs, as it always has. AMD wafer allotment at TSMC has to first go towards making its hyperscaler CPU customers happy. If you promise AWS/Azure/GCP hundreds of thousands of EPYC CPUs, you better deliver.

I question how useful it is to dogpile (not you personally, but generally) on AMD, when the investments in people and dollars are trending in the right decision. PyTorch and TensorFlow were broken on ROCm until relatively recently. Now that they work, you (not unreasonably) ask where the other stuff is.

The reality is that NVIDIA will likely forever be the leader with CUDA. I doubt we’ll ever see PhD students and university labs making ROCm their first choice when having to decide where to conduct career-making/breaking research.

But, I don’t think it’s really debatable that AMD is closing the relative gap, given the ROCm ecosystem didn’t exist until at all relatively recently. I’m guessing the very credible list of software partners now at least trying ROCm (https://www.amd.com/en/corporate/events/advancing-ai.html#ec...) are not committing time + resources to an ecosystem that they see as hopeless.

---

Final thoughts:

A) It was completely rational for AMD to focus on devoting the vast majority of R&D spend to its CPUs (particularly server/EPYC), particularly after the success of Zen. From the day that Lisa Su took over (Oct 8, 2014), the stock is up 50x+ (even more earlier in 2024), not that share price is reflective of value in the short term. AMD revenue for calendar year 2014 was $5.5B, operating income negative 155 million. Revenue for 2023 was $22.68B, operating income $401 million. Operating income was substantially higher in 2022 ($1.2B) and 2021 ($3.6B), but AMD has poured that money into R&D spending (https://www.statista.com/statistics/267873/amds-expenditure-...), as well as the Xilinx acquisition.

B) It was completely rational for NVIDIA to build out CUDA, as a way to make it possible to do what they initially called "scientific computing" and eventually "GPU-accelerated computing". There's also the reality that Jensen, the consummate hype man, had to sell investors a growth story. The reality is that gaming will always be a relatively niche market. Cloud gaming (GeForce Now) never matched up to revenue expectations.

C) It’s difficult for me to identify any obvious “points of divergence” that in an alternate history would’ve led to better outcomes with AMD. Without the benefit of “future knowledge”, at what point should AMD have ramped up ROCm investment? Given, as I noted above, in the months before ChatGPT went viral, Jensen’s GTC keynote gave only a tiny mention to LLMs.

D) If anything, the company that missed out was Intel. Beyond floundering on the transition from 14nm to 10nm (allowing TSMC and thus AMD to surpass them), Intel wasted its CPU-monopoly years and the associated profits. Projects like Larrabee (https://www.anandtech.com/show/3738/intel-kills-larrabee-gpu...) and Xe (doomed in part by internal turf wars) (https://www.tomshardware.com/news/intel-axes-xe-hp-gpus-for-...) were killed off. R&D spending was actually comparable to the amount spent on share buybacks in 2011 (14.1B in buybacks vs 8.3B in R&D spending), 2014 (10.7B vs 11.1B), 2018 (10.8B vs 13.B), 2019 (13.5B vs 13.3B) and 2020 (14.1B vs 13.55B). (See https://www.intc.com/stock-info/dividends-and-buybacks and https://www.macrotrends.net/stocks/charts/INTC/intel/researc...).


lol AMD flogged its floundering foundry waaay before Intel ran into any problems.

in fact most of your points about AMD's lack of dough can be traced back to that disaster. The company wasn't hit by some meteorite. It screwed up all by itself.

Then lucky it had that duopolistic X86 licence to lean on or it would have gone the way of Zilog or Motorola. 'Cos it sure can't rely on its janky compute offering.


Assuming you're not just here to troll (doubtful given your comment history, but hey I'm feeling generous):

> lol AMD flogged its floundering foundry waaay before Intel ran into any problems.

Not wanting/being able to spend to compete on the leading edge nodes is an interesting definition of "floundering". Today there is exactly 1 foundry in the world that's on that leading edge, TSMC. We'll see how Intel Foundry works out, but they're years behind their revenue/ramp targets at this point.

It's fairly well known that Brian Krzanich proposed spinning out Intel's foundry operations, but the board said no.

The irony is that trailing edge fabs are wildly profitable, since the capex is fully amortized. GloFo made $1 billion in net income in FY2023.

> in fact most of your points about AMD's lack of dough can be traced back to that disaster. The company wasn't hit by some meteorite. It screwed up all by itself

Bulldozer through Excavator were terrible architectures. What does this have to do with what's now known as Global Foundries?

GloFo got spun out with Emirati money in March 2009. Bulldozer launched in Q4 2011. What's the connection?

AMD continued to lose market share (and was unprofitable) for years after the foundry was spun out. Bad architectural choices, and bad management, sure. Overpaying for ATI, yep. "Traced back" to GloFo? How?

> Then lucky it had that duopolistic X86 licence to lean on or it would have gone the way of Zilog or Motorola. 'Cos it sure can't rely on its janky compute offering.

"Janky" when? "Rely" implies present tense. You're saying AMD compute offerings are janky today?


Small correction: CUDA was first released in 2007 and of course Nvidia was also aiming at HPC before the AlexNet moment.


Good summary. There was also the 2010's multivendor HSA and OpenCL software evolution directions that ended up losing other vendors on the way and many customers turned out to accept the proprietary Cuda.


And yet people seem to work just fine with ML on AMD GPUs when they aren’t thinking about Jensen.


I have a 7900 XTX. There's a known firmware crash issue with ComfyUI. It's been reported like a year ago. Every rocm patch release I check the notes, and every release it goes unfixed. That's not to go into the intense jank that is the rocm debian repo. If we need DL at work, I'll recommend Nvidia, no question.


Which AMD GPUs? Most consumer AMD GPUs don't even support ROCm.


Debian, Arch and Gentoo have ROCm built for consumer GPUs. Thus so do their derivatives. Anything gfx9 or later is likely to be fine and gfx8 has a decent chance of working. The https://github.com/ROCm/ROCm source has build scripts these days.

At least some of the internal developers largely work on consumer hardware. It's not as solid as the enterprise gear but it's also very cheap so overall that seems reasonable to me. I'm using a pair of 6900XT, with a pair of VII's in a backup machine.

For turn key proprietary stuff where you really like the happy path foreseen by your vendor, in classic mainframe style, team green is who you want.


> For turn key proprietary stuff where you really like the happy path foreseen by your vendor

there really was no way for AMD to foresee that people might want to run GPGPU workloads on their polaris cards? isn't that a little counterfactual to the whole OpenCL and HSA Framework push predating that?

Example: it's not that things like Bolt didn't exist to try and compete with Thrust... it's that the NVIDIA one has had three updates in the last month and Bolt was last updated 10 years ago.

You're literally reframing "having working runtime and framework support for your hardware" as being some proprietary turnkey luxury for users, as well as an unforeseeable eventuality for AMD. It wasn't a development priority, but users do like to actually build code that works etc.

That's why you got kicked to the curb by Blender - your OpenCL wasn't stable even after years of work from them and you. That's why you got kicked to the curb by Octane - your Vulkan Compute support wasn't stable enough to even compile their code successfully. That's the story that's related by richg42 about your OpenGL driver implementation too - that it's just paper features and resume-driven development by developers 10 years departed all the way down.

The issues discussed by geohotz aren't new, and they aren't limited to ROCm or deep learning in general. This is, broadly speaking, the same level of quality that AMD has applied to all its software for decades. And the social-media "red team" loyalism strategy doesn't really work here, you can't push this into "AMD drivers have been good for like 10 years now!!!" fervor when the understanding of the problems are that broad and that collectively shared and understood. Every GPGPU developer who's tried has bounced off this AMD experience for literally an entire generation running now. The shared collective experience is that AMD is not serious in the field, and it's difficult to believe it's a good-faith change and interest in advancing the field rather than just a cashgrab.

It's also completely foreseeable that users want broad, official support for all their architectures, and not one or two specifics etc. Like these aren't mysteries that AMD just accidentally forgot about, etc. They're basic asks that you are framing as "turnkey proprietary stuff", like a working opencl runtime or a working spir-v compiler.

What was it linus said about the experience of working with NVIDIA? That's been the experience of the GPGPU community working with AMD, for decades. Shit is broken and doesn't work, and there's no interest in making it otherwise. And the only thing that changed it is a cashgrab, and a working compiler/runtime is still "turnkey proprietary stuff" they have to be arm-twisted into doing by Literally Being Put On Blast By Geohotz Until It's Fixed. "Fuck you, AMD" is a sentiment that there is very valid reasons to feel given the amount of needless suffering you have generated - but we just don't do that to red team, do we?

But you guys have been more intransigent about just supporting GPGPU, no matter what framework, please just pick one, get serious and start working already than NVIDIA ever was about wayland etc. You've blown decades just refusing to ever shit or get off the pot (without even giving enough documentation for the community to just do it themselves). And that's not an exaggeration - I bounced off the AMD stack in 2012, and it wasn't a new problem then either. It's too late for "we didn't know people wanted a working runtime or to develop on gaming cards" to work as an excuse, after decades of overt willing neglect it's just patronizing.

Again, sorry, this is ranty, it's not that I'm upset at you personally etc, but like, my advice as a corporate posture here is don't go looking for a ticker-tape parade for finally delivering a working runtime that you've literally been advertising support for for more than a decade like it's some favor to the community. These aren't "proprietary turnkey features" they're literally the basics of the specs you're advertising compliance with, and it's not even just one it's like 4+ different APIs that have this problem with you guys that has been widely known, discussed in tech blogs etc for more than a decade (richg42). I've been saying it for a long time, so has everyone else who's ever interacted with AMD hardware in the GPGPU space. Nobody there cared until it was a cashgrab, actually half the time you get the AMD fan there to tell you the drivers have been good for a decade now (AMD cannot fail, only be failed). It's frustrating. You've poisoned the well with generations of developers, with decades of corporate obstinance that would make NVIDIA blush, please at least have a little contrition about the whole experience and the feelings on the other side here.


You're saying interesting things here. It's not my perspective but I can see how you'd arrive at it. Worth noting that I'm an engineer writing from personal experience, the corporate posture might be quite divergent from this.

I think Cuda's GPU offloading model is very boring. An x64 thread occasionally pushes a large blob of work into a stream and sometime later finds out if it worked. That does however work robustly, provided you don't do anything strange from within the kernel. In particular allocating memory on the host from within the kernel deadlocks the kernel unless you do awkward things with shuffling streams. More ambitious things like spawning a kernel from a kernel just aren't available - there's only a hobbled nested lifetime thing available. The volta threading model is not boring but it is terrible, see https://stackoverflow.com/questions/64775620/cuda-sync-funct...

HSA puts the x64 cores and the gpu cores on close to equal footing. Spawning a kernel from a kernel is totally fine and looks very like spawning one from the host. Everything is correctly thread safe so calling mmap from within a kernel doesn't deadlock things. You can program the machine as a large cluster of independent cores passing messages to one another. For the raw plumbing, I wrote https://github.com/jonchesterfield/hostrpc. That can do things like have an nvidia card call a function on an amd one. That's the GPU programming model I care about - not passing blobs of floating point math onto some accelerator card, I want distributed graph algorithms where the same C++ runs on different architectures transparently to the application. HSA lends itself to that better than Cuda does. But it is rather bring your own code.

That is, I think the more general architecture amdgpu is shipping is better than the specialised one cuda implements, despite the developer experience being rather gnarlier. I can't express the things I want to on nvptx at all so it doesn't matter much that simpler things would work more reliably.

Maybe more relevant to your experience, I can offer some insight into the state of play at AMD recently and some educated guesses at the earlier state. ATI didn't do compute as far as I know. Cuda was announced in 2006, same year AMD acquired ATI. Intel Core 2 was also 2006 and I remember that one as the event that stopped everyone buying AMD processors. Must have been an interesting year to be in semiconductors, was before my time. So in the year cuda appears, ATI is struggling enough to be acquired, AMD mortgages itself to the limit to make the acquisition and Intel obsoletes AMD's main product.

I would guess that ~2007 marked the beginning of the really bad times for AMD. Even if they could guess what cuda would become they were in no position to do anything about it. There is scar tissue still evident from that experience. In particular, the games console being the breadwinner for years can be seen in some of the hardware decisions, and I've had an argument with someone whose stance was that semi-custom doesn't need a feature so we shouldn't do it.

What turned the corner is the DoE labs being badly burned by reliance on a single vendor for HPC. AMD proposed a machine which looks suspiciously like a lot of games consoles with the power budget turned way up and won the Frontier bid with it. That then came with a bunch of money to try to write some software to run on it which in a literal sense created the job opening I filled five years back. Intel also proposed a machine which they've done a hilariously poor job of shipping. So now AMD has built a software stack which was razor focused on getting the DoE labs to sign the cheques for functionally adequate on the HPC machines. That's probably the root of things like the approved hardware list for ROCm containing the cards sold to supercomputers and not so much the other ones.

It turns out there's a huge market opportunity for generative AI. That's not totally what the architecture was meant to do but whatever, it likes memory bandwidth and the amdgpu arch does do memory bandwidth properly. The rough play for that seems to be to hire a bunch of engineers and buy a bunch of compiler consultancies and hope working software emerges from that process, which in fairness does seem to be happening. The ROCm stack is irritating today but it's a whole different level of QoI relative to before the Frontier bring up.

Note that there's no apology nor contrition here. AMD was in a fight to survive for ages and rightly believed that R&D on GPU compute was a luxury expense. When a budget to make it work on HPC appeared it was spent on said HPC for reasonable fear that they wouldn't make the stage gates otherwise. I think they've done the right thing from a top level commercial perspective for a long time - the ATI and Xilinx merges in particular look great.

Most of my colleagues think the ROCm stack works well. They use the approved Ubuntu kernel and a set of ROCm libraries that passed release testing to iterate on their part of the stack. I suspect most people who treat the kernel version and driver installation directions as important have a good experience. I'm closer to the HN stereotype in that I stubbornly ignore the binary ROCm release and work with llvm upstream and the linux driver in whatever state they happen to be in, using gaming cards which usually aren't on the supported list. I don't usually have a good time but it has definitely got better over the years.

I'm happy with my bet on AMD over Nvidia, despite the current stock price behaviour making it a serious financial misstep. I believe Lisa knows what she's doing and that the software stack is moving in the right direction at a sufficient pace.


> I think Cuda's GPU offloading model is very boring.

> That is, I think the more general architecture amdgpu is shipping is better than the specialised one cuda implements, despite the developer experience being rather gnarlier.

This reminded me of that Twitter thread that was linked on HN yesterday, specifically the part about AMD's "true" dual core compared to Intel's "fake" dual core.

> We did launch a “true” dual core, but nobody cared. By then Intel’s “fake” dual core already had AR/PR love. We then started working on a “true” quad core, but AGAIN, Intel just slapped 2 dual cores together & called it a quad-core. How did we miss that playbook?! AMD always launched w/ better CPUs but always late to mkt. Customers didn’t grok what is fake vs real dual/quad core. If you do cat /proc/cpu and see cpu{0-3} you were happy.

https://news.ycombinator.com/item?id=40696384

What is the currently available best way to write GPGPU code to be able to ship a single install.exe to end users that contains compiled code that runs on their consumer class AMD, Nvidia, and Intel graphics cards? Would AdaptiveCpp work?


Shipping compiled code works fine if you have a finite set of kernels. Just build everything for every target, gzip the result and send it out. People are a bit reluctant to do that because there are lots of copies of essentially the same information in the result.

I suspect every solution you'll find which involves sending a single copy of the code will have a patched copy of llvm embedded in said install.exe, which ideally compiles the kernels to whatever is around locally at install time, but otherwise does so at application run time. It's not loads of fun deriving a program from llvm but it has been done a lot of times now.


> Shipping compiled code works fine if you have a finite set of kernels. Just build everything for every target, gzip the result and send it out. People are a bit reluctant to do that because there are lots of copies of essentially the same information in the result.

That's kind of the point, you have to build everything for a lot of different targets. And what happens a year from now when the client have bought the latest GPU and wants to run the same program on that? Not having an intermediary compile target like RTX is a big downside, although I guess it didn't matter for Frontier.

I can't find any solution, AdaptiveCpp seems like the best option but they say Windows support is highly experimental because they depend on a patched llvm, and they only mention OpenMP and Cuda backends anyway. Seems like Cuda is still the best Windows option.


There's a degree of moving the goalposts there.

Shipping some machine code today to run on a GPU released tomorrow doesn't work anywhere. Cuda looks like it does provided someone upgrades the cuda installation on the machine after the new GPU is released because the ptx is handled by the cuda runtime. HSAIL was meant to do that on amdgpu but people didn't like it.

That same trick would work on amdgpu - compile to spir-v, wait for a new GPU, upgrade the compiler on the local machine, now you can run that spir-v. The key part is the installing a new JIT which knows what the new hardware is, even if you're not willing to update the program itself. Except that compile to spir-v is slightly off in the weeds for compute kernels at present.

It's tempting to view that as a non-issue. If someone can upgrade the cuda install on the machine, they could upgrade whatever program was running on cuda as well. In practice this seems to annoy people though which is why there's a gradual move toward spir-v, or to shipping llvm IR and rolling the die on the auto-upgrade machinery handling it. Alternatively ship source code and compile it on site, that'll work for people who are willing to apt install a new clang even if they aren't willing to update your program.


> Cuda looks like it does

And that's what matters. It might be seen like moving the goal posts by someone who knows how it works in the background and what kind of work is necessary to support the new architecture, but that's irrelevant to end users. Just like end users didn't care about "true" or "fake" dual cores.

> If someone can upgrade the cuda install on the machine, they could upgrade whatever program was running on cuda as well.

No, because that would mean that all GPGPU developers have to update their code to support the new hardware, instead of just the runtime taking care of it. I think you're more focused on HPC, data centers, and specialized software with active development and a small user base, but how would that work if I wanted to run a image processing program, video encoder, game, photogrammetry, etc, and the developer have lost interest in it years ago? Or if I have written some software and don't want to have to update it because there's a new GPU out? And isn't the cuda runtime installed by default when installing the driver, which auto updates?

> there's a gradual move toward spir-v

It was introduced 9 years ago, what's taking so long?

> Alternatively ship source code and compile it on site, that'll work for people who are willing to apt install a new clang

Doesn't seem to work well on Windows since you need to use a patched Clang, and some developers have a thing about shipping their source code.

On the whole both the developer and the Windows user experience is still very unergonomic, and I really expected the field to have progressed further by now. Nvidia is rightly reaping the reward from their technical investments, but I still hope for a future where I can easily run the same code on any GPU. But I hoped for that 15 years ago.


I mean, you say there was “just no money” but AMD signed a deal over three years ago to acquire Xilinx for $50b. They’ve been on an acquisition spree in fact. Just not anything related to gpgpu, because that wasn’t a priority.

Yes, after you spend all your money there’s nothing left. Just like after refusing the merger with nvidia and then spending all the cash buying ati there was nothing left. Times were very tough, ATI and consoles kept the company afloat after spending all their money overpaying for ATI put you there in the first place. Should have done the merger with nvidia and not depleted your cash imo.

More recently could easily have spent 0.5% of the money you spent on Xilinx and 10x’d your spend on GPGPU development for 10 years instead. That was 2020-2021 - it’s literally been 5+ years since things were good enough to spend $50 billion on a single acquisition.

You also spent $4b on stock buybacks in 2021... and $8 billion in 2022... and geohotz pointed out your runtime still crashed on the sample programs on officially-supported hardware/software in 2023, right?

Like the assertion that a single dime spent in any other fashion than the way it happened would have inevitably led to AMD going under while you spend an average of tens of billions of dollars a year on corporate acquisitions is silly. Maybe you legitimately believe that (and I have no doubt times were very very bad) but I suggest that you’re not seeing the forest for the trees there. Software has never been a priority and it suffered from the same deprioritizing as the dGPU division and Radeon generally (in the financial catastrophe in the wake of the ATI debacle). Raja said it all - gpus were going away, why spend money on any of it? You need some low end stuff for apus, they pulled the plug on everything else. And that was a rational, albeit shortsighted, decision to keep the company afloat. But that doesn’t mean it’s the only course which could have done that, that’s a fallacy/false logic.

https://youtu.be/590h3XIUfHg?t=1956

Nothing drives this home more than than this very interview with Lisa Su where she is repeatedly probed around her priorities and it always comes back to hardware. Even when she is directly asked multiple sequential questions probing at her philosophy around software, she literally flatly says hardware is what’s important and refuses to even say it’s a priority today. “I don’t know that. I am a hardware person”, and that’s how she’s allocated resources too.

https://news.ycombinator.com/item?id=40703420

I agree Xilinx is a very important piece for building systems, and it’s led to a lot of innovation around your packaging etc, but it’s also been a massive miss on systems engineering too. Nvidia has picked PHYs with higher bandwidth-density per era and built the hardware to scale the system up, as well as the software to enable it all. The network is hardware too, there is much more to hardware than just single-package performance and AMD is falling behind at most of it, even with Xilinx.

Again, though, from a developer perspective every single person’s formative experience with gpgpu has been getting excited to try opencl, APP, HSA, ROCm, or HIP, and then AMD shattering it, followed by “wow CUDA just works”. You guys have been an awful company to work with from the dev side.

And again, I’m sure you do look at it in terms of “we target the commercial stuff because they pay us money to” but you really should be thinking in terms of where your addressable market is. Probably 70% of your market (including iGPUs) is using GCN era gaming hardware. You simply choose not to target these users. This includes a large number of current-gen products which you continue to sell into the market btw - those 5700G are Vega. Another 25% or so is using rdna1/2 and you didn’t target these until very recently (still not on Linux, the dominant platform for this). And you are continuing to add numbers to this market with your zen4 APUs - you simply choose not to support these on ROCm at all, in fact.

Your install base is backwards-weighted, and in fact is only becoming moreso as you lose steam in the desktop market - 85% of dGPUs sold this gen were the competition's. And literally going into this generation you completely kicked all legacy hardware to the curb anyway, except for Radeon vii, the weird fragile card that nobody bought because it was worse than a 2080 at a higher price a year later. You have no long-term prospects because you stubbornly refuse to support the hardware that is going to be accessible to the people who want to write the next 20 years of software for you, you are literally targeting the inverse of your install base.

Sorry to say it, and again not mad at you personally etc, but the other comments are right that this is seemingly a problem that everyone can see except for the people who work at AMD. The emperor has no software.

It is hard to think of a more perfect example of the Innovator's Dilemma: you targeted the known, stable markets while a smaller, more agile, more focused (in R&D spend, etc) competitor visibly created a whole new segment, that you continued to ignore because there still wasn't Big Money in it yet. It's a tale as old as time, and it always comes from a place of execs making safe, justifiable, defensible decisions that would have been completely rational if things had gone differently and GPGPU compute didn't become a thing. But it's not a defense to the question of "why did you miss the boat and what are you going to do differently going forward", either. Understanding why the decision was made doesn't mean it's a good one.

One of the points I've been making is that the reason the mining booms keep coming back, and AI booms, etc is that "dense compute" has obviously been a thing for a while now. Not just HPC, not just crypto, but there are lots of things which simply need extreme arithmatic/bandwidth density above all else, and as various fields discover that need we get surges into the consumer gaming market. Those are the wakeup calls, on top of HPC and AI markets visibly and continuously expanding for a decade now, while AMD watched and fiddled. And again, as of 5+ years ago AMD was not nearly so destitute they couldn't start moving things forward a bit - you were so destitute you were looking to start acquisitions to the tune of tens of billions a year, tens of billions of dollars in stock buybacks, etc.

I just flatly reject the idea that this was the best posible allocation of every dollar within the company such that this was flatly not achievable. You had money, you just didn't want to spend it on this. Especially when the CEO is Hardware Mafia and not the Software Gang.

(and I'm nodding intentionally to the "Bomber Mafia" in WW2 - yeah, the fighter aircraft probably can't escort the bombers into germany, you're right, and that's mostly because the bomber mafia blocked development of drop tanks, but hindsight is 20/20 and surely it seemed like a rational decision at the time!)

https://www.youtube.com/watch?v=I7aGC6Sp8zQ

I also frankly think there is a real concerning problem with AMD and locus-of-control, it's a very clear PTSD symptom both for the company and the fans. Some spat with Intel 20 years ago didn't make AMD spend nearly a hundred billion dollars on acquisitions and stock buybacks instead of $100m on software. Everything constantly has to tie back to someone else rather than decisions that are being made inside the company - you guys are so battered and broken that (a) you can't see that you're masters of your own destiny now, and (b) that times are different now and you both have money to spend now and need to spend it. You are the corporate equivalent of a grandma eating rotten food despite having an adequate savings/income, because that's how things were during the formative years for you. You have money now, stop eating rotten food, and stop insisting that eating rotten food is the only way to survive. Maybe 20 years ago, but not today.

I mean, it's literally been over 20 years now. At what point is it fair to expect AMD leadership to stand by their own decisions in their own right? Will we see decisions made in 2029 be justified with "but 25 years ago..."? 30 years? More? It's a problem with you guys: if the way you see it is nothing is ever your responsibility or fault, then why would you ever change course? Which is exactly what Lisa Su is saying there. I don't expect a deeply introspective postmortem of why they lost this one, but at least a "software is our priority going forward" would be important signaling to the market etc. Her answer isn't that, her answer is everything is going great and why stop when they're winning. Except they're not.


it's also worth pointing out that you have abdicated driver support on those currently-sold Zen2/3 APUs with Vega as well... they are essentially legacy-support/security-update-only. And again, I'm sure you see it as "2017 hardware" but you launched hardware with it going into 2021 and that hardware is still for sale, and in fact you continue to sell quite a few Zen2/3 APUs in other markets as well.

if you want to get traction/start taking ground, you have to actually support the hardware that's in people's PCs, is what I'm saying. The "we support CDNA because it is a direct sale to big customers who pay us money to support it" is good for the books, but it leads to exactly this place you've found yourselves in terms of overall ecosystem. You will never take traction if you don't have the CUDA-style support model both for hardware support/compatibility and software support/compatibility.

it is telling that Intel, who is currently in equally-dire financial straits, is continuing to double-down on their software spending. At one point they were running -200% operating margins on the dGPU division, because they understand the importance. Apple understands that a functional runtime and a functional library/ecosystem are table stakes too. It literally, truly is just an AMD problem, which brings us back to the vision/locus-of-control problems with the leadership. You could definitely have done this instead of $12 billion of stock buybacks in 2021/2022 if you wanted to, if absolutely nothing else.

(and again, I disagree with the notion that every single other dollar was maximized and AMD could not have stretched themselves a dollar further in any other way - they just didn't want to do that for something that was seen as unimportant.)


ROCm 6.0 and 6.1 list RDNA3 (gfx1100) and RDNA2 (gfx1030) in their supported architectures list: https://rocm.docs.amd.com/en/latest/compatibility/compatibil...

Although "official" / validated support^ is only for PRO W6800/V620 for RDNA2 and RDNA3 RX 7900's for consumer. Based on lots of reports you can probably just HSA_OVERRIDE_GFX_VERSION override for other RDNA2/3 cards and it'll probably just work. I can get GPU-accelerate ROCm for LLM inferencing on my Radeon 780M iGPU for example w/ ROCm 6.0 and HSA_OVERRIDE_GFX_VERSION=11.0.0

(In the past some people also built custom versions of ROCm for older architectures (eg ROC_ENABLE_PRE_VEGA=1) but I have no idea if those work still or not.)

^ https://rocm.docs.amd.com/projects/install-on-linux/en/lates...


Omg. I know this is mostly marketing speaking, but this is her reply when asked about AMD's reticence to software:

> Well, let me be clear, there’s no reticence at all. [...] I think we’ve always believed in the importance of the hardware-software linkage and really, the key thing about software is, we’re supposed to make it easy for customers to use all of the incredible capability that we’re putting in these chips, there is complete clarity on that.

I'm baffled how clueless these CEOs sometimes seem about their own product. Like, do you even realize that this the reason why Nvidia is mopping the floor with your stuff? Have you ever talked to a developer who had to work with your drivers and stack? If you don't start massively investing on that side, Nvidia will keep dominating despite their outrageous pricing. I really want AMD to succeed here, but with management like that I'm not surprised that they can't keep up. Props to the interviewer for not letting her off the hook on this one after she almost dodged it.


What is she supposed to say? Perhaps "our products have bad software, don't buy them, go buy Nvidia instead"?


She could admit that they fell behind on this one and really need to focus on closing the gap now. But instead she says it's all business as usual, which assures me that I won't give their hardware another shot for quite a while.


I wouldn't blame that on the CEO, that's just regular media training and it's their job is to keep the stock market happy.

Companies really only admit to failure if there's no other option and the pressure is too high ("Antenna gate" and others come to mind).


Well, while they desperately try to keep shareholders happy instead of developers, Nvidia racks in trillions in market cap. But I agree in the sense that thinking ahead further than the next quarter is not a great strength of most publicly traded companies.


Just because something was said in public to some interviewer doesn't mean they are thinking exactly the same internally.


Supporting... I remember this Freakonomics interview with Ballmer about his statements about the iPhone: "There’s two things: What would I have said differently and what would I have thought differently? They’re actually quite different."

From: https://freakonomics.com/podcast/hoopers-hoopers-hoopers/


Well, technically yes. To see why that's the case you'd actually have to work with their products.


> that's just regular media training and it's their job is to keep the stock market happy.

The investors already know AMD's SW is bad. Now they know not to trust AMD's CEO.


She could say "We are in a compute gold rush and yet AMD's stock didn't gain anything in the last 6 months, so I hereby submit my resignation". That would work.


She managed to pull them out of the garbage bin after Bulldozer but I guess she hasn’t managed to hook them up to the AI bubble yet.


I'd love to know if any domain experts have a write up on what the the talent+ time+financial investment it would take for AMD to come up with with something that is a worthy rival to CUDA. Very curious to understand what the obstacles are.


~5 years. Medium-sized team in-house + hordes (hundreds, thousands) of engineers in the field helping clients on-site, writing code for them directly upstreamed to drivers, core libs, etc. (iteratively optimized in-house, ship feature, rinse and repeat). Story of the PlayStation SDKs, of DX too, but above all CUDA (they really outdid this strategy), now for cuDNN and so much more.

It takes incompressible time because you have to explore the whole space, cover most bases; and it takes an industry several years (about one "gen" / hardware cycle) to do that meaningfully.It helps when your platform is disruptive and customers move fast.

Maybe 3 years at best if you start on a new ideal platform designed for it from scratch. And can throw ungodly amount of money fast at it (think 5K low-level engineers roaming your installed base).

Maybe 10+ yrs (or never) if you're alone, poor, and Radeon (j/k but to mean it's non-trivial).


I’d say it mainly needs persistence and good execution (library support). NVIDIA has co-developed CUDA with their hardware, and largely stayed compatible with it, since around 2009, and around 2012 it first started taking off in the HPC space. Years later this enabled first their boom in crypto and then an even bigger one in AI. I don’t think this amount of R&D would be out of reach of today’s AMD (as NVIDIA wasn’t any bigger back then), but the backing of it needs to come from the very top.


First, they need to work with kernel devs to finally fix their drivers. Like, Nvidia used to be a "pain in the ass" here as well (that's a literal quote from Torvalds), so simply by contributing more than nothing, they could have taken the lead. But they definitely screwed this one up.

Second, they need to fix their userspace stack. ROCm being open source and all is great in principle, but simply dropping your source to the masses doesn't make it magically work. They need to stop letting it linger by either working with the open source community (huge time investment) or do it themselves (huge money investment).


The code is all on GitHub, the ISA docs are public, the driver is in upstream Linux with the work in progress on Gitlab. You can build whatever you want on AMD's hardware with total disregard to their software if you're so inclined. One or two companies seem to be doing so.

This has been true since roughly the opencl days, where the community could have chosen open standards over subservience to team green. Then again for the HSA movement, a really solid heterogeneous programming model initially supported by a bunch of companies. Also broadly ignored.

Today the runtime code is shipping in Linux distributions. Decent chance your laptop has an AMD CPU in it, that'll have a built in GPU that can run ROCm with the kernel you're already using and packages your distribution ships.

I'm not sure what more AMD could be doing here. What more do you want them to do?


> the community could have chosen open standards over subservience to team green

i think most people would rather have proprietary software that works rather than opensource that doesn't


>The code is all on GitHub, the ISA docs are public, the driver is in upstream Linux with the work in progress on Gitlab

That's exactly what I meant by dumping the source and hoping that someone turns else it to plug and play magic - for free. This simply doesn't work.


The code is there and they're stoically implementing everything themselves.

The current ML ecosystem is people write papers and frameworks using cuda and then people complain that amd hasn't implemented them all on rocm, without really acknowledging that nvidia didn't implement them either. All the code is out there so that people could implement their work on amd and then complain at nvidia for it missing from their ecosystem, but that's not the done thing.

What would you have amd do differently here?


I wonder if they really need a CUDA rival.

This AI stuff has progressed a bit. Intel has been working on interesting stuff with OneAPI. It might be the case that things have progressed to the point where the primitives are well enough understood that you need something more like a good library rather than a good compiler.

In the end, more people seem to love BLAS than Fortran, after all.


That library (Triton) sits on top of the compiler and drivers (ROCm). If the driver kernel panics, no high-level library can fix that.


I don’t have direct experience, so I could be wrong. But, I believe a lot of the nice stuff that CUDA brings along is profiling and performance related, that is, most useful if you are writing the code yourself. Plus, if the ecosystem is not quite as stable, but it is mostly AMD’s engineers writing the library that have to deal with it, they have more latitude to just not go down the buggy or bad-performance code-paths.


My theory is that someone came up with the bright idea of allowing more open source in the stack and that that would allow them to get it all done via crowd sourcing and on the cheap. But if true it was a quite naive view of how it might work.

If instead they said let's take the money we should invested in internal development and build an open developer community that will leverage our hardware to build a world class software stack it might have been a little better.


AMD has just never had good developer software. For ages the best BLAS on AMD was… Intel MKL, as long as you figured out how to dispatch the right kernels.

Actually, it could be really cool if everybody acted like AMD. The fact that Intel and Nvidia put out the best number libraries for free means you can’t sell a number crunching library!


I don't want a CUDA rival. I want to get the entire pile of CUDA code that is already written and run it on AMD GPUs without any kind of tweak or rewrite, and have it just work every time

Compatibility with existing code is very important. People can't afford to rewrite their stuff just to support AMD, and thus they don't

AMD is kind of trying to do this with rocm and HIP, but whatever they are doing it's not enough


I spotted this recent post https://www.reddit.com/r/LocalLLaMA/comments/1deqahr/comment... that was pretty interesting:

> When I was working on TVM at Qualcomm to port it to Hexagon a few years ago we had 12 developers working on it and it was still a multiyear long and difficult process.

> This is also ignoring the other 20 or so developers we had working on Hexagon for LLVM, which did all of the actual hardware enablement; we just had to generate good LLVM IR. You have conveniently left out all of the LLVM support that this all requires as AMD also uses LLVM to support their GPU architectures.

> Funny enough, about a half dozen of my ex coworkers left Qualcomm to go do ML compilers at AMD and they're all really good at it; way better than I am, and they haven't magically fixed every issue

> It's more like "hire 100 additional developers to work on the ROCM stack for a few years"

This last statement sounds about right. Note that ROCm has over 250 repos on Github, a lot of them pretty active: https://github.com/orgs/ROCm/repositories?type=all - I'm sure an enterprising analyst who was really interested could look at the projects active over the past year and find unique committers. I'd guess it's in the hundreds already.

I think if you click through the ROCm docs https://rocm.docs.amd.com/en/latest/ (and maybe compare to the CUDA docs https://docs.nvidia.com/cuda/ ) you might get a good idea of the differences. ROCm has made huge strides over the past year, but to me, the biggest fundamental problem is still that CUDA basically runs OOTB on every GPU that Nvidia makes (with impressive backwards and in some cases even forwards compatibility to boot https://docs.nvidia.com/deploy/cuda-compatibility/ ) on both Linux and Windows, and... ROCm simply doesn't.

I think the AMD's NPUs complicate things a bit as well. It looks like it's its currently running on its own ONNX/Vitis (Xilinx) stack https://github.com/amd/RyzenAI-SW , and really it should either get folded into ROCm (or a new SYCL/oneAPI-ish layer needs to be adopted to cover everything).


> is a worthy rival to CUDA

Vulkan Compute already exists

But when Developers still continue buying NVIDIA for CUDA, because developers only target CUDA for their applications it is a chicken-egg scenario, similar to Linux vs Windows.


> Like, do you even realize that this the reason why Nvidia is mopping the floor with your stuff?

Are they?

Generally, my experience has been that AMD products generally just work even if they're a bit buggy sometimes, while Nvidia struggles to get a video signal at all.

Seems perfectly reasonable to focus on what matters, while Nvidia is distracted by the AI fad.


I'm not talking about gaming, I'm talking about general purpose computing (although even for gaming your statement is pretty bold). Since she's CEO of a publicly traded company, it seems pretty weird that she would ignore the fields where the money is at, while Nvidia becomes the most valuable company in the world. So she's not just ignoring developers' wants but also her stockholders'.


GPGPU is largely a non-presence outside of a few niche fields like video encoding (which, uh, seems to work fine enough for me?).


AI is not exactly niche.


https://en.wikipedia.org/wiki/AI_winter

It keeps being hyped up again every once in a while years, but it has never, will never, panned out in practice.


I don't know if you're out of the loop but the AI scene has dramatically changed in the past few years. We are solidly out of the AI winter.


Talking about having left "the" AI winter because we're in the middle of a hype cycle makes about as much sense as saying we can get rid of all the snow plows because the trees are currently green.

There's a lot of money in the space, yes. There's still nothing of substance.


I regret to inform you that companies like AMD run on money, not substance.


even if the LLM thing is an "AI fad" there are many other things that ML is used for that matter (to the people spending real money on GPUs - think A6000, H100, not gamer cards)


Such as…?


recommender systems for social media, automatic transcription, autonomous driving, computational photography, semantic search for photo albums, circuit board design, semiconductor mask production


I’m not anti-CEO, I think they play an important role, but why would you interview a CEO about hard technological problems?


She worked as a researcher in the field for decades. Moore's law only kept up because of some of the techniques she developed.

> During her time at IBM,[6] Su played a "critical role"[7] in developing the "recipe"[2] to make copper connections work with semiconductor chips instead of aluminum, "solving the problem of preventing copper impurities from contaminating the devices during production".[7] Working with various IBM design teams on the details of the device, Su explained, "my specialty was not in copper, but I migrated to where the problems were".[6] The copper technology was launched in 1998,[7] resulting in new industry standards[22] and chips that were up to 20% faster than the conventional versions.[6][7]


AMD was close to ruin when Lisa took over. It had completely lost the graphics and x64 wars and was limping by on low margin games consoles.

Since then Epyc has broken Intel. Anyone buying Xeon's today is at credible risk of being fired over it when someone notices the power bill relative to their competition.

The graphics ramp is behind Nvidia in mind share and market share. The ROCm software stack gets a lot of grief on here. Nevertheless, Nvidia lost the frontier and el cap bids and now Microsoft is running GPT4 on AMD hardware. Sounds pretty good to me given it's one product line of many.

If turning a multinational semiconductor firm away from the edge of bankruptcy and into the profitable conglomerate it is today doesn't qualify as "solving a hard problem" I don't know what would. It's way beyond what I'd be capable of doing.


Major clouds have been investing in ARM alternatives. x86 is still king compatibility matters a lot. But it is not as simple as Su paints for teams of x86 chip designers to switch to arm chip designers and reach top place, specifically because fabs also matter and AMD is the player in the market with less money to pay the fabs.

GPU market would be hard to recover, and reason to use AMD (for the money printing AI) is just budget. Software is not good enough, it’s understandably not in AMD’s DNA, it was simply lacking in budget as it was close to bankruptcy when CUDA started taking off.

emphasis on top, ofc great designers would still design great no matter the ISA, but best and great is different


Lisa Su is one of the (surprisingly rare) tech CEOs who comes from an engineering background.


You mean like the CEOs of Intel, Nvidia, TSMC, ASML, Micron etc?


I've noticed that it's all the chip companies that have CEOs that understand to a deep level the stuff that the company is actually doing.

Compare and contrast the big software / services tech firms...

It feels like companies like Intel, AMD, Nvidia and TSMC are much more about delivering good technical solutions to hard problems than software companies like Google or Microsoft that try to deliver "concepts" or "ideas".

I do sometimes wonder though why decision making at a company like AMD benefits so much from having a highly competent engineer as the CEO compared to let's say Oracle...


You don't want to hear this, but it's because what we programmers do is the easy part of software. In the hardware business the goal is well-defined (just be faster+) and all the magic is in how to do it. In software the difficult part is figuring out what the customer actually wants and then getting your organization to actually align on building that. The coding itself is easy.

+There is obviously some nuance to what faster means. Faster at what? But finding some popular workloads and defining benchmarks is a lot easier than what software people have to do


Who doesn't like hearing that we (programmers) get all the fun, easy work while the other suckers in the economy have to put in the hard work for our benefit?


Agreed, most hard coding problems are due to requirements thrashing and business vision churn. All things related to figuring out what the customer wants.


Calling all the “ad-ware” web companies “Tech companies” is what leads to this apparent contradiction. It is largely identity appropriation across the board.


In general it's very common for CEOs to be of a business/MBA background. Apple and Google both have this, along with tons of other companies outside the semiconductor industry.


Apple as a hardware company needed someone who was good at operations and logistics. Tim Cook is one of the best in the business.

On the other hand, I have no idea what Google’s CEOs purpose is. Google has no vision and can’t produce new products to save their lives.

Google is unique among the current BigTech companies in being completely incapable of diversifying or pivoting to new market opportunities


Google might be the worst run unicorn. How they fell so low from that high is truly an achievement. They will continue to print money, but to think Google was once programmer Utopia and Googlers were regarded as demi-goods... inconceivable now.


Isn't the CEO of Intel a bit crazy in regards of Leading his company through praying?


And so is her first cousin (once removed) Jensen Huang of Nvidia!


Much of the hard problems these days require scale and with it coordination


It says in the article that he was asked by readers to do a Lisa Su interview. The headline is a little misleading, they don't talk much about technological problems. The interview is a soft light tour of her career and a few attempts to get her to talk about the present day business. Interviews with someone like Musk or Jensen are much more technically intense.

Honestly this interview feels bearish for AMD. Su's performance is not good. Thompson repeatedly pushes her to reflect on past mistakes, but it's just not happening. The reason why AMD has fallen so far behind NVIDIA shine through clear as day and it doesn't look like it's going to get fixed anytime soon.

Su's problem and therefore AMD's problem is that she doesn't want to think about software at all. Hardware is all she knows and she states that openly. Nor does she seem to consider this a weakness. The problem goes back to the very start of her career. The interview opens with Thompson saying she faced a choice between computer science and electronics engineering at MIT, and she picked EE because it was harder. Is that true? She's nowhere in AI due to lack of sufficient skilled devs so now would be a good time to talk up the importance of software, but no, she laughs and says sure! CS seemed easy to her because you "just" write software instead of "building things", whereas in electronics your stuff "has to work". End of answer.

He tries to get a comment on the (in hindsight) not great design tradeoffs made by the Cell processor, which was hard to program for and so held back the PS3 at critical points in its lifecycle. It was a long time ago so there's been plenty of time to reflect on it, yet her only thought is "Perhaps one could say, if you look in hindsight, programmability is so important". That's it! In hindsight, programmability of your CPU is important! Then she immediately returns to hardware again, and saying how proud she was of the leaps in hardware made over the PS generations.

He asks her if she'd stayed at IBM and taken over there, would she have avoided Gerstner's mistake of ignoring the cloud? Her answer is "I don’t know that I would’ve been on that path. I was a semiconductor person, I am a semiconductor person." - again, she seems to just reject on principle the idea that she would think about software, networking or systems architecture because she defines herself as an electronics person.

Later Thompson tries harder to ram the point home, asking her "Where is the software piece of this? You can’t just be a hardware cowboy ... What is the reticence to software at AMD and how have you worked to change that?" and she just point-blank denies AMD has ever had a problem with software. Later she claims everything works out of the box with AMD and seems to imply that ROCm hardly matters because everyone is just programming against PyTorch anyway!

The final blow comes when he asks her about ChatGPT. A pivotal moment that catapults her competitor to absolute dominance, apparently catching AMD unaware. Thompson asks her what her response was. Was she surprised? Maybe she realized this was an all hands to deck moment? What did NVIDIA do right that you missed? Answer: no, we always knew and have always been good at AI. NVIDIA did nothing different to us.

The whole interview is just astonishing. Put under pressure to reflect on her market position, again and again Su retreats to outright denial and management waffle about "product arcs". It seems to be her go-to safe space. It's certainly possible she just decided to play it all as low key as possible and not say anything interesting to protect the share price, but if I was an analyst looking for signs of a quick turnaround in strategy there's no sign of that here.


In my point of view AMD is going down not because nVidia, but because of ARM and Qualcomm. AMD Ryzen x64 cash cow is going to start declining soon both in the server and consumer space.

I saw this clear as day when M1 Macbooks came out and Amazon AWS Graviton servers becoming more popular and cheaper. It was inevitable that the PC world was going to move to ARM soon, in fact I am surprised that it took this long to get viable ARM PC laptops (only this year).

So unless AMD has some secret ARM or RISC-V research division close to launch a product I don't see how it is going to survive long term.


> So unless AMD has some secret ARM or RISC-V research division close to launch a product I don't see how it is going to survive long term.

AMD has already built ARM chips and uses a modified ARM core for their Platform Security Processor. They have an architectural license and have already committed to launching ARM chips by 2025.

Why would they need to make it a secret? And what makes you think that they are?


ARM frontend was already a thing at AMD decade ago.

Btw. As for now ARM has bigger market share in peoples minds than actual sales and it aint going to eat x86 even by 2030


Won’t these ARM products kill Intel first? AMD’s parts are more competitive, correct?


The GPU scheduler is an arm chip. As in taped out, shipping in volume, has been for years. I think there was a project decades ago that put an arm front end on an x86 core. If industry decides they like the aarch64 isa more than x64 I doubt it would cause much trouble to the silicon team.


> If industry decides they like the aarch64 isa more than x64 I doubt it would cause much trouble to the silicon team.

the problem isn't ISA so much as AMD not having a moat around its CPU revenue anymore. Yes, AMD will have ARM chips... and so will Qualcomm, and Mediatek/NVIDIA, and Marvell. AWS and Facebook and Google have already largely dumped x86 chips entirely. But they haven't dumped them for ARM chips from AMD, or Intel, or Qualcomm, they've dumped them for ones they develop themselves at rates that would be commercially unviable for AMD to compete with as a semicustom product (and those companies would not want that in the first place).

It's not about Intel vs AMD and which of the two is faster anymore. It’s about the fact that intel plus AMD is going to shrink to be a minority player in the market as ARM commoditizes a market from which Intel plus AMD extracted economic rents for decades. That free money train is coming to a stop. Absolutely there is going to be a market for AMD as a consultency business helping others integrate chips (just like semicustom today) but even that is going to be something that many others do etc, and the IP cores themselves certainly don't have to be licensed from AMD anymore, there will be many viable offerings and AMD's margins will be thinner when they provide less of the stack and design services etc. Absolutely there will be people willing to pay for the cores too, but it's going to be a lot less than the people who want something that's Good Enough to just hook up their custom accelerator thing to a CPU (just like AWS Graviton and Google TPU etc).

And while consoles are still going to have legacy support as a major inertia factor, even that is not forever, the ftc leak showed Microsoft bid the next-gen console out as a “ML super resolution upscaling” and RTGI focused ARM thing, right? Especially with how far AMD has fallen behind in software and accelerator integration etc. ARM will eventually break that moat too - especially with nvidia pushing hard on that with switch 2 as well etc. Even these ancillary services that AMD provides are largely important because of the most that x86 provides around anyone else having easy interoperation.

Margins are going to come down, and overall market will grow but AMD will command less of it, and also large customers are going to depart that market entirely and do it themselves (which AMD will also get a chunk of, but a much smaller pie etc than selling the whole CPU etc).

Again, like, the problem isn't that AMD doesn't have a Snapdragon X Elite. It isn't even about whether they're faster than Snapdragon X Elite. It's the fact that Snapdragon X Elite is going to do to client revenue what Graviton and TPU are already doing to your datacenter revenue. It's what ARM and Qualcomm and NVIDIA are going to do to your x86 and graphics SIP licensing revenue, and your semicustom integration revenue etc. Even if they are flatly not as good, they don't have to be, in order for them to collapse your x86 rents and your margin on the resulting design+integration services and products. We will come to realize we didn’t need to pay as much for these things as we did, that integration isn’t as expensive when you have five similar blocks from 5 different companies and three places that can integrate the SOC. That inefficiency is all part of the x86 rents/x86 tax too.

This isn't to say you're doooomed but like, Intel and AMD are not winners from the powerplay that Microsoft is making against x86 right now. You literally are going to be facing 3-4 more well-funded large competitors who can now offer services that are now substitutable for yours in a way that nobody appreciated would happen a year ago. Windows on ARM being a viable proposition in terms of support and emulation changes the game on the x86 rents in client/gaming markets, and companies are moving in to take advantage of that. Intel and AMD have literally nowhere to go but down on that one - they will not control the resulting market anymore, actually they already are losing control of large parts of the datacenter market due to cloud ARM offerings from hyperscale partners. Now it's client too, and foreseeable risk to consoles and semicustom etc.

All of that ancillary revenue was just gated by x86 in the final measure. Now that x86 is no longer a necessity, we’re going to see a significant increase in market efficiency, and parties that are dependent on the largesse of those rents are going to have problems. You can’t rely on graphics being a loss-leader (or low-margin-leader) to sell x86 SOCs, for example.


It'll be sad if the x64 lines wind down entirely, or end up on IBM style life support, in end of an era sense. Also fatal to Intel.

AMD would be fine though. They're currently in first or second place across x64, gpu, fpga. There's dedicated networking stuff and the ai accelerator/ npu things as well. Semi-custom has been a big thing for ages. That's a clean sweep across all forms of magic sand.

They're also the most open source & collaboration themed of all the hardware players. I think Broadcom building on AMD's infinity fabric is a really big deal; that's collaboration at the hardware IP level.

Right now everything is obsessed with generative AI and Nvidia building artificial intellectual factories, but if that proves to be marketing nonsense it'll kill Nvidia and AMD will shake it off.

The moat isn't the x64 patents. It's diversification and execution competence.


Apparently someone at AMD (Su herself?) had mentioned that making an arm frontend for ryzen isn't that impossible. Perhaps they already have prototypes lying around their labs?


Even if that is the case and they can just port their designs to ARM they will still be facing much more competition, not only from Qualcomm but also from the cloud vendors in the server space (AWS, Azure, GCP).

All the cloud server hardware is getting more and more vertically integrated, why would the cloud vendors pay for AMD hardware when they can build their own?


Did you read my post from 4 years ago?

  Is AMD the king of the Titanic (x86)?
https://www.reddit.com/r/AMD_Stock/comments/kg4e8j/is_amd_th...

I basically outlined why I would not invest in AMD and it's inevitable that ARM would take over servers and personal computers.


No I did not read it, we just arrived at the same conclusions although you were a bit earlier than me to realise this. What opened my eyes was the easy of transition to the ARM-based macs. I fully agree with all your points and that has been my view since around 2021 (when I got an M1 mac).

Once dev computers are running ARM at large no one is going to bother cross-compiling their server code to x64, they will just compile to ARM which will tear through AMD server demand. In fact my own org already started migrating to AWS graviton servers.

And this bodes poorly for Nvidia as well, I bet all cloud providers are scrambling to design their own in-house alternatives to nVidia hardware. Maybe alternatives to CUDA as well to either remove the nVidia lock-in or create their own lock-ins. Although Nvidia is much better positioned to stay ahead in the space.


The problem with the Nvidia replacement goal of big tech is that they don't have an ARM-like organization to design cores for them. Big tech use their own ARM CPUs because they use stock ARM core designs and its ISA. The hardwork was already done for big tech.

Big tech must design their own GPUs. From the looks of it, it's much harder to do it on your own than license cores from ARM.

https://www.businessinsider.com/amazon-nvidia-aws-ai-chip-do...


Apple does and other big tech are just an acquisition away from being able to do it as well. In fact if memory serves me right Apple in-house chips were originally derived from an acquisition.

I wouldn't be surprised if Microsoft or Google would buy AMD or Intel (or subdivisions of them) at some point in the future.

This is all speculation of course, and you are not wrong about Nvidia being harder to replace. I mentioned this myself in my previous post, but I don't discard the possibility of it happening.


Electrical engineers do generally think software is easy. Even when their day is a horror show of TCL and verilog. In fairness I think hardware is horrendously difficult, so maybe they're not wrong.


chuckle they fact that they can solve their problems with TCL and verilog should be all the proof it takes to conclude that their jobs are easier.

But the guys who really have it hard are guys writing the EDA software. You have to be expert in both chip design and software development. You are writing software to design tomorrow's computers, and it has to run on today's computers. Virtually every problem is NP-complete, and the problem size is growing with Moore's law.

And all your customers just LOOOOVE TCL :-) so you end up writing a lot of that too :-)


Most bachelor's and master's level CS is comparatively easier than EE because EE requires much more hard math. The theory at least, but project-wise CS is more demanding. I had two EE roommates in college, their exams were HARD, but their home-projects were easy compared to CS (less projects overall as well).

I remember one exam my roommate complaining about was about getting all the formulas he needed into his scientific calculator before the exam even started. If you understood how to derive all the formulas and knew how to put them in the calculator and how to use them you passed the exam. I think it was analog circuit processing exam but I might be wrong.

Research-level in computer science can get very hard as well though. A lot of it is more pure mathematics than engineering.


As far as undergraduate work goes EE is harder due to the math background required, indeed. However, the thing is, if you take the same brilliant minds who would ace EE, and reallocate them to software, they won't magically throttle down and limit themselves to undergrad CS concepts; they will find and tackle all the complexity they can withstand. You end up with the JS ecosystem, CI/CD, IaC, columnar databases, and so on. So I wonder if some of this is happening where thinking that AMD doing undergrad CS-level effort is all there is, where there is actually invisible complexity that is being missed that NVidia managed to tackle.


Yeah I agree with you, if you see these programmers doing very advanced stuff they often have degrees in physics and the like. The hardest problems in CS today are all very math-focused.

The original quote from the OP was about why Lisa Su decided to go EE bachelor though and EE bachelor indeed is one of the hardest ones out there.

What I was trying to highlight is that although EE is harder, CS has a bigger workload due to all the personal projects you have to build. But given enough time and persistence you can get a CS degree even if you are not the sharpest (speaking as someone who is not the sharpest). To be honest in my own degree I didn't have much trouble with the content (besides a few math-heavy classes), just the workload.

I remember me and my roommates study time, my time was usually implementing algorithms on my PC while theirs was usually pouring over text-books and solving differential equations. Although one of my roomates was a huge nerd who had to get max grades, he spent much more time than me studying. The other one spent about the same amount of time as me.


> I had two EE roommates in college, their exams were HARD, but their home-projects were easy compared to CS (less projects overall as well).

Maybe that's just a result of EE take-home projects being less practical? Hold on, let me walk on over to my wire bonding station ...

In my applied EM class in college, we had a year-end project in which we built an antenna of a specified type (e.g., helical, corner reflector, etc ... ). The final exam was essentially a do or die transmitter hunt. We had a lot of open lab time to do it. But that project was an exception, not the norm.


I think it is because CS is such a broad field, every little nook has a ton of different algorithms to study and, like you said, it is not that hard to tell students to implement algorithm X as projects.

While my EE roommates would often just spend their time solving the same types of problems from their text books over and over again. My roommates had a few lab classes where they did get assignments, but they were usually pretty small and couldn't be done at home.


They are quite different skills, I think. Being good at one doesn't tend to mean being good at the other (and this also applies to organisations: hardware companies will tend to suck at software by default and vice-versa. Not because they're necessarily anti-correlated but because they're sufficiently uncorrelated that they will tend to be average, i.e. bad in the area they aren't doing well in). But then there's a pretty wide spread of difficulty and skills in the individual parts of either which dwarfs any difference in the average.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: