AMD GPUs Go Brrr

homarp · 2025-11-15T05:31:45 1763184705

see https://news.ycombinator.com/item?id=45923188 for HipKittens discussion

sorenjan · 2025-11-15T12:48:11 1763210891

See also this post about the same work: HipKittens: Fast and furious AMD kernels [0], with comments from George Hotz and AMD employees.

[0] https://news.ycombinator.com/item?id=45923188

DeathArrow · 2025-11-15T07:26:34 1763191594

I think many people tried making AMD GPU go brrr for the mass of the developers but no one succeeded.

I don't get why AMD doesn't solve their own software issues. Now they have a lot of money so not having money to pay for developers is not an excuse.

And data centers GPUs are not the worst. Using GPU compute for things like running inference at home is a much, much better experience with Nvidia. My 5 years old RTX 3090 is better than any consumer GPU AMD released up to this date, at least for experimenting with ML and AI.

cyberax · 2025-11-15T09:13:09 1763197989

I recently switched from an NVidia card (5090) to a couple of AMD cards (R9700 32GB) for my inference server.

I must say it's been a completely positive experience. The mainline Fedora kernel just worked without any need to mess with the DKMS. I just forwarded /dev/dri/* devices to my containers, and everything worked fine with ROCm.

I needed to grab a different image (-rocm instead of -cuda) for Ollama, change the type of whisper build for Storyteller. And that was it! On the host, nvtop works fine to visualize the GPU state, and VAAPI provides accelerated encoding for ffmpeg.

Honestly, it's been an absolutely pleasant experience compared to getting NVidia CUDA to work.

logicchains · 2025-11-15T09:08:34 1763197714

> Now they have a lot of money so not having money to pay for developers is not an excuse.

NVidia is the exception to the rule when it comes to hardware companies paying competitive salaries for software engineers. I imagine AMD is still permeated by the attitude that software "isn't real work" and doesn't deserve more compensation, and that kind of inertia is very hard to overcome.

qcnguy · 2025-11-15T10:56:29 1763204189

[flagged]

darkteflon · 2025-11-15T17:59:23 1763229563

My man, your world view is twisted by dogma. You may not personally like how she runs AMD, but Lisa Su is eminently qualified for the job. Her gender has nothing to do with this. You need to check yourself.

qcnguy · 2025-11-17T15:58:59 1763395139

At the CEO level there is no "qualified for the job". It's not like you can get a PhD in being a successful CEO. There is only actual success.

And it's not me twisted by dogma. I'm just predicting what would happen. Do you seriously argue Su could be chucked out (very likely replaced by a man) without a giant screaming fest from the usual suspects? No way. It's the NYTA journos who'd go on the warpath and be twisted by dogma.

acdha · 2025-11-17T16:06:08 1763395568

You’re bringing a lot of emotion to this but not much information or a compelling argument. Perhaps you shouldn’t be leveling accusations of “screaming fest”.

Also, the NYT recently ran a piece asking whether women ruined the workplace. It’s unclear why you think they would “go on the warpath” over a CEO being pushed out for business reasons.

sethops1 · 2025-11-15T11:17:28 1763205448

That "bad leadership" dug AMD out of hole and transformed the company into a behemoth. From under $2 a share to around $250 in eight years. I'll invest in that kind of bad leadership all day everyday.

happycube · 2025-11-15T12:30:02 1763209802

Yup... the first Ryzen/EPYC chips were literally a saving throw.

AMD's driver/software woes compared to nVidia make more sense when you realize they barely made it here at all.

qcnguy · 2025-11-15T17:57:25 1763229445

You should compare AMD vs its peers, not its even worse prior state.

AMD should by all rights be a strong competitor to NVIDIA with a big chunk of the AI market. They have nearly nothing. The buck should stop at the top, but with AMD it doesn't.

KetoManx64 · 2025-11-16T04:17:24 1763266644

This uses to be impressive, then you look at the gains that Bitcoin investors have and this is quite paltry, especially when you consider that inflation is 8-10%, per year.

_zoltan_ · 2025-11-15T13:42:04 1763214124

the two statements can be true at the same time: they can still view software developers as second class while having a great hardware vision.

what OP is saying is that after a point it doesn't matter how good your HW is if your SW stack is bad.

jacobgorm · 2025-11-15T07:51:53 1763193113

And the developer experience is horrible when working with AMD. They don’t even accept driver crash bug reports.

donaldihunter · 2025-11-15T11:22:56 1763205776

People say that as if the Nvidia experience is better. Nvidia also has a horrible developer experience.

kg · 2025-11-15T11:37:50 1763206670

YMMV but I reported a crash in Nvidia's vulkan driver and they responded promptly and fixed it.

_zoltan_ · 2025-11-15T13:40:35 1763214035

Huh? I've been developing against the Nvidia ecosystem for years. Just build a container and you're done. They even provide base containers.

Anything specific related to DC level computing?

Nathanba · 2025-11-15T09:35:37 1763199337

I just saw that Nvidia even maintains their own fork of Unreal Engine. AMD isn't even competing.

moomin · 2025-11-15T11:58:14 1763207894

nVidia has been deeply involved in the software side, first with gaming, forever. It’s written into their DNA. Even when ATI/AMD could outperform them in raw hardware, nVidia worked well with every last game and worked with individual developers even writing some of their code for them.

skeptrune · 2025-11-15T07:35:34 1763192134

I appreciate that there are people in academia working on this problem, but it seems like something AMD would have to fix internally if they were serious.

amelius · 2025-11-15T10:58:59 1763204339

I personally prefer the hardware companies making just hardware.

Keeps the incentives pure.

I'm even willing to accept a 20% performance hit for this requirement, should someone bring that up.

jack_tripper · 2025-11-15T11:41:04 1763206864

>I personally prefer the hardware companies making just hardware. Keeps the incentives pure.

That's self contradictory. Their incentive is to sell more HW and at higher prices using whatever shady practices they can get away with, software or no software. There's nothing pure about that, it's just business. High end chips aren't commodity HW like lawnmowers, they can't function without the right SW.

And this isn't the 90's anymore when Hercules or S3 would only make the silicon, and then system integrators would write the drivers for it which was basically MS-DOS calls to read/write to registers via the PCI bus, by the devs reading a 300 page manual, those days are long gone. Modern silicone is orders of magnitude more complex that nobody else besides the manufacturer could write the drivers for it to extract the most performance out of it.

>I'm even willing to accept a 20% performance hit for this requirement, should someone bring that up.

I'm also willing to accept arbitrary numbers I make up, as a tradeoff, but the market does not work like that.

LeifCarrotson · 2025-11-15T12:58:26 1763211506

> ...by the devs reading a 300 page manual, those days are long gone. Modern silicone is orders of magnitude more complex that nobody else besides the manufacturer could write the drivers for it...

The 300 page manual would be 3,000 or 30,000 pages long, if modern ARM ISR manuals are any indication. Independent developers could totally write performant drivers if they had the documents, but those manuals do not exist - or if they do, they're proprietary.

bigbadfeline · 2025-11-15T18:31:01 1763231461

> Independent developers could totally write performant drivers if they had the documents,

Surely they could, but at that complexity level, they wouldn't put the necessary amount of effort in it without being payed, and at that point, it's better to hire them.

> but those manuals do not exist - or if they do, they're proprietary.

And there are market-related reasons for that, it's not done because of some arbitrary paranoia. Another important issue is that good documents are hard to write - with regard to driver coding, it's much easier to make a quick call or message the hardware people about some unclear aspect of chip's operation rather than go through the formal process of modifying the official documents. Waiting for external developers to reverse engineer that is slow and leads to serious competitive disadvantages and AMD is an example of it.

amelius · 2025-11-15T19:00:20 1763233220

> Surely they could, but at that complexity level, they wouldn't put the necessary amount of effort in it without being payed, and at that point, it's better to hire them.

The assumption that no good software will be written without pay is outdated as FOSS disproved it many times over.

bigbadfeline · 2025-11-16T01:15:26 1763255726

> The assumption that no good software will be written without pay is outdated as FOSS disproved it many times over.

I think I made it clear that the necessary effort has to measure up to the complexity level, modification volume and time constraints typical for competitive GPU hardware - hasn't happened without pay for any GPU.

That may change if most of the GPU drivers are moved on-chip, which should have happened earlier but there's a lot of politics involved there too, so who knows.

MaxBarraclough · 2025-11-15T13:50:22 1763214622

> nobody else besides the manufacturer could write the drivers for it to extract the most performance out of it

Let's not go too far here. Reverse engineering and independent development of usable drivers are not impossible, they're 'merely' extremely challenging. Alyssa Rosenzweig in particular had great success reverse engineering the Apple M1 GPU and writing drivers for it, and that was just a few years ago.

https://en.wikipedia.org/wiki/Alyssa_Rosenzweig#Career

https://news.ycombinator.com/item?id=45034537

jack_tripper · 2025-11-15T16:57:40 1763225860

The M1 launched in 2019 and FOSS drivers still not on par with MAcOS. You can't stay in business waiting for 6 years for someone else to make drivers for your HW.

This is just a HN fantasy that's not compatible with business of making money. That's why everyone here make money working in SW.

amelius · 2025-11-15T18:54:38 1763232878

> The M1 launched in 2019 and FOSS drivers still not on par with MAcOS.

That's mostly because the documentation was never released.

Honestly, it makes no sense to try to suggest that FOSS can't write decent software when reality shows otherwise.

surajrmal · 2025-11-15T14:48:06 1763218086

It's extremely expensive and you will be late to the market by at least one generation. It's not economically viable outside of hobbiest situations.

amelius · 2025-11-15T12:06:17 1763208377

> Their incentive is to sell more HW and at higher prices using whatever shady practices they can get away with, software or no software.

And you don't think these shady practices will leak into the software?

> Modern silicone is orders of magnitude more complex that nobody else besides the manufacturer could write the drivers for it...

The hardware people at the manufacturer are not the software people. So there __must__ be documentation.

jack_tripper · 2025-11-15T16:55:27 1763225727

>So there __must__ be documentation.

YES, internal documentation, full of proprietary IP.

amelius · 2025-11-15T18:53:45 1763232825

But not releasing said documentation is a choice. And that goes against my original point.

arcbyte · 2025-11-15T12:53:35 1763211215

> the market does not work like that.

That depends on whether OP is buying/renting AMD gpu machines.

jack_tripper · 2025-11-15T16:13:53 1763223233

Can you elaborate on that?

andruby · 2025-11-15T13:26:48 1763213208

Unfortunately hardware can’t exist anymore without software. Everything non-trivial needs firmware or microcode.

And depending on others to write firmware for your hardware, I don’t think that’s a recipe for success.

amelius · 2025-11-15T13:50:54 1763214654

Software team at AMD to hardware team at AMD: "Give us the hardware with the docs then we will write software for it"

Hardware team at AMD: "Sorry, hardware can't exist without software; we'll first have to write the software"

Software team: "But we're the software team ..."

Hardware team: "Uhm yeah ... seems we have a nasty chicken and egg problem here"

matt-p · 2025-11-15T14:08:24 1763215704

That means 25% more datacentre/grid capacity. Genuinely I think most companies are not happy to fund that, in order to save marginally in other areas.

ngcc_hk · 2025-11-15T13:57:58 1763215078

Is apple a hw or sw … or is that a wrong question. Why is a company has to be a hw or sw one ?

If nvidia dominate because of CUDA and why it can do it but amd should not?

tester756 · 2025-11-15T11:58:35 1763207915

You alone is... pretty small market niche, I'd say.

amelius · 2025-11-15T13:23:41 1763213021

This is a silly thing to say. Right now there are probably thousands of hackers dying to get their hands on the M-series CPU documentation from Apple.

tester756 · 2025-11-15T19:46:01 1763235961

Thousands is niche. There are tens of millions of Apple laptops with Mx CPUs sold.

_zoltan_ · 2025-11-15T13:37:53 1763213873

This thread is about DC level HW, not consumer electronics.

amelius · 2025-11-15T13:42:44 1763214164

Apple's silicon would be used in DCs if they were more open. But sadly, Apple's offerings are a combination of hardware and software, and therefore not open by necessity.

musebox35 · 2025-11-15T13:59:05 1763215145

In certain contexts 20% is a lot bucks, leaving that on the plate would be very wasteful ;-)

amelius · 2025-11-15T14:02:45 1763215365

Yes, it would be 20% wasteful. But giving up freedom can be more costly.

Also, the 20% would be open to further optimization by the community, so it wouldn't be that bad in practice, probably.

musebox35 · 2025-11-15T16:12:12 1763223132

In some commercial contexts with the savings from that 20%, you can buy a lot freedom and then with the freedom you bought you can make more free things :)

amelius · 2025-11-15T16:32:44 1763224364

Could all be true, maybe, somehow. But I sleep better when my castle is not in someone else's kingdom. That alone is enough for me to accept the small performance penalty.

_zoltan_ · 2025-11-15T13:37:01 1763213821

20% performance on a 10GW DC? Suuuuuuure....

amelius · 2025-11-15T13:43:14 1763214194

What do absolute numbers have to do with it?

aabhay · 2025-11-15T09:23:49 1763198629

Except that this same team built a similarly named software package for Nvidia GPUs as well. It’s bright researchers doing what they do best if you ask me.

sigmoid10 · 2025-11-15T10:35:43 1763202943

Except that this other package also only came out last year and has contributed zero to Nvidia's current status. If AMD ever wants to be taken seriously in this market, they will need to start making their own software good instead of relying on "open source" in the mistaken belief that someone else will fix their bad code for free. Nvidia spent more than a decade hiring top talent and getting their proprietary software environment right before they really took off. And some of the older ML researchers here will certainly remember it wasn't pain-free either. But they didn't just turn the ship around, they turned it into a nuclear aircraft carrier that dominates the entire world.

stingraycharles · 2025-11-15T12:30:26 1763209826

Yeah honestly I’m dumbfounded why all these years AMD still doesn’t have an internal “code red” and get their developer experience up to par with CUDA.

martinpw · 2025-11-15T14:21:00 1763216460

Check out several interesting comments here from AMDAnon for insider insights on this question:

https://news.ycombinator.com/item?id=45923188

stingraycharles · 2025-11-15T17:43:33 1763228613

Man if that is anything close to the truth, it would explain a lot and be pretty depressing. It would imply leadership doesn’t understand software at all, and considers it a liability rather than an asset.

elteto · 2025-11-15T13:24:41 1763213081

It seems that AMD, like many other companies, doesn’t “get” software. It’s a cost-center, a nuisance, not really hard engineering, the community will take care of that, etc. It’s pretty ironic.

jstummbillig · 2025-11-15T13:12:55 1763212375

Yes. Why is that? Somebody here must have an informed opinion. It seems ludicrous, but also too obvious. What's up?

stingraycharles · 2025-11-15T13:54:49 1763214889

I found their acquisition of Xilinx (the FPGA company) to predict that they were going all in on a uniform FPGA / GPU / AI ecosystem, but… that didn’t seem to have yielded any integration benefits?

I’m genuinely dumbfounded by what’s up at AMD at this point.

_zoltan_ · 2025-11-15T13:38:20 1763213900

Honestly they should be hired by NVIDIA or AMD.

reactordev · 2025-11-15T13:39:42 1763213982

Fully agree. They punted 10 years ago and are now playing catchup. They have the hardware but can’t manage to unlock its full potential due to them not knowing how to write firmware that does.

Ecko123 · 2025-11-15T09:55:13 1763200513

AFAIK, they are already doing it at various levels including working with tinycorp

colordrops · 2025-11-15T08:07:21 1763194041

It's insane to me that AMD is not spending billions and billions trying to fix their software. Nvidia is the most valuable company in the world and AMD is the only one poised to compete.

aabhay · 2025-11-15T09:20:20 1763198420

They are, but the problem is that shifting an organization whose lifeblood is yearly hardware refreshes and chip innovation towards a ship-daily software culture is challenging. And software doesn’t “make money” the way hardware does so it can get deprioritized by executives. And vendors are lining up to write and even open source lots of software for your platform in exchange for pricing, preference, priority (great on paper but bad for long term quality). And your competitors will get ahead of you if you miss even a single hardware trend/innovation.

keyringlight · 2025-11-15T10:48:26 1763203706

There was a podcast episode linked here a while ago about how the software industry in Japan never took off as it did in America and it was a similar conclusion. According to the host, the product being sold was hardware, and software was a means to fulfill and then conclude the contract. After that you want the customer to buy the new model, primarily for the hardware and software comes along for the ride.

It should be obvious by now though that there's symbiosis between software and hardware, and that support timescales are longer. Another angle is that it's more than just AMD's own software developers, also the developers making products for their customers who in turn buy AMD's if everyone works together to make them run well and it's those second developers they need to engage with in a way their efforts will be welcomed.

nikanj · 2025-11-15T11:03:36 1763204616

Hardware is a profit center, software is a cost center, and they get treated accordingly

david-gpu · 2025-11-15T08:23:21 1763195001

I worked at at a number of GPU vendors, and it felt like Nvidia was the only one that took software as an asset worth investing in, rather than as a cost center. Massively different culture.

LarsDu88 · 2025-11-15T09:31:20 1763199080

From this writeup it does sound like the architecture of the AMD gpu makes it a bit harder to optimize. It also seems like long term, the AMD approach may scale better in the long run. 8 chiplets rather than 2 for the nvidia offering, along with all the associated cache and memory locality woes.

The future will probably see more chiplets rather than less, so I wonder if dealing with complexity here will pay more dividends in the long run

WithinReason · 2025-11-15T09:39:45 1763199585

AMD doesn't need warp specialisation for high performance while nvidia does, which simplifies programming AMD

boxerab · 2025-11-15T13:54:10 1763214850

This is a great project, but the bigger question is: why isn't AMD doing this themselves? It continues to boggle my mind how much they don't seem to get the importance of a mature software stack when it is so obviously the key to the success of team red. A stack that can be used for EVERY card they produce, like CUDA, not just a select few. I used to believe that AMD the underdog would catch up some day, but I've more or less given up on them.

alex1138 · 2025-11-15T05:46:31 1763185591

It's not my favorite internet meme but I'm tickled to see "go brr" on a website/university like Stanford

microtonal · 2025-11-15T06:56:12 1763189772

They already "went brr" when they announced ThunderKittens a year ago: https://hazyresearch.stanford.edu/blog/2024-05-12-tk

Skunkleton · 2025-11-15T07:38:35 1763192315

This meme is tired. Let it rest boss.

rightbyte · 2025-11-15T09:04:05 1763197445

Usually a sign that it is not cool anymore and the kids need to make something new up.

bsaul · 2025-11-15T10:59:49 1763204389

side question : how is mojo doing in that regard ? i thought their ideas was to improve devX on amds gpu ?

badgersnake · 2025-11-15T09:08:20 1763197700

I quite like that the AMD aren’t so popular with the AI bubble. It means I can play games without getting a mortgage.

J_Shelby_J · 2025-11-15T10:14:03 1763201643

How those AMD crashes though. All my friends in AMD CPUs have had a hell of the last two years with constant crashes in unreal engine games. Meanwhile, I made fun of myself for buying an ancient 11 series which is a decade old arch at this point but is rock solid.

Balinares · 2025-11-15T10:51:58 1763203918

AMD CPU, AMD GPU, zero crashes here. No crashes on the Steam Deck either, which is also 100% AMD.

The common denominator to the crashes you mention might possibly not be AMD? Do you friends perchance play on Windows?

viktorcode · 2025-11-15T10:24:27 1763202267

Just to point out that those crashes are specific to Windows: current generation of consoles run the same UE games with no crashes.

badgersnake · 2025-11-15T17:32:49 1763227969

RX 9070 here. Works fine for me on Linux, also runs ollama-rocm quite well, but don’t tell anyone.

akazantsev · 2025-11-15T15:54:33 1763222073

Had those due to insufficient cooling in the case. Tell him to run the games without the side panel. I installed additional fans later and have had no such issue ever since. xt 7900

tryauuum · 2025-11-15T13:10:01 1763212201

it took me around half a year to get an AMD integrated GPU working on linux

    AMD Ryzen 7 PRO 8700GE w/ Radeon 780M Graphics

the solution was adding amdgpu.ppfeaturemask=0xffff7fff to the command line. Before that I could reliably crash the driver with firefox.

tester756 · 2025-11-15T12:13:20 1763208800

My AMD cpu died after 9 months. I've received money return, but still it leaves a bad taste.

snovv_crash · 2025-11-15T19:06:58 1763233618

Contrast that with Intel's last generation of chips, all of which started failing after a similar time period. AMD only need to be better than the competition.

x3n0ph3n3 · 2025-11-15T09:13:17 1763197997

Their linux driver support isn't so great, though. I really considered an AMD GPU for my most recent build, and based on the driver support for just the integrated graphics on my new AMD CPU (7900X), I opted for an NVidia card instead.

esseph · 2025-11-15T10:02:53 1763200973

I have a 9060 in one PC and a 9070 in another, on Fedora 43.

It runs great. Run all my steam stuff through them. Those days to mention have been long gone for quite awhile.

mkayokay · 2025-11-15T10:07:39 1763201259

I'm running a 6900XT on Arch and have no problems so far. Steam, Heroic launcher and every game i tried so far worked like a charm. You can even OC with LACT [1] if you want to.

[1] https://github.com/ilya-zlobintsev/LACT

abenga · 2025-11-15T12:09:30 1763208570

How so? Switching from an Nvidia card to an AMD one I am now able to upgrade my kernel whenever without getting a blinking cursor after reboot. How are in-tree drivers worse than whatever Nvidia does?

blactuary · 2025-11-15T13:53:19 1763214799

Quite the opposite these days. AMD just works and Nvidia is a crapshoot

avidphantasm · 2025-11-15T12:59:58 1763211598

This is great, but why does the write-up read like it was written by someone with brain damage?

gortok · 2025-11-15T13:17:46 1763212666

The writing is laughably bad. I can’t tell if it’s someone that over relied on AI or if they just mimic the structure and mannerisms of AI produced writing because that’s what they see.

A few choice examples:

> Checkout part one of this series for an intro to HipKittens and checkout this post for a technical deep dive.

> Unsurprisingly, making AMD GPUs go brr boils down to keeping the “matrix cores” (tensor cores on NVIDIA) fed.

> These two patterns tradeoff programmability and performance, where 8-wave and its large tile primitives lead to compact code and 4-wave fine-grained interleaving expands code size. Surprisingly, the 8-wave schedule is sufficient to achieve SoTA-level performance on GEMMs and attention forwards. For GQA non-causal attention backwards, 8-wave also outperforms all AMD baselines by 1.8 × 1.8×, and our HK 4-wave further outperforms by 2.3 × 2.3×.

And I could go on. And on.

But overall besides the overuse of cliche/memespeak places it doesn’t make sense, the entire section that deals with the hot loop describes something that should be explained in a graph and instead explained in 100 lines of source code.

beepbooptheory · 2025-11-15T13:32:39 1763213559

Am I crazy what is wrong with any of those quotes.