NVidia is making boatloads of money because their driver works and they have a s...

dralley · 2024-04-01T14:28:04 1711981684

>However, AMD was less capable than expected and their drivers were too buggy to run neural networks like those needed for the MLPerf benchmark. So now, it appears that AMD, Tinybox, and investors like me won't be making boatloads of money.

This is where the melodrama kicks in.

They reverse-reversed course less than a week later and now they're back on AMD again.

AMD plans go "on hold" - March 19: https://twitter.com/__tinygrad__/status/1770151484363354195

AMD plans restarted - March 25: https://twitter.com/__tinygrad__/status/1772139983731831051

erichocean · 2024-04-01T15:44:10 1711986250

> This is where the melodrama kicks in.

The complaint was that they couldn't debug the GPU.

Now they can, so they're soldiering on.

Seems reasonable to me.

dralley · 2024-04-01T16:27:09 1711988829

Surely if they had done a good job communicating with AMD they could have figured that out beforehand. The repo was already public.

gunsle · 2024-04-04T02:59:58 1712199598

He talks about in his streams how horrible the communication he’s received from AMD is. From what I listened to, it seemed like that was more of why he was giving up initially. Why do a bunch of free work for a massive company that won’t even communicate with you? Especially when he’s doing them a massive favor for next to no investment on their end?

razodactyl · 2024-04-05T09:23:10 1712308990

It's not just this though. AMD cards are priced better and NVIDIA is overpriced due to the gap.

I would love to use AMD for my ML experiments and I would love to see healthy competition benefit this field.

lofaszvanitt · 2024-04-01T09:18:04 1711963084

Why is that, that AMD seemingly can't act like a sane individual would do?

viraptor · 2024-04-01T11:22:23 1711970543

There's a chance they (maybe specifically the lawyers) know something we don't. I mean, maybe they are absurdly incompetent in listening to feedback, while at the same time achieving technically great things in hardware. But after so many years and seeing all the AI money going to the competitor... That seems less and less likely every day.

orbital-decay · 2024-04-01T11:51:12 1711972272

What might that be, for example?

"Organizationally unable to make competent software, perfectly able to make great hardware" seems to be the common case with hardware companies, if not de facto standard. Exceptions are rare.

viraptor · 2024-04-01T12:11:17 1711973477

Apart from "trying to implement this will cost us more in CUDA API copying lawsuits then it could earn", I don't know.

But it's not just that they can't make competent software. It's that everyone tells them they should try, that it looks like a pile of money ready to pick up, that people try doing it on their own... and AMD does nothing. They're not even taking the chance to fail/succeed. Can you imagine that Lisa Su doesn't get asked about this at least once a week?

nextaccountic · 2024-04-01T14:22:39 1711981359

They are already trying to copy CUDA API with rocm and HIP. If this is lawsuit worthy, they may already be hit with a lawsuit at any moment.

phero_cnstrcts · 2024-04-01T19:11:41 1711998701

Well, the CEO of AMD and the CEO of that competitors are the in the same family. Literally. Maybe they just don’t care which entrance the money takes.

CaptainOfCoit · 2024-04-01T12:50:06 1711975806

One individual doesn't suffer from the problem of being pulled in multiple different directions by multiple people. A company is not typically led by one person "dictator-style" but instead groups of people who try to make decisions together, sometimes not agreeing.

smallmancontrov · 2024-04-01T14:03:07 1711980187

Sure, but I don't think this is "too many cooks in the kitchen," I think it's the opposite: hardware companies tend to be structurally incapable of spending as much as they should on software because everyone in the hardware space has the same bias. The economics of the space select for it in the short term and against it in the long term, creating the neverending foot-gun party we observe.

mrguyorama · 2024-04-01T17:46:57 1711993617

AMD has simply never invested into software. Their code has been atrocious since even before AMD bought ATI. ATI "Catalyst Control Center" was their consumer driver code before Vista and into Windows 7 IIRC, and that was utter trash. Granted, nVidia's drivers were ALSO trash back then, accounting for literally 65% of ALL Vista BSODs.

nVidia decided to redouble their efforts, and now they might still crash occasionally, but are largely way better at driver stability and they brought CUDA into the world at the same time.

AMD decided that shitty software didn't seem to stop them from selling GPUs, and also we're too busy desperately surviving a decade of Intel anti-competitive practices that nearly killed the company, and bet everything on Ryzen. They also worked to make pretty good physical GPU hardware. Meanwhile, their GPUs still couldn't run Blender as fast as a similarly specced nVidia card because their OpenCL implementation was god awful. They ran at literally half the render speed of a similar nVidia GPU. It was stuck on OpenCL 1.x the whole time, because the 2.x implementation was literally broken. They nearly didn't have ANY hardware render solution for an update to the Blender rendering engine in 3.x because OpenCL 1.x literally couldn't do what they wanted, and ROCm is a joke. AMD engineers helped put together an emergency/late breaking fix to create an HIP implementation, and that works, at least mostly.

My pet theory is that not only does AMD not give a fuck about software, but they saw how nVidia was struggling with market segmentation from consumer cards being effective compute cards, and didn't want to run into those same struggles if they had a real CUDA competitor. Instead, they got to rub nVidia's face into the dirt with their GPUs that had way more VRAM, and not worry that it would chew into their professional GPU profit margins, because you can't compute on consumer cards. Oh, I forgot to mention, the whole time this nonsense is going on, AMD is pushing really hard to get their professional GPUs into Supercomputer clusters, and has several premier supercomputer implementations where their GPUs have no problem being used for top level compute tasks, almost like they CAN actually write GPU compute software and just don't give it to consumers.

lofaszvanitt · 2024-04-02T13:07:12 1712063232

Interesting, thanks.

DiabloD3 · 2024-04-01T02:45:18 1711939518

Slight mistake in your description: CUDA is an out of date API that was replaced by Khronos's own official compute APIs. Khronos is a standards consortium that Nvidia is a founding member of.

Although the marketing department at Nvidia still pushes for greenfield CUDA codebases, no new code should be written in it, and they should opt for open source international standards only. Khronos APIs are implemented by over 120 vendors.

mshockwave · 2024-04-01T03:13:36 1711941216

With my past experience in Khronos, NVIDIA is indeed a member and they sent decent guys to the meetings -- but only for strategic reasons, rather than "advocating for open standards" as you described. My experiences there actually told me the opposite that they will never drop CUDA. Objectively they also have incentive to do so: fighting with 100-ish companies to ratify something is always slower than rolling out an feature in an ecosystem you have total control of.

pixelpoet · 2024-04-01T14:23:44 1711981424

Not just that, but they get to abuse their near monopoly to strongarm many companies into using Cuda. Can't say more unfortunately...

baq · 2024-04-01T06:45:46 1711953946

CUDA is a vendor lock-in mechanism and they’ll happily endorse whatever Khronos says is standard… after they lose dominance.

It took 30 years for x86 and just barely so.

redox99 · 2024-04-01T04:21:53 1711945313

That's outright false and just wishful thinking from you.

greenavocado · 2024-04-01T02:55:39 1711940139

Show me the announcement deprecating CUDA

KeplerBoy · 2024-04-01T07:03:26 1711955006

Why do I have the feeling that CUDA will outlive whatever Khronos has proposed (do you refer to Sycl)?

Foobar8568 · 2024-04-01T08:48:32 1711961312

How may standards Khronos endorced over the years/decades on compute? From an uneducated and external view, it seems every 2-5years there is a new standard.

DiabloD3 · 2024-04-02T01:17:03 1712020623

Ultimately 2. OpenCL, and Vulkan (via it's compute shader).

Sycl's job isn't that, its meant to abstract implementations of common components across different kinds of hardware, and doesn't force you into any particular style of impl. As in, I could write a component for Sycl for my GPU in OpenCL and what Sycl would abstract away from the consumer of my component would be the entire usage of OpenCL itself; but I could write a component for a DSP, and it'd use an entirely closed source SDK for that hardware and is entirely opaque, and a Sycl user could use that impl for that function of they owned that DSP (instead of a CPU-based or GPU-based impl).

Also, Vulkan's compute doesn't replace OpenCL (not even in the sense that Vulkan, as a graphics API, replaces OpenGL). They're different levels of abstraction. Most Vulkan games are written almost entirely in compute shaders (ex: the powerhouse that is the Doom 2016 and Doom Eternal engines; and why they perform so fucking amazingly on paltry hardware like the original revision Xbox One, or hell, even the Switch).

In addition, I almost consider DX12 a flavor of Vulkan. Same job, written largely by the same people from the same companies, but instead of being OpenGL C-dialect flavored, its D3D C++-dialect flavored, but they both have entirely equivalent APIs that often call the same driver internals and produce nearly identical MIR. Microsoft did this on purpose to reflect the nature of how modern GPUs are almost entirely software renderers, sans certain parts of the texture units.

bee_rider · 2024-04-01T05:37:12 1711949832

That’s not a mistake, you just have a different preference.