AMD Instinct MI325X in Q4 2024, 288GB of HBM3E

shrubble · 2024-06-03T04:52:02 1717390322

The claim that the next generation would be 35x faster, felt like an "Osborne moment" to me, but if demand is robust enough...

Netcob · 2024-06-03T09:18:03 1717406283

In AI, that doesn't sound too surprising to me right now.

I just experiment with some local LLMs, but the differences are pretty huge:

Llama 3 8B, Raspberry Pi 5: 2-3 Tokens/second (but it works!)

Llama 3 8B, RTX 4080: ~60 Tokens/second

Llama 3 8B, groq.com LPU, ~1300 Tokens/second

Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

Llama 3 70B, groq.com LPU, ~330 Tokens/second

There seem to be huge gaps between CPU, GPU and specialized inference ASICs. I'm guessing that right now there aren't many genius-level architecture breakthroughs, and that it's more about how much memory and silicon real estate you're willing to dedicate to AI inference.

SushiHippie · 2024-06-03T11:12:20 1717413140

What quantization levels did you use?

I think groq doesn't use quantization, so the gap between your hardware and groq would be even further apart.

kkielhofner · 2024-06-03T12:41:11 1717418471

> I think groq doesn't use quantization, so the gap between your hardware and groq would be even further apart.

To my knowledge this isn't (absolutely) publicly known but users on /r/LocalLLaMA and elsewhere have provided some pretty clear examples that Groq is almost certainly quantized. Which makes sense considering their memory situation...

An entire GroqRack (42U cabinet) has 14GB of RAM which means it likely can't even reasonably run llama3 8b in BF16/FP16. Let alone 70b, Mixtral, etc.

The amount of hardware required to run their public-facing hosted product likely takes up an obscene amount of floor space, even in int4. Their docs for GrowFlow describe int8 quantization but their toolkit is heavily dependent on ONNX, which has had recent tremendous work in terms of different post training quantization strategies and precisions.

However, the power efficiency vs performance is very good, potentially to the point of being able to use very cheap datacenter/co-location space that isn't capable of meeting the power and (air) cooling densities of datacenter AMD and Nvidia GPU products.

Interestingly I have access to a GroqRack system that I'm hoping to be able to spend some time on this week.

SushiHippie · 2024-06-03T13:36:58 1717421818

Ah TIL, thanks for the insights!

Netcob · 2024-06-04T07:58:44 1717487924

I don't remember exactly, whatever came out first on Huggingface I guess. Some Q4 variant probably.

zozbot234 · 2024-06-03T13:39:16 1717421956

> Llama 3 70B, AMD 7800X3D: 1-2 Tokens/second

How much RAM is required for this result? It's quite impressive that it even works as well as it does.

Netcob · 2024-06-04T07:57:00 1717487820

I have 64 GB, but it really depends on the quantization. Looking at LM Studio I see versions ranging from 15 GB to 49 GB, and that's roughly how much RAM they will require.

LM Studio will also let you do partial GPU offloads, but I've only started experimenting with that. The 1-2 Tokens/second value is what I got using GPT4All.

wmf · 2024-06-03T05:08:09 1717391289

Nvidia is doing the same thing. They announced B100 before H200 shipped and a few hours ago they started talking about R100 before B100 shipped.

ipsum2 · 2024-06-03T07:08:15 1717398495

(Re: Osborne effect) It's going to be released in 2 years. Rarely can businesses wait that long, they're going to be ordering the MI300 now.

karma_pharmer · 2024-06-03T08:55:59 1717404959

Or they're trying to distract attention from the fact that they've already sold out 100% of the fab capacity available to produce these chips for the next two years.

So really, they lose nothing. They've already booked sales of everything there is to sell. So might as well now turn attention to those who might be customers two years from now, and make them feel like the wait will be worth it.

latchkey · 2024-06-03T07:23:31 1717399411

[deleted] See below, I did not understand the Osborne effect comment.

ipsum2 · 2024-06-03T07:26:20 1717399580

You're going to wait for the MI350 and not order any more MI300s?

latchkey · 2024-06-03T07:29:05 1717399745

Weird that I got downvoted on the above. I'm buying and deploying MI300x's today and will buy whatever AMD comes out with next.

ipsum2 · 2024-06-03T07:30:26 1717399826

You were probably downvoted because you were shilling your company, and you misunderstood the comment.

"The Osborne effect is a social phenomenon of customers canceling or deferring orders for the current, soon-to-be-obsolete product as an unexpected drawback of a company's announcing a future product prematurely. It is an example of cannibalization."

latchkey · 2024-06-03T08:06:40 1717402000

Shilling is ok on a topic directly related to my business.

You're right on the Osborne effect though! Thanks for that. We are definitely not doing that.

To clarify: When we started, MI300x was not officially announced yet, so we were planning on buying MI250's. Due to everything taking longer than expected around starting the business and receiving funding, by the time we had money in the bank, it was time to buy MI300x. Going forward, we are buying MI300x today and will continue to buy AMD MI series as they are released in the future.

krasin · 2024-06-03T08:36:20 1717403780

Since we're on the topic of your business: I am training a decent amount of neural nets nowadays (mostly, around the new-gen robotics policies) and use vast.ai instances with 8x RTX 4090 cards.

I've been interested to give 8x MI300x a try, as they are supposed to be cheaper per FLOPs, but it looks like your service does not provide on-demand pay-per-second instances. Any plans to change that?

latchkey · 2024-06-03T08:56:17 1717404977

I would love nothing more than to be able to enable on-demand GPUs, but unfortunately this is a limitation from AMD right now. We can't do PCIe pass through to a virtual machine, it just doesn't work. This is why our minimum is 8 right now. If you look at all of our competitors, they have the same issue. Even Azure "VM", is 8 at a time, but they are all sold out due to high demand.

It kind of makes sense since their history is only supporting the high end GPUs in their HPC solutions, where they don't use VM's. They've committed to us directly that they will fix this issue.

I updated our pricing page to note this.

krasin · 2024-06-03T19:15:06 1717442106

> I would love nothing more than to be able to enable on-demand GPUs, but unfortunately this is a limitation from AMD right now. We can't do PCIe pass through to a virtual machine, it just doesn't work. This is why our minimum is 8 right now. If you look at all of our competitors, they have the same issue. Even Azure "VM", is 8 at a time, but they are all sold out due to high demand.

Thank you for the response!

Renting 8 GPUs at once is fine and desired; it's not the issue.

The issue is that right now one has to commit to at least 1 week of use; my use patterns are bursty and it does not map well to the current proposition.

latchkey · 2024-06-03T23:33:37 1717457617

This is very good feedback. We are just getting off the ground and on/off-boarding is still a bit of work for us.

Right now, we are trying to attract people who are a mix of wanting to kick the tires on a new product, as well as take compute for longer term.

I did mention in the pricing section that we can store your data locally, as part of the advertised pricing. This is our effort to recognize your use case.

I also understand that you want to optimize and don't want to pay for something that you're not using. We will eventually get to that point, but honestly just not there yet.

Think also from our end, we have these GPUs and if you're not using them... then who is? We've put out the capex/opex to make these available to you at any time, so the only way to be efficient on our side is to do a week long block right now.

Regardless, if you want to reach out to me directly, please do so. Maybe there is a middle ground we can both work from. Happy to consider all options and getting in early with us will always have first mover advantages.

krasin · 2024-06-04T04:56:48 1717477008

Thank you for being open; if my spend on vast.ai grows another 10X, I will consider reaching out. Right now, I am still a fairly small fish for your supercomputer.

trogdor · 2024-06-03T17:12:18 1717434738

>I updated our pricing page to note this

Nice job on your website!

Sincerely,

Someone who recently gave you a hard time for not having info on your website.

latchkey · 2024-06-03T23:42:45 1717458165

Thank you! That is awesome feedback. Your hard time helped motivate me to work on it as more of a priority.

Havoc · 2024-06-03T08:45:48 1717404348

> 35x increase in AI inference performance compared to AMD Instinct MI300 Series

Even for marketing claims that’s pretty wild.

Still lots of trajectory left in just scale up plan it seems

layoric · 2024-06-03T09:34:48 1717407288

I think there is a close limit considering most of these gains are coming from the reduced memory bandwidth consumption that comes with the smaller data types. This would line up with Nvidia’s crazy graph from yesterday where data types were specified.

How much lower can these go though? 2bit? 1.58bit? 1bit? It seems that these massive gains have a very hard stop to gains that AMD and Nvidia will use to raise their stock price before it all comes to a sudden end.

jauntywundrkind · 2024-06-03T03:25:29 1717385129

Such a weird & cruel modernity, where these releases are purely in the abstract. No, you still won't be able to buy a MI300X in Q4 2024. The enhanced edition will absolutely not be available.

(I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting)).

latchkey · 2024-06-03T08:10:18 1717402218

The focus is on hyperscalers and cloud service providers now. Even Groq stopped selling to retail.

karma_pharmer · 2024-06-03T08:56:32 1717404992

Welcome (back) to the age of the mainframe.

Except we call it "cloudframe" now.

mpreda · 2024-06-03T05:19:49 1717391989

I think that's where short-sighted financial gain leads AMD to. Where's the money? -- datacenter. So let's focus the good stuff on datacenter exclusivelly. What about "the rest" (gamers, hobbist, students)? There's no money there, let's give theme crap RDNA that we make sure can't be used for any real work; just pretent we're catering for their needs.

I think their "consumer GPU" did so bad recently that AMD could just as well, you know, simply liquidate the "consumer GPU" division and stop pretending.

I'm in the "consumer GPU" market myself; what AMD GPU do I buy today? -- Radeon Pro VII, launched in 2020 and the best AMD consumer GPU I can find today.

It's such a divide. I could optimize my software for such powerful GPUs as the Mi300 line.. but why do that, given that probably I won't even see one such GPU in my lifetime.

atq2119 · 2024-06-03T05:25:58 1717392358

The RX 7900s are pretty good. You get 24GB of RAM in a consumer GPU. If you're interested in GenAI that's a good offering for your "gamers, students, hobbyists" category.

And they announced a workstation version with 48GB: https://www.phoronix.com/news/AMD-Radeon-PRO-W7900-Dual-Slot

3abiton · 2024-06-03T07:09:15 1717398555

Indeed for me that's the only standout from this release, 48gb not too overpriced.

jacobgorm · 2024-06-03T07:29:37 1717399777

I’ve filed a detailed AMD Windows driver crash bug months ago that is getting totally ignored because all their devs have been moved to working on AI.

re-thc · 2024-06-03T04:53:16 1717390396

> (I miss the old PC era where the world at large was benefiting in tandem from new things happening (or falling behind from not adapting))

Paper launches aren't anything new. It's always been a thing especially in hardware.

wmf · 2024-06-03T05:10:37 1717391437

What is kind of new is that hyperscalers sometimes get chips 1-2 years before the rest. If the new chip is 3x-4x faster that looks pretty unfair.

re-thc · 2024-06-03T05:29:12 1717392552

> What is kind of new is that hyperscalers sometimes get chips 1-2 years before the rest.

That's not new. Replace hyperscalers with crypto miners and we've had a similar situation for a while.

latchkey · 2024-06-03T07:24:25 1717399465

Crypto miners no longer buy up large quantities of GPUs. That market died off when ETH switched from PoW to PoS.

DEADMINCE · 2024-06-03T08:25:37 1717403137

> Crypto miners no longer buy up large quantities of GPUs.

That's news to crypto miners.

latchkey · 2024-06-03T08:30:35 1717403435

I no longer have the 150,000 GPUs I was mining with.

The only ones left standing are the small ones and most of them are just using what they already purchased.

There is no liquidity in the shitcoins to be able to support daily large dumps of their tokens and unless you have almost free power, there is zero profitability.

lukevp · 2024-06-03T15:17:08 1717427828

You need to start a blog latchkey, I see you on here in all sorts of different contexts and it’s always interesting to see what you’re up to these days!

latchkey · 2024-06-03T23:24:28 1717457068

Thanks luke! Super appreciate it. Company website was the first priority, but I've also started on a blog component for it as well. Using bullet.so/notion.so and really enjoying how easy it is to build everything.

forrestthewoods · 2024-06-03T04:14:55 1717388095

> No, you still won't be able to buy a MI300X in Q4 2024

Why not? Because they’re sold out to hyperscalers?

latchkey · 2024-06-03T08:13:19 1717402399

They are not sold out. It is just a lot more work to support retail on a novel new product, so they are focused on hyperscalers and CSP's. Don't forget that high end GPUs are US export controlled as well. They are considered weapons by the government. [See 88 Fed. Reg. 73458 (Oct. 25, 2023) and the Export administration Regulations (EAR)].

almostgotcaught · 2024-06-03T04:50:01 1717390201

> No, you still won't be able to buy a MI300X in Q4 2024.

they're 15k - who exactly is disappointed they won't be able to buy one?

krasin · 2024-06-03T05:21:19 1717392079

People on reddit.com/r/localllama build even more expensive rigs sometimes. Everyone wants to run llama3 70B and eventually the 405B version.

latchkey · 2024-06-03T07:22:42 1717399362

As someone who's buying them, that is not the price. Nor can you just buy one at a time. They are on a OAM/UBB, which is 8.

ein0p · 2024-06-03T07:31:20 1717399880

I’d buy a few at that price. But I can’t.

ai_what · 2024-06-03T05:26:07 1717392367

I mean it's the price of 2 * Apple Mac Pro's. It isn't exactly unheard of.

Just a few weeks ago I spoke to someone who shelled out $10k for running LLM's locally. I've seen more expensive builds as well.

bayindirh · 2024-06-03T09:06:18 1717405578

> who exactly is disappointed they won't be able to buy one?

HPC centers and research clusters.

nabla9 · 2024-06-03T10:22:35 1717410155

AMD comparison:

  8x AMD MI300X (192GB, 750W) GPU
  8x H100 (80GB, 700W) GPU

What would be the result against

  8x H100 NVL (188GB, <800W) GPU

?

DrNosferatu · 2024-06-03T12:57:33 1717419453

Is the software stack working (for practical use)?

AMD still has to prove themselves in this.