Max RAM for Strix Halo is 128GB. It's not a competitor to the Mac Ultra which go...

Gracana · 2025-03-14T14:41:58 1741963318

Despite the hype, the 512GB Mac is not really a good buy for LLMs. The ability to run a giant model on it is a novelty that will wear off quickly... it's just too slow to run them at that size, and in practice it has the same sweet spot of 30-70B that you'd have with a much cheaper machine with a GPU, without the advantage of being able to run smaller models at full-GPU-accelerated speed.

SV_BubbleTime · 2025-03-14T14:55:48 1741964148

There’s so much flux in LLM requirements.

2 to 3 tokens per second was actually probably fine for most things last year.

Now, with reasoning and deep searching, research models, you’re gonna generate 1000 or more tokens just as it’s talking to itself to figure out what to do for you.

So everyone’s focused on how big a model you can fit inside your ram, the inference speed is now more important than it was.

Gracana · 2025-03-14T15:23:58 1741965838

Absolutely.

The thinking models really hurt. I was happy with anything that ran at least as fast as I could read, then "thinking" became a thing and now I need it to run ten times faster.

I guess code is tough too. If I'm talking to a model I'll read everything it says, so 10-20 tok/s is well and good, but that's molasses slow if it's outputting code and I'm scanning it to see if it looks right.

adgjlsfhk1 · 2025-03-14T16:00:03 1741968003

counterpoint: thinking models are good since they give similar quality at smaller RAM sizes. if a 16b thinking model is as good as a 60b one shot model, you can use more compute without as much RAM bottleneck

terribleperson · 2025-03-14T19:36:08 1741980968

Counter-counterpoint: RAM costs are coming down fast this year. Compute, not so much.

I still agree, though.

aurareturn · 2025-03-14T15:20:03 1741965603

It runs DeepSeek R1 q4 MoE well enough.

Gracana · 2025-03-14T16:06:11 1741968371

It does have an edge on being able to run large MoE models.

porphyra · 2025-03-14T11:17:48 1741951068

The $2000 strix halo with 128 GB might not compete with the $9000 Mac Studio with 512 GB but is a competitor to the $4000 Mac Studio with 96 GB. The slow memory bandwidth is a bummer, though.

aurareturn · 2025-03-14T11:45:47 1741952747

  but is a competitor to the $4000 Mac Studio with 96 GB. The slow memory bandwidth is a bummer, though.

Not really. The M4 Max has 2x the GPU power, 2.13x the bandwidth, faster CPU.

$2000 M4 Pro Mini is more of a direct comparison. The Mini only has 64GB max ram but realistically, a 32B model is the biggest model you want to run with less than 300 GB/s bandwidth.

Tepix · 2025-03-14T14:41:02 1741963262

You will be limited to a much smaller context size with half the RAM even if you're using a smaller model.

izacus · 2025-03-14T10:59:03 1741949943

> Max RAM for Strix Halo is 128GB. It's not a competitor to the Mac Ultra which goes up to 512GB.

What a... strange statement. How did you get to that conclusion?

aurareturn · 2025-03-14T11:22:10 1741951330

Why do you think it's strange?

izacus · 2025-03-14T19:53:31 1741982011

The original poster arrogantly and confidently proclaims that a device that costs like 2000$ isn't going to be able to compete against a 10000$ SKU of another device.

I'm wondering how do you get to such a conclusion?

aurareturn · 2025-03-15T01:57:31 1742003851

I honestly don't understand what you're trying to say.

Tepix · 2025-03-14T14:40:06 1741963206

Running something like Qwq 32b q4 with a ~50k context will use up those 128GB with the large KV cache.

aurareturn · 2025-03-15T01:58:43 1742003923

So what makes you think Strix Halo with such a weak GPU and slow memory bandwidth can handle 50k context with a usable experience for a 32B model?

Let's be realistic here.

The compute, bandwidth, capacity (if 128GB) are completely imbalanced for Strix Halo. M4 Pro with 64GB is much more balanced.

Tepix · 2025-03-21T08:38:42 1742546322

You're probably right. However with sparse models and MoE, 128GB may be useful.

Tepix · 2025-03-14T17:05:02 1741971902

Of course it's a competitor. Only a fraction of M3 Ultra sold will have 512GB RAM