A not-absurdly-priced card that can run big models (even quantized) would sell l...

bigwheels · 2025-10-14T20:29:45 1760473785

How does LPDDR5 (This Xe3P) compare with GDDR7 (Nvidia's flagships) when it comes to inference performance?

Local inference is an interesting proposition because today in real life, the NV H300 and AMD MI-300 clusters are operated by OpenAI and Anthropic in batching mode, which slows users down as they're forced to wait for enough similar sized queries to arrive. For local inference, no waiting is required - so you could get potentially higher throughput.

freeqaz · 2025-10-15T04:32:28 1760502748

I think the better comparison, for consumers, is how fast is LPDDR5 compared to the normal DDR5 attached to your CPU?

Or, to be more specific, what is the speed when your GPU is out of RAM and it's reading from main memory over the PCI-E bus?

PCI-E 5.0: 64GB/s @ 16x or 32GB/s @ 8x 2x 48GB (96GB) of DDR5 in an AM5 rig: ~50GB/s

Versus the ~300GB/s+ possible with a card like this, it's a lot faster for large 'dense' models. Yes, even an NVIDIA 3090 is ~900GB/s of bandwidth, but it's only 24GB, so even a card like this Xe3P is likely to 'win' because of the higher memory available.

Even if it's 1/3rd of the speed of an old NVIDIA card, it's still 6x+ the speed of what you can get in a desktop today.

MrBuddyCasino · 2025-10-15T12:09:51 1760530191

This doesn’t matter at all, if the resulting tokens/sec is still too slow for interactive use.

halJordan · 2025-10-14T20:59:02 1760475542

Lpddr5x (not lpddr5) is 10.7 Gbps. Gddr7 is 32 Gbps. So it's going to be slower

codedokode · 2025-10-14T22:18:35 1760480315

Yes but in matrix multiplication there are O(N²) numbers and O(N³) multiplications, so it might be possible that you are bounded by compute speed.

electroglyph · 2025-10-15T12:19:31 1760530771

both are equally important. compute for prefill and mem bandwidth for generation

qingcharles · 2025-10-14T20:55:10 1760475310

I asked GPT to pull real stats on both. Looks like the 50-series RAM is about 3X that of the Xe3P, but it wanted to remind me that this new Intel card is designed for data centers and is much lower power, and that the comparable Nvidia server cards (e.g. H200) have even better RAM than GDDR7, so the difference would be even higher for cloud compute.

btian · 2025-10-14T20:46:46 1760474806

Isn't that precisely what DGX Spark is designed for?

How is this better?

geerlingguy · 2025-10-14T20:51:14 1760475074

DGX Spark is $4000... this might (might) not be? (and with more memory)

btian · 2025-10-14T21:12:38 1760476358

This starts shipping in 2027. I'm sure you can buy a DGX Spark for less than $4k in 2 years time.

gessha · 2025-10-14T23:17:10 1760483830

But good luck with Nvidia not turning it into abandoware.