How does LPDDR5 (This Xe3P) compare with GDDR7 (Nvidia's flagships) when it comes to inference performance?
Local inference is an interesting proposition because today in real life, the NV H300 and AMD MI-300 clusters are operated by OpenAI and Anthropic in batching mode, which slows users down as they're forced to wait for enough similar sized queries to arrive. For local inference, no waiting is required - so you could get potentially higher throughput.
I think the better comparison, for consumers, is how fast is LPDDR5 compared to the normal DDR5 attached to your CPU?
Or, to be more specific, what is the speed when your GPU is out of RAM and it's reading from main memory over the PCI-E bus?
PCI-E 5.0: 64GB/s @ 16x or 32GB/s @ 8x
2x 48GB (96GB) of DDR5 in an AM5 rig: ~50GB/s
Versus the ~300GB/s+ possible with a card like this, it's a lot faster for large 'dense' models. Yes, even an NVIDIA 3090 is ~900GB/s of bandwidth, but it's only 24GB, so even a card like this Xe3P is likely to 'win' because of the higher memory available.
Even if it's 1/3rd of the speed of an old NVIDIA card, it's still 6x+ the speed of what you can get in a desktop today.
I asked GPT to pull real stats on both. Looks like the 50-series RAM is about 3X that of the Xe3P, but it wanted to remind me that this new Intel card is designed for data centers and is much lower power, and that the comparable Nvidia server cards (e.g. H200) have even better RAM than GDDR7, so the difference would be even higher for cloud compute.