> are we stuck waiting for the 20-25 years for GPU improvements If this turns ou...

coolspot · 2024-12-20T23:46:45 1734738405

LLMs need efficient matrix multiiliers. GPUs are specialized ASICs for massive matrix multiplication.

vlovich123 · 2024-12-21T00:31:56 1734741116

LLMs get to maybe ~20% of the rated max FLOPS for a GPU. It’s not hard to imagine that a purpose built ASIC with maybe adjusted software stack gets us significantly more real performance.

boroboro4 · 2024-12-21T03:35:30 1734752130

They get more than this. For prefill we can get 70% matmul utilization, for generation less than this but we’ll get to >50 too eventually.

bjornsing · 2024-12-22T08:22:10 1734855730

And even when you get to 100% utilization you’ll still be wasting a crazy amount of gates / die area, plus you’re paying the Nvidia tax. There is no way in hell that will go on for 10 years if we have good AGI but inference is too expensive.