I do *a lot* of open source LLM research/dev work on a Mac Studio. While it does...

dkjaudyeqooe · on Feb 27, 2024

I feel like there's going to be a lot of movement towards the CPU with AI compute, and Apple's processors show the possibilities.

GPUs happened to have a lot of throughput lying around so they got put to work, but already the importance of having lots of memory to hold huge models even just for inference is clear. I also think the future AI will have a lot more going in 'conventional' compute rather than just large arrays of simple tensor ops or the like.

CPUs will increasingly gain specialist hardware to accelerate AI workloads, beyond what we have now and less monolithic too, in that it'll probably have a variety of kinds of accelerators.

That will combine well with big main memory and storage that is ever closer to the CPU to enable very fast virtual memory. I wouldn't be surprised if we soon see CPUs with HBW storage as well as HBW memory.

smoldesu · on Feb 27, 2024

> CPUs will increasingly gain specialist hardware to accelerate AI workloads

Maybe, but then you're describing a coprocessor instead of the CPU. The CPU portion of the M1 SOC should be simpler than Apple's Intel processors, considering they don't support the wide bevvy of AVX/SSE instructions in-hardware anymore. The goal of the ARM transition is to keep the CPU side as simple as possible to optimize for power.

Personally I think we're going to see more GPU-style accelerators in the future. People want high-throughput SIMD units, ideally with a good programming framework a-la CUDA to tie it together. It makes very little sense to design dedicated inferencing hardware when a perfectly usable and powerful GPU exists on most phones. It's practically redundant to try anything else.