I do a lot of open source LLM research/dev work on a Mac Studio. While it doesn't quite compete in terms of speed with a GPU for standard transformers models, I can run pretty huge models locally. When I'm working with llama.ccp, the speed and model size I can run is very impressive.
I can also run Stable Diffusion XL in reasonable time frames on an iPad Pro. The current gen Macbook Pro can perform almost as well as the Mac Studio with an M2 Ultra, only the M3 Max has about 1/2 the bus speed (though still wild that you can run good sized local LLMs on a laptop).
If local generative AI becomes a major part of computing in the future, Apple has a huge advantage over the other players out there. This was obvious the second I started working on my Mac Studio. I have spent plenty of time using a traditional GPU setup for LLM work, and yes it is faster, but the complexity of getting things running is way beyond the average user's ability. Not having to fight with cuda ever is amazing, and so far everything else has 'just worked' as is typical of Apple.
If Apple has a team of talented people working to get gen AI performance tuned specifically to their hardward, I suspect we'll see some very competitive offerings in this space.
I feel like there's going to be a lot of movement towards the CPU with AI compute, and Apple's processors show the possibilities.
GPUs happened to have a lot of throughput lying around so they got put to work, but already the importance of having lots of memory to hold huge models even just for inference is clear. I also think the future AI will have a lot more going in 'conventional' compute rather than just large arrays of simple tensor ops or the like.
CPUs will increasingly gain specialist hardware to accelerate AI workloads, beyond what we have now and less monolithic too, in that it'll probably have a variety of kinds of accelerators.
That will combine well with big main memory and storage that is ever closer to the CPU to enable very fast virtual memory. I wouldn't be surprised if we soon see CPUs with HBW storage as well as HBW memory.
> CPUs will increasingly gain specialist hardware to accelerate AI workloads
Maybe, but then you're describing a coprocessor instead of the CPU. The CPU portion of the M1 SOC should be simpler than Apple's Intel processors, considering they don't support the wide bevvy of AVX/SSE instructions in-hardware anymore. The goal of the ARM transition is to keep the CPU side as simple as possible to optimize for power.
Personally I think we're going to see more GPU-style accelerators in the future. People want high-throughput SIMD units, ideally with a good programming framework a-la CUDA to tie it together. It makes very little sense to design dedicated inferencing hardware when a perfectly usable and powerful GPU exists on most phones. It's practically redundant to try anything else.
I can also run Stable Diffusion XL in reasonable time frames on an iPad Pro. The current gen Macbook Pro can perform almost as well as the Mac Studio with an M2 Ultra, only the M3 Max has about 1/2 the bus speed (though still wild that you can run good sized local LLMs on a laptop).
If local generative AI becomes a major part of computing in the future, Apple has a huge advantage over the other players out there. This was obvious the second I started working on my Mac Studio. I have spent plenty of time using a traditional GPU setup for LLM work, and yes it is faster, but the complexity of getting things running is way beyond the average user's ability. Not having to fight with cuda ever is amazing, and so far everything else has 'just worked' as is typical of Apple.
If Apple has a team of talented people working to get gen AI performance tuned specifically to their hardward, I suspect we'll see some very competitive offerings in this space.