Even the M3 Max seems to be slower than my 3090 for LLMs that fit onto the 3090,...

int_19h · on Jan 29, 2024

M3 Max is actually less than ideal because it peaks at 400 Gb/s for memory. What you really want is M1 or M2 Ultra, which offers up to 800 Gb/s (for comparison, RTX 3090 runs at 936 GB/s). A Mac Studio suitable for running 70B models with speeds fast enough for realtime chat can be had for ~$3K

The downside of Apple's hardware at the moment is that the training ecosystem is very much focused on CUDA; llama.cpp has an open issue about Metal-accelerated training: https://github.com/ggerganov/llama.cpp/issues/3799 - but no work on it so far. This is likely because training at any significant sizes requires enough juice that it's pretty much always better to do it in the cloud currently, where, again, CUDA is the well-established ecosystem, and it's cheaper and easier for datacenter operators to scale. But, in principle, much faster training on Apple hardware should be possible, and eventually someone will get it done.

coder543 · on Jan 30, 2024

Yep, I seriously considered a Mac Studio a few months ago when I was putting together an “AI server” for home usage, but I had my old 3090 just sitting around, and I was ready to upgrade the CPU on my gaming desktop… so then I had that desktop’s previous CPU. I just had too many parts already, and it deeply annoys me that Apple won’t put standard, user-upgradable NVMe SSDs on their desktops. Otherwise, the Mac Studio is a very appealing option for sure.