Hacker News new | past | comments | ask | show | jobs | submit login

>On the consumer side, Apple silicon is also quite capable.

I am not sure that is true. A glance/or long stay at the reddit localllama subreddit basically has a bunch of frustrated CPU users trying their absolute best to get anything to work at useful speeds.

When you can get an Nvidia GPU for a few hundred dollars or a full blown gaming laptop with a 4050 6gb vram for $900, its hard to call a CPU based AI capable.

Heck we don't have GPUs at work, and CPU based is just not really reasonable without using tiny models and waiting. We ended up requesting GPU computers.

I think there is a 'this is technically possible', and there is a 'this is really nice'. Nvidia has been really nice to use. CPU has been miserable and frustrating.




Actually, llama.cpp running on Apple silicon uses GPU(Metal Compute Shader) to inference LLM models. Token generation is also very memory bandwidth bottlenecked. On high end Apple silicon it's about 400MB/s to 800MB/s, comparable to NVIDIA RTX 4090, which has memory bandwidth of 1000MB/s. Not to mention that Apple silicon has unified memory architecture and has high memory models (128GB, up to 192GB), which is necessary to run large LLMs like Llama 3 70B, which roughly takes 40~75GB of RAM to work reasonably.


[flagged]


I use it all the time?


The number of people running llama3 70b on NVidia gaming GPUs is absolutely tiny. You're going to need at least two of the highest end 24 GB VRAM GPUs and even then you are still reliant on 4 bit quantization with almost nothing left for your context window.


The cognitive dissonance here.

70B models arent better than 7B models outside roleplay. The logic all sucks the same. No one even cares about 70B models.


I don't think NVIDIA's reign will last long. The recent AI resurgence is not even a decade old. We can't expect the entire industry to shift overnight, but we are seeing rapid improvements in the capability of non-GPU hardware to run AI workloads. The architecture change has been instrumental for this, and Apple is well positioned to move the field forward, even if their current gen hardware is lacking compared to traditional GPUs. Their silicon is not even 5 years old, yet it's unbeatable for traditional workloads and power efficiency, and competitive for AI ones. What do you think it will be capable of in 5 years from now? Same for Groq, and other NPU manufacturers. Betting on NVIDIA doesn't seem like a good long-term strategy, unless they also shift their architecture.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: