So iOS LLM Apps dont use the neural engine? Lol

woadwarrior01 · on June 11, 2024

None of the current iOS and macOS LLM Apps use the Neural Engine. They use the CPU and the GPU.

nb: I'm the author of a fairly popular app in that category.

jjtheblunt · on June 11, 2024

How would you know none of the apple apps use the neural engine? Is the key in the statement “LLM”?

woadwarrior01 · on June 11, 2024

Yes, I specifically meant autoregressive LLMs. BERT style encoder only models, ViTs and CNNs ran perfectly fine. Yesterday's coremltools update[1] changes that.

[1]: https://github.com/apple/coremltools/pull/2232

l33t7332273 · on June 11, 2024

Why do they not?

wpm · on June 11, 2024

AFAIK there is no general purpose, "do this on the ANE" API. You have to be using specific higher level APIs like CoreML or VisionKit in order for it to end up on the ANE.

bt1a · on June 11, 2024

This, plus metal acceleration works quite well. 7~8B parameter models quantized to 3bpw or so run with good tok/s on my iphone 15 pro

sanxiyn · on June 11, 2024

It works quite well as long as you don't care about battery.

hmottestad · on June 10, 2024

If they use Llama.cpp they probably run on the GPU. Apple hasn’t published much about their neural engine, so you kinda have to use it through CoreML. I assume they have some aces up their sleeves for running LLMs efficiently that haven’t told anyone yet.

renewiltord · on June 10, 2024

Probably not. The CoreML LLM stuff only works on Macs AFAIK. Probably the phone app uses the GPU.