During my time at Apple the bigger issue with personalized, on-device models was...

hmottestad · on June 10, 2024

They’ve gone with a single 3B model and several “adapters” for each use case. One adapter is good at summarising while another good a generating message replies.

onesociety2022 · on June 11, 2024

AI noob here. Is every single model in iOS really just a thin adapter on top of one base model? Can everything they announced today really be built on top of one base LLM model with a specific type of architecture? What about image generation? What about text-to-speech? If they’re obviously different models, they can’t load them all at once into RAM. If they have to load from storage every time an app is opened, how will they do this fast enough to maintain low latency?

wmf · on June 11, 2024

The main LLM is only 1.5 GB so it should only take a half second to load. Or they could keep it loaded. The other models may be even smaller.

glial · on June 11, 2024

Maybe they use the "Siri is waking up and the screen wabbles" animation time for loading the model. That would be clever.

mholm · on June 11, 2024

They'll have plenty of time to load the model; It still needs to wait for the user to actually voice/type their request. Invoking Siri happens well before the request is ready.