Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

During my time at Apple the bigger issue with personalized, on-device models was the file size. At the time, each model was a significant amount of data to push to a device, and with lots of teams wanting an on-device model and the desire to update them regularly, it was definitely a big discussion.



They’ve gone with a single 3B model and several “adapters” for each use case. One adapter is good at summarising while another good a generating message replies.


AI noob here. Is every single model in iOS really just a thin adapter on top of one base model? Can everything they announced today really be built on top of one base LLM model with a specific type of architecture? What about image generation? What about text-to-speech? If they’re obviously different models, they can’t load them all at once into RAM. If they have to load from storage every time an app is opened, how will they do this fast enough to maintain low latency?


The main LLM is only 1.5 GB so it should only take a half second to load. Or they could keep it loaded. The other models may be even smaller.


Maybe they use the "Siri is waking up and the screen wabbles" animation time for loading the model. That would be clever.


They'll have plenty of time to load the model; It still needs to wait for the user to actually voice/type their request. Invoking Siri happens well before the request is ready.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: