Hacker News new | past | comments | ask | show | jobs | submit login
Apple Intelligence Foundation Language Models (machinelearning.apple.com)
24 points by sebg 58 days ago | hide | past | favorite | 14 comments



It's interesting that they optimize so heavily for safety at the expense of performance


Makes sense to me. I assume the first versions of these models will make their debut in iOS 18 and will represent the most mainstream exposure to date of LLM technology outside of technical audiences (although I’m not sure how much traction LLM integration in office 365 has at this point). If the story is: “Wow, Siri just got a whole lot smarter, even if she sometimes weirdly refuses to answer questions” it’s a whole lot better than “Siri just told my kid the recipe for meth.” No matter how fast or accurately she provides it :)

Performance in this first public release will be considered a baseline and can be improved over time. A PR disaster is a much bigger issue.


The paper says they used 8192 TPUv4 chips for training. Interesting another house didn’t use nvidia for their model training.


Weird they compare their model + adapter to other models without an adapter.


purely for optics


Apple's New Foundation Language Models (AFMs)

1. Two Main Models: - AFM-on-device: ~3 billion parameters, for efficient on-device use - AFM-server: Larger model for Private Cloud Compute

2. Architecture and Training: - Based on Transformer with optimizations - Three-stage training: core, continued, and context-lengthening - LoRA adapters for task-specific fine-tuning - Innovative quantization: 3.5-3.7 bits per weight

3. Performance and Benchmarks: - AFM-on-device outperforms larger models (e.g., Gemma-7B, Mistral-7B) - AFM-server competitive with GPT-3.5 - HELM MMLU (5-shot): AFM-on-device 61.4%, AFM-server 75.4% - GSM8K (8-shot CoT): AFM-server 83.3% - Strong in instruction-following (IFEval) - Best overall on Berkeley Function Calling Leaderboard

4. Capabilities: - Excels in instruction following, tool use, writing, math - Long context support up to 32k tokens - Specialized for tasks like summarization

5. Responsible AI: - Focus on user privacy and responsible AI principles - Extensive safety measures (red teaming, human evaluations) - Lower violation rates on safety prompts vs. other models

6. Unique Aspects: - "Accuracy-recovery adapters" post-quantization - Novel RLHF framework: "Iterative Teaching Committee" (iTeC) - New RL algorithm: MDLOO


tried to find model size for the AFM-server but not succeed...


Just went through the 47 pages long ad (disguised as a whitepaper), but I'm unable to find what makes this so responsible. Anyone could point me to the part that I missed? Must admit I mostly used CTRL-F + reading about 30%.


Did you catch section 7 "Responsible AI"? It starts on page 26 and ends on page 31, covering topics from categories of censored topics to excluding user/sensitive data to red teaming.


Well, sure, there is a section with this title, but based on that "Responsible AI" is more like a trade mark, rather than anything that's even remotely would resemble to the definition of "responsible". Looking closer, the capitalization of the word even reinforces this in me.

It's more like recent those news, about boneless chicken can contain actually bones (and possibly can be made of non-chicken too).


It is laughably lacking information. They claim "12 primary categories comprised of 51 subcategories" and name exactly 5.


Of course it's a useless marketing term. But, the paper lists 4 "Responsible AI principles": (1) Empower users with intelligent tools (2) Represent our users (3) Design with care (4) Protect privacy.

Doing things on-device and not using users' data seems to be the most concrete thing.


MacOS and iOS as operating systems are dumb as bricks but they let developers cook so it doesn’t matter. I don’t understand why Apple Intelligence is seen as world changing when ChatGPT and Claude are a click away on both platforms. Sure, the privacy will be higher with Apple’s on-device stuff but Apples offering can’t compete with the vast knowledge of the likes of ChatGPT. In the end, you’re still going to be interacting with outside LLM’s most of the day and only use Apple’s stuff 5% of the time.


Im not sure they have to compete.

If they * Show local performance * Provide a way for other LLM providers to receive compensation for providing local models

Then there are alot of use cases that get better with a local LLM instead of a remote LLM:

* Offline search

* Faster Text suggestion and transformation

* Client side AI assist for video games (This is better from a cost perspective for the game dev)

* Pushing the LLM runtime cost to the user




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: