You may be able to make a case that Apple's model has less parameters or perform...

You may be able to make a case that Apple's model has less parameters or performs worse than other models on standardized tests.

Thats far from being "inferior" when you are talking about tuning for specific tasks, let alone when taking into account real-world constraints - like running as a local always-running task on resource-constrained mobile devices.

Running third party models means requiring them to accomplish the same tasks. Since the adapters are LORA-based, they are not adaptable to a different base model. This pushes a lot of specialized requirements onto someone hoping to replace the on-device portion.

This is different from say externally hosted models such as their announced ChatGPT integration. They announced an intention to integrate with other providers, but it is not clear yet how that is intended to work (none of this stuff is released yet even in alpha form).