He put into words something that I've always had a hunch about: that small specialized models, will have their own niche, especially when it comes to crafting delightful UX experiences... those tiny models might be able to push us over that extra 5% that separates a "meh" product than a "WOW" product.
Same thing with GPU's vs CPU's--an orchestration of thousands of simple cores is better suited to some use-cases than large general models. Even though the performance might even be identical for a given task, there's something to be said about cost and latency, especially when orchestrating 1000's of them.
Just imagine what 1000's of tiny models could achieve vs one giant model?