Thank you for the info. This post is just further noodling, not any sort of disagreement.
"Responses" streamed from the cloud differ a lot in terms of whether or not it's some sort of "API" response that could be a couple of cheap kilobytes of JSON, or a full text-to-speech audio file.
I can't speak to Alexa, but I do use Google Maps, and it obviously has two modes for its TTS; a slightly nicer mode that streams from the cloud if you're on the network, but a perfectly serviceable lower quality one generated from the device if it can't get to the cloud at the time. I've often wondered why they don't just put the nicer cloud model on my device. Maybe when Maps was first starting it was a big difference, but now I'm sure my cell can run that "nice" model just as well. It's not like they're running a massive multi-billion parameter thing that only runs on the latest graphics cards (or, if they are, they aren't getting that much additional value for it), I'm sure it's something we could run locally just fine.
Learning could always be done by sampling some things to send up.
For sure I agree with your overarching assessment though.
The irony, as you noted, is even as consumer devices have gotten more and more powerful computation and processing is being pushed to the backend/cloud. For example; to get a EC2 that's comparable with my laptop one has to pay through the nose. But still most of my machine's CPU cycles is idling and is used for rendering webpages. To make matters worse, most of the tools now a days are web only. Even if I want to use there's no app. There are exceptions of course such as IntelliJ, Figma etc., but sadly they should be the norm but aren't.
To me, it feels as if because bandwidth got cheaper engineers just got lazy and stopped having to think to take advantage of client device's capabilities. It's gotten so bad that most of the apps wouldn't even start without internet connection.
"Responses" streamed from the cloud differ a lot in terms of whether or not it's some sort of "API" response that could be a couple of cheap kilobytes of JSON, or a full text-to-speech audio file.
I can't speak to Alexa, but I do use Google Maps, and it obviously has two modes for its TTS; a slightly nicer mode that streams from the cloud if you're on the network, but a perfectly serviceable lower quality one generated from the device if it can't get to the cloud at the time. I've often wondered why they don't just put the nicer cloud model on my device. Maybe when Maps was first starting it was a big difference, but now I'm sure my cell can run that "nice" model just as well. It's not like they're running a massive multi-billion parameter thing that only runs on the latest graphics cards (or, if they are, they aren't getting that much additional value for it), I'm sure it's something we could run locally just fine.
Learning could always be done by sampling some things to send up.