> or, better yet, the open ears of a friendly audio assistant
It’s interesting you mention this. I’ve been wondering this for a while now - there have been made leaps recently in LLMs, speech synthesis and speech recognition. There are sophisticated language models, computer voices that are hard to distinguish from real humans, and software that can reliably understand even the worst recording of someone speaking.
Yet still, those three components have not yet been integrated in a next generation Alexa yet. But why? It doesn’t even sound particularly complicated (on the scale of all the prior art necessary).
BingChat already takes voice input and gives voice replies, but still requires the push of a button in the UI to start, it still can't run as a voice assistant in the background.
Principal–agent problem! Previous generation assistants have been frozen in time by managerial capitalism. This is evident in literally all the incumbents that matter in the western world: Google, Amazon, Apple, Microsoft and Samsung.
It took founder-led OpenAI to kick everyone in the butt. Thankfully the wheels are moving again to get to what you're describing, an inevitability in the very near future.
Sam Altman is the principal. The agents are MAMAA middle managers, who are often also smart (though I'm clearly biased here, having been one of those in my previous life) but highly incentivized to be obedient and risk-averse.
It’s interesting you mention this. I’ve been wondering this for a while now - there have been made leaps recently in LLMs, speech synthesis and speech recognition. There are sophisticated language models, computer voices that are hard to distinguish from real humans, and software that can reliably understand even the worst recording of someone speaking. Yet still, those three components have not yet been integrated in a next generation Alexa yet. But why? It doesn’t even sound particularly complicated (on the scale of all the prior art necessary).