Hacker News new | past | comments | ask | show | jobs | submit login

> I’ve never understood why voice recognition has always attempted to be complete understanding of arbitrary input, rather than follow a simple command language

Because the UI affordances (in this case the control language) wouldn’t be discoverable or memorable across a large range of devices or apps. Moreover, speaking is an activity that allows for an arbitrary range of symbol patterns, and a feedback loop between two who are in dialog are able to resolve complex matters even though they start from different positions.




I mean, right now the current state is effectively an undiscoverable control language, with somewhat flexibility but generally fails/unreliable unless you restrict yourself to very specific language — language that differs based on the task executed, often with similar but different specific formats required to do similar actions

I’d argue that if the current state is at all acceptable, then a consistent, teachable and specific language format would be an improvement in every way — and you can have an “actual” feedback loop because there’s a much more limited set of valid inputs, so your errors can be much more precise (and made human-friendly, without, I think, made merely programmer-friendly).

As it stands, I’ve never managed a dialogue with Siri/Alexa; it either ingests my input correctly, rejects it as an invalid action, does something completely wrong, or produces a “could not understand.. did you mean <gibberish>?”.

Having the smart-ai dialogue would be great if I could have it, but for the last decade that simply isn’t a thing that occurs. Perhaps with GPT and it’s peers, but afaik GPT doesn’t have a response->object model that could be actioned on, so the conversation would sound smoother but be just as incompetent at actually understanding whatever you’re looking to do. I think this is basically the “sufficiently smart compiler” problem, that never comes to fruition in practice


It's like using a CLI where the argument structure is inconsistent and there is no way to list commands and their arguments in a practical way.


Close your eyes and imagine that CLI system is instead voice / dialog based. The tedium. For bonus points, imagine you’re in a space shared with others. Doesn’t work that well…


What? No, I think it'd be great! I'd love to be able to say out loud "kube get pods pipe grep service" and the output to be printed on the terminal. I _don't_ want to say "Hey Google, list the pods in kubernetes and look for customer service".

The transfer between my language and what I can type is great. It starts becoming more complex once you need to add countless flags, but again, a structured approach can fix this.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: