Is there a free/open software like this but for the entire desktop environment? I'm looking for something that is highly scriptable so I can have some simple magic phrases and link them to specific scripts If not, any non-free ones? (I know about Dragon but not much)
I want things like "next" "back" "exit" kind of commands. If the trigger is there I can write the scripts myself.
Ideally it would be context aware, so it could check if a browser is the active window, if yes "next" and "back" go to next and previous tabs or something. If IDE is active toggle through files, etc..
This idea was similarly proposed by a couple people when I posted my voice-recognition resume/website (https://benwasser.com) to Hacker News. Glad to see it taking off.
Also, I didn't see it mentioned, but I found that I needed to use SSL to get Chrome to not re-ask for permission after a brief timeout of silence. I haven't had a chance to test out their implementation, but my guess is that SSL is a big unstated requirement for decent usage.
The SSL requirement is built-in to Chrome, it's one of the security requirements for safe microphone usage. Annyang and anything else using the microphone will always be subject to this requirement.
I remember, in 2006, doing a full multimodal (touch, mouse and voice) web application with W3C standards: VoiceXML, XHTML and a glue dialect called X+V (meaning "xhtml plus voicexml"). The browser at the time was Opera, and the voice recognition was handled client side, with a free IBM ViaVoive plugin that Opera could download with a single click. It already had many things built-in to sync voice events to DOM events.
It's nice to finally see it embedded in major browsers.
It is not clear to me, however, whether it works offline or uses a web service.
This is really awesome. Sometimes I feel annoyed finding settings or controls on website to perform certain tasks or go through lots of clicks. This api can certainly make them easier and fun too.
If it can do something like "Facebook deactivate my account with reason its temporary" then this lib can certainly redefine user experience too.
First few queries worked, but when I told it "Show me tacos" and it honestly responded with "Searching for porn...".
Either way, interesting idea - I'm curious how this might be used not only for interacting with websites in novel ways but for ADA purposes where the user has difficulty controlling a physical input device but can still speak.
On a more serious note, I still wish we had things like this combined with the intelligence and semantic deciphering that Ubiquity did [1]. I've disappointed that Ubiquity didn't go further.
Love seeing this. The other day there was a thread about how there is so much focus on supporting IE browsers with relatively low market share, yet not much on accessibility issues within all browsers.
I think this is a great step towards highlighting those issues, and offering up a solution to address some of them programatically.
Chrome on Android 4.4.2 ask for permission to use the microphone, plays the "listening" sound beep a la google now, but does not seem to respond / recognize. Anyone getting this to work on mobile device?
I'm not in a place where I can try this right now, but how does this work? Is it constantly listening, or do you need to trigger / toggle on/off the listening function?
I want things like "next" "back" "exit" kind of commands. If the trigger is there I can write the scripts myself.
Ideally it would be context aware, so it could check if a browser is the active window, if yes "next" and "back" go to next and previous tabs or something. If IDE is active toggle through files, etc..