Annyang.js – Let visitors control your site with voice commands

borplk · on June 20, 2014

Is there a free/open software like this but for the entire desktop environment? I'm looking for something that is highly scriptable so I can have some simple magic phrases and link them to specific scripts If not, any non-free ones? (I know about Dragon but not much)

I want things like "next" "back" "exit" kind of commands. If the trigger is there I can write the scripts myself.

Ideally it would be context aware, so it could check if a browser is the active window, if yes "next" and "back" go to next and previous tabs or something. If IDE is active toggle through files, etc..

rjbwork · on June 21, 2014

You might be interested in watching this talk:

https://www.youtube.com/watch?v=8SkdfdXWYaI

Although, if I recall correctly, he uses Dragon Naturally Speaking, which is a commercial product.

ddod · on June 20, 2014

This idea was similarly proposed by a couple people when I posted my voice-recognition resume/website (https://benwasser.com) to Hacker News. Glad to see it taking off.

Also, I didn't see it mentioned, but I found that I needed to use SSL to get Chrome to not re-ask for permission after a brief timeout of silence. I haven't had a chance to test out their implementation, but my guess is that SSL is a big unstated requirement for decent usage.

brentjanderson · on June 20, 2014

The SSL requirement is built-in to Chrome, it's one of the security requirements for safe microphone usage. Annyang and anything else using the microphone will always be subject to this requirement.

fredbfr · on June 20, 2014

I remember, in 2006, doing a full multimodal (touch, mouse and voice) web application with W3C standards: VoiceXML, XHTML and a glue dialect called X+V (meaning "xhtml plus voicexml"). The browser at the time was Opera, and the voice recognition was handled client side, with a free IBM ViaVoive plugin that Opera could download with a single click. It already had many things built-in to sync voice events to DOM events. It's nice to finally see it embedded in major browsers. It is not clear to me, however, whether it works offline or uses a web service.

kumarishan · on June 20, 2014

This is really awesome. Sometimes I feel annoyed finding settings or controls on website to perform certain tasks or go through lots of clicks. This api can certainly make them easier and fun too.

If it can do something like "Facebook deactivate my account with reason its temporary" then this lib can certainly redefine user experience too.

Looking forward to integrate it in my app soon.

robertnealan · on June 20, 2014

First few queries worked, but when I told it "Show me tacos" and it honestly responded with "Searching for porn...".

Either way, interesting idea - I'm curious how this might be used not only for interacting with websites in novel ways but for ADA purposes where the user has difficulty controlling a physical input device but can still speak.

oso2k · on June 20, 2014

You're weren't looking for pink?

On a more serious note, I still wish we had things like this combined with the intelligence and semantic deciphering that Ubiquity did [1]. I've disappointed that Ubiquity didn't go further.

[1] https://blog.mozilla.org/labs/2008/08/introducing-ubiquity/

onassar · on June 20, 2014

Love seeing this. The other day there was a thread about how there is so much focus on supporting IE browsers with relatively low market share, yet not much on accessibility issues within all browsers.

I think this is a great step towards highlighting those issues, and offering up a solution to address some of them programatically.

RyanMcGreal · on June 20, 2014

I really hope they stop development at version 0.7734.

Kiro · on June 20, 2014

ErnestedCode · on June 20, 2014

Arrested Development reference -- 07734 => hello => Annyong

delgaudm · on June 20, 2014

Chrome on Android 4.4.2 ask for permission to use the microphone, plays the "listening" sound beep a la google now, but does not seem to respond / recognize. Anyone getting this to work on mobile device?

uptown · on June 20, 2014

I'm not in a place where I can try this right now, but how does this work? Is it constantly listening, or do you need to trigger / toggle on/off the listening function?

eggbrain · on June 20, 2014

It uses the Webkit Speech API[1], and (from what I can tell) it's constantly listening[2]

[1]http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-... [2]https://github.com/TalAter/annyang/blob/master/annyang.js#L9...

borplk · on June 20, 2014

Does the web speech API expose the raw audio feed?

VikingCoder · on June 20, 2014

Interesting... there's a project I was thinking about that this might work for. Or at least, I can learn a lot from their code!

on June 20, 2014

[deleted]

yebyen · on June 20, 2014

"https://www.talater.com wants to use your microphone." Allow / Deny

Using Chromium

techpeace · on June 20, 2014

Yup, that's the HTML Speech Input API asking for permission to access your microphone in order to enable the voice recognition.

yebyen · on June 20, 2014

Yeah, parent [now deleted] was wondering why random websites can access his microphone, apparently his browser didn't ask

tbh · on June 20, 2014

The Arrested Development reference in the footer makes me wonder if the name ought to be Annyong instead.

whatthemick · on June 20, 2014

Perhaps they named it for Annyong but changed it for the easier search results?

notastartup · on June 20, 2014

sounds like "Hello" in Korean, except that it's commonly spelled as Annyung or even Annyong