Hacker News new | past | comments | ask | show | jobs | submit login
Annyang.js – Let visitors control your site with voice commands (talater.com)
86 points by danso on June 20, 2014 | hide | past | favorite | 23 comments



Is there a free/open software like this but for the entire desktop environment? I'm looking for something that is highly scriptable so I can have some simple magic phrases and link them to specific scripts If not, any non-free ones? (I know about Dragon but not much)

I want things like "next" "back" "exit" kind of commands. If the trigger is there I can write the scripts myself.

Ideally it would be context aware, so it could check if a browser is the active window, if yes "next" and "back" go to next and previous tabs or something. If IDE is active toggle through files, etc..


You might be interested in watching this talk:

https://www.youtube.com/watch?v=8SkdfdXWYaI

Although, if I recall correctly, he uses Dragon Naturally Speaking, which is a commercial product.


This idea was similarly proposed by a couple people when I posted my voice-recognition resume/website (https://benwasser.com) to Hacker News. Glad to see it taking off.

Also, I didn't see it mentioned, but I found that I needed to use SSL to get Chrome to not re-ask for permission after a brief timeout of silence. I haven't had a chance to test out their implementation, but my guess is that SSL is a big unstated requirement for decent usage.


The SSL requirement is built-in to Chrome, it's one of the security requirements for safe microphone usage. Annyang and anything else using the microphone will always be subject to this requirement.


I remember, in 2006, doing a full multimodal (touch, mouse and voice) web application with W3C standards: VoiceXML, XHTML and a glue dialect called X+V (meaning "xhtml plus voicexml"). The browser at the time was Opera, and the voice recognition was handled client side, with a free IBM ViaVoive plugin that Opera could download with a single click. It already had many things built-in to sync voice events to DOM events. It's nice to finally see it embedded in major browsers. It is not clear to me, however, whether it works offline or uses a web service.


This is really awesome. Sometimes I feel annoyed finding settings or controls on website to perform certain tasks or go through lots of clicks. This api can certainly make them easier and fun too.

If it can do something like "Facebook deactivate my account with reason its temporary" then this lib can certainly redefine user experience too.

Looking forward to integrate it in my app soon.


First few queries worked, but when I told it "Show me tacos" and it honestly responded with "Searching for porn...".

Either way, interesting idea - I'm curious how this might be used not only for interacting with websites in novel ways but for ADA purposes where the user has difficulty controlling a physical input device but can still speak.


You're weren't looking for pink?

On a more serious note, I still wish we had things like this combined with the intelligence and semantic deciphering that Ubiquity did [1]. I've disappointed that Ubiquity didn't go further.

[1] https://blog.mozilla.org/labs/2008/08/introducing-ubiquity/


Love seeing this. The other day there was a thread about how there is so much focus on supporting IE browsers with relatively low market share, yet not much on accessibility issues within all browsers.

I think this is a great step towards highlighting those issues, and offering up a solution to address some of them programatically.


I really hope they stop development at version 0.7734.


Why?


Arrested Development reference -- 07734 => hello => Annyong


Chrome on Android 4.4.2 ask for permission to use the microphone, plays the "listening" sound beep a la google now, but does not seem to respond / recognize. Anyone getting this to work on mobile device?


I'm not in a place where I can try this right now, but how does this work? Is it constantly listening, or do you need to trigger / toggle on/off the listening function?


It uses the Webkit Speech API[1], and (from what I can tell) it's constantly listening[2]

[1]http://updates.html5rocks.com/2013/01/Voice-Driven-Web-Apps-... [2]https://github.com/TalAter/annyang/blob/master/annyang.js#L9...


Does the web speech API expose the raw audio feed?


Interesting... there's a project I was thinking about that this might work for. Or at least, I can learn a lot from their code!


[deleted]


"https://www.talater.com wants to use your microphone." Allow / Deny

Using Chromium


Yup, that's the HTML Speech Input API asking for permission to access your microphone in order to enable the voice recognition.


Yeah, parent [now deleted] was wondering why random websites can access his microphone, apparently his browser didn't ask


The Arrested Development reference in the footer makes me wonder if the name ought to be Annyong instead.


Perhaps they named it for Annyong but changed it for the easier search results?


sounds like "Hello" in Korean, except that it's commonly spelled as Annyung or even Annyong




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: