Hacker News new | past | comments | ask | show | jobs | submit login
Google launches voice typing in Google Docs (googledocs.blogspot.com)
331 points by nandaja on Feb 25, 2016 | hide | past | favorite | 130 comments



Its pretty impressive. I haven't touched much upon trying such things for quite a few years when I last had a quick play with Dragon Naturally Speaking.

It's getting PRETTY close, with a basic cheap laptop microphone, my Australian accent, and no training.

But I did say pretty close. Not exactly right. And I'm only speaking a few sentences.

I read out a passage aloud, and apart from not quite getting everything right (but getting some remarkably difficult words/names correct), the thing that struck me is how written word, and the art form of such, is in fact quite different from a verbal stream of spoken words. The little sigils and breaking up of a written text conveys a whole bunch of meaning and subtlety that a word for word stream doesn't quite get.

For me, its just inaccurate enough to still be irritating. Everything's working fine, and then I hit a mis-dictated word, have to do a bit of a double take and ask myself "What the...what on earth does it think i'm trying to say here...oh...i said THAT!" which breaks the flow of anything but the most basic and primitive sentence.

But then that makes me wonder, you know what... how many humans can actually follow everything I'm actually saying? It's not like I can peer into their heads and verify the transcription of voice-to-words inscribed on their brains.


Humans don't follow what you say :)

On this topic I have never managed to get my typing above 60 women because this is as about as fast as I can think ahead writing.


Autocorrect misinterpreting what you intended to type is a delicious irony in this context!


I wouldn't complain if you can type as fast as 60 women! ;)


Lol. Got to love the auto correct of wpm to women :)


indeed, he probably utilizes superior techniques with multiples keyboards simultaneously!


Also known as the "room full of typewriter-armed secretaries" technique?


Hahaha, this made my day. 60 women indeed.


Dragon Naturally Speaking has been great for years. Developers have been using it to cobble together a voice programming solution for a few years:

https://www.extrahop.com/community/blog/2014/programming-by-...

http://ergoemacs.org/emacs/using_voice_to_code.html

I've got more notes on Github:

https://github.com/melling/ErgonomicNotes/blob/master/progra...

We really seem to be at the point where voice is ready to become part of our daily use once a real API becomes available. People who use Dragon, for example, need to use the Windows version because of the hooks to Python.


Wasn't there a case where the core engineer on Dragon was screwed out of being able to continue working on the product / in the field?


Dragon was sold to L&H, which had been engaging in fraud and went into brankrupcy; some of the technology was apparently sold to Visteon, some to ScanSoft (which then acquired Nuance, and took on its name).

The founder, Jim Baker, lost out and unsuccessfully tried to sue Goldman Sachs (who had been advising him).

And yet, even today what happened next to the Bakers seems remarkable. With Goldman Sachs on the job, the corporate takeover of Dragon Systems in an all-stock deal went terribly wrong. Goldman collected millions of dollars in fees — and the Bakers lost everything when Lernout & Hauspie was revealed to be a spectacular fraud. L.& H. had been founded by Jo Lernout and Pol Hauspie, who had once been hailed as stars of the 1990s tech boom. Only later did the Bakers learn that Goldman Sachs itself had at one point considered investing in L.& H. but had walked away after some digging into the company.

- a 2012 NY Times article "Goldman Sachs and the $580 Million Black Hole" http://www.nytimes.com/2012/07/15/business/goldman-sachs-and...

recent discussion here: https://news.ycombinator.com/item?id=10994707#up_11014194


> I read out a passage aloud, and apart from not quite getting everything right (but getting some remarkably difficult words/names correct), the thing that struck me is how written word, and the art form of such, is in fact quite different from a verbal stream of spoken words. The little sigils and breaking up of a written text conveys a whole bunch of meaning and subtlety that a word for word stream doesn't quite get.

Worth noting that kids who start using this young will have much less problem adapting their speech for automated transcription. Likewise, adults often were drawn to "natural-language capable" Ask Jeeves, but kids just searched with alta vista using grammatically bizarre but effective terms.


> But then that makes me wonder, you know what... how many humans can actually follow everything I'm actually saying? It's not like I can peer into their heads and verify the transcription of voice-to-words inscribed on their brains.

As far as I know, I have good hearing, and I often struggle to understand people. It would be interesting to see what a human error rate on transcription is.


I have historically done my own transcription for closed captions on my company's YouTube videos (we have a policy to never post a video without captions, as accessibility matters to us), but recently started outsourcing it to a friend on fiverr. We're close and she knows my accent well, and hears me talk about the subjects of the videos now and then, but even she has to verify some stuff with me along the way. I don't have an exact number for errors, or how often she has to verify words and phrases with me, but it's higher than I would have expected.

I also recall when I worked in television, we'd have someone come in to do captions for the Rockets basketball games...they made a lot of mistakes. Maybe every other sentence had a mistake in it. Partly this was the speed at which they had to type (using a Stenotype style device with chording keyboard), but I suspect they were also mis-hearing things quite often.

In particular, technical terms (whether for computers or basketball) are elusive if you don't speak the language. "WYSIWYG" was a tricky one, in one of my videos.


Do the voice commands work for you? I just read it a letter from my insurance company and it was not far off word perfect, including URLs and email addresses. I can't get most of the voice commands to work though, which is pretty annoying. Punctuation and new paragraphs work, but none of the new commands such as deletion and formatting. It says English-only, but does that mean US English? I'm UK English.

"delete delete last word delete line select line delete paragraph oh I give up"


Not accurately enough (given I haven't read much of the documentation).

I actually verified it was kinda picking them up, but it did this weird thing where I managed to insert a period if i remember correctly, but then it took it out after I kept talking?


Curious, are you using the English (Australia) setting?


Yep, I used the english(Australia) setting. Though I wonder whether it might be an issue with the types of Australian accents as well.

We don't have the accent diversity of the brits/north-americans, but we do still have 2 or 3 kinds with some local variation.

Plus I know some Australians have asked where my accent is from (err...Australia?), so I accept the possibility that my own little idiosyncrasies might be throwing it off a bit, even though when I travel overseas and talk, I think I sound like Steve bloody Irwin...


Super impressive.

My wife just spoke full-speed Vietnamese into a Google Doc, 100% correct, even the tone marks.


Works pretty well with Hindi, too, neat!


Yup! Just tried. Now only if it works in Firefox too.


Do you have to tell it what language you're using in advance?


Yes, there's a dropdown where you select the language you're going to speak.


Wow! I was just able to dictate English, German, French, and Russian digits, with only a few mistakes in Russian.

It's just another demonstration that Google and other similar companies are fundamentally on a different trajectory. Let's ride this rocket...


Is there something particular about Vietnamese that makes it easier than other languages for dictations? Like, does it lack homophones, are letters always pronounced the same way? 100% correct sounds incredible.


> Is there something particular about Vietnamese that makes it easier than other languages for dictations?

phonemic orthographies (These languages should work better) - https://en.wikipedia.org/wiki/Phonemic_orthography

English is highly non-phonemic the only language I would think is worst is French/Greek with silent letters and the different accents and in Modern Greek /i/ can be written in six different ways: ι, η, υ, ει, οι and υι. My Italian and most Eastern European languages would be easier European languages for voice translation.

This seriously reminds me of my days in college and grad school learning/teaching dead languages. The dead languages (Latin, ancient Hebrew, Classical/Koine Greek and Aramaic) we really don't know how they were pronounced so we just made them phonetic, which tells you that we don't speak them correctly since no language is 100%.

Vietnamese is interesting because there are several dialects that would trip this up but doesn't appear to. https://en.wikipedia.org/wiki/Vietnamese_language


What other languages are like this?


The linked Wikipedia page on phonemic orthography uses Serbian as an exemplar. This section looks like what you're after - https://en.wikipedia.org/wiki/Phonemic_orthography#Compariso....


Vietnamese writing is pretty much phonetic. Technically, you could learn the alphabet and read Vietnamese almost perfectly without prior understanding of word pronunciation (obviously there are some exception, but there much much less variation than English). Also, different regions in Vietnam pronounce certain alphabet letters differently, but within-region per-letter pronunciation is very similar.

I suspect Vietnamese tones (6 tones in total) give an stronger, more consistent, "orthogonal" signal that helps speech-to-text. For example, a word spoken with an "up" tone will always be spoken with an up tone. Whereas in English, depending on the word's position in a sentence, the speaker's emotion, etc, you might have great across-word tonal variation, even within the same speaker's multiple utterances of the word.

When you listen to Vietnamese it's very "short and choppy" with a lot of tonal variation. I suspect that the recurrent neural nets Google is training for speech recognition purposes have better inference over these separable, tonally-consistent utterances compared with harder to separate and highly variable English speech.


I first realized the voice to text improvement while watching my three-year old daughter search on my Android tablet...she can't spell (correctly) but used Google voice search quite effectively.

This should enable the very young, the very old, the disabled, etc. to digitize (and therefore share) their words, stories and worlds. Powerful.


I've had the same experience watching my (now 4yo son) voice search on my phone. It picks up what he's saying very reliably ... even when he's still laughing uncontrollably from the results of his last search :)

In case you were unaware, you can listen to past voice searches here: https://history.google.com/history/audio sometimes my wife and I will go back and listen to my son's searches for good laughs all around ;)


I'm sure Google is mining (anonymized) data from this -- I wonder if they can use it to improve the transcriptions for Youtube automatic captioning by seeing what kind of revisions people make to the voice transcription.


I've often wondered this but it feels like they're very siloed. Google Now seems to improve every year; it almost never gets anything wrong that I speak nowadays but just a few years ago it could get rough if I'm trying to get it to type a bunch of stuff.

But YouTube auto captions? They seem just as horrible today as they were years ago.


There are a lot of checks and balances internally re: who's allowed access to user data, even in cases where it seems logical & obvious. Regardless how it appears externally, Google does take data privacy & security very seriously (to the point where sometimes it's painful trying to get things done).


For talking heads speaking clearly in English, YouTube speech recognition is improving noticably.


I recently started paying someone on fiverr to caption all of the videos I make. Costs practically nothing ($5 for five minutes plus tip), and is so much better than the YouTube auto-captions that it's not even comparable.


The Chrome integration is what is new right? I have been using Google Docs via phone using voice typing for a LONG time now.

In fact, when I wanted to use voice typing at my desk I would just open the phone, open the document on both the phone and Chrome and watch the text appear on screen. Then edit it from the keyboard.


Cool idea. You won't get voice commands though


This is brilliant. I will try it out.


Google Now has been used as voice reminder keeper by me for a long time!

There are cases where it can still miss by a large margin, but general usability has greatly improved over past 2 years.

I would love to see that this thing become smarter, that it can detect you are saying a list of things then format them automatically. That well be truly AMAZING!


I am sure that the team who owns voice recog at G is working on it.

I wonder organizationally if it's structured that way, but I digress...

This particular integration is really impressive after some cursory testing.

If I were CTO at, say, Dragon... I would be having a sleepless night.


This had been announced back in September 2015, albeit from a Google Docs for Education point of view:

https://googleblog.blogspot.co.nz/2015/09/google-docs-classr...

Under the first video, see "With Voice typing, you can record ideas or even compose an entire essay without touching your keyboard."

---

It seems that what's new are the "Voice commands" to Select Text, Format, etc. etc. That's really great! Take a look at the commands here:

https://support.google.com/docs/answer/4492226


I can't stress how important this is for people like my daughter who at age 13 reads at a Grade 3 level thanks to a genetic disorder that affects her learning. She's a great verbal communicator, but the reading/writing is an immense challenge. Things like this will change her life and open up opportunities that we're previously not open to her.


Voice dictation is HUGELY useful but few people use it anymore. Most of us tried Dragon in years past, it sucked, and we gave up.

I believe so much in this that I made a Siri->Windows bridge and dictate 80% of my text with it. (except that hash rocket) http://myechoapp.com/


David Pogue has been using Dragon for over a decade:

https://www.youtube.com/watch?v=x0GXX-SJuQM

John Siracusa used Dragon to write his 20,000 word Mac reviews.

Developers have even used Dragon to assist with programming: http://ergoemacs.org/emacs/using_voice_to_code.html

I think voice accuracy is there but we need more integration with our apps and operating systems. Consistency would help too. Google recognizes "undo" while Dragon recognizes "scratch that"


Point taken but I would guess that they use nice headset mics and extensively train (and maintain and carry around) their voice databases. With Siri you can get 99% accuracy if you dicate punctuation and there's no training or fancy mics.


I don't believe you need to extensively train the voice database. Most don't require any training to start. Don't think expensive microphones either. Maybe you should just read some of the links already posted.


Dragon strongly recommends training and using a good mic, but mine is good and cost 30 BP which I don't think is expensive in this context. Initial training takes about 15 minutes.


A very niche use-case and an anecdote at that, but I guess every hospital radiology department in the UK employs at least one full-time transcriber who types radiologists' recorded notes.

In the hospital I worked in, there were four full-time staff doing just this. Just in radiology.

When I had this job for a couple of months most of a decade ago, it was obvious what would eventually happen to those jobs.


I like the look of your product, but I think you are being disingenuous having this text on your main page "We don't store any of your words and your data never touches our disks.". I don't think you would put people off by making the Siri note in the privacy policy more prominent.


Good feedback. Not intending to be disingenuous. I'll talk to my partner about how we can make that clearer. It's intending to explain that our software (iOS, Windows, Cloud) never logs or saves any of the text.


Is there a technical reason this only works in Chrome? It wouldn't work for me in my main browser (Safari 9.1).


Safari doesn't support the 'getUserMedia/Stream API' so it can't work on safari.


Thanks. A quick search shows 'getUserMedia/Stream API' is supported in Firefox though and people on this thread don't seem to be able to use it in Firefox (I haven't personally tested).


Hmm, then maybe it's using the speech recognition API which is really only supported in chrome right now (it's in firefox but behind a flag).

Although i just assumed that google would be running the speech recognition themselves.


Or Firefox?


Google seriously needs to build an Amazon Echo competitor NOW. Using Nest (if that division can muster actually creating a new product) and it's learning chops.

Google: please don't mess this up as a science experiment as you do so many products - build it for real.


Amazing work, it is good to see Google Docs moving forward. I have written plenty of essays in Google Docs in the past, hopefully this new feature is just what some have been wanting. Thanks for the great work guys!


Can't wait till Google starts data-mining background noises to "improve" your experience. Yes, they can do it with other of their services, but this one is the most likely to provide the most personal information. Also, it will capture much more information, since you will be running it over longer periods of time.


Looks like it's still not great for writing fiction. I didn't see anything for "Quotation Marks" for dialogue in their Voice Commands, and even if it did, I sure hope I wouldn't have to say "Quotation Mark What was that Question Mark Quotation Mark asked Billy Period. Quotation Mark I said Poop Nuts Exclamation Point Quotation Mark said Daniel Period."

This is much much slower than I can type. I know it'd be hard to infer punctuation, but even just a one syllable shorthand command for each common bit of punctuation would be nice (you can disable it by default).

I really want to use this stuff too, because it means I could write stories while exercising at home, which is apparently how Randy Pausch wrote his last novel (he dictated it and had someone else transcribe it... I'd like Google to do that last bit for me).


If you're willing to pay, Dragon Naturally Speaking has more features:

http://whatsnext.nuance.com/connected-living/thursday-tip-ho...

If you simply want a better free product, you'll have to wait a few more years. In the meantime, the solution will work well for tens of millions of people and Google can learn from them.


How are they nowadays? I tried their free demo app awhile back and had to correct pretty much every third spoken word using an annoying user interface to go back to my previous text, that it was easily taking less time for me just to type it in the first place.


Siri is powered by Nuance technology. They've been pretty good for a decade now. David Pogue and John Siracusa have been using it to do their writing.

http://arstechnica.com/apple/2013/10/os-x-10-9/23/


They have my emails, they have my documents, they have my pictures, they have my location, they have my browsing history, and now they can have my voice.

Only lacks DNA, fingerprints, retina footprint and non-verbal communication.

Oh wait, it's only a matter of time.


I was waiting for this comment. What's the alternative? No voice-recognition? Voice recognition that only runs client-side and lacks all the advantages a centralised service can provide?An equivalent provided by someone else that, for whatever reason, you trust more than Google?

Edit: I didn't downvote you, though, and I've upvoted to counteract whoever did. The point you're raising can clearly contribute to an interesting discussion.


> Voice recognition that only runs client-side and lacks all the advantages a centralised service can provide?

By the way, my impression from talking to someone in the know was that Google has been developing client-side voice recognition that is only a about couple of years behind their cloud voice recognition in quality. (The client-side service still requires the huge amounts of voice data to be used to train the software, but it can run offline on a cell phone processor.) Right now being two years behind in quality is quite noticeable, but that will change soon once quality plateaus near perfect, and client-side voice recognition will make sense again.

Just rumors though.


My prediction is client side voice recognition, as good as this, in >2 years.

Powered either by mobile GPU's and more efficient ML algorithms than we currently have, or specialist chips like http://www.movidius.com/


I understand that my comment makes people uncomfortable as in "ha this is the typical anti-Google-paranoid guy". I respect that, and I used a specifically skeptic tone on purpose.

I love Google, they make great products, but at the same time I can't help being afraid of this marvelous monster.


Oh, I hear you. Part of me totally shares your concern, and - just like pretty much anyone else - I wouldn't want Google recording everything I say into my microphone. But that's pure speculation right now; let's wait until we know whether or not they're actually doing that, or until someone's at least gone through the Ts&Cs with a fine toothcomb.



>What's the alternative? No voice-recognition? Voice recognition that only runs client-side and lacks all the advantages a centralised service can provide?

Distributed, self-hosting would be far safer than having all of your most personal information owned by a remote centralized authority.


The alternative would be for Google to take end-to-end encryption seriously, even for their databases, so even they can't access most of the information gathered from the users.

But they haven't even done that for Hangouts yet, let alone for their databases.


If the voice recognition were strictly client-side, I might actually be willing to try it. The advantages of a centralized service would appear to accrue largely to the centralizer.


Thanks for the edit. It is quite amusing to see the up and down votes going, my karma keeps moving but stays about the same. Maybe I should have been even more sarcastic!


This is what really concerns me, and why I've avoided using 'google now' on my phone. With this data, google could easily construct a 'voice signature' for every one of their users. And those voice signatures are just one FISA warrant away from falling into the hands of various national security agencies.

I would rather google had my DNA on file if I had to choose between the two. Why? Because voice signatures can be used to passively authenticate/identify someone without them even realising they've been authenticated/identified. You just need control of a nearby microphone while they happen to be talking. At least with DNA, I'd probably notice someone jabbing a needle in my arm or sticking a cotton swab in my mouth.


I could not envision living like Vincent Freeman (Ethan Hawke) in Gattaca, making sure you live no trace of yourself anywhere.

I you are alive, you already leave traces everywhere. No need for a needle or cotton swab here.

Of course I am being paranoid, but we all should be :)


I must admit I don't do a morning scrub-down a la Vincent in Gattaca :)

It's true that DNA can also be 'non-obviously' collected. But there are two key differences between DNA and voice auth, when looked at from the perspective of a national surveillance agency:

1. 'Non-obvious' DNA collection occurs after the fact, and is much more costly. You have to send someone to the physical location, collect samples, and process them in a lab.

2. Because of (1), DNA could never be used as a 'always on' mass surveillance system. As opposed to voice auth, where you just need enough microphones, covering enough area, streaming data back to a server for processing. Sorta like how Batman finds the Joker in 'The Dark Knight' :)


Your actual voice is one regular warrant away, with a wiretap on your phone.


True, but that warrant can only be obtained from the judicial branch of government by law enforcement officers conducting a criminal investigation, and only if it meets the following criteria (among many others which I've excluded for brevity, see: https://www.law.cornell.edu/uscode/text/18/2518):

- a full and complete statement of the facts and circumstances relied upon by the applicant, to justify his belief that an order should be issued, including (i) details as to the particular offense that has been, is being, or is about to be committed, (ii) except as provided in subsection (11), a particular description of the nature and location of the facilities from which or the place where the communication is to be intercepted, (iii) a particular description of the type of communications sought to be intercepted, (iv) the identity of the person, if known, committing the offense and whose communications are to be intercepted;

- a full and complete statement as to whether or not other investigative procedures have been tried and failed or why they reasonably appear to be unlikely to succeed if tried or to be too dangerous;

- there is probable cause for belief that an individual is committing, has committed, or is about to commit a particular offense enumerated in section 2516 of this chapter;

- there is probable cause for belief that particular communications concerning that offense will be obtained through such interception;

- normal investigative procedures have been tried and have failed or reasonably appear to be unlikely to succeed if tried or to be too dangerous;

- there is probable cause for belief that the facilities from which, or the place where, the wire, oral, or electronic communications are to be intercepted are being used, or are about to be used, in connection with the commission of such offense, or are leased to, listed in the name of, or commonly used by such person.

- No order entered under this section may authorize or approve the interception of any wire, oral, or electronic communication for any period longer than is necessary to achieve the objective of the authorization, nor in any event longer than thirty days.

In contrast, FISA warrants are indiscriminate and the criteria for acceptance seems to be 'asking for one'. Of the 35,529 FISA warrant applications made up to 2013, only 12 were rejected.


> that warrant can only be obtained from the judicial branch of government by law enforcement officers conducting a criminal investigation

You seem to be laboring under the assumption that your phone company is more benign than Google, which is an assumption I am not willing to make (for Verizon or ATT among others)


I think you have more faith than I do in the process. Couldn't the relevant LEs pick & choose the most friendly judge?


Biometerics [1]. Googling for Google's biometrics privacy policy.[2]

HN likes to talk about the layperson non-techies disregard for technology's impact on our rights, but a grep of this thread turned up no hits on biometrics.

[1]: https://en.wikipedia.org/wiki/Biometrics

[2]: https://www.google.com/?gws_rd=ssl#q=inurl:google.com+privac...


There's some conversation about this fairly far down the page. It would appear that the HN crowd are not even slightly concerned about the privacy implications of 'passive' biometric identification. Given HN is a fairly intelligent and tech-adept demographic, I suspect the general population would care even less. Pretty worrying IMHO.

Short of surveillance cameras in our homes, I can't think of anything worse for privacy than government coming in to possession of passive biometrics like voice signatures (e.g. a google voice DB dump collected through a FISA warrant). Just in case people genuinely don't comprehend the danger:

1. They allow a third party to identify you against your will.

2. They allow a third party to identify you without your knowledge that you have been identified/authenticated.

3. You can't revoke a voice sig credential like you can with a password credential. Once the government has it, they can do (1) and (2) at will for the rest of your life.

http://www.scientificamerican.com/article/biometric-security...


This thread doesn't hit on "chemtrails" either.


For obvious reasons.

Your voice is a biometric. Innovation is fine, but Google needs to address our digital rights.


Why isn't this just built into the browser or the OS?


Windows does have a built in dictation tool, you can even us it in .net (i am for my side project).


I've often wondered about this. There are any number of desktop environments for linux, but I haven't come across any that are speech oriented.

One reason I can think of is that there was no incentive to build speech recognition because in most organizations that use computers, people talking to their computers is likely to disturb everyone around. But with more and more jobs becoming work from home, this may not be such a problem in future.


I wouldn't be surprised to see it shipping with Chrome sometime in the near future.


Dictation is built into recent versions of OSX.

https://support.apple.com/en-us/HT202584


This is totally cool. Their translations are pretty good. Regular desk microphones (condenser) work just fine.

Thank you for posting this.


I would like to use this kind of thing even if I had to type the commands... anything to avoid mousing, and holding down arrow keys, and all that tedious stuff. Typing sentences is totally natural for me, I can do it all day. For years I've been dreaming of a renaissance of ed-like editors.


Welcome to the future (of 2013): https://youtu.be/8SkdfdXWYaI?t=9m5s https://youtu.be/8SkdfdXWYaI?t=16m22s

Don't forget to see the entire talk (28 minutes): https://www.youtube.com/watch?v=8SkdfdXWYaI


One important note is that your throat gets tired talking, just like your hands do typing.


I think it is an issue of convenience. Anecdotal evidence, I have never had trouble doing either of those for 8 hours straight (and more).

Voice will need some serious software support if it is going to take off. There is zero chance that users will learn those mnemonics, just as now almost nobody knows how to touch type. The usage will be more in the form of general commands, than specifying every action like we do it now with the keyboard+mouse.


Seriously folks, this is awesome. Does anyone have a recommendation of a USB microphone/headset that will work with Ubuntu so that I can start using this a bit more comfortably?

I did not know voice typing was possible before, this has made my day. All I need now is to invest in a decent microphone!


Microsoft lx 3000 is unidirectional and has noise cancelling and is cheap. Caveat : I haven't tried it with Ubuntu.


Get a Blue Yeti.


Blue Yeti is good (and not too expensive), but isn't it a bit overkill for voice typing?

If you generally feel like having a better microphone would help in other cases as well (Skype, vlog, what-have-you), then I'd agree with the above post.


Not available for me, menu item grayed out. Austria/Europe here. Language is set to English/USA.


I have that on Firefox, it works with Chrome.


Alright, was with Firefox


Am I the only one that finds text to speech cumbersome and more of an impediment than anything else?

I applaud the effort, but spreadsheets are difficult enough to navigate at the best of times, let alone trying to manipulate with your voice.


Depends on the context.

Running precise commands? Keyboard wins.

No keyboard? I love the voice search of my roku 4 over trying to search via virtual keyboard and arrow keys, or using my Amazon Echo to find artists/songs

Long text? While I prefer typing, I remember one author I heard (Kevin Anderson, I think) who dictates all his books (first drafts) into a mini-recorder and pays someone to transcribe them. I find that hard to comprehend, but I bet he's not unique.

Spreadsheets? Keyboard wins for data entry, but I bet voice would be convenient for analysis. "Sum column B" "What is the average of column D, excluding 0 values"


This sounds like it could be a massive step up for disabled and low literacy people worldwide. The fact that it's in so many languages, and accessible anywhere with internet, really distinguishes it.


To the contrary, I believe, while it will be helpful in a number of ways, the orthography skills would likely suffer from lack of exercise as a computer takes on the task. Writing helps a lot to have to think about the spelling.


Amazing, I always wanted a feature like this. Sometimes, we're so tired to move our fingers to type something. I hope they make it as convenient as possible, like introducing voice typing in Google Keep, and creating a complete note just by saying OK Google.

Sometimes, things pop-up in mind suddenly and we need to note them down right away.This is definitely the coolest feature. Also, on the other side, this is also helpful to people who face problems with typing, or somehow can't type (temporary injury in hand, or slow typing speed, etc.)


Is there any information about an API for this? I know that they abandoned their old Voice to Text API, which is sad, because there aren't that many alternatives out there. I'm using Watson right now, but it's really bad, and the language range is very narrow. I'm also open to any suggestions for Voice to Text online APIs or local server side solutions.


In response to multiple language support, an alternative use for the voice recognition is learning to speak well in another language. I've already improved my native English, by noticing which words are mistranscribed. And assuming the problem is with my imprecise speech, not the bot.


I tried it a few days ago and it was amazing, even in Spanish (latam accent) and English with my non-native accent.

Anyone knows if this is available through API? I would love something like this when writing text on emacs & org mode :-)


This is really nice. I think laptops will need to start having array mics for better far-field capture.

Tested it for a quick Todo list and it was about 95% accurate. I can see myself using this


I suspect that array mics can be important for accurate voice recognition. My anecdotal impression, for example, is that Amazon Echo is MUCH better than Siri on an iPhone at recognizing what I'm saying. I can tell Echo from across the room with music playing to add something to a shopping list and it gets things mostly right.


The menu item "Voice Typing" in "tools" is disabled for me, with no explanation or hint as to why.


This actually works, finally a 'machine' which understands my easer european new york accent. Bravo!


dictation.io is another voice recognition system that impresses me, and it understands tons of languages.


It's the same Google ASR on the backend.


We may programming in voice in near future.


we may programming in voice in near future.


مرحبا


Tried with Romanian, got a bunch of random words .. about 50% right, the resulting text is totally unintelligible. I was expecting a lot more accuracy.. Apple's built-in dictation does way better than this.


I bet it doesn't support Portuguese in all its variants.


It does support both "Brasil" and "Portugal". Have you tried it out? Does it work well in Portuguese?


Not in a place I can try it.

But usually such dictating software does a very bad job regarding Portuguese, regardless of the language variant.

Apparently I offended some people, thanks for the downvotes!


The offensive part is that you're what you're stating is something you haven't even bothered to examine before deciding to write down an opinion.


Because in 30 years of computing experience I am used to my mother tongue being forgotten by American companies when they announce voice dictation software.

I am not sorry if others cannot take critics from those used to be ignored by them.


Even if you'd just asked the question "How good is Portuguese support?", whilst you probably wouldn't have received too many upvotes, I think you at least wouldn't have been downvoted into oblivion. Unfortunately, what might have been an important point will now just get buried.


If you consider that your comment is against the same company behind translate.google.com, which does have a (wonderful) Portuguese support, you'll probably be able to see how unthoughtful it was.

Heck, Google even have specific doodles for Brazil and even have an office here in my city.

I'm a native Portuguese speaker and I'm not pissed off as you are.


Not offended. You made a baseless assertion. Have you actually tried it yet?


Even if what you say is true, there are a lot of Portuguese variants. You can't expect Google to support every one of them.

https://en.wikipedia.org/wiki/Portuguese_dialects


What fun:

I have recently been reading The Hobbit with my 6 year old son Benjamin stop he has become engrossed in the book somewhat and then just hearing about the Adventures of Bilbo Baggins and Gandalf as well of course as mine growing are a diary Norrie tilly tilly Bailey and wailing not forgetting the king Under the Mountain himself starring oakenshield Gollum Gollum




Consider applying for YC's W25 batch! Applications are open till Nov 12.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: