Hacker News new | past | comments | ask | show | jobs | submit login

I understand why people use it but as a recipient I really hate voice messages. Here in Spain they are becoming more common too.

The problem is as the article mentions. Difficulty searching back, not being able to use Google translate (I'm still learning Spanish), and the time it takes to listen to them. Sending a voice message is faster than typing for the sender but much slower than reading for the recipient which makes it especially annoying in group chats where you end up forcing this extra time on all participants. For this reason I don't think it's very social.

And then there's the issue of listening to them in public, having to dig around for headphones if you don't want to bother your surroundings.

So in general if someone audio messages me I just ignore it until they type their message.

In most languages swipe typing is not a lot slower than talking anyway. And if people really want to talk they can use speech recognition.

I run all my chats through my own matrix server (with bridges) and I was actually thinking of making something that automatically tries to transcribe them. The problem though is that I speak 3 languages :)




> In most languages swipe typing is not a lot slower than talking anyway. And if people really want to talk they can use speech recognition.

Not really, I disagree with "not a lot slower". It is much slower and much more frustrating than talking when you have to redraw a word for the fifth time and the keyboard insists on "Chegamiknit" when you want to write "Champagne".

Anyway, it's not an all or nothing thing. Sometimes it's going to be faster, more convenient and appropriate to send an audio message for both parties because tone and articulation convey meaning that text does not. edit: and when you know the other party can act on it. Eg: they can listen and can't talk back but can write back, or the inverse and you have to send them text messages because they can't listen (rushing to catch a metro or to their car) but they can read and they can talk back but can't write fast enough (because rushing to catch a metro or to their car) (yeah, it happened :). So, use when appropriate ?

I also suspect voice messages have different usage across cultures and subcultures and group of peers and the context.

It's also a different way to be with the other person, sharing an audio space. Of course there are limits.


>is much slower and much more frustrating than talking when you have to redraw a word for the fifth time and the keyboard insists on "Chegamiknit" when you want to write "Champagne".

Dont use autocompletion then. You can abreviate or shorten some words too. Champgn and so on.


> Dont use autocompletion then.

That just negates the swipe typing argument from parent for text not being much slower than talking.


> don't want to bother your surroundings

all apps with voice messages I happened to use support playback through voice speaker, if you put the phone to the ear as if you were just speaking with someone

but I don't really remember how I discovered that and it seems that majority of users don't even know about this feature


Depending on how loud that speaker is, people around you would still be able to hear it, even if its up to your ear. Years ago when I used public transport more, it was almost daily that I'd be able to listen in on peoples phone conversations because of how loud their speaker was even when not using the speakerphone setting.


On my Android (not even particularly recent) the phone earpiece speaker responds to the volume buttons when I'm in a phone call or use it to join a discord (group) voice chat. The latter works very well with voice activation, unless I'm in a loud place, in which case the incompletely filtered background noise causes intelligibility problems when talking over another.


I'm sure it does, but I speak from experience where there wasn't much of a difference in volume between the earpiece speaker, and the speakerphone. Most people had their earpiece loud enough that if I was close enough I could hear it just as loud as when they used speakerphone mode.

With a lot of (cheaper) phones the speaker is also terrible enough that the different modes don't make much of a difference.


Yeah I know that's an option too but it's annoying... It means it's even more time-consuming because it ties up one hand as well as my attention for the duration of the message.


I feel the same about consuming any kind of information. Most of the time, a well written article trumps a podcast or video 6:1. It's the same for a text message vs. voice message.


I explicitly avoid video tutorials and explanations unless there is literally (lol) no other option. It's almost never the appropriate medium -- not searchable, not indexed, and rewinding to repeat earlier information is a terrible experience.


I have exactly the opposite preference. I find I learn much faster from video tutorials and actively seek them out.


None of that is related to the parent's comment - even if you prefer them, they're still not searchable or indexed, and rewinding is still a bad experience.

Given that many people (including GP and me) prefer document content (not "text" because you can put interactive tools and diagrams in documents, but not text) and it has all of those advantages, it seems like it's pretty clearly the superior medium.

Plus, human brains are notoriously bad at introspection, so I wouldn't be surprised if you don't actually learn faster from videos, and just prefer them because they're e.g. more engaging.


I don't understand what is meant with not indexable. Many YouTube videos have time stamps embedded in the time line and you can also link a video with a certain timestamp it's only a right click away on desktop.


"Indexable" here means that the content is made accessible in a form such that it can be indexed for searching. You can't search through a YouTube video, and this makes them significantly worse than textual documents that can be.


The parent's comment referenced preference, so it does relate to that comment.


I wrote the above post and I agree 100%

My work has been trying to promote e-learning with ted talk-like videos and I absolutely hate it. It's not just about the speed.

I can learn so much quicker with text. I can skim through the parts I already know and spend more time on the parts I need to carefully consider. With video I need to skip around and it's hard to keep track of what's being discussed then.

I think it's really the younger people in the organisation that ask for it because they're used to youtubes. I rarely watch youtube, probably once a month or so. Makes sense but they try to push it on everyone by setting a goal of so many videos to do.

However even when I was young I thought that classroom teaching was inefficient and I could learn much better myself. One of the problems (also with video) is that you have to slow to the speed of the slowest participant.


For computer stuff usually text is better than video, except maybe how to do complex stuff in some desktop application or walk through games. But for learning how to raise a fence a video is so much better than text.


> I run all my chats through my own matrix server (with bridges) and I was actually thinking of making something that automatically tries to transcribe them. The problem though is that I speak 3 languages :)

I'd be very interested in something like this! Also deal with 3 different languages on a daily basis in different group chats, and many people use voice messages. If you get started I'd for sure contribute to your effort, as having to change from "reading" to "listening" so often really sucks and would save me a lot of time if I could have it transcribed.


Thanks, I had it on my list to investigate, will let you know!

Part of the problem is that I'd like to use a service that isn't too privacy invasive. Unfortunately I doubt there is one that's good and doesn't do that :)


WeChat handles this really well in Chinese. It will transcribe voice messages silently complete with suitable emojis if the message is particularly emotional.


That's super impressive, and extremely useful!


When sending audio messages you are exchanging your convenience for the inconvenience of others.

I just refuse to listen to any and don't receive them any more.

Yes, there are valid use-cases for one-way voice messages. There just aren't many of those.


> When sending audio messages you are exchanging your convenience for the inconvenience of others.

This is exactly what I wanted to say in my post, but perfectly phrased and to the point. I'll remember this one if it comes up again, thanks!!


I agree, I disabled my voicemail a decade ago for this exact reason (it's a lot slower and more of a faff if you're in public) but unfortunately some people in my life are adopting voice messaging and there's not a lot I can do beyond a polite "I'd rather you didn't do that".

It's definitely a much worse experience for the recipient, I think new social norms of asking first are probably the best solution. I wouldn't ring someone who wasn't expecting a call without asking them if they were free to call first for example, and I think the same should apply to voice messages.


There are ways to regulate the burden.

If I can't privately listen right now, I don't, and if they get annoyed, I'll tell them why.

If something would require me to replay it multiple times to take notes, depending on the situation I'll tell the sender to type it up.

It isn't a matter of trying to punish, it is negotiating preferences. Send me something I have to interact with, and it may well take me some time to get to it. This is no different than emailing me a PDF.


Phew! For a split second, I thought you were suggesting that the government should regulate this.


> It's definitely a much worse experience for the recipient, I think new social norms of asking first are probably the best solution. I wouldn't ring someone who wasn't expecting a call without asking them if they were free to call first for example, and I think the same should apply to voice messages.

True, the latter is definitely a good change. I noticed it in work too, people don't call without asking first. It's really a great norm because you can say "just give me 5 minutes to complete this" when you're concentrating.

Of course it was never a norm before because text messaging didn't exist but it makes sense.

It would be great if people started doing that too for voice messages.


I used to hate voice messages, but I no longer think it is that black and white.

What I still highly dislike are voice messages without prior context or prompt. Eg. just receiving a message out of nowhere, not having any idea what it is about, having to dig around for headphones just to find out, that it says: "hello! how's it going?"

On the other hand, if I ask a person a question that can't be answered in just a sentence, I'm perfectly fine with the other person quickly recording a 1 min voice message instead of having to type it out.


That is when you undermine the concept of voice messaging. "hey I got your message but it was all distorted, could you txt me the details"


Love this.


I'm really surprised that no chat apps seem to auto-transcribe voice messages.

It would seem obvious for the transcription and the sound file to be available to both sender and receiver of the message.

I think so far the sticking point is that message transcription is still hard to do with any accuracy on-device, and if you use a server then you can't claim e2e encryption.


WeChat has been doing it for quite a while, and does a great job. Almost certainly server-side and obviously very far from E2EE though :)


I agree. I don't like receiving more than 1 voice message per week from a person. It's terrible when they start sending voice messages as 50% or more of their messages.

- Text messages require brevity.

- Depending on who you're talking to, Voice messages ramble on and on, without getting to the point. As a result, it can take minutes to understand a main point, rather than seconds as in the case of text message.

Just learned via another comment that playback can apparently be sped up to 1.5x or 2x. That's good, I'll have to look into that.


I'm also not a big fan of voice messages in chats, but got used to it. Lately, speeding up playback (the "1.5x" or "2x" in Whatsapp) has been a true blessing for the receiver end.

Voice messages are also useful when writing is not physically viable (ie. while cooking). I even tend to prefer it when I want to develop an idea while delivering the message and have the message development process itself registered as part of what I'm sending.


I tried the speeding up but it feels so rushed. The same way people always seem to be in a rush to pack their stuff in Aldi/Lidl stores (because the staff are always in a hurry). It feels like pressure and I avoid it for that reason.


I use Telegram with Voicy with my friends, it's pretty decent at transcribing voice messages from multiple languages. You can even hook it into the Speech-to-Text service from Google Cloud (however the integration doesn't support custom parameters).


One thing I really hate about whatsapp voice notes is that they give a seen indicator for the sender. My privacy settings disable all of seen and is currently active features and I'd really love if the voice notes would just respect that.


> So in general if someone audio messages me I just ignore it until they type their message.

This should be the norm for everyone.


Yes, i know, no market, no interest in it, but there is this. https://en.wikipedia.org/wiki/Literacy

It correlates with the wealth in a country. An-alphabets will prefer voice chat.


Is that why people from marketing department rather call IT people on the phone than typing over a messenger? Or god forbid, use an email.


Very likely, the answer to your question may be close to a Yes


Nitpick: Perhaps you meant "illiterate"? I guess you were thinking "analfabeta" :)


Indeed but the article in particular mentions that this was the suspected region originally but turned out to not be the case.


I think on-device text transcription is getting good enough where hopefully it catches on. I've been impressed by how good it works on cheap Android phones for English, no idea how it is progressing for other languages but I have to imagine it will; but there is a question of whether senders still prefer the voice snippets for some reason. Transcribing on the receiving end could work too but unfortunately has disadvantages like not having access to the sender's dictionary and corrections.


Spanish seems to work quite well also. Know a couple of people that dictate their messages regularly. Even had a situation when for the live of me I couldn't understand a local because of his very strong dialect (I'm a fluent speaker, but not native). After some intents he got frustrated, picked up his cheap android phone to talk into and the transcription worked perfectly.


Don't search around for headphones, just put your phone to your ear and it will play as if a phone call, not the speaker phone.

Also, WhatsApp and Signal allow you to play the voice note at faster speeds, like 1.5x.


More convenient than voicemail at least. No waiting to dial in to a phone system or needing to remember a PIN you hardly use.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: