Hacker News new | past | comments | ask | show | jobs | submit login
Codec 2 700C (rowetel.com)
70 points by BuuQu9hu on Jan 14, 2017 | hide | past | favorite | 19 comments



Given Adobe demonstrated technology to convert text into somebodies voice based upon a few samples and make that person sound like they can say whatever you want to type. Are we not eventually going to use that approach for the optimum voice compression and just send text which is converted into voice. Given most phone conversations are with people already spoken to then that would certainly be a solution for low-bandwidth voice comms, or at least an approach more viable today than previously.

But low bandwidth voice comm research is useful on many levels, one aspect is that it tends to push latency down and in some area's that gets more and more important.


Even if we assume that somehow the text-to-speech model for the sender is already cached at the recipient, you still have to encode that "text" in a format that can both deal with sentences whose pronouncation can't be deduced from regular written english (is "I read a lot" present or past tense?), and at least a decent amount of intonation and pauses of the original speech.

You could probably use a phoenetic alphabet with intonation data. That also completely solves the problem of words that are not in the system's dictionary. You still loose a lot of character (by not transfering breathing, laughter etc.), but you don't get much of that in these ultra-low bandwith codecs anyway.


Are we not eventually going to use that approach for the optimum voice compression and just send text which is converted into voice.

That's basically dictionary compression, except in this case the dictionary might need to be extremely large.


Got a link to that research? Sounds like doing key verification over the phone just became a bad idea.



Very cool achievement! IIUC, the author was targeting a digital replacement for analog SSB voice comms.

I don't expect to see widespread adoption in other areas though, such as cellular phones and similar. To my ears, I doubt the codec will compete with common voice intelligibility metrics, such as Diagnostic Rhyme Test [1] or PESQ [2].

However, it may be an improvement over SSB voice intelligibility!

[1] http://www.dynastat.com/Speech%20Intelligibility.htm

[2] https://en.wikipedia.org/wiki/PESQ


Keep in mind that 700 b/s is an extremely low bitrate.

I worked on Opus, which bottoms out at around 6000 b/s. A typical IP+UDP+RTP packet has 40 bytes of headers, which with 40 ms frames is 8000 b/s of overhead. Cell networks will use header compression to reduce that, but even so. 8000 b/s is almost eleven and a half 700C codec2 streams.

codec2 is just in another league. It's seriously impressive work.


I think the old Crichton novel, Congo, mentioned using a very low bitrate codec over a digital satellite link. So maybe something already exists off the shelf that one could try over amateur radio.


AFAIK there aren't any commercial codecs with such low bit rates, as there simply isn't the demand. The minimum bit rate of GSM-AMR is 4.75kbps. Some military codecs designed for use over HF are in the same ballpark as Codec2 - STANAG 4479 operates at 800bps. There is some academic research on voice codecs below 500bps.


Fun fact: export of very low bitrate (<2400 bps) codecs has been restricted under the same munitions restrictions as crypto (And also when open source presumably subject to the same court outcomes in the US).


Gotta talk to those subs somehow, those ultra long wavelength Comms aren't exactly high bandwidth. (I'm just making this up but it feels plausible, at least a few decades ago)


Well, I listened to the samples, and I've done a whole lot of SSB on HF over the past 40 years. The codec is a huge achievement, but most SSB signals sound much more natural. I'm not ready to say the codec is equally intelligible, but it certainly is good.

I think a better metric is spectral power for equal intelligibility. Over an HF channel, given 2700 Hz bandwidth (nominal for SSB, although you can go narrower), and a given level of transmitter power, say 100W PEP on the SSB side, then take 100W PEP and whatever error-correcting modem you can come up with that fits in 2700 HZ on the digital side, what does it sound like? What is the intelligibility comparison?

There are a lot of factors in play here, BER, power level, channel fading, etc. It's not clear to me that a system that is good when it works but suffers total drop outs if it loses bits is better than an analog system that suffers noise floor variations but at least gives you something.


This signal with suitable FEC in a 2700Hz bandwidth should be copyable at a power level where you wouldn't be to tell that there was someone talking with SSB.

There have been some examples showing that for older versions of the codec running at twice this rate.

RE: fading, there isn't really any great way to fix that short of higher delays (longer FEC) or increasing bandwidth to be wider than the fade-- no other way around the signal not just being there.



It sounds a bit like 80s-style speech synthesis:

https://www.youtube.com/watch?v=sV3pYZZ2jEw

(This example is the Dr. Sbaitso demo that shipped with Sound Blaster 1.0/1.5 or so and onwards.)


It'd be awesome if this could be coded and decoded in javascript in realtime so you could do extremely low bandwidth podcasts (1,3 megabyte per half an hour) and play them in a browser or send them by e-mail. With HCCB color coding (Microsoft Tag) you could back that up on an A4 and send it by mail!


It is even better; the ‘b’ is ‘bits’, not ‘bytes’.

700 bits/sec = 700 * 60 * 60 / 8 bytes/hour = 315,000 bytes/hour, or slightly under 308 kilobytes per hour/154 kilobytes for that 30 minute podcast.


Wow, yes! And someone is actually offering codec2 audio files: https://lowbitnet.wordpress.com/audio-books/


The website is at 700b/sec too, it seems...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: