I'm working on Roc, a toolkit for real-time streaming over the network. Among other things, it provides command-line tools and PulseAudio modules that can be used for home audio. It can be used with PA, with bare ALSA, and with the macOS CoreAudio.
The main difference from other transports, including PulseAudio TCP and RTP streaming, is the better quality of service when the latency is low (100 to 300 ms) and the network in unreliable (Wi-Fi). The post explains why and provides some comparison, usage instructions, and future plans.
There is still a long way to go, and now we're looking for people thoughts and feedback. Do you find the project useful? How would you use it? What features would you like to see?
It's very disappointing that 300ms is considered "low" latency. The circumference of the Earth is 40075km, and the speed of light in optical fiber is slightly faster than 2/3rds of that in vacuum, so it's physically possible to send a signal to any place on Earth within 100ms, and get a reply within 200ms.
IMO "low latency" should mean low enough than it's very unlikely to be noticed, which most musicians seem to accept as 5ms. (Theoretically, even microsecond level delayed audio can be noticed if mixed with the original signal because of comb filtering.)
Many audio streaming apps requires 1-2 seconds latency (especially on Wi-Fi), that's why I called the 100-300 ms range "low". 100ms is the minimum I've seriously tested on Wi-Fi so far. 300ms is, roughly, the maximum UI delay that feels acceptable (you press "play" and hear the sound).
300ms is still noticably laggy when the audio is part of a video. Some media players can delay their audio to account for playback delay in the audio device, if the audio stack supports that. Does Roc support that, or if not, is it on your roadmap?
> 300ms is still noticably laggy when the audio is part of a video.
Agree.
> Some media players can delay their audio to account for playback delay in the audio device, if the audio stack supports that. Does Roc support that, or if not, is it on your roadmap?
We have an open issue for implementing correct latency reports in PA modules. When we'll fix it, players that support that feature should automatically start taking the latency into account.
Thanks for reminding me, I'll test this feature specifically.
Many years ago we did some unscientific testing how latency affected a telephony application. Already at around 80ms there was measurable impact on the dialogue, with an increased frequency of the callers interrupting each other's sentences. Even modern VoIP applications can still be problematic in this regard and additional latency from the software stack wouldn't help.
I see you've got Opus on your to-do list. I would really appreciate that! I find Opus (appropriately configured) to be audibly indistinguishable from CD audio, and it would really help with the bandwidth requirements.
I've always been really excited by the possibilities implied by PulseAudio's network capabilities, but disappointed by their latency and bandwidth requirements. Roc + Opus would be amazing.
Check out https://github.com/eugenehp/trx for Opus streaming inspiration, I've played around with their code and found it easy to work with. Opus would be great with ROC because in case of buffer over/under runs the codec provides features to mask dropouts based on previous content. This is critical when using Wi-Fi.
> Opus would be great with ROC because in case of buffer over/under runs the codec provides features to mask dropouts based on previous content. This is critical when using Wi-Fi.
Are you talking about its PLC or FEC? I didn't test it yet and I'm interested if people are using both of them with music.
BTW it would be also interesting to combine our FECFRAME support with Opus.
Excellent; a few years ago I even started hacking on my own transport, very roughly as a PA module, but life got in the way and it never got very far. So I'm very pleased to see this great project. Thanks, and good luck!
Fun fact: end-to-end latency in midrange digital wireless microphone systems is 2.7ms [0]. People sometimes wonder why we bother with 5-6 figure specialty audio and RF gear when we could "just" use general purpose computers and WiFi. This is one of the reasons.
I've got a home HifiBerry streaming setup over Ethernet. I am using TCP streaming and the latencies are low enough not to be noticeable at all while watching YouTube or playing games and streaming the audio output to my speaker setup on the RPi.
1) Would this make any difference?
2) Does it currently support online plug-unplug the way RTP works without restarting pulseaudio?
If you have no issues with 1) latency 2) packet losses and 3) clocks difference, that would be no difference, at least until Roc could offer some new encodings.
(If you're using PA, it handles the clocks difference for you. Its RTP transport sometimes worked strange for me, but its "native" tunnels handled it well.)
> Does it currently support online plug-unplug the way RTP works without restarting pulseaudio?
Roc sinks and sink inputs may be loaded and unloaded at any time without restarting PA. But there is no service discovery yet, which means that 1) when a remote sink input appears, sink is not automatically added 2) when a remote sink input disappears, sink is not automatically removed. (We will add this in upcoming releases). Currently the remote sink input can appear and disappear at any time and the local sink will just continue streaming packets to the specified address.
This is exactly what I did - creating an ALSA plugin and leveraging snd-loopback to pass PCM to a streaming process. I would be interested in incorporating your protocol into SlimStreamer. Currently it uses SlimProto, which is TCP based (so sync part is a nightmare to get working on a reasonable level). How far are you with supporting multiple sampling rates and multiple receivers?
> How far are you with supporting multiple sampling rates
Roc currently supports arbitrary input/output rates but only a single network rate (44100). If the network rate differs from the input/output rate, Roc performs resampling.
We're now finishing the 0.1 release, and I was planning to add support for more network encodings, including more rates, in 0.2. Feel free to file an issue or mail us with a list of encoding/rates you need.
> and multiple receivers?
No support yet. If you use a multicast address, it would probably just work though.
Again, feel free to file an issue and describe what you would expect from such support. I'll be happy to implement it if someone needs it.
Another question is how Roc will interact with your sync part. How do you perform synchronization?
How did you measure latency? I'm thinking to contribute a similar project that has 1~2seconds of delay but I don't know what tools I need to use to benchmark latency.
That small experiment in my post does not include a correct latency estimation. I just configured all three transports with the desired latency. Actually I'm thinking about writing a tool for such benchmarks that will measure the overall latency (PA + network + PA).
I wish you luck with your project! sorry if my comment was snarky but a lot of modern software seems to essentially not care about latency or responsiveness, and it's something that bothers me more and more these days
I'm working on Roc, a toolkit for real-time streaming over the network. Among other things, it provides command-line tools and PulseAudio modules that can be used for home audio. It can be used with PA, with bare ALSA, and with the macOS CoreAudio.
The main difference from other transports, including PulseAudio TCP and RTP streaming, is the better quality of service when the latency is low (100 to 300 ms) and the network in unreliable (Wi-Fi). The post explains why and provides some comparison, usage instructions, and future plans.
There is still a long way to go, and now we're looking for people thoughts and feedback. Do you find the project useful? How would you use it? What features would you like to see?