Hacker News new | past | comments | ask | show | jobs | submit login
The Linux audio stack demystified (and more) (rtrace.io)
155 points by ruffyx64 6 months ago | hide | past | favorite | 135 comments



I hear many complaining (even here) about "the mess" of linux audio.

First, in the article, [1] shows in one single diagram where the complexity is coming from; the audio system has to handle a good deal of different hardware on many different systems and also provide extra functionality for multiplexing, network features, wireless headsets and their codecs, etc. All this: open source.

Second: linux is the only platform where everything works right now flawlessly for me: My Bose joins without any problems, switches to headset mode during zoom calls, switches back to high definition audio otherwise. I can select a different sink, even networked, whenever I want the output to appear on a different networked device. MacOS sometimes needs a reboot so that the bluetooth subsystem works, what the hell.

And all this worked with PulseAudio, and now works with Pipewire, which is an even higher quality iteration of PA.

I don't complain. I wish MacOS/Windows had such a versatile, configurable, but sanely-working-out-of-the-box audio system as an off-the-shelf Fedora, or even freaking Arch linux has.

HTH

[1]: https://blog.rtrace.io/images/linux-audio-stack-demystified/...


I wholeheartedly agree, but the frustrations are quite understandable. If you want to do something somewhat advanced and you have to care about Jack, Alsa, and then the Pulse or Pipewire layers, it's quite overwhelming (why would you have to know the Linux audio history to work with it).

PulseAudio was quite buggy when it was considered ready for general public by Ubuntu. I kept using some scripts to do what it could theoretically do for 2/3 years, and then it didn't have any quirks anymore.

Now that it is replaced by the pipewire stack, audio bugs are back. For instance, on a laptop if system audio is set to 'output' instead of duplex, switching between speakers and headphones does not work, and volume change would sometimes get stuck. And sometimes it duplicates the streams on another system, which can be solved by killing the daemon.

Even with this less stable state, I agree that this is nothing compared to how bad the situation is in Mac or Windows world. In the professional environments I've been in, basically all people have given up on the system properly switching their device parameters correctly. So calls often need a few minutes to reconfigure audio. Same with external screen switching.


For me it's been nothing but problems. The primary reason is I'm probably doing things that (almost) nobody else is doing but I assume both countless people are and it should be working fine. Once I hook up some midi devices, want things to be recorded, run a synthesizer stack through pw-jack, expect the midi clock to go down the USB bus, etc, all kinds of interesting behavior starts happening. It even completely locked up my machine a few times. Like hard freeze. I had Audacity just nope right out. It somehow corrupted the local configuration and I had to blow it away to get it to start up again.

And then there's the whole new suite of programs you have to learn where their implementation is constantly in flux so the documentation isn't exactly accurate. pw-record for instance. That "--list-targets" option it tells you to use is long gone (that princess is now in the "wpctl status " castle). You gotta check the date of everything written online because the month it was written matters. It's still far from great.

I used an Amiga about 30 years ago to do similar things. Now that was something that genuinely just worked. People are still using it. That's how functional it was.

But like all these things, I should find the motivation to shutup the complaining and get to cracking on the code to make it suck less.

When it comes to linux and things are broken, your assumption on how many people have seen it and who is working on the section of code it's caused by is invariably an order of magnitude or two too high. That's why you can't find any fixes on the web. You're one the first to see it. Exciting, isn't it?


It would be helpful if folks on here who say "works for me"/"doesn't work for me" would indicate which flavour/version of Linux they use.

Some distros don't have the latest pipewire stack, and you're left to fend for yourself, having to follow some incomplete or poorly written blog post to side-step what your distro does and put pipewire on top.

Me: Ubuntu 24.04 LTS, and happily using carla, ardour, lmms and a bunch of midi devices with pw-jack. (I'm aware pw-jack is not required anymore, but that was my old workflow, and it speaks to the backwards-compatibility that the stack offers.)


Understood but I thought it was clear at least from my post that this isn't a distribution problem. I've certainly tried the distribution version, building from source, and various other things.

I have a counter-request: It's really frustrating when people are like "It just works! I had no problem! It's so easy!" in response to someone who has clearly struggled to get things working. It'd be like someone telling you they just had a car crash from a mechanical failure and you responding "Well I didn't! I drove home just fine!"

Instead, if you're going to respond at all, something like "I'm sorry, don't give up. I hope you figure it out" would be nice.


Have you tried Linux audio forums/community? I think the problem is that, by experience, regular musicians will just say "you need to use windows (or mac) to do anything serious" so it's complicated to find enough people that are competent in both skills.


> Have you tried Linux audio forums/community

Actually I've gotten everything to work adequately but claiming the alsa/pulse/jack to pipewire transition wasn't just a new nightmare would be wrong, well for me.

> "you need to use windows (or mac) to do anything serious"

Correct and maybe! Serious as in commercial or production? Yes!

Serious as in exploration in HCI and digital instrument creation and what kind of new sounds can come from that? Now we're back into Linux!

I try to (poorly in my opinion) explore the uncharted, I'm not really looking to make a single penny.

Take for instance, the classic chaotic pendulum (https://m.youtube.com/watch?v=yQeQwwXXa7A), you can hook that up to an Arduino sensor pack and convert the values to midi notes that get piped through a synthesizer or they can be the filter control of the synthesizer.

How can the arrangement of the chaotic magnet surface affect the aesthetics of the sound?

For instance, if you hook the values up to a sequencer playing arpeggiators and limit the chord choice wisely, it kinda sounds like bach. Especially if you do time dilation and don't try for things to be real-time.

Recording a composition is a sequence of 2D diagrams with time signatures.

Here's another one, this time with synesthesia. You take a number of sticky notes in various colors and aim a camera at a wall and then assign different roles and rules to the colors and their adjacencies and do a similar pipeline but this time you're playing a concert by sticking post-its to a wall on top of each other.

And yet a 3rd. You take a couple hour capture of rush hour from a freeway traffic camera and assign instruments to the lanes, scale signatures to their densities and then you can hear an orchestration of Friday traffic.

In all these you're still "playing" music because you're taking an active role in a bunch of aesthetic decisions and constraints, it's just a new relationship.


Do you have some repos and/or write-ups on these projects? That sound pretty cool.


I really want people to do their own!

Maybe I'll do a write-up on the setup to make it easier.

Really I look at things like musique concrete, BBC radiophonic workshop and John Cage and think that's where, a few decades removed, all the modern sound has come from that's been so dominant for the past 40 years or so.

I want to embrace the new and weird so 30 years from now I can help be a historical part of building whatever is coming next.

Tomorrow's all we got!


> which flavour/version of Linux they use.

> this isn't a distribution problem

Exactly. This all shouldn't be distro dependent when the only things involved are ALSA (kernel) and pipewire/pulseaudio.

Both unfortunately have distros do "things" to them all. Some default to dmix, some to direct hardware with PW/PA dynamically spawned and thus getting exclusive access for a single Unix user while the others are SOL, and other painful conundrums because they thought "we'll just do this and then it just works" for a single basic use case.


I hope you figure it out :)

I think the stars just aligned for me, and it worked. Not an expert at all on these things.


Would you use a distribution that doesn't prioritize the use-case for which you intend?

Use a Linux distribution that is intended for professional audio use.

>Once I hook up some midi devices, want things to be recorded, run a synthesizer stack through pw-jack, expect the midi clock to go down the USB bus, etc, all kinds of interesting behavior starts happening.

On Ubuntu Studio [0]/[1] - this Just Plain Works™, you know. I've been doing exactly this for years on my Ubuntu Studio machine, and I just don't have any of the issues you've encountered.

[0] - note that ubuntustudio is a metapackage you can install on most Ubuntu instances, which will set up audio for professional use.

[1] - see also, Zynthian: https://zynthian.org/


I’d say a lot of the complexity comes from the fact that there are at least four things — Jack, PulseAudio, Pipewire and the ALSA userland stuff, which the article fails to mention — that try to solve more or less the same problems (probably less, in the case of the ALSA userland). Add to this the fact that everything can be, and often is, run as a compatibility layer on top of (or below) everything else, and the naïve user who just wants to get some music though the speakers can be excused for feeling a bit dizzy.


I'm sure the windows audio stack is many times more complex, and it's pretty opaque.

OEMs also love to include all kinds of audio related bloatware, which makes getting audio hardware to work reliably(!) quite challenging. The HP laptop I'm typing this on has an "Intel microphone array" (just 2 mics) which has it's own intel drivers, but there's also some HP control panel, realtek and 'sound research' branded stuff, Fortemedia SAMsoft effects(?), Intel smart sound...

If I'm recording seriously, I usually just go to the device manager (devmgmt.msc) and disable as much as I can and enable devices in a trial-and-error way to see what the minimum is to get audio to work again. Otherwise, all kinds of 'enhancements' end up in the audio path.


Intel Smart Audio is the worst, it takes over normal USB Audio devices and does its own weird processing on top, resulting sometimes with "hilarious" crashes.

And on corporate laptop you might not be able to disable its driver :/

Fortunately, it's too dumb to deal with devices that are behind more hubs than the root one...


If Windows could do what Linux can out of the box VoiceMeter would not exist.


I assume you meant VB-Audio's "Voicemeeter"? If so, yeah, that's solid software, and it's NUTS that Windows hasn't made it unnecessary yet.

(For those who may be reading this comment and wondering what the shit this software is, go give it a look. (I use it regularly and don't do Pro Audio stuff... I just want to be able to independently adjust relative volumes of groups of software (like Voice Chat and Video Games).) If the software's feature set looks intriguing, do give it a try and use Voicemeeter Potato. IIRC, all "flavors" of Voicemeeter have the same trial period, and I can't think of any reason to not use the the "flavor" with the most knobs and interconnects.)


100% agree with you.

I have had a Linux-based DAW running in my studio, alongside the requisite MacOS and Windows machines, for decades now. It runs Ubuntu Studio, has superlative audio performance (72 channels of digital audio), and is a rock solid workhorse for doing large edits on tracks.

The key to it is in using Ubuntu Studio, which is a well-tuned distribution focused on superlative Audio performance, and to choose your hardware wisely. In my case, its all Presonus - because they have been Linux-friendly for a long time - and it easily delivers latency numbers that outperform even the Mac in the room.


That's the very first time I read "sanely-working-out-of-the-box" about Linux audio. I have a very different experience of course, such as "why do my BT headphones suddenly play everything at 8000hz sampling rate while I never asked for this and why the UI won't let me switch it back to 24khz ?".


This happens exactly the same for instance on Mac or Windows with BT headphones, it's simply a fact of Bluetooth. If at any point you open an app that accesses the microphone of the earbuds / headphones, the format will downgrade from high-quality, playback only BT profile to a low-quality duplex profile. There's no high quality duplex audio profile in the Bluetooth protocol yet afaik, and certainly not implemented by any vendor. Just do t use Bluetooth if you care about sound quality is the answer.


Never had such issue on MacOs with the same headphones and the same apps running.


Sounds like HFP. For me on Ubuntu you can choose between audio playback profile and HFP in the sound settings ui, but maybe you have an app running that is changing that setting.

(Apologies for injecting advice into a grumbling thread)


No need to apologize, advice is good. Even if for now I'll keep Linux at what it's best: servers.


BT headphones? BT is false dharma. Mobile shit. RF headphones work great on Linux because RF is true desktop dharma.


Well, I am glad it works for you. My Ubuntu setup at work, on the other hand, has recently picked up a habit of randomly switching to "Family 17h (Models 00h-0fh) HD Audio controller" which has nothing plugged into it instead of my actual headphones.


I’m trying right now to connect a BT speaker to a RPi, while using mpg123 as a player. At least from the command line, it is all but easy. And I do not think what I’m doing is not the most basic scenario.

Right now after coupling I have to restart mpg123.


I mean... it should work of course, and if it's not working for you then improvement is needed. But anything involving BT is most definitely not the most basic scenario. The most basic scenario is "plug speaker into sound card".


For me it all sometimes fall apart after suspend/resume and I am on a desktop, other than that I enjoy watching a movie with my wife with two pairs of Bluetooth headphones while having a USB attached microphone input overlaid so we can hear if our daughter went out of her room.


It's a mess not because it's complex, it's because of bad design.


I think it's fair to say that your experience is extremely atypical.

PulseAudio constantly stands out as the most common frustration in all of Linux. systemd is pretty frustrating too, but it's only frustrating the 1% of the time when it breaks (because it means your whole system is broken), whereas the 70% of the time PulseAudio doesn't work, it's more mildly infuriating.

For anything except a single audio in and single audio out, I would not bet on a beginner-to-intermediate ever getting it working, and PulseAudio is probably the main reason I in 2024 cannot recommend Linux on the desktop for non-engineers.


I have been using pipewire for a few years already. Which distro are you using that is still using pulseaudio?

I can't complain really, in last decade audio has been working really well on my computer globally bar the occasional bluetooth pairing/connection issue. But I've seen people struggling with BT on all OS/platforms anyway. Pulseaudio was a PITA in the early years but matured a lot and Pipewire has been flawless on my distro since it took over. And I can do a lot of stuff out of the box that requires third party tools on other system to do the same.


> whereas the 70% of the time PulseAudio doesn't work, it's more mildly infuriating.

I have installed Linux on I don't know how many systems at this point. My desktop, my work laptop, other people's work laptops, etc. Audio just worked flawlessly on each and every one. I cannot believe that PulseAudio has issues 70% of the time. There's simply no way I've been that lucky to have never seen an issue.


I get things like pulseaudio deciding only one ear is enough haha but emotionally I'm just grateful that it works at all sometimes since I really like the idea of a linux studio. Reboots help. I use a little bit older Ubuntu Studio, should probably upgrade. Back in the day when I only had alsa, I loved the experience.


The "mess" of Linux audio is due to ONE reason: single-client ALSA driver model.

every other layer is a coping mechanism and the plurality and divergence of the FOSS community responds in various ways: - Jack - PulseAudio - PipeWire

I am unclear why Jaroslav Kyocera chose to make ALSA single-client, but Apples CoreAudio multi-client driver model is the right way to do digital audio on general-purpose computing devices running multi-tasking OS'es on application processors, in my opinion.

Current issues this article does not address that actually constitute large parts of the "mess" of Linux Audio:

- channel mapping that is not transparent nor clearly assigned anywhere in userspace. (aka, why does my computer insist that my multi-input pro-audio interface is a surround-sound interface? I don't WANT high-pass-filters on the primary L/R pair of channels. I am not USING a subwoofer. WTF)

- the lack of a STANDARD for channel-mapping, vs the Alsa config standards, /etc/asound.conf etc.

- the lack of friendly nomenclature on hardware inputs/outputs for DAW software, whether on the ALSA layer, or some sound-server layer. (not to mention that ALSA calls an 8-channel audio-interface "4 stereo devices")

- probably more, but I can't remember. My current audio production systems have the DAW software directly opening an ALSA device. I cannot listen to audio elsewhere until I quit my DAW. This works and I can set my latency as low as the hardware will allow it.

this is the thing: more than about 10ms latency is unacceptable for audio recording in the multitrack fashion, as one does.


> The "mess" of Linux audio is due to ONE reason: single-client ALSA driver model.

This is one of the major reasons why Linux accessibility sucks IMO.

Audio is one thing that you need to "just work™" if you want to get accessibility right, as there's no way for a screen reader user to fix it without having working audio in the first place[1]. On Linux, it does not "just work", and different screen readers have different ideas on how they want audio to be handled. In particular, the terminal Speakup screen reader (with a softsynth) wants exclusive control of your device through ALSA IIRC, while the Orca screen reader for the GUI goes through Pulse. That makes it impossible to use both of them at the same time.

[1] Well, you can sort of fix it by having a second machine and SSHing into the broken one, but that's not what I mean.


> the terminal Speakup screen reader (with a softsynth) wants exclusive control of your device through ALSA

If you have Pulseaudio or Pipewire, they add a plugin to ALSA library that reroutes audio to audio daemon, so ALSA applications should work correctly.


I would be surprised if Orca did use Pulse directly, it uses speech-dispatcher (IIRC) which then uses PA if configured that way.

Also, Accessibility != Audio. I, for instance, use Braille only. No need for speech synthesis. So equating Accessibility issues wth the crazy audio stack is a little bit too simple.


I mean... I've never seen a single audio issue on Linux. It does "just work" in my experience. I realize the people citing issues in this thread aren't just making shit up for the fun of it, but I think there's a lot of going too far and saying it sucks for everyone when it seems to work just fine for most.


I disagree.

Applications want to receive/provide a stream (X sample-rate, Y sample format, Z channels) and have it routed to the right destination, that probably is not configured with the same parameters. Having all applications responsible for handling this conversion is not doable. Having the kernel handle this conversion is not a good idea. The routing decision-making needs to be implemented somewhere as well. Let's not ignore the complexity involved in format negotiation as well.

The scenario of a DAW (pro-audio usage) is too specific to generalise from that. That is the only kind of software that really cares about codec configuration, latencies and picking its own routing (or rather to let the user pick routing from the DAW GUI).


> I am unclear why Jaroslav Kyocera chose to make ALSA single-client, but Apples CoreAudio multi-client driver model is the right way to do digital audio on general-purpose computing devices running multi-tasking OS'es on application processors, in my opinion.

Because ALSA is a different layer in the audio stack than CoreAudio.

ALSA corresponds to MacOS drivers and I/O Kit.

CoreAudio (Audio Toolbox / Audio Unit) corresponds to Pipewire / Pulseaudio.

But on the Mac side everyone is OK with using CoreAudio (with the accompanying set of daemons), while on Linux, for some reason, everyone wants to go as low-level as possible, "just open the device file" and is wondering, why something is missing. Because you skipped that, that's why.


> current audio production systems have the DAW software directly opening an ALSA device.

I mean, I remember this being the case for a very long time on windows with ASIO too, which is the only reasonable way to run a DAW with acceptable latency there. MacOS has multi-client but I was never able to get latency as low as fine-tuned windows and Linux systems, and in the end that's what matters - you just use your motherboard's chip for OS audio and your pro soundcard for the actual workload. Pipewire is very close to giving a good experience but there'll always be some overhead - I'm making some art installations running various chains of audio effects on a raspberry pi zero and the difference between going through pipewire even if my app (https://ossia.io) is the only process doing any sound, and going straight to ALSA, is night and day in terms of "how many reverbs I can stupidly chain before I hear a crack".


My presonus interface allows multiple applications to access it over ASIO simultaneously, while letting regular Windows audio through, at 16 samples of latency. ASIO does not mandate exclusive access, bad drivers do.


Single-client model is not bad because it doesn't require kernel to do the mixing, sample rate conversion and they can be moved to userspace (which Windows does these days as well [1]). The less code in kernel, the better.

[1] https://learn.microsoft.com/en-us/windows/win32/coreaudio/us...


Is this planned to be addressed/fixed? (single-client model) Maybe there were previous attempts?


No, because there's nothing to fix (at the system side).

Apps should use the right API from the right layer; when they skip something, no wonder they will miss whatever the skipped layer provides. When they do not need exclusive access to the device and want to play nice with the other apps, they should use pipewire/pulseaudio.

For 99% of apps, using ALSA directly is the wrong approach. You don't use IOKit directly in Mac apps either.


Pipewire/Pulseaudio install a plugin for ALSA library so that ALSA applications audio is rerouted to audio daemon. So apps using ALSA can work at both systems with and without an audio daemon.


At this day and age there should not be a system without an audio daemon. At least not one, that is not broken. Apps should not certainly accommodate for broken systems, and forcing workarounds for the correct ones.


Here's my background

1. I have to modify my audio settings every time I start a call in Teams on Linux because it keeps losing my audio device.

2. In my audio settings UI, half the time I switch my devices the speaker test doesn't work.

3. In my audio settings UI, whenever I switch my mic I hear myself. The mic feedback only disappears 30 seconds after I close the settings UI.

4. My work headsets have a robotic sound (likely caused by an incorrect bitrate or buffer size). I can only use work bluetooth headsets via their dedicated dongle.

This was my default experience on a popular debian based distro. And it mirrors the general experience I see online. Things are unstable and a mess.

I started reading this article and it's embelished with phrases like: "is a professional-grade audio server", "widely used in professional audio production environments", and general language that sounds like a sales pitch. This does not fit with anything I'm familiar with.

I would have preferred a neutral and semi technical approach, with 10% of the buzzwords. As written, I trust nothing.


That would be your headset being in headset mode. I'm on Debian Testing, and I'm finally able to exit headset mode and use a high quality audio codec instead. I hope that's the direction Linux is heading.


Oh no, it's not. I've swapped modes more than imaginable. And even if it were in headset mode the quality would be inacceptable compared to what I was getting on windows. I even manually tried to change pulseaudio settinga for that device with no luck. And I don't feel like turning this thread into a debugging session. But, like, correctly figuring out reasonable bitrates should work by default.


1. Is a known Problem with Teams


Yeah, teams definitely shares in the blame there. That one hurts more and I blame both microsoft and linux.


The Teams web app works very well in Chrome. I only miss being able to set a custom background to my video and popping out presentations into their own window. It seems much lighter on resources too, maybe because Chrome does more of the video encoding or decoding using hardware.


Pipewire has pretty much unified the userland linux audio stack (+ supports video as well as bonus). Kernel side it has always been alsa. There's TinyAlsa so you don't have to use libasound to interface with the kernel alsa. (userland alsa is quite PITA)


> Kernel side it has always been alsa.

Well you probably didn't mean it in a literal sense, but it was OSS up to kernel 2.4 and 2.6 had both OSS and ALSA


Whenever someone mentions Linux and Audio I always remember this image (made by Adobe I think) https://harmful.cat-v.org/software/operating-systems/linux/a.... It is missing Pipewire but it should be easy to add a dozen of new lines. This is the reason why I simply use plain ALSA without any sound daemons.


That's about maximum mess historically but thankfully most stuff on that diagram is no longer present in a modern install. And that's before pipewire which has further unified the stack.


Pipewire hasn't really unified anything, it's just pw -> alsa -> hardware instead of pa -> alsa -> hardware for the common case.

And not much has fallen out of use. OpenAL, libao, jack, portaudio, libcanberra, Gstreamer and phonon at least are still used widely, and a bunch of others keep cropping up occasionally in cross-platform software.


My understanding is that pipewire finally unifies JACK and Pulseaudio. You no longer have to decide if you want a general audio setup or a low-latency one, there's a single audio server now that does everything well.

So from that list the unification is now:

- JACK and Pulseaudio are both replaced by pipewire

- OSS is long gone, ALSA is now the only low-level interface

- ESD, NAS, ClanLib, xine, portaudio, allegro and Phonon are not present in the Ubuntu install I just checked

Basically we're down to a unified stack that has BlueZ and ALSA to access actual hardware and pipewire as the single audio (and video) daemon. Everything else is either shims so apps don't need to change interface or cross platform APIs like SDL and OpenAL. We are much better than what this diagram shows.


There is still an ALSA OSS emulation in the kernel, probably distros do not enable it but I have had it enabled for years. All it needs is some ioctl system calls, which does everything for me internally. ALSA with just system calls and without libalsa can work for some cards but it would be hit and miss. I like that I can use OSS in Go without C/CGo, i.e. https://github.com/gen2brain/oss.


On an infinite featureless plain with spherical cows, yes, in theory we've "unified" everything, as soon as closed source software stops existing (together with BSDs) and everyone rewrites their audio stacks everywhere. (Just like we had unified almost everything before pipewire was invented.)

In practice, OSS and Jack still stick around, as do portaudio, libao and others.


Aren't most of the libs you mentioned cross-platform or with a very specific use in mind which have direct analogues in Windows world?


Correct. You can draw just as messy a diagram for Windows. DirectSound versus OpenAL versus gstreamer(win32) versus jack(win32) versus XAudio2 versus DirectShow versus DirectMusic versus WDM Audio versus WASAPI versus UAA versus …

If there's any difference it's that under Windows, the only debugging and error handling you have is "lol reinstall all drivers and codecs and pray".


And then there is Pipewire with PulseAudio plugin, and not to mention you can compile Pipewire with support for GStreamer, Jack, etc. I guess we can remove only the Arts and ESD daemons from that list, but the mess just adds up.


Well, we had a lot of layers in Linux Audio the last 30 years. But when PulseAudio was forced into the world by a c-section, with everyone in the LA community already knowing its a still-birth, I kind of lost trust in coordinated project creation. PA makes me so unhappy that I totally uninstall it whereever I see it. Good for me that I am just a console user, because the damn beast is all over the GUI space.

Fact is, RT audio is hard, and the peoplebehind JACK have cared for the underlying problems for a long time already.

Maybe PipeWire, but to be honest, it reminds me too much of PA.

I guess I will stay with plain JACK and SuperCollider as my toolbelt, and not care about PA or PW. Like the grumpy old hacker I am.


> ALSA is the core layer of the Linux audio stack. It provides low-level audio hardware control, including drivers for sound cards and basic audio functionality.

> ...

> Ulike PuleAudio and JACK, PipeWire does not require ALSA on a system, in fact if ALSA is installed the output of ALSA is very likely pushed through PipeWire

I don't get this part. If ALSA represents the kernel level hardware drivers for audio, how does Pipewire bypass it? Does it implement an alternative set of kernel drivers? I assumed Pipewire still relies on ALSA base.


ALSA had both kernel-space drivers and a user-space API layer.

I think what they're getting at is that PipeWire speaks the ALSA API, so an app or game that linked against ALSA will connect to PipeWire and should Just Work, without needing to be rewritten to target PipeWire's API.

PipeWire does the same trick with the PulseAudio API as well. on my PipeWire-using NixOS box, for example, I can connect through the `pavucontrol` GUI, my `pactl`-based keybindings work the same, etc. it's a clever design that allows them to avoid what would otherwise be a nasty pile of backwards-compatibility issues and poor desktop user experience.


That part is simply wrong. Pipewire uses Alsa to drive the soundcard. Alsa is a hard requirement.


This. Also, how does PulseAudio not support videoconferencing?


Well. PulseAudio is a plain sound server. It does not handle video at all.


May be it's about a higher layer of ALSA API, not about actual hardware drivers.


yeah I think there are two ALSAs, one in the kernel as drivers and one as a userspace library.


This is probably wrong. ALSA is a low-level API that allows direct access to audio hardware, but only for one application at a time. ALSA has a kernel component and an userspace library that can be configured.

Pipewire uses ALSA to interface with audio hardware. But Pipewire also adds a plugin to an ALSA userspace library so that audio from apps that use ALSA API is rerouted to Pipewire. Pulseaudio did the same trick.


I was also confused by this bit. If ALSA is part of the kernel, how can it not be installed?


The article doesn't note kernel-ALSA and the old userspace-ALSA libraries that apps use to push audio.


Well, you could compile your kernel without ALSA.


I think it could only make sense for sending sound over the network.


The text has a strong GPT-ish flavor.


Gpt is just emulating best writing quality. This article is very well written. If it was gpt generated, I'd be happy to read more like it.


I'd argue gpt emulates the lowest (i.e. cheapest to mass produce) passable quality writing optimized for longest page time/views


I genuinely think humans are going to degrade their writing style in order to sound less like an LLM


It's pretty nice article but for me - just for introductory purposes. It shows how sound and digital audio works and what basic libraries and tools we have in linux to deal with sound. But I'm still stupid when it comes to details and user interface tools. The article(s) I really love to see is, on one hand, more technically detail-specific and, on the other hand, broadly defining options I can have as an end user. I mean - from basic tools (CLI, GUI) that are available for simple purposes like volume control, stream selection, etc. to pro-audio, complex scenarios. For me it's too many tools and options I can use in linux for audio and this is the reason for being lost sometime. Of course for daily use I have pipewire with pulse, alsa and jack "plugins" which gives me seamless cooperation with lots of apps and controls but maybe I can get rid of some module or app...


I don't really understand how was JACK supposed to be used. On Windows or Mac you typically run a DAW and load plugins into it. But on Linux the user is supposed to run every plugin as a separate application and connect them using JACK? Doesn't this mean there would be lot of context switches? Also, in a DAW you can save your configuration, but how do you do this with JACK and a bunch of independent applications?

Also, given that Pulseadio and Pipewire both support ALSA clients, does it mean that the preferred API for applications should be ALSA? This way they can play sound on any system, even where there is no audio daemon.


An issue I've experienced very often is that sometimes when my laptops goes to sleep and I wake it up, the speakers occasionally aren't switched to unless I restart pipewire. Same thing for headphones, sometimes when I plug them in, they aren't switched to unless I replug it in a couple of times. Might be hardware related but situations like this make me feel like I should just use linux for servers instead of for a personal computer.


What version of Linux did your laptop ship with?


It had Fedora 36. So maybe Linux kernel 5.17

I have the latest kernel version now.


It's more about what distro the hardware vendor supports.

What vendor and model laptop was it? I'll make a note to avoid them in the future.


Thinkpad x1 carbon gen 10 lenovo


I'm shocked to hear that Lenovo has buggy hardware. Shocked!

Well, not that shocked.

This tracks from what I've experienced and heard, though I don't think that one came with Linux. But I wasn't given a choice, unfortunately.


Am I the only one that sees unfriendly input/output channel names with Pipewire in client software?

(Bitwig, Ardour, Reaper, more? I would like to see "Input 1" or "Channel 1" and not some strange ciphers when trying to assign things in a little dropdown selector in a DAW)


It seems to try to name the outputs based on the hardware name, but with multiple audio devices it can become quite confusing indeed. I've resorted to creating a wireplumber[1] config to rename things more reasonably and also disable a bunch of the inputs/outputs that I never use so they don't clutter up the lists.

[1] https://pipewire.pages.freedesktop.org/wireplumber/index.htm...


This sounds (ha) like broken hardware.


I remember there was oss and alsa. Then on top of that you had esd or artsd, with incompatible APIs and userland delays. Then at some point they were replaced with pulseaudio. I've just noticed now that pulseaudio is also replaced with pipewire in latest Ubuntu.

That does not instill confidence in that they have any idea what they are doing.

I remember that as far as on 2012 some Linux game ports from Steam (before Proton was a thing) failed to play sound.

Other than that, it took them a decade to figure out that sound should switch to HDMI when it is plugged in. It may still require arcane config changes and may break down.

I've opened the post to read about pipewire, and it seems that clicking an anchor does nothing. So it's not only sound they can't get right.


Pipewire unifies all the apis, and is fully backward compatible. It has own API too, but you don't have to use it. You can write app using any of the apis, alsa, pulse, jack, and pipewire can work with it. You only need pipewire api if you need more control or pipe through video as well.

JACK is very flexible approach to audio (think of unix pipes but audio) and this is what pipewire is inspired by, but targets the general desktop use and not just audio production as JACK does.

Generally pipewire stays out of the way and works as pulseaudio replacement, but with the pipelines you can do a lot as a power user.

Personally I think pipewire is one of the best things that has happened to linux, and since it shims all the previous audio apis instead of trying to tape the existing solutions into the package, it actually removes bloat from your system and you don't have to write weird libasound config files anymore.


Don't worry, Pipewire is a very good development.


I want to move my DAW to Linux, but giving up TotalMix is quite a blow...

Not sure how well the NI Native Access crap is working on WINE either. Does anyone know?


There's a page on WineHQ [0] with method to launch Native Access 2, works perfectly.

[0] https://appdb.winehq.org/objectManager.php?sClass=version&iI...


I now have flashbacks to when I tried to get sound working on my Gentoo install 15 years ago. It seems that now everything has changed (wtf is pipewire) and still requires arcane knowledge to get everything working smoothly...


To be honest, I wouldn't say it requires arcane knowledge these days. When setting up both Arch and Gentoo, I did a bit of research, determined that Pipewire was probably the best for me (because it seems to be largely a superset of the other options), and installed it following the wiki instructions. That's it. I haven't had to configure things in any sort of detail, it just worked smoothly the first time without any config.


Let me guess, you don't have a Realtek controller for audio.


Realtek audio has gotten to a pretty good place on Linux. I've had several over the past few years, and they just work for the most part. Fedora, especially has been a wonderful out of the box experience for audio.

Bluetooth on the other hand...


I must be the only unlucky one because my linux approved motherboard Intel DQ77MK with a Realtek ALC892 8-channel for audio kept switching between front and back panel (although nothing was plugged in on front) on Fedora 39 :( . It does work fine on windows.


Actually now that you mention it, I did have a similar problem on one particular motherboard. I don't attribute it to linux though so hadn't thought of it. It turned out to be a hardware issue (with the motherboard) though. The card kept resetting due to too low of input power. I had hoped a BIOS update would fix it, but it didn't. I have had a couple of built-in audio modules die and stop being seen by the kernel on old laptops.

Honestly I would just buy a USB interface. I got a Focusrite Scarlett 2i2 and have been very happy with it.


The only thing I can say is that it does not happen in windows 10.

Do you have any more information on how can it be diagnosed?

The motherboard it does have the latest bios update.


Did you try OSX on the same hardware?


Unfortunately no, I don't think I have the knowledge to try a hackintosh? Also not very sure if the system would be compatible with it. It is something to try eventually though.

what do you think it will happen?


I do have one, whatever HP figured was good enough for their "high-end" elitebooks. Worked perfectly on Arch without any fiddling. Even the mute leds on the keyboard work as expected.


I believe I do, though I'm not certain. I have whatever is built into my Asus motherboard, which seems like it is ultimately a Realtek chip. Not sure though.


Pipewire is a major development that's especially important for Wayland desktop. It was hard to miss unless you haven't used Linux in a long time.


I don't know what Pipewire is as well. I've used Linux daily for 10+ years, but I've never bothered to use Wayland.


Pipewire ends up being used to paper over Wayland deficiencies in some areas, thus why it's important for Wayland desktop ;)

By itself it's essentially grand unification of audio servers that actually works better and is way less... opinionated about the only true way some things works, which was a problem with pulseaudio at times.


Separation of responsibilities is good, instead of having a mix of everything but not well enough situation with X11.


Separation of responsibilities is something Wayland fails hard, by effectively hardcoding coexistence of significant part of display driver, windows management, simple things like windows decorations (which, thanks to Gnome's insistence, are by default only client-side, so your concerns invade internals of client apps!) etc.


Just don't use Gnome. KDE is fine with server side decorations and it's false that Wayland mandates client side ones. Wayland ≠ Gnome and Gnome itself indeed made a bunch of pretty questionable decisions.

But if anything it's X11 that tries to lump a ton of stuff together. Wayland is very minimalistic in comparison. That's why stuff like libinput and Pipewire are part of making a functional desktop.


Client Side Decorations are made effectively essential in core protocol (given that the core protocol doesn't even support existence of "windows"), server side decorations aren't, and some applications will display weirdly because of that.

More over, every "WM" in Wayland's case needs to implement the entire stack, even if it uses a common library for some of it.

And after similar length of development time, I'd say the result is still worse in many aspects than X11, and I say that as someone both using and praising a wayland-based compositor and lamenting that it pretty much locks me more than Windows used to


> and some applications will display weirdly because of that.

Well, if compositor supports server side decorations (and normal compositors should), I don't see how applications can behave weirdly becasue of that unless they are just buggy. It's up to applications to figure out if server side decorations work and use them if they do. I.e. even if it was a core feature, buggy applications could still behave incorrectly.

And I don't see Wayland being worse in many aspects, but on the contrary, see it being better in aspects which X11 didn't or can't address. I recently started using Wine Wayland for gaming and it's significantly better experience than using it with XWayland.


Well, time to move with the progress. I've been using KDE Wayland session for several years already.


I use i3-wm. I know that there is Sway but it will require some effort to migrate. Also there are fresh reports that Wayland+Sway have problems with NVidia (even worse I have AMD + NVidia). I'd wait till it gets resolved or my current setup stops working.


Not personally using tiling set up, but I think KDE has some scripts for KWin that allow doing that. So you can use KDE for that purpose.


nah I'm okay thanks. X11 still works flawlessly for me.


The main issue is that X11 is an increasingly unmaintained case and all new development happens with Wayland anyway. So as long as you can deal with lack of support - I guess no need to move, but otherwise Wayland with KDE has been well usable for a while already.


Yeah I'll move when I have hardware that really needs it. I use a 12 year old laptop.


I've been a WSL user for years now. Way less effort.


Until vmmem starts using 100% of your CPU and you need to kill the WSL service to get your computer back (without the unsaved work of course).

But yeah WSL is just too easy. Specially with native vscode support it has become a favourite for many developers


If I had been using using linux as main OS in that situation, I would have had to reboot. How is that different?


It's different because this is windows hyper-v (or something related) crashing and bringing down your WSL session with it. A native Linux is far more stable.


PipeWire is drop-in replacement for pshshaudio with additional features (while preserving pshsh part sometimes). Only difference is 15 years ago half of the problems with sound were fixed by uninstalling pulseaudio as it was in early stages full of bugs but still pushed to many distros similar to systemd. Today's pipewire is as easy as uninstalling pulseaudio, installing pipewire, and most of the times it will continue working


To be fair. Pulseaudio in early days exposed tons of audio driver bugs that were since then fixed. Bluetooth audio was barely working back then. Now bluetooth works so well that it makes other operating systems look bad.


Pulseaudio had a great GUI mixer/settings app. That does not seem to come with pipewire and tries to install PA when explicitly installed.


I just install fedora and everything works out of the box.


Beautiful work


Linux distros would be best off lifting as much as possible from the Android Open Source Project, i.e. a professional and streamlined Linux-based system that actually works and isn't just a mishmash of incompatible, poorly designed hobbyist trash.

The year of the Linux desktop never arrived but most of the world has Linux sitting in our palms. Let's build on that instead of the dead ends of Debian, Slackware, etc.


Hasn't Android audio always struggled with input latency?


I use several audio apps on Android, on a mid-low phone. The most intensive use comes from Cubase - Audio and synths, mixer, inputs and outputs, midi/ phone input program. After 30 years on Atari then windows I've had my ups and downs with the program, but I can easily say that my android version is a1 solid, connected to/ from my hardware synths


wow. no. debian is the opposite of a dead end. now, ubuntu and lubuntu and xubuntu = i agree. android, however, is garbage. - it most certainly is not consistent across devices and editions - having to do phone-like-stuff on a desktop/laptop is painful - i hate Android. It's settings bother me the most. there is ZERO organization here! There are DARK patterns galore, it's built to dissuade you from doing anything serious.

I can only take your comment as either a joke, or you literally only use a phone/tablet in life.

news flash: Android audio is ALSA at it's core. Additionally: native Android audio is ultra-high-latency... completely unacceptable for pro-audio




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: