Hacker News new | past | comments | ask | show | jobs | submit login

You don't generate 1470 samples of audio per frame.

You generate N channels worth of 1470 samples per frame, and mix (add) them together. Make N large enough, and make the computation processes associated with generating those samples complex enough, and the difference between audio and video is not so different.

Jacob Collier routinely uses 300-600 tracks in his mostly-vocal overdubs, and so for sections where there's something going on in all tracks (rare), it's more in the range of 400k-900k samples to be dealt with. This sort of track count is also typical in movie post-production scenarios. If you were actually synthesizing those samples rather than just reading them from disk, the workload could exceed the video workload.

And then there's the result of missing the audio buffer deadline (CLICK! on every speaker ever made) versus missing the video buffer deadline (some video nerds claiming they can spot a missing frame :)




> You generate N channels worth of 1470 samples per frame, and mix (add) them together. Make N large enough, and make the computation processes associated with generating those samples complex enough, and the difference between audio and video is not so different.

Sure, but graphics pipelines don't only touch each pixel once either. :)

> Jacob Collier routinely uses 300-600 tracks in his mostly-vocal overdubs, and so for sections where there's something going on in all tracks (rare), it's more in the range of 400k-900k samples to be dealt with.

Sure, track counts in modern DAW productions are huge. But like you note in practice most tracks are empty most of the time and it's pretty easy to architect a mixer than can optimize for that. There's no reason to iterate over a list of 600 tracks and add 0.0 to the accumulated sample several hundred times.

> If you were actually synthesizing those samples rather than just reading them from disk, the workload could exceed the video workload.

Yes, but my point is that you aren't. Consumers do almost no real-time synthesis, just a little mixing. And producers are quite comfortable freezing tracks when the CPU load gets too high.

I guess the interesting point to focus on is that with music production, most of it is not real-time and interactive. At any point time, the producer's usually only tweaking, recording, or playing a single track or two and it's fairly natural to freeze the other things to lighten the CPU load.

This is somewhat analogous to how game engines bake lighting into static background geometry. They partition the world into things that can change and things that can't and use pipelines approach for each task.

> And then there's the result of missing the audio buffer deadline (CLICK! on every speaker ever made)

Agreed, the failure mode is catastrophic with audio. With video, renderers will simply use as much CPU and GPU as they can and players will max everything out. With audio, you set aside a certain amount of spare CPU as headroom so you never get too close to the wall.


I think the most concrete argument in support of your position is that typical high-performance audio processing doesn't take an external specialized supercomputer-for-your-computer with 12.3 gazillion parallel cores, a Brazil-like network of shiny ducts hotter than a boiler-room pipe and a memory bus with more bits than an Atari Jaguar ad.


> doesn't take an external specialized supercomputer-for-your-computer with 12.3 gazillion parallel cores,

Not anymore at least. :)

In the early days of computer audio production, it was very common to rely on external PCI cards to offload the DSP (digital signal processing) because CPUs at the time couldn't handle it.


You're restating the grandparents point: The only people doing real heavyweight real-time audio are music producers. Once Jacob Collier is done rendering his song, it's distributed as a YouTube video with single stereo audio track.

And, talking about learning lessons from other fields, there's no particular reason that Jacob has to to render his audio at full fidelity in real time. Video editors usually do their interactive work with a relatively low fi representation, and do the high fidelity rendering in a batch process that can take hours or days. As I'm sure you're aware.


I don't know much about video editors. Film sound editors in movie post-production do not work the way you describe.


> Film sound editors in movie post-production do not work the way you describe.

They don't generally lower fidelity. That is needed for video simply because the data sizes for video footage are so huge they are to work with.

But DAWs do let users "freeze", "consolidate" or otherwise pre-render effects so that everything does not need to be calculated in real-time on the fly.


I'm the original author of a DAW, so I'm more than familiar with what they do :)

Film sound editors do not do what you're describing. They work with typically 600-1000 tracks of audio. They do not lower fidelity, they do not pre-render. Ten years ago, one of the biggest post-production studios in Hollywood used TEN ProTools system to be able to function during this stage of the move production process.


Sure, but that's an extreme use case.

Typical DAW users are using a pro-sumer machine and bouncing tracks as necessary to fit within their CPU limits.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: