Web Audio API has a similar mechanism for generating samples with JS and it's ju...

cromwellian · on Feb 5, 2014

>Web Audio API has a similar mechanism for generating samples with JS and it's just as bad. In fact, Web Audio is worse for playback of JS-synthesized audio, not better. And lots of use cases demand playback of JS-synthesized audio: emulators, custom audio codecs, synths, etc.

Generating audio samples via JS is an escape hatch, the same way rasterizing to a raw framebuffer is an escape hatch. If your system had audio acceleration hardware, and many systems do (e.g. DirectSound/OpenAL), you want to leverage that.

If you are deploying a game to a mobile device, the last thing you want to do is waste CPU cycles burning JS to do DSP effects in software. This is especially awful for low end phones like the ones that Firefox OS runs on. Latency on audio is already terrible, you don't want the browser UI event loop involved IMHO in processing it to feed it. Maybe if you had speced it being available to Web Workers without needing the UI thread involved it would make more sense.

>The Web Audio API introduced an interconnected web of dozens of poorly tested, poorly specified components that covered some arbitrary subset of cases users care about

The arbitrary subset being, those that have been on OSX for years? AFAIK, Chris Rogers developed Web Audio based on years of experience working on Core Audio at Apple, and therefore, at minimum, it represents at least some feedback from the use cases of the professional audio market, at the very least, Apple's own products like Garage Band and Logic Express which sit atop Core Audio.

You assert that the other use cases could have been handled by just extending existing APIs, but to me this argument sounds like the following:

"You've just dumped this massively complex WebGL API on us, it has hundreds of untested functions. It would be better to just have <canvas> or <img> with a raw array that you manipulate with JS. Any other functions could be done [hand-wave] by just bolting on additional higher level apis. Like, if you want to draw a polygon, we'd add that."

APIs like Core Audio, Direct Sound, OpenGL have evolved because of low level optimization to match the API to what the hardware can accelerate efficiently. In many cases, bidirectional, so that HW influences the API and vice-versa. Putting the burden on the WG to reinvent what has already been done for years is the wrong way to go about it. Audio is a solved problem outside JS, all that was needed was good idiomatic bindings for either Core Audio, OpenAL, or DirectSound.

Whenever I see these threads on HN, I always get a sense of a big dose of NIH from Mozilla. Whenever anyone else proposes a spec, there's always a complaint about complexity, like Republicans complaining about laws because they are too long, when in reality, they don't like them either for ideological reasons, or political ones.

Mozilla is trying to build a web-based platform for competing with native apps. You can see it in asm.js and Firefox OS. And they are not going to get there if they shy away from doing the right things because they are complex. Mobile devices need all the hardware acceleration they can get, and specing out a solution that requires JS to do DSP sound processing is just an egregious waste of battery life and cycles IMHO for a low end web-OS based mobile HW.

_delirium · on Feb 6, 2014

> Generating audio samples via JS is an escape hatch, the same way rasterizing to a raw framebuffer is an escape hatch.

But that escape hatch is where all the interesting innovation happens! It's great that canvas exists, and it's much more widely used than WebGL is, because it's more flexible and depends on less legacy cruft. You don't have to use it, but I do, and I'd like a "canvas for audio" too.

To make matters worse, the Web Audio stuff is much less flexible than OpenGL. You can at least write more or less arbitrary graphics code in OpenGL: it's not just an API for playing movie clips filtered through a set of predefined MovieFilter nodes. You can generate textures procedurally, program shaders, render arbitrary meshes with arbitrary lighting, do all kinds of stuff. If this were still the era of the fixed-function OpenGL 1.0 pipeline, it'd be another story, but today's OpenGL at least is a plausible candidate for a fully general programmable graphics pipeline.

Web Audio seems targeted more at just being an audio player with a fixed chain of filter/effect nodes, not a fully programmable audio pipeline. How are you going to do real procedural music on the web, something more like what you can do in Puredata or SuperCollider or even Processing, without being able to write to something resembling an audio sink? Apple cares about stuff like Logic Express, yes, but that isn't a programmable synth or capable of procedural music; while I care about is the web becoming a usable procedural-music platform. One alternative is to do DSP in JS; another is to require you to write DSP code in a domain-specific language, like how you hand off shaders to WebGL. But Web Audio does the first badly and the 2nd not at all!

> Audio is a solved problem outside JS

Yeah, and the way it's solved is that outside JS, you can just write a synth that outputs to the soundcard...

cromwellian · on Feb 6, 2014

WebGL is less used than Canvas for the most part, because 3D and linear algebra are much more difficult to work with for most people than 2D. Also, people work with raw canvas image arrays much more rarely than they do the high level functions (stroke/fill/etc)

OpenGL was still a better API than a raw framebuffer even when it was just a fixed function pipeline. Minecraft for example is implemented purely with fixed-function stuff, no shaders. It isn't going to work if done via JS rasterization.

Yes, there are people on the edge cases doing procedural music, but that is a rare use case compared to the more general case of people writing games and needing audio with attenuation, 3D positional HRTF, doppler effects, etc. That's the sweet spot that the majority of developers need. Today's 3D hardware includes features like geometry shaders/tessellation, but most games don't use them.

OpenSL/AL would work a lot better if it had "audio shaders". Yes. But if your argument is that you want to write a custom DSP, then you don't want Data Audio API, what you want is some form of OpenAL++ that exposes an architecture neutral shader language for audio DSPs, that actually compiles your shader and uploads it to the DSP. Or, you want OpenCL plus a pathway to schedule running the shaders and copying the data to the HW that does not involve the browser event loop.

That said, if there was a compelling need for the stuff you're asking for, it would have been done years ago. None of the professional apps, nor game developers, have been begging for Microsoft Direct Sound, Apple, or Khronos to make audio shaders. There was a company not to long ago, Aureal 3D, which tried to be the "3dfx of audio", but failed, but it turns out, most people just need a set of basic sound primitives they can change together.

I have real sympathy for your use case. For years, I dreamed of sounds being generated in games ala PhysX, really simulating sound waves in the environment, and calculating true binaural audio, the way Oculus Rift wants to deliver video to your senses, taking into account head position. To literally duplicate the quality of binaural audio recordings programmatically.

But we're not there, the industry seems to have let us down, there is no SGI, nor 3dfx, nor Nvidia/AMD "of audio" to lead the way, and we certainly aren't going to get there by dumping a frame buffer from JS.

Right now, the target for all this stuff, Web GL, Web Audio, et al, it exposing APIs to bring high performance, low latency games to the web. I just don't see doing attenuation or HRTF in JS as compatible with that.

_delirium · on Feb 6, 2014

I agree that for games the market hasn't really been there, and they're probably served well enough by the positional-audio stuff plus a slew of semi-standard effects. And I realize games are the big commercial driver of this stuff, so if they don't care, we won't get the "nVidia of audio".

I'm not primarily interested in games myself, though, but in computer-music software, interactive audio installations, livecoding, real-time algorithm and data sonification, etc. And for those use cases I think the fastest way forward really just is: 1) a raw audio API; and 2) fast JS engines. Some kind of audio shader language would be even better perhaps, but not strictly necessary, and I'd rather not wait forever for it. I mean to be honest I'd be happy if I could do on the web platform today what I could do in 2000 in C, which is not that demanding a level of performance. V8 plus TypedArrays brings us pretty close, from just a code-execution perspective, certainly close enough to do some interesting stuff.

Two interesting things I've run across in that vein that are starting to move procedural-audio stuff onto the web platform:

* http://charlie-roberts.com/gibber/info/?page_id=6

* http://www.bfxr.net/

There are already quite a few interactive-synth type apps on mobile, so mobile devices can do it, hardware-wise. They're just currently mostly apps rather than web apps. But if you can do DSP in Dalvik, which isn't really a speed demon, I don't see why you can't do it in V8.

Edit: oops, the 2nd one is in Flash rather than JS. Take it instead then as example of the stuff that would be nice to not have to do in Flash...

kevingadd · on Feb 6, 2014

Your argument for the superiority of the Web Audio API seems to be 'it's like Core Audio', and you seem to argue that WebGL is great just because it's like OpenGL. What actually supports this argument? Would you be a big fan of WebDirect3D just because it was exactly like Direct3D? After all, virtually all Windows games and many windows desktop apps use Direct3D, so it must be the best. 3D games on Linux use OpenGL so it must be the best. If you're going to argue that it's good to base web APIs on existing native APIs, why not OpenAL -> WebAL, like OpenGL -> WebGL?

Specs need to be evaluated on the merits. The merits for the Web Audio API at time of release:

* Huge

* Poorly-specified (the API originally specified two ways to load sounds, the simplest of which blocked the main thread for the entire decode! very webby. Spec was full of race conditions from day 1 and some of them still aren't fixed.)

* Poorly-tested

* Large, obvious functionality gaps (you can't pause playback of sounds! you can't stitch buffers together! you can't do playback of synthesized audio at rates other than the context rate!)

* Incompatible with existing <audio>-based code (thanks to inventing a new, inferior way for loading audio assets), making all your audio code instantly browser-specific

* Large enough in scope to be difficult to implement from scratch, even given a good specification (which was lacking)

* A set of shiny, interesting DSP/filter chain features, like convolution and delay and HRTF panning and so on, useful for specific applications

* Basic support for playback and mixing roughly on par with that previously offered by <audio>, minus some feature gaps

The merits for the old Mozilla audio data API at the time of Web Audio's release:

* Extends the <audio> element's API to add support for a couple specific features that solve an actual problem

* Narrow scope means that existing audio code remains cross-browser compatible as long as it does not use this specific API

* The specific features are simple enough to trivially implement in other browsers

You keep making insane leaps like 'Web Audio is good because it's like Core Audio' and 'Mozilla wants you to write DSPs in JavaScript because ... ????' even though there's no coherent logic behind them and there's no evidence to actually support these things. A way to synthesize audio in JavaScript does not prevent the introduction of an API for hardware DSP mixing or whatever random oddball feature you want; quite the opposite: it allows you to introduce those new APIs while offering cross-browser compatible polyfills based on the older API. The web platform has been built on incremental improvement and graceful degradation.

P.S. Even if the Web Audio API were not complete horseshit at the point of its introduction, when it was introduced the Chrome team had sat on their asses for multiple versions, shipping a completely broken implementation of <audio> in their browser while other vendors (even Microsoft!) had support that worked well enough for playing sound effects in games. It's no coincidence that web developers were FORCED to adopt Google's new proprietary API when the only alternative was games that crashed tabs and barely made sound at all.

cromwellian · on Feb 6, 2014

Isn't the point of introducing something actually getting it fixed from feedback in the WG? So you're complaint is, someone introduced a draft of an idea with a prototype implementation, and you're pissed it wasn't perfect the first time around?

Calling something someone worked on, who happens to be a domain expert, "horseshit" seems a little extreme don't you think? Were not most of the initial problems resolved by WG feedback or not? If yes, they hurray, the WG fulfilled it's purpose. If every feature arrived complete with no problems, there's be little need for a WG, emphasis on the 'W'.

Also "* A set of shiny, interesting DSP/filter chain features, like convolution and delay and HRTF panning and so on, useful for specific applications" Specific, as in, the vast majority of applications. This would be like pissing all over CSS or SVG filters because they don't include a pixel shader spec. 3D positional sound and attenuation are the two features used by the vast majority of games. Neither most applications nor games resort to hand written DSP effects.

As for the <audio> tag playback. Here's a thread where I already had this debate with Microsoft (http://cromwellian.blogspot.com/2011/05/ive-been-having-twit...). Even Microsoft's implementation of <audio> was not sufficient for games or for music synthesis. First of all, their own demo had audible pops because the JS event loop could not schedule the sounds to play on queue. For games like Quake2, which we ported to the Web using GWT, some sound samples were extremely short (the machine gun sound) and were required to be played back-to-back seamlessly as long as the trigger was pulled to get a nice constant machine gun sound. This utterly fails with <audio> tag playback, even on IE (in wireframe canvas2d mode of course). Another port I did, which was a Commodore 64 SID player had the same issue. So let's dispense with the myth that using basic <audio> is sufficient for games. It lacks the latency control to time playback properly even on the best implementation. For Quake2, which features distance attenuation and stereo-positioning,

On the issue of Web Audio / Core Audio, my point there is merely that all of your bashing ignores the point that it is in fact, derived from a mature API that was developed from professional industry requirements over a long time. You keep bashing the built in filters, but those are the tried and true common cases, it's like bashing Porter-Duff blending modes because it's not as general as a shader.

As for Direct3D. You do realize that OpenGL for a long time sucked and was totally outclassed by DirectX? Shaders were introduced by Microsoft. A lot of advanced features arrived on Windows first, because the ARB was absolutely parallelized. So yes, if someone created a "Web3D" API based on Direct3D, it would still be better than Canvas2D, even if you had to write a Direct3D->OpenGL mapping layer. I don't have many bad things to say about DirectX8+, Microsoft did a good job pushing the whole industry forward. DirectX was the result of the best and brightest 3D IHVs and ISVs contributing to it, and so it would be unwise to discount it just because it is proprietary Microsoft.

And Web developers were not forced to adopt Web Audio. For a long time, people used flash shims. In fact, when we shipped Angry Birds for the Web, it used a flash-shim. If Data Audio API lost at the WG, you can't blame Google, people on the WG have free will, and they could have voted down Web Audio in favor of Data Audio, regardless of what Chrome ships.

What I'm hearing in this context however is that you are content in ignoring what most developers wanted. People trying to build HTML5 games needed a way to do the same things people do in games on the Desktop or on consoles with native APIs. The Mozilla proposal did not satisfy these, with no ability to easily do simple things like 3D positioning or distance fallout without dumping in a giant ball of expensive JS into games that developers were already having performance issues with.