This is a well known problem. The noise is due to mu-law compression. The 16 bit...

Tade0 · on April 14, 2021

How does the number of parameters scale with resolution?

Specifically, how much slower this would be if the audio was, say, 10 bits?

I recall a lab exercise in college where we were supposed to increase the resolution of a quantizer until we reached a decent tone and 10 bits were the point at which we reached satisfying quality.

xcodevn · on April 14, 2021

It is a single matrix multiplication to predict probabilities of all possible outputs. For example, with a hidden state of 1024 dimensions, and 8 bits output, it is 1024x256 parameters. 10 bits will need 1024x1024 params.