Much of the work in speech synthesis has been about closing the gap in vocoders,...

sdenton4 · on June 11, 2020

WaveRNN (and even slimmer versions, like LPCNet) are great, and run for a tiny fraction of the compute of the original WaveNet. Pruning is also a good way to reduce model sizes.

I'm not sure what's up with the WaveGLOW (17.1M) example in the linked wavenode comparison... The base WaveGLOW sounds reasonable, though. They're also using all female voices, which strikes me as dodgy; lower male voice pitch tracking is often harder to get right, and a bunch of comparisons without getting into harder cases or failure modes makes it seem like they're covering something up.

(I've run into a bunch of comparisons for papers in the past where they clearly just did a bad job of implementing the prior art. There should be a special circle of hell...)

audiohermit · on June 11, 2020

Agreed. I didn't have a better comparison at hand.

I'm looking at you GAN papers.