Is it possible to use the "deep dream" methods with a network trained for audio such as this? I wonder what that would sound like, e.g., beginning with a speech signal and enhancing with a network trained for music or vice versa.
Interesting! So if I understand correctly, much of the noise in the generated audio is due to the noise in the learned filters?
I assume some regularization is added to the weights during training, say L1 or L2? If this is the case, this essentially equivalent to assuming the weight values are distributed i.i.d. Laplacian or Gaussian. It seems you could learn less noisy filters by using a prior that assumes dependency between values within each filter, thereby enforcing smoothness or piecewise smoothness of each filter during training.
The piano stuff already seemed like 'dream music', as did the 'babble' examples. I found myself terribly frustrated by how short all those examples were. I wanted lots more :)