Yes, I meant to run 2D CNN over generated spectrograms. We do something similar ...

throwqwerty · on Feb 28, 2020

why run a CNN over a spectrogram? besides what I said (convolutions in frequency domain are multiplications in time domain) an FT is linear. if classifying using those features were effective then your CNN would've learned the DFT matrix weights from the original signal.

kd5bjo · on Feb 28, 2020

A spectrogram has time on one axis and frequency on the other, so the ultimate result is a multiplication in one dimension and a convolution in the other. It can be used to show things like when a note starts and stops in a piece of music, which is difficult in either purely-time or purely-frequency space.

Also, it’s computationally intractable to individually train 2^N weights. What a CNN does instead is train a convolution kernel which is passed over the whole domain to produce the input for the next layer; by operating in frequency space, it’s considering the basis functions e^{j omega +- epsilon} instead of delta(x +- epsilon)

throwqwerty · on Feb 28, 2020

my mistake i didn't realize spectrogram and spectrum were distinct objects.

>Also, it’s computationally intractable to individually train 2^N weights.

that's a good point - i'd forgotten for a moment (because i'm so used to cooley-tukey fft) that in principle getting the spectrum involves a matmul against the entire vector. which brings up a potentially interested question: can you get a DNN to simulate the cooley-tukey fft (stride permutations and all).

kaoD · on Feb 28, 2020

A spectrogram isn't a FT. It's a FT sampled over time.

(Not sure if that matters, I don't know much about NNs, but seemed like an important distinction.)

Muller20 · on Feb 28, 2020

because convolutions are faster during training and after the deployment and they are also easier to train.