Valid-only convolution (in the MATLAB sense) by itself reduces the dimensionalit...

Valid-only convolution (in the MATLAB sense) by itself reduces the dimensionality of the input; for images, it will go from (h x w) to (h - kh + 1) x (w - kw + 1) per each plane.

You can think of a convnet as a series of feature transformations, consisting of a normalization/whitening stage, a filter bank that is a projection into a higher dimension (on an overcomplete basis), non-linear operation in the higher dimensional space, and then possibly pooling to a lower dimensional space.

The “filter bank” (aka convolution) and non-linearity produce a non-linear embedding of the input in a higher dimension; in convnets, the “filter bank” itself is learned. Classes or features are easier to separate in the higher dimensional space. There are some still-developing ideas on how all this stuff is connected to wavelet theory on firmer mathematical ground and the like, but for the most part, it just works "really well".

For an image network, at each layer there are (input planes x output planes) convolution kernels of size (kh x kw).

Each output plane `j` is a sum over all input planes `i` individually convolved using the filter (i, j); the reduction dimension is the input plane.