>With a sufficiently magical kernel function, indeed you can get great results with a kernel machine. But it's not so easy to write a kernel function for a domain like image processing, where shifts, scales, and small rotations shouldn't affect similarity much. Let alone for text processing, where it should recognize 2 sentences with similar meaning as similar.
I think the key issue at hand is that gradient descent is easier to train than a model using kernel functions. Someone could absolutely devise a mechanism for back propagation of errors with kernel functions, but at that point, it is basically a neural network.
Wouldn't that (similarity matters) be very useful information to encode in the NN though? Kinda like how you can do non-convolutional deep-learning on images, but it takes exponentially more training data than convolutional deep-learning does.
I think the key issue at hand is that gradient descent is easier to train than a model using kernel functions. Someone could absolutely devise a mechanism for back propagation of errors with kernel functions, but at that point, it is basically a neural network.