I'm a data scientist who uses gaussian processes all the time. They are:
1. Typically very accurate.
2. Sound theory and good uncertainty estimates.
3. Being a Bayesian model, tuning is very easy.
The main competing models for some of the same tasks are gradient boosted decision trees and sometimes neural networks. GBTs win over NN for most tasks in practice, although they don't get much hype. GPs do well with smooth data in my experience, with GBTs winning over any data where a limited number of bespoke decision tree splitting rules can represent the data well.
Interestingly, damn near anything, including neural networks, linear regressionm and GBTs can be interpreted as gaussian processes (or an approximation of GPs) by certain choice of covariance function. GPs are just functions in a reproducing kernel hilbert space defined by the covariance function. That can include most anything.
GPs with full covariance matrices don't scale to more than a few thousand examples (n^3), but approximations can be made to scale to large datasets.
> GBTs win over NN for most tasks in practice, although they don't get much hype
I've always thought GBDTs get too much hype. As a data scientist, it seems like everyone wants to immediately throw a random forest or GBDT at the problem without knowing anything else about it.
Yeah, I think in the data science community, GBDTs are appropriately-hyped, since their dominant performance on Kaggle has been well known for some time now. In addition to that, GBDTs are so easy to run; taken together, it's probably always correct to just run a GBDT as one of the first things you do after you've got the data wrangled. Of course, as a phd-in-training data scientist, I feel disappointed (either in myself or in the task) if I can't think of a more interesting and better performing method than a GBDT :)
How big is the dataset you usually deal with? From my experience, you hit computational bottleneck pretty quickly with GPs (100k datapoints of maybe 100 dimensions is pretty much the max you can deal with).
And if you are talking about dataset of this scale, then I agree with you that GPs are better than NN. However, people are excited about NN capability of dealing with immensely huge and high dimensional dataset not small scale ones.
Thousands to millions. There are approximations that work very well with millions of examples.
Neural networks are empirically outperformed by gradient boosted trees (look at Kaggle competitions) on most practical tasks except for image, sound, and video problems.
Neural networks can be very slow on large datasets. Training can often take days or weeks, even with a GPU. GBTs and GP approximations are faster.
1. Typically very accurate. 2. Sound theory and good uncertainty estimates. 3. Being a Bayesian model, tuning is very easy.
The main competing models for some of the same tasks are gradient boosted decision trees and sometimes neural networks. GBTs win over NN for most tasks in practice, although they don't get much hype. GPs do well with smooth data in my experience, with GBTs winning over any data where a limited number of bespoke decision tree splitting rules can represent the data well.
Interestingly, damn near anything, including neural networks, linear regressionm and GBTs can be interpreted as gaussian processes (or an approximation of GPs) by certain choice of covariance function. GPs are just functions in a reproducing kernel hilbert space defined by the covariance function. That can include most anything.
GPs with full covariance matrices don't scale to more than a few thousand examples (n^3), but approximations can be made to scale to large datasets.