One of their more interesting applications is in Bayesian Optimization (BO). You...

One of their more interesting applications is in Bayesian Optimization (BO). You'd do BO optimization where evaluating the objective function is expensive. As against, lets say, where you might use gradient descent, where you navigate your space via repeated evaluation of the objective function.

BO essentially starts building its own model of what the objective function looks like in relation to its parameters, and picks a few promising points to evaluate it at. Thus, it moves the burden of evaluating a costly objective multiple times to identifying these highly promising points.

A BO algorithm can use GP to build its own model since the uncertainty estimates that come with it are fairly valuable and help in this seeking of promising points. For ex the GP can say that at so-and-so value of the parameter the objective function is expected to do well, and its confident about it, hence this point can be explored next ... or it can say that at a particular region its extremely uncertain of its estimates, hence the region needs exploring. The BO keeps updating its model everytime the objective function is evaluated. Since at any instant the model encodes knowledge from all past evaluations, this is also sometimes known as Sequential Model Based Optimization (SMBO).

If you need a practical example of its use within Machine Learning, BO with GPs have been used in finding the right hyperparameters for a model - [1], [2]. Traditionally you'd perform a grid-search in the space of hyperparameters, building a model at all points on the grid; BO tells you that you don't need to really do that --- you can be smart of what points in the grid should you actually build your model at.

The problem with GP based BO is scaling. There has been a fair amount of research (its an active area) to address this - sometimes by performing BOs without GPs, and sometimes using GPs but trying to speed it up [3].

[1] A good tutorial on the approach - https://arxiv.org/abs/1012.2599

[2] A library for this https://github.com/JasperSnoek/spearmint The corresponding paper is linked.

[3] This paper is a good example - it suggests doing BO both using GPs (in this case suggesting speed-ups, Section 3) and without it (section 4) https://papers.nips.cc/paper/4443-algorithms-for-hyper-param... . In fact [2] is also a good example of speeding up GP-based BO.