Hacker News new | past | comments | ask | show | jobs | submit login

As for practical use cases, one is to find an approximate optimization to a function

- You want to find the min/max of some probability distribution P(x)

- P(x) is too complicated to find a closed-form min, but you can draw samples from it.

- So instead, you carefully construct some OTHER probability distribution Q(x|θ) that you claim is structurally similar "enough" to P(x), parameterized by θ.

- Now you find the theta which minimizes the KL divergence KL(P(x) || Q(x|θ)), which is equivalent to delivering you the parameters of θ to Q(x|θ) that make it [approximately] "most" similar to P(x) without ever having minimized P(x)

It was a trick that came up a lot when AI consisted of giant Bayesian plate models for each specific task that you had to hand-optimize.




Note that "drawing samples from P(x)" means to have training data drawn from P(x).

You can form the 'empirical' probability distribution P'(x) from your n training samples {x_i}, with P'(x_i) = 1/n and P'(x) = 0 for all other x.

Then finding the θ which minimizes KL(P'(x) ∥ Q(x|θ)) is equivalent to finding the maximum likelihood estimate (MLE) given your training data.

(Note: I don't know what's meant by "the min/max of some probability distribution P(x)" and suggest ignoring that)


MLE | training data

Just writing hand wavily :)




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: