"Typical statistics work is to use a known good model and estimate its parameter...

fxtentacle · on July 3, 2020

Yes, I have reduced both statistics and ML to the subsets that are usually used when working in the field, because the blog post was about employment options.

I would wager that people doing non-parametric statistics are both very rare and most likely advertise themselves as machine learning experts, not as statisticians.

As for the random network, I was referring to https://arxiv.org/abs/1911.13299 and I have seen similar effects in my own work where a new architecture was performing significantly better before training than the old one was after training.

If you want a generally agreed upon example, it'd be conv nets with a cost volume for optical flow. What the conv nets do is to implement a glorified hashing function for a block of pixels. That'll work almost equally well with random parameters. As the result, PWC-Net already has strong performance before you even start training it.

contravariant · on July 3, 2020

>As for the random network, I was referring to https://arxiv.org/abs/1911.13299 and I have seen similar effects in my own work where a new architecture was performing significantly better before training than the old one was after training.

The fact that a dense neural network with 20M parameters performs equally well as a model with 20M random values and 20M _bits_ worth of parameters means nothing more than that the parameter space is ridiculously large.

The only models that perform well given random parameters are those that are sufficiently restrictive. Like weather forecasts, where perturbations of the initial conditions give a distribution of possible outcomes. Machine learning models are almost never restrictive.

fxtentacle · on July 3, 2020

Of course, I agree with you that the parameter space is ridiculously large. But sadly, that's what people do in practice. And with 20mio, their example is still small in comparison to GPT-3 with 175 billion parameters.

I disagree with you on the restrictive part. Those ML models that are inspired by biology tend to be restrictive, the same way that the development of mammal brains is assumed to be restricted by genetically determined structure. Pretty much all SOTA optical flow algorithms are restricted in what they can learn. And those restrictions are what makes convergence and unsupervised learning possible, because the problem by itself is very ill posed.

srean · on July 3, 2020

Non-parametric statistics blurs the lines a bit, prequential statistics (ala Dawid) blurs the lines even more, but he is not wrong. A traditional statistician will be excited about a method because it can recover the parameter (be it finite dimensional, or infinite dimensional). On the other hand an ML person will be excited about a method because, even if the method sucks at recovering the parameters, it does well on the prediction task (if it can be shown that it approaches the theoretical limit of the best that one can do, no matter what the distribution of the dats, and it can do so with efficient use of compute power, ... that would be the holy grail).