Could you point the pages where you saw where Gaussian is assumed. To me it talk...

graycat · on Nov 13, 2018

What you said about a random number generator is a trivial test and has big problems with power (probability of Type II error) of the test.

For power, I did mention Neyman-Pearson.

In one paper I wrote on such things, multi-dimensional and distribution-free, I did some derivations, not easy to do (the paper was published in Information Sciences) for probability of Type I error, significance of the test, and used a classic result, LeCam called "tightness" of S. Ulam to show that the power of the test was better than that of a trivial test. Ulam's result is in P. Billingsley, Convergence of Probability Measures. Billingsley was long at U. Chicago.

srean · on Nov 14, 2018

> What you said about a random number generator is a trivial > test and has big problems with power (probability of Type II > error) of the test.

erm that was exactly the last paragraph of my comment.

In ML distribution free multivariate tests have a slightly different flavor. The way those work is you compare the empirical expectation of all functions in a class (typically a Hilbert space). This is reminiscent of Cramer Wold. Separability properties of reproducing kernel Hilbert space makes it tractable to compare these.

graycat · on Nov 14, 2018

Good to know. That is WAY beyond anything in the statistics in the pure/applied math department where I got my Ph.D. But in that department I ignored their interests in statistics as much as I could which was nearly total.

As I mentioned, I got an, admittedly weak, power result using Ulam's tightness. Yes, it's weak, but it's quite general.

If I return to the mathematical statistics of hypothesis testing, I'll look into what you mentioned.

From what you mentioned, it appears that some of ML is actually being mathematical: I continue to believe that computer science is 'out of gas' and needs some new directions and those about have to be from, with, in pure/applied math, applied to problems newly important because of computing, but, still, pure/applied math.

srean · on Nov 15, 2018

The "learning theory" part of ML has always been rigorous in their arguments. Same for top tier conferences and journals. ML is really applied math that stands on the shoulders of functional analysis, optimization, probability theory and algorithms. The key way in which the flavor of ML is different from the flavor of statistics is that ML theorems are about guarantees on prediction quality whereas in statistics guarantees are on recovering parameters (which could be infinite dimensional). This is a broad generalization so there will be edge cases.

BTW when is the launch. All the best.