Hacker News new | past | comments | ask | show | jobs | submit login

Could you point the pages where you saw where Gaussian is assumed. To me it talked more heavily about multidimensional non-parametric methods than parametric, let alone Gaussian. It does use Gaussian as a warm up motivating example in the beginning.

The rest that you say is pretty standard stuff and should be elementary knowledge for anyone working seriously in this domain( * ). Am I missing something ?

> For getting an hypothesis test from those algorithms will usually mean doing a derivation to find the probability of Type I error, and such a derivation is commonly difficult to not done yet

Sure its difficult, doesn't mean its not been done. This is bread and butter in ML and non-parametric stats. In fact type I is easy and type I is no where near enough. Setting up an RNG (random number generator) will satisfy the requirements of type I error trivially. The key is type II or in other words key is the proof that the proposed method is better than the bespoke RNG, stated yet another way if power is abysmal it counts for squats.

( * ) Gratuitous assumptions of Gaussianity and incorrect applications of CLT does not qualify as 'working seriously in this domain" for me




What you said about a random number generator is a trivial test and has big problems with power (probability of Type II error) of the test.

For power, I did mention Neyman-Pearson.

In one paper I wrote on such things, multi-dimensional and distribution-free, I did some derivations, not easy to do (the paper was published in Information Sciences) for probability of Type I error, significance of the test, and used a classic result, LeCam called "tightness" of S. Ulam to show that the power of the test was better than that of a trivial test. Ulam's result is in P. Billingsley, Convergence of Probability Measures. Billingsley was long at U. Chicago.


> What you said about a random number generator is a trivial > test and has big problems with power (probability of Type II > error) of the test.

erm that was exactly the last paragraph of my comment.

In ML distribution free multivariate tests have a slightly different flavor. The way those work is you compare the empirical expectation of all functions in a class (typically a Hilbert space). This is reminiscent of Cramer Wold. Separability properties of reproducing kernel Hilbert space makes it tractable to compare these.


Good to know. That is WAY beyond anything in the statistics in the pure/applied math department where I got my Ph.D. But in that department I ignored their interests in statistics as much as I could which was nearly total.

As I mentioned, I got an, admittedly weak, power result using Ulam's tightness. Yes, it's weak, but it's quite general.

If I return to the mathematical statistics of hypothesis testing, I'll look into what you mentioned.

From what you mentioned, it appears that some of ML is actually being mathematical: I continue to believe that computer science is 'out of gas' and needs some new directions and those about have to be from, with, in pure/applied math, applied to problems newly important because of computing, but, still, pure/applied math.


The "learning theory" part of ML has always been rigorous in their arguments. Same for top tier conferences and journals. ML is really applied math that stands on the shoulders of functional analysis, optimization, probability theory and algorithms. The key way in which the flavor of ML is different from the flavor of statistics is that ML theorems are about guarantees on prediction quality whereas in statistics guarantees are on recovering parameters (which could be infinite dimensional). This is a broad generalization so there will be edge cases.

BTW when is the launch. All the best.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: