Agreed -- very odd to use a parameter (bin width) in a nonparametric estimation....

abhgh · 2024-04-12T03:06:25 1712891185

"nonparametric" is somewhat of a (confusing!) misnomer in that it doesn't mean no parameters, but lots of them where the # of parameters grows with # of instances [1]. In all of these cases the models have some general parameter(s) as well.

Some simple examples would be the bin-width and bandwidth in the histogram and the kernel density estimator. A somewhat complex example would be Dirichlet Process-based Mixture Models [2]; this has a "concentration" parameter. The terminology is used outside of density estimation too, e.g., Support Vector Machines (SVM) and k-Nearest Neighbors are considered nonparametric [3].

[1] For ex, see https://stats.stackexchange.com/a/268646, or https://youtu.be/I7bgrZjoRhM?si=VOEENs773SXlEMxm&t=300

[2] https://www.gatsby.ucl.ac.uk/~ywteh/research/npbayes/dp.pdf

[3] https://stats.stackexchange.com/a/237704

vvanirudh · 2024-04-12T01:08:35 1712884115

If sampling from the density is the only goal, then you are absolutely right. Can directly estimate empirical CDF as you pointed below. But histograms can still be useful to approximate the PDF itself? (taking the derivative of the empirical CDF to estimate PDF is wild as you said)

andrewla · 2024-04-12T16:39:07 1712939947

Yes, if you need an approximate pdf. I've found that when I'm working non-parametrically (or in robust statistics) I like to stay in cdf space or use quantile functions more than trying to use those nasty derivatives.