Why estimate PDF through histogram then convert to CDF, when one can estimate CDF directly? Doing so also avoids having to choose bin width that can have substantial impact.
Agreed -- very odd to use a parameter (bin width) in a nonparametric estimation. Just use the raw data. In numerical analysis, broadly speaking, integrals are stable while derivatives are wild; an empirical cdf is a nice smooth integral of the messy pdf.
"nonparametric" is somewhat of a (confusing!) misnomer in that it doesn't mean no parameters, but lots of them where the # of parameters grows with # of instances [1]. In all of these cases the models have some general parameter(s) as well.
Some simple examples would be the bin-width and bandwidth in the histogram and the kernel density estimator. A somewhat complex example would be Dirichlet Process-based Mixture Models [2]; this has a "concentration" parameter. The terminology is used outside of density estimation too, e.g., Support Vector Machines (SVM) and k-Nearest Neighbors are considered nonparametric [3].
If sampling from the density is the only goal, then you are absolutely right. Can directly estimate empirical CDF as you pointed below. But histograms can still be useful to approximate the PDF itself? (taking the derivative of the empirical CDF to estimate PDF is wild as you said)
Yes, if you need an approximate pdf. I've found that when I'm working non-parametrically (or in robust statistics) I like to stay in cdf space or use quantile functions more than trying to use those nasty derivatives.
If the data is continuous, use kernel density estimation (KDE) instead of histograms to visualize the probability density, since KDE will give a smoother fit. A similar idea is to fit a mixture of normals -- there are numerous R packages for this and sklearn.mixture.GaussianMixture in SciPy.
Yep! The next post would be on Kernel density estimation -- wanted to start from histograms as they are still a useful tool in 1-D and 2-D density estimation, and you don't have to store the data either (unlike KDE)
'I will describe a very popular nonparametric method, Kernel Density Estimation, that also follows strategy 1 and is much more scalable to higher dimensions than histograms.'