A statistical estimation problem where you are not trying to estimate a function of a distribution of random observations, but the distribution itself. In a sense, all of statistics implicitly does density estimation, but this is often instrumental in the course of discovering the some actual parameter of interest. (Although maybe you interest in Bayesian statistics and you care a ;lot about the shape of the posterior density in particular.)

So, estimating distributions nonparametrically is not too weird a function approximation problem. We wish to find a function a density function \(f:\mathcal{X}\to\mathbb{R}\) such that \(\int_{\mathcal{X}}f(x)dx=1\) and \(\forall x \in \mathcal{X},f(x)\geq 0\).

We might set ourselves different loss functions than usual in statistical regression problems; Instead of, e.g. expected \(L_p\) prediction error we might use a traditional function approximation \(L_p\) loss, or a probability divergence measure.

The most common density estimate, that we use implicitly all the time,
is to *not* work with densities as such but distributions.
We take the
empirical distribution as a distribution estimate;
that is, taking the data as a model for itself.
This has various non-useful features such as being rough and rather hard to
visualise as a density.

Question:
When would I *actually* want to estimate, specifically, a density?

Visualisation, sure. Nonparametric regression without any better ideas. As a latent parameters in a deep probabilistic model.

What about non-parametric *conditional* density estimation? Are there any
general ways to do this?

## Divergence measures/contrasts

There are many choices for loss functions between densities here; any of the probability metrics will do. For reasons of tradition or convenience, when the object of interest is the density itself, certain choices dominate:

- \(L_2\) with respect to the density over Lebesgue measure on the state space, which we call the MISE, and works out nicely for convolution kernels.
- KL-divergence. (May not do what you want if you care about performance near 0. See (Hall 1987).)
- Hellinger distance
- Wasserstein divergences.
- β¦

But having chosen the divergence you wish to minimise, you now have to choose with respect to which criterion you wish to minimise it? Minimax? in probability? In expectation? β¦? Every combination is a different publication. Hmf.

## Minimising Expected (or whatever) MISE

This works fine for Kernel Density Estimators where it turns out just be a Wiener filter where you have to choose a bandwidth. How do you do this for other estimators, though?

## Connection to point processes

There is a connection between spatial point process intensity estimation and density estimation. See Densities and intensities.

## Spline/wavelet estimations

π

## Mixture models

See mixture models.

## Gaussian processes

Gaussian process can provide posterioer densities over densities somehow? (Tokdar 2007; Lenk 2003)

## Renormalizing flow models

A.k.a. measure transport etc. Where one uses reparameterisation. π

## k-NN estimates

Filed here because too small to do elsewhere.

To use nearest neighbor methods, the integer k must be selected. This is similar to bandwidth selection, although here k is discrete, not continuous. K.C. Li (Annals of Statistics, 1987) showed that for the knn regression estimator under conditional homoskedasticity, it is asymptotically optimal to pick k by Mallows, Generalized CV, or CV. Andrews (Journal of Econometrics, 1991) generalized this result to the case of heteroskedasticity, and showed that CV is asymptotically optimal.

## Kernel density estimators

a.k.a. kernel smoothing.

### Fancy ones

HT Gery Geenens for a lecture he just gave on convolution kernel density estimation where he drew a parallel between additive noise in kde estimation and multiplicative noise in non-negative-valued variables.

## References

*Journal of Econometrics*47 (2): 359β77.

*Conditional Specification of Statistical Models*. Springer Science & Business Media.

*The Annals of Statistics*19 (3): 1347β69.

*Computational Statistics & Data Analysis*36 (3): 279β98.

*arXiv:1308.3968 [Stat]*, August.

*Journal of the Royal Statistical Society. Series B (Methodological)*51 (1): 81β92.

*arXiv:0808.1416 [Math, Stat]*, August.

*Nonparametric Statistics for Stochastic Processes: Estimation and Prediction*. 2nd ed. Lecture Notes in Statistics 110. New York: Springer.

*Journal of the Royal Statistical Society: Series B (Methodological)*27 (2): 332β37.

*Proceedings of the 25th International Conference on Machine Learning*, 192β99. ICML β08. New York, NY, USA: ACM Press.

*Combinatorial Methods in Density Estimation*. Springer Series in Statistics. New York: Springer.

*NIPS*.

*The Annals of Statistics*35 (6): 2504β35.

*Statistical Science*11 (2): 89β121.

*Stochastic Processes and Their Applications*39 (2): 345β58.

*Simulation Conference, 2008. WSC 2008. Winter*, 560β68.

*Journal of the American Statistical Association*88 (422): 495β504.

*The Annals of Statistics*15 (4): 1491β1519.

*Journal of the American Statistical Association*99 (468): 1015β26.

*Unpublished Manuscript*.

*The Annals of Statistics*18 (3): 999β1010.

*Institute of Mathematical Statistics Lecture Notes - Monograph Series*, 359β83. Beachwood, OH: Institute of Mathematical Statistics.

*Advances in Statistical Modeling and Inference*, 613β34.

*Computational Statistics & Data Analysis*12 (3): 327β47.

*Journal of Computational and Graphical Statistics*1 (4): 301β28.

*arXiv:1702.07028 [Cs]*.

*Journal of Computational and Graphical Statistics*12 (3): 548β65.

*The Annals of Statistics*15 (3): 958β75.

*Methodology and Computing in Applied Probability*14 (3): 567β78.

*The Annals of Statistics*38 (3): 1733β66.

*The Annals of Statistics*44 (2): 771β812.

*Zeitschrift FΓΌr Wahrscheinlichkeitstheorie Und Verwandte Gebiete*28 (3): 207β26.

*Journal of Statistical Planning and Inference*141 (1): 115β39.

*Scandinavian Journal of Statistics*37 (2): 321β37.

*Journal of Statistical Planning and Inference*128 (1): 79β93.

*arXiv:1905.11255 [Cs, Math, Stat]*, May.

*Journal of Computational Neuroscience*29 (1-2): 171β82.

*Journal of Machine Learning Research*18 (57).

*International Conference on Artificial Intelligence and Statistics*, 781β88.

*Communications on Pure and Applied Mathematics*66 (2): 145β64.

*Communications in Mathematical Sciences*8 (1): 217β33.

*Journal of Computational and Graphical Statistics*16 (3): 633β55.

*Neural Networks: The Official Journal of the International Neural Network Society*10 (1): 99β109.

## No comments yet. Why not leave one?