# Density estimation

## Especially non- or semiparametrically

A statistical estimation problem where you are not trying to estimate a function of a distribution of random observations, but the distribution itself. In a sense, all of statistics implicitly does density estimation, but this is often instrumental in the course of discovering the some actual parameter of interest. (Although maybe you interest in in Bayesian statistics and you care a ;lot about the shape of the posterior density in particular.)

So, estimating distributions nonparametrically is not too weird a function approximation problem. We wish to find a function a density function $$f:\mathcal{X}\to\mathbb{R}$$ such that $$\int_{\mathcal{X}}f(x)dx=1$$ and $$\forall x \in \mathcal{X},f(x)\geq 0$$.

We might set ourselves different loss functions than usual in statistical regression problems; Instead of, e.g. expected $$L_p$$ prediction error we might use a traditional function approximation $$L_p$$ loss, or a probability divergence measure.

The most common density estimate, that we use implicitly all the time, is to not work with densities as such but distributions. We take the empirical distribution as a distribution estimate; that is, taking the data as a model for itself. This has various non-useful features such as being rough and rather hard to visualise as a density.

Question: When would I actually want to estimate, specifically, a density?

Visualisation, sure. Nonparametric regression without any better ideas. As a latent parameters in a deep probabilistic model.

What about non-parametric conditional density estimation? Are there any general ways to do this?

## Divergence measures/contrasts

There are many choices for loss functions between densities here; any of the probability metrics will do. For reasons of tradition or convenience, when the object of interest is the density itself, certain choices dominate:

• $$L_2$$ with respect to the density over Lebesgue measure on the state space, which we call the MISE, and works out nicely for convolution kernels.
• KL-divergence. (May not do what you want if you care about performance near 0. See (Hall 1987).)
• Hellinger distance
• Wasserstein divergences.

But having chosen the divergence you wish to minimise, you now have to choose with respect to which criterion you wish to minimise it? Minimax? in probability? In expectation? …? Every combination is a different publication. Hmf.

## Minimising Expected (or whatever) MISE

This works fine for Kernel Density Estimators where it turns out just be a Wiener filter where you have to choose a bandwidth. How do you do this for other estimators, though?

## Connection to point processes

There is a connection between spatial point process intensity estimation and density estimation. See Densities and intensities.

🏗

## Mixture models

See mixture models.

## Gaussian processes

Gaussian process can do this apparently? (Tokdar 2007; Lenk 2003)

## Renormalizing flow models

A.k.a. measure transport etc. Where one uses reparameterisation. 🏗

## k-NN estimates

Filed here because too small to do elsewhere.

To use nearest neighbor methods, the integer k must be selected. This is similar to bandwidth selection, although here k is discrete, not continuous. K.C. Li (Annals of Statistics, 1987) showed that for the knn regression estimator under conditional homoskedasticity, it is asymptotically optimal to pick k by Mallows, Generalized CV, or CV. Andrews (Journal of Econometrics, 1991) generalized this result to the case of heteroskedasticity, and showed that CV is asymptotically optimal.

## Kernel density estimators

### Fancy ones

HT Gery Geenens for a lecture he just gave on convolution kernel density estimation, where he drew a parallel between additive noise in kde estimation and multiplicative noise in non-negative-valued variables.

Andrews, Donald W. K. 1991. “Asymptotic Optimality of Generalized CL, Cross-Validation, and Generalized Cross-Validation in Regression with Heteroskedastic Errors.” Journal of Econometrics 47 (2): 359–77. https://doi.org/10.1016/0304-4076(91)90107-O.

Arnold, Barry C., Enrique Castillo, and Jose M. Sarabia. 1999. Conditional Specification of Statistical Models. Springer Science & Business Media. https://books.google.com.au/books?hl=en&lr=&id=lKeKu_HtMdQC&oi=fnd&pg=PA1&dq=arnold+castillo+sarabia+conditional+specification+of+statistical+models&ots=gxWoVEdsde&sig=p0BJlEeB5yQ052m5YhfQ_A6Kmoo.

Barron, Andrew R., and Chyong-Hwa Sheu. 1991. “Approximation of Density Functions by Sequences of Exponential Families.” The Annals of Statistics 19 (3): 1347–69. https://doi.org/10.1214/aos/1176348252.

Bashtannyk, David M., and Rob J. Hyndman. 2001. “Bandwidth Selection for Kernel Conditional Density Estimation.” Computational Statistics & Data Analysis 36 (3): 279–98. https://doi.org/10.1016/S0167-9473(00)00046-3.

Battey, Heather, and Han Liu. 2013. “Smooth Projected Density Estimation,” August. http://arxiv.org/abs/1308.3968.

Berman, Mark, and Peter Diggle. 1989. “Estimating Weighted Integrals of the Second-Order Intensity of a Spatial Point Process.” Journal of the Royal Statistical Society. Series B (Methodological) 51 (1): 81–92. https://publications.csiro.au/rpr/pub?list=BRO&pid=procite:d5b7ecd7-435c-4dab-9063-f1cf2fbdf4cb.

Birgé, Lucien. 2008. “Model Selection for Density Estimation with L2-Loss,” August. http://arxiv.org/abs/0808.1416.

Bosq, Denis. 1998. Nonparametric Statistics for Stochastic Processes: Estimation and Prediction. 2nd ed. Lecture Notes in Statistics 110. New York: Springer.

Cox, D. R. 1965. “On the Estimation of the Intensity Function of a Stationary Point Process.” Journal of the Royal Statistical Society: Series B (Methodological) 27 (2): 332–37. https://doi.org/10.1111/j.2517-6161.1965.tb01500.x.

Cunningham, John P., Krishna V. Shenoy, and Maneesh Sahani. 2008. “Fast Gaussian Process Methods for Point Process Intensity Estimation.” In Proceedings of the 25th International Conference on Machine Learning, 192–99. ICML ’08. New York, NY, USA: ACM Press. https://doi.org/10.1145/1390156.1390181.

Devroye, Luc, and Gábor Lugosi. 2001. Combinatorial Methods in Density Estimation. Springer Series in Statistics. New York: Springer.

Dinh, Vu, Lam Si Tung Ho, Duy Nguyen, and Binh T. Nguyen. 2016. “Fast Learning Rates with Heavy-Tailed Losses.” In NIPS. http://arxiv.org/abs/1609.09481.

Efromovich, Sam. 2007. “Conditional Density Estimation in a Regression Setting.” The Annals of Statistics 35 (6): 2504–35. https://doi.org/10.1214/009053607000000253.

Eilers, Paul H. C., and Brian D. Marx. 1996. “Flexible Smoothing with B-Splines and Penalties.” Statistical Science 11 (2): 89–121. https://doi.org/10.1214/ss/1038425655.

Ellis, Steven P. 1991. “Density Estimation for Point Processes.” Stochastic Processes and Their Applications 39 (2): 345–58. https://doi.org/10.1016/0304-4149(91)90087-S.

Giesecke, K., H. Kakavand, and M. Mousavi. 2008. “Simulating Point Processes by Intensity Projection.” In Simulation Conference, 2008. WSC 2008. Winter, 560–68. https://doi.org/10.1109/WSC.2008.4736114.

Gu, Chong. 1993. “Smoothing Spline Density Estimation: A Dimensionless Automatic Algorithm.” Journal of the American Statistical Association 88 (422): 495–504. https://doi.org/10.1080/01621459.1993.10476300.

Hall, Peter. 1987. “On Kullback-Leibler Loss and Density Estimation.” The Annals of Statistics 15 (4): 1491–1519. https://doi.org/10.1214/aos/1176350606.

Hall, Peter, Jeff Racine, and Qi Li. 2004. “Cross-Validation and the Estimation of Conditional Probability Densities.” Journal of the American Statistical Association 99 (468): 1015–26. https://doi.org/10.1198/016214504000000548.

Hansen, Bruce E. 2004. “Nonparametric Conditional Density Estimation.” Unpublished Manuscript. http://www.ssc.wisc.edu/~bhansen/papers/ncde.pdf.

Hasminskii, Rafael, and Ildar Ibragimov. 1990. “On Density Estimation in the View of Kolmogorov’s Ideas in Approximation Theory.” The Annals of Statistics 18 (3): 999–1010. https://doi.org/10.1214/aos/1176347736.

Ibragimov, I. 2001. “Estimation of Analytic Functions.” In Institute of Mathematical Statistics Lecture Notes - Monograph Series, 359–83. Beachwood, OH: Institute of Mathematical Statistics. http://projecteuclid.org/euclid.lnms/1215090078.

Koenker, Roger, and Ivan Mizera. 2006. “Density Estimation by Total Variation Regularization.” Advances in Statistical Modeling and Inference, 613–34. http://ysidro.econ.uiuc.edu/~roger/research/densiles/Doksum.pdf.

Kooperberg, Charles, and Charles J. Stone. 1992. “Logspline Density Estimation for Censored Data.” Journal of Computational and Graphical Statistics 1 (4, 4): 301–28. https://doi.org/10.2307/1390786.

———. 1991. “A Study of Logspline Density Estimation.” Computational Statistics & Data Analysis 12 (3, 3): 327–47. https://doi.org/10.1016/0167-9473(91)90115-I.

Lee, Holden, Rong Ge, Tengyu Ma, Andrej Risteski, and Sanjeev Arora. 2017. “On the Ability of Neural Nets to Express Distributions.” In. http://arxiv.org/abs/1702.07028.

Lenk, Peter J. 2003. “Bayesian Semiparametric Density Estimation and Model Verification Using a Logistic–Gaussian Process.” Journal of Computational and Graphical Statistics 12 (3): 548–65. https://doi.org/10.1198/1061860032021.

Li, Ker-Chau. 1987. “Asymptotic Optimality for $C_p, C_L$, Cross-Validation and Generalized Cross-Validation: Discrete Index Set.” The Annals of Statistics 15 (3): 958–75. https://doi.org/10.1214/aos/1176350486.

Lieshout, Marie-Colette N. M. van. 2011. “On Estimation of the Intensity Function of a Point Process.” Methodology and Computing in Applied Probability 14 (3): 567–78. https://doi.org/10.1007/s11009-011-9244-9.

Norets, Andriy. 2010. “Approximation of Conditional Densities by Smooth Mixtures of Regressions.” The Annals of Statistics 38 (3): 1733–66. https://doi.org/10.1214/09-AOS765.

Panaretos, Victor M., and Yoav Zemel. 2016. “Separation of Amplitude and Phase Variation in Point Processes.” The Annals of Statistics 44 (2): 771–812. https://doi.org/10.1214/15-AOS1387.

Papangelou, F. 1974. “The Conditional Intensity of General Point Processes and an Application to Line Processes.” Zeitschrift Für Wahrscheinlichkeitstheorie Und Verwandte Gebiete 28 (3): 207–26. https://doi.org/10.1007/BF00533242.

Reynaud-Bouret, Patricia, Vincent Rivoirard, and Christine Tuleau-Malot. 2011. “Adaptive Density Estimation: A Curse of Support?” Journal of Statistical Planning and Inference 141 (1): 115–39. https://doi.org/10.1016/j.jspi.2010.05.017.

Sardy, Sylvain, and Paul Tseng. 2010. “Density Estimation by Total Variation Penalized Likelihood Driven by the Sparsity ℓ1 Information Criterion.” Scandinavian Journal of Statistics 37 (2): 321–37. https://doi.org/10.1111/j.1467-9469.2009.00672.x.

Schoenberg, Frederic Paik. 2005. “Consistent Parametric Estimation of the Intensity of a Spatial–Temporal Point Process.” Journal of Statistical Planning and Inference 128 (1): 79–93. https://doi.org/10.1016/j.jspi.2003.09.027.

Schuster, Ingmar, Mattes Mollenhauer, Stefan Klus, and Krikamol Muandet. 2019. “Kernel Conditional Density Operators,” May. http://arxiv.org/abs/1905.11255.

Shimazaki, Hideaki, and Shigeru Shinomoto. 2010. “Kernel Bandwidth Optimization in Spike Rate Estimation.” Journal of Computational Neuroscience 29 (1-2): 171–82. https://doi.org/10.1007/s10827-009-0180-4.

Sriperumbudur, Bharath, Kenji Fukumizu, Arthur Gretton, Aapo Hyvärinen, and Revant Kumar. 2017. “Density Estimation in Infinite Dimensional Exponential Families.” Journal of Machine Learning Research 18 (57). http://arxiv.org/abs/1312.3516.

Sugiyama, Masashi, Ichiro Takeuchi, Taiji Suzuki, Takafumi Kanamori, Hirotaka Hachiya, and Daisuke Okanohara. 2010. “Conditional Density Estimation via Least-Squares Density Ratio Estimation.” In International Conference on Artificial Intelligence and Statistics, 781–88. http://machinelearning.wustl.edu/mlpapers/paper_files/AISTATS2010_SugiyamaTSKHO10.pdf.

Tabak, E. G., and Cristina V. Turner. 2013. “A Family of Nonparametric Density Estimation Algorithms.” Communications on Pure and Applied Mathematics 66 (2): 145–64. https://doi.org/10.1002/cpa.21423.

Tabak, Esteban G., and Eric Vanden-Eijnden. 2010. “Density Estimation by Dual Ascent of the Log-Likelihood.” Communications in Mathematical Sciences 8 (1): 217–33. https://projecteuclid.org/euclid.cms/1266935020.

Tokdar, Surya T. 2007. “Towards a Faster Implementation of Density Estimation with Logistic Gaussian Process Priors.” Journal of Computational and Graphical Statistics 16 (3): 633–55. https://doi.org/10.1198/106186007X210206.

Zeevi, Assaf J., and Ronny Meir. 1997. “Density Estimation Through Convex Combinations of Densities: Approximation and Estimation Bounds.” Neural Networks: The Official Journal of the International Neural Network Society 10 (1): 99–109. https://doi.org/10.1016/S0893-6080(96)00037-8.