Estimating densities by considering the observations drawn from that as a point process. In one dimension, this gives us the particularly lovely trick of survival analysis, but the relations are much more general.
Consider the problem of estimating the common density of indexed i.i.d. random variables from realisations of those variables, where is a (cumulative) distribution. We assume the state is absolutely continuous with respect to the Lebesgue measure, i.e. . This implies that and that the density exists as a standard function (i.e. we do not need to consider generalised functions such as distributions to handle atoms in etc.)
Here we parameterise the density with some finite dimensional parameter vector i.e. whose value completely characterises the density; the problem of estimating the density is then the same as the one of estimating
In the method of maximum likelihood estimation, we seek to maximize the value of the empirical likelihood of the data. That is, we choose a parameter estimate to satisfy
Basis function method for density
Let’s consider the case where we try to estimate this function by constructing it from some given basis of functions so that
and We keep this simple by requiring so that they are all valid densities. Then the requirement that will imply that i.e. we are taking a convex combination of these basis densities.
Then the maximum likelihood estimator can be written
A moment’s thought reveals that this equation has no finite optimum, since it is strictly increasing in each . However, we are missing a constraint, which is that to be a well-defined probability density, it must integrate to unity, i.e. and therefore
By messing around with Lagrange multipliers to enforce that constraint we eventually find
Intensities
Consider the problem of estimating the intensity of a simple, non-interacting inhomogeneous point process on some compact from a realization , and this counting function counts the number of points that fall on a set .
The intensity is (in the simple non-interacting case — see Daley and Vere-Jones (2003) for other cases) a function such that, for any box , where and for any disjoint boxes,
After some argumentation about intensities we can find a likelihood for the observed distribution:
Say that we wish to find the inhomogeneous intensity function by the method of maximum likelihood. We allow the intensity function to be described by a parameter vector which we write , and we once again construct an estimate:
Basis function method for intensity
Now consider the case where we assume that the intensity can be written in a basis as above, so that with Then our estimate may be written We have a similar log-likelihood to the density estimation case.
Under the constraint that we have and therefore
Note that if we consider the points as a shifted density we find the result is the same as the maximum obtained by considering the points as an inhomogeneous spatial point pattern, up to an offset of , i.e.
Count regression
From the other direction, we can formulate density estimation as a count regression; For “nice” distributions this will be the same as estimating the correct Poisson intensity for every given small region of the state space (e.g. (Gu 1993; Eilers and Marx 1996)). 🏗
Probability over boxes
Consider a box in . The probability of any one falling within that box,
We know that the expected number of to fall within that box is times the probability of any one falling in that box, i.e. and thus …Where was I going with this? Something to do with linear point process estimation perhaps? 🏗
Interacting point processes
Interacting point processes have intensities too which may also be re-interpreted as densities. What kind of relations are implied between the RVs which would have this “dynamically evolving” density? Clearly not i.i.d. But useful somewhere?
References
Andersen, Borgan, Gill, et al. 1997. Statistical models based on counting processes. Springer series in statistics.
Berman, and Diggle. 1989.
“Estimating Weighted Integrals of the Second-Order Intensity of a Spatial Point Process.” Journal of the Royal Statistical Society. Series B (Methodological).
Brown, Cai, and Zhou. 2010.
“Nonparametric Regression in Exponential Families.” The Annals of Statistics.
Castellan. 2003.
“Density Estimation via Exponential Model Selection.” IEEE Transactions on Information Theory.
Cox. 1965.
“On the Estimation of the Intensity Function of a Stationary Point Process.” Journal of the Royal Statistical Society: Series B (Methodological).
Cunningham, Shenoy, and Sahani. 2008.
“Fast Gaussian Process Methods for Point Process Intensity Estimation.” In
Proceedings of the 25th International Conference on Machine Learning. ICML ’08.
Eilers, and Marx. 1996.
“Flexible Smoothing with B-Splines and Penalties.” Statistical Science.
Ellis. 1991.
“Density Estimation for Point Processes.” Stochastic Processes and Their Applications.
Giesecke, Kakavand, and Mousavi. 2008.
“Simulating Point Processes by Intensity Projection.” In
Simulation Conference, 2008. WSC 2008. Winter.
Heigold, Schlüter, and Ney. 2007.
“On the Equivalence of Gaussian HMM and Gaussian HMM-Like Hidden Conditional Random Fields.” In
Eighth Annual Conference of the International Speech Communication Association.
Kooperberg, and Stone. 1991.
“A Study of Logspline Density Estimation.” Computational Statistics & Data Analysis.
———. 1992.
“Logspline Density Estimation for Censored Data.” Journal of Computational and Graphical Statistics.
Marteau-Ferey, Bach, and Rudi. 2020.
“Non-Parametric Models for Non-Negative Functions.” In
Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS ’20.
Saul, and Lee. 2001.
“Multiplicative Updates for Classification by Mixture Models.” In
Advances in Neural Information Processing Systems.
Sha, and Saul. 2006a.
“Large Margin Hidden Markov Models for Automatic Speech Recognition.” In
Advances in Neural Information Processing Systems.
Sha, and Saul. 2006b.
“Large Margin Gaussian Mixture Modeling for Phonetic Classification and Recognition.” In
2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.
Sugiyama, Takeuchi, Suzuki, et al. 2010.
“Conditional Density Estimation via Least-Squares Density Ratio Estimation.” In
International Conference on Artificial Intelligence and Statistics.
Tüske, Tahir, Schlüter, et al. 2015.
“Integrating Gaussian Mixtures into Deep Neural Networks: Softmax Layer with Hidden Variables.” In
2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
van Lieshout. 2011.
“On Estimation of the Intensity Function of a Point Process.” Methodology and Computing in Applied Probability.
Willett, and Nowak. 2007.
“Multiscale Poisson Intensity and Density Estimation.” IEEE Transactions on Information Theory.