# Empirical estimation of information

This is an empirical probability metric estimation problem, with especially cruel error properties. There are a few different versions of this problem corresponding to various different information: Mutual information between two variables, KL divergence between two distributions, information of one variable; discrete variables, continuous variable… In the mutual information case this is an independence test.

Say I would like to know the mutual information of the laws of processes generating two streams $$X,Y$$ of observations, with weak assumptions on the laws of the generation process. Better, suppose further that each observation from each process is i.i.d. In the case that they have a continuous state space and joint densities $$p_{X,Y}$$, marginal densities $$p_{X},p_{Y},$$

$\operatorname {I} (X;Y)=\int _{\mathcal {Y}}\int _{\mathcal {X}}{p_{X,Y}X,Y\log {\left({\frac {p_{X,Y}X,Y}{p_{X}(x)\,p_{Y}(y)}}\right)}}\;dx\,dy$

Information is harder than the many metrics, because observations with low frequency have high influence on that value but are by definition rarely observed. It is easy to get a uselessly biased — or even inconsistent — estimator, especially in the nonparametric case.

## Histogram estimator

The obvious one for discrete data. For continuous data where the histogram bins must also be learned, this method is highly sensitive and can be inconsistent if you don’t do it right .

🏗

## Monte Carlo parametric

One case you might want to estimate this value which is one where there is no nonparametric estimation problem per se but the integral to solve it is inconvenient. In which case, we might use a Monte Carlo method.

John Schulmann explicates a good trick for estimating KL divergence in the case that you can simulate from $$x_i\sim q$$ and calculate $$p(x)$$ and $$q(x_i),$$ The following estimator is good despite looking unrelated:

\begin{aligned} KL[q, p] &= \int_x q(x) \log \frac{q(x)}{p(x)} \mathrm{d}x\\ &= E_{ x \sim q}\left[\log \frac{q(x)}{p(x)} \right]\\ &\approx \frac1N \sum_{i=1}^N \frac12(\log ⁡p(x)−\log ⁡q(x))^2 \end{aligned}

He also introduced a simple debiased one that does even better. The mechanics are interesting. (If you actually want a mutual information this notionally calculates it if we find the KL divergence between joint and product densities; But that is not totally trivial I shall concede.)

## References

Akaike, Hirotogu. 1973. In Proceeding of the Second International Symposium on Information Theory, edited by Petrovand F Caski, 199–213. Budapest: Akademiai Kiado.
Amigó, José M, Janusz Szczepański, Elek Wajnryb, and Maria V Sanchez-Vives. 2004. Neural Computation 16 (4): 717–36.
Crumiller, Marshall, Bruce Knight, Yunguo Yu, and Ehud Kaplan. 2011. Frontiers in Neuroscience 5: 90.
Gao, Shuyang, Greg Ver Steeg, and Aram Galstyan. 2015. In Journal of Machine Learning Research, 277–86.
Goodman, Joshua. 2002. arXiv:cond-Mat/0202383, February.
Grassberger, Peter. 1988. Physics Letters A 128 (6–7): 369–73.
Haslinger, Robert, Kristina Lisa Klinkner, and Cosma Rohilla Shalizi. 2010. Neural Computation 22 (1): 121–57.
Hausser, Jean, and Korbinian Strimmer. 2009. “Entropy Inference and the James-Stein Estimator, with Application to Nonlinear Gene Association Networks.” Journal of Machine Learning Research 10: 1469.
Kandasamy, Kirthevasan, Akshay Krishnamurthy, Barnabas Poczos, Larry Wasserman, and James M. Robins. 2014. arXiv:1411.4342 [Stat], November.
Kraskov, Alexander, Harald Stögbauer, and Peter Grassberger. 2004. Physical Review E 69: 066138.
Marzen, S. E., and J. P. Crutchfield. 2020. arXiv:2005.03750 [Cond-Mat, Physics:nlin, Stat], May.
Moon, Kevin R., and Alfred O. Hero III. 2014. In NIPS 2014.
Nemenman, Ilya, William Bialek, and Rob de Ruyter van Steveninck. 2004. Physical Review E 69 (5): 056111.
Nemenman, Ilya, Fariel Shafee, and William Bialek. 2001. In arXiv:physics/0108025.
Paninski, Liam. 2003. Neural Computation 15 (6): 1191–1253.
Roulston, Mark S. 1999. Physica D: Nonlinear Phenomena 125 (3-4): 285–94.
Schürmann, Thomas. 2015. Neural Computation 27 (10): 2097–2106.
Shibata, Ritei. 1997. “Bootstrap Estimate of Kullback-Leibler Information for Model Selection.” Statistica Sinica 7: 375–94.
Song, Jiaming, and Stefano Ermon. 2020. In, 18.
Taylor, Samuel F, Naftali Tishby, and William Bialek. 2007. “Information and Fitness.” Arxiv Preprint arXiv:0712.4382.
Wolf, David R., and David H. Wolpert. 1994. arXiv:comp-Gas/9403002, March.
Wolpert, David H., and David R. Wolf. 1994. arXiv:comp-Gas/9403001, March.
Zhang, Zhiyi, and Michael Grabchak. 2014. Neural Computation 26 (11): 2570–93.

### No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.