Log-concave distributions

The probabilist’s equivalent to convex optimisation objectives

2017-08-28 — 2024-12-01

Bayes

estimator distribution

Markov processes

Monte Carlo

optimization

probabilistic algorithms

probability

stochastic processes

Suspiciously similar content

A log-concave density is a probability density function $f : R^{d} \to [0, \infty)$ that satisfies the condition of log-concavity, meaning that the logarithm of $f$ is a concave function. Formally, a density $f$ is log-concave if for all $x, y \in R^{d}$ and for all $λ \in [0, 1]$ ,

$f (λ x + (1 - λ) y) \geq f (x)^{λ} f (y)^{1 - λ} .$

Equivalently, $f$ is log-concave if $\log f$ is a concave function.

Log-concave distributions pop up a lot in the literature because it is easy to prove things about them and they feel general. I do not use the terminology much myself because they are too limited for most purposes I have in practice, but this is background terminology we all need to have.

1 Examples

1.1 Gaussian Distribution

The density $f (x) = \frac{1}{\sqrt{2 π} σ} \exp (- \frac{(x - μ)^{2}}{2 σ^{2}})$ is log-concave because $\log f (x)$ is a concave quadratic function.

1.2 Exponential Distribution

$f (x) = λ e^{- λ x}$ for $x \geq 0$ is log-concave since $\log f (x) = \log λ - λ x$ is linear (hence concave).

1.3 Uniform Distribution

On any convex set, the uniform density is log-concave because $\log f$ is constant (and thus concave).

2 Useful Properties

2.1 Closure

The class of log-concave densities is closed under affine transformations, marginalization, and convolution.

2.2 Unimodality

Log-concave densities are unimodal, which simplifies statistical inference and optimization tasks. But also tells us that they are not that flexible.

Relatedly,

2.3 Maximum Likelihood Estimation

Estimating a log-concave density via MLE is a well-posed problem with unique solutions under mild conditions.

2.4 Concentration Inequalities

Heaps of the standard concentration inequalities are specifically for log-concave distributions.

2.5 Connection to Convex Optimization

See

Holden Lee and Andrej Risteski
the connection between log-concavity and convex optimisation.
“a Markov Chain reminiscent of noisy gradient descent”.

3 Langevin MCMC

Rob Salomone explains this well; see Hodgkinson, Salomone, and Roosta (2019).

4 Connection to Exponential Families

I had to write this out for myself.

Many common exponential family distributions (Gaussian, Poisson) are log-concave, but there are exceptions within the exponential family that are not globally log-concave.

Recall, an exponential family of distributions has densities (or mass functions) of the form

$f (x ∣ θ) = h (x) \exp (η (θ)^{⊤} T (x) - A (θ)),$

where: - $h (x)$ is the base measure, - $η (θ)$ is the natural parameter, - $T (x)$ is the sufficient statistic, - $A (θ)$ is the log-partition function.

What stops log-convexity from entering by either the $T$ term or the $h$ term? Nothing, AFAICS.

4.1 Gamma Distribution

The gamma distribution has the density

$f (x ∣ α, β) = \frac{β^{α}}{Γ (α)} x^{α - 1} e^{- β x}, x > 0,$

where $α > 0$ (shape parameter) and $β > 0$ (rate parameter).

When $α \geq 1$ : The function $\log f (x)$ is concave in $x$ , making the density log-concave.
When $0 < α < 1$ : The function $\log f (x)$ is not concave in $x$ because the term $x^{α - 1}$ introduces convexity

4.2 Beta Distribution

The beta distribution has the density

$f (x ∣ α, β) = \frac{Γ (α + β)}{Γ (α) Γ (β)} x^{α - 1} (1 - x)^{β - 1}, 0 < x < 1,$

where $α, β > 0$ .

When $α, β \geq 1$ : The density is log-concave.
When either $α < 1$ or $β < 1$ : The density is not log-concave because $\log f (x)$ is not a concave function of $x$ .

5 References

Bagnoli, and Bergstrom. 1989. “Log-Concave Probability and Its Applications.”

Brosse, Moulines, and Durmus. 2018. “The Promises and Pitfalls of Stochastic Gradient Langevin Dynamics.” In Proceedings of the 32nd International Conference on Neural Information Processing Systems. NIPS’18.

Castellani, and Cavagna. 2005. “Spin-Glass Theory for Pedestrians.” Journal of Statistical Mechanics: Theory and Experiment.

Domke. 2017. “A Divergence Bound for Hybrids of MCMC and Variational Inference and an Application to Langevin Dynamics and SGVI.” In PMLR.

Duane, Kennedy, Pendleton, et al. 1987. “Hybrid Monte Carlo.” Physics Letters B.

Durmus, and Moulines. 2016. “High-Dimensional Bayesian Inference via the Unadjusted Langevin Algorithm.” arXiv:1605.01559 [Math, Stat].

Garbuno-Inigo, Hoffmann, Li, et al. 2020. “Interacting Langevin Diffusions: Gradient Structure and Ensemble Kalman Sampler.” SIAM Journal on Applied Dynamical Systems.

Ge, Lee, and Risteski. 2020. “Simulated Tempering Langevin Monte Carlo II: An Improved Proof Using Soft Markov Chain Decomposition.” arXiv:1812.00793 [Cs, Math, Stat].

Girolami, and Calderhead. 2011. “Riemann Manifold Langevin and Hamiltonian Monte Carlo Methods.” Journal of the Royal Statistical Society: Series B (Statistical Methodology).

Hodgkinson, Salomone, and Roosta. 2019. “Implicit Langevin Algorithms for Sampling From Log-Concave Densities.” arXiv:1903.12322 [Cs, Stat].

Mandt, Hoffman, and Blei. 2017. “Stochastic Gradient Descent as Approximate Bayesian Inference.” JMLR.

Mangoubi, and Smith. 2017. “Rapid Mixing of Hamiltonian Monte Carlo on Strongly Log-Concave Distributions.” arXiv:1708.07114 [Math, Stat].

Norton, and Fox. 2016. “Tuning of MCMC with Langevin, Hamiltonian, and Other Stochastic Autoregressive Proposals.” arXiv:1610.00781 [Math, Stat].

Saumard, and Wellner. 2014. “Log-Concavity and Strong Log-Concavity: A Review.” arXiv:1404.5886 [Math, Stat].

Welling, and Teh. 2011. “Bayesian Learning via Stochastic Gradient Langevin Dynamics.” In Proceedings of the 28th International Conference on International Conference on Machine Learning. ICML’11.

Xifara, Sherlock, Livingstone, et al. 2014. “Langevin Diffusions and the Metropolis-Adjusted Langevin Algorithm.” Statistics & Probability Letters.