Log-concave distributions
The probabilist’s equivalent to convex optimisation objectives
August 28, 2017 — December 2, 2024
A log-concave density is a probability density function \(f: \mathbb{R}^d \to [0, \infty)\) that satisfies the condition of log-concavity, meaning that the logarithm of \(f\) is a concave function. Formally, a density \(f\) is log-concave if for all \(x, y \in \mathbb{R}^d\) and for all \(\lambda \in [0,1]\),
\[ f(\lambda x + (1 - \lambda)y) \geq f(x)^\lambda f(y)^{1 - \lambda}. \]
Equivalently, \(f\) is log-concave if \(\log f\) is a concave function.
Log-concave distributions pop up a lot in the literature because it is easy to prove things about them and they feel general. I do not use the terminology much myself because they are too limited for most purposes I have in practice, but this is background terminology we all need to have.
1 Examples
1.1 Gaussian Distribution
The density \(f(x) = \frac{1}{\sqrt{2\pi}\sigma} \exp\left( -\frac{(x - \mu)^2}{2\sigma^2} \right)\) is log-concave because \(\log f(x)\) is a concave quadratic function.
1.2 Exponential Distribution
\(f(x) = \lambda e^{-\lambda x}\) for \(x \geq 0\) is log-concave since \(\log f(x) = \log \lambda - \lambda x\) is linear (hence concave).
1.3 Uniform Distribution
On any convex set, the uniform density is log-concave because \(\log f\) is constant (and thus concave).
2 Useful Properties
2.1 Closure
The class of log-concave densities is closed under affine transformations, marginalization, and convolution.
2.2 Unimodality
Log-concave densities are unimodal, which simplifies statistical inference and optimization tasks. But also tells us that they are not that flexible.
Relatedly,
2.3 Maximum Likelihood Estimation
Estimating a log-concave density via MLE is a well-posed problem with unique solutions under mild conditions.
2.4 Concentration Inequalities
Heaps of the standard concentration inequalities are specifically for log-concave distributions.
2.5 Connection to Convex Optimization
See
- Holden Lee and Andrej Risteski
- the connection between log-concavity and convex optimisation.
- “a Markov Chain reminiscent of noisy gradient descent”.
3 Langevin MCMC
Rob Salomone explains this well; see Hodgkinson, Salomone, and Roosta (2019).
4 Connection to Exponential Families
I had to write this out for myself.
Many common exponential family distributions (Gaussian, Poisson) are log-concave, but there are exceptions within the exponential family that are not globally log-concave.
Recall, an exponential family of distributions has densities (or mass functions) of the form
\[ f(x \mid \theta) = h(x) \exp\left( \eta(\theta)^\top T(x) - A(\theta) \right), \]
where: - \(h(x)\) is the base measure, - \(\eta(\theta)\) is the natural parameter, - \(T(x)\) is the sufficient statistic, - \(A(\theta)\) is the log-partition function.
What stops log-convexity from entering by either the \(T\) term or the \(h\) term? Nothing, AFAICS.
4.1 Gamma Distribution
The gamma distribution has the density
\[ f(x \mid \alpha, \beta) = \frac{\beta^\alpha}{\Gamma(\alpha)} x^{\alpha - 1} e^{-\beta x}, \quad x > 0, \]
where \(\alpha > 0\) (shape parameter) and \(\beta > 0\) (rate parameter).
- When \(\alpha \geq 1\): The function \(\log f(x)\) is concave in \(x\), making the density log-concave.
- When \(0 < \alpha < 1\): The function \(\log f(x)\) is not concave in \(x\) because the term \(x^{\alpha - 1}\) introduces convexity
4.2 Beta Distribution
The beta distribution has the density
\[ f(x \mid \alpha, \beta) = \frac{\Gamma(\alpha + \beta)}{\Gamma(\alpha)\Gamma(\beta)} x^{\alpha - 1} (1 - x)^{\beta - 1}, \quad 0 < x < 1, \]
where \(\alpha, \beta > 0\).
- When \(\alpha, \beta \geq 1\): The density is log-concave.
- When either \(\alpha < 1\) or \(\beta < 1\): The density is not log-concave because \(\log f(x)\) is not a concave function of \(x\).