(Discrete-measure)-valued stochastic processes

\(\renewcommand{\var}{\operatorname{Var}} \renewcommand{\dd}{\mathrm{d}} \renewcommand{\pd}{\partial} \renewcommand{\bb}[1]{\mathbb{#1}} \renewcommand{\bf}[1]{\mathbf{#1}} \renewcommand{\vv}[1]{\boldsymbol{#1}} \renewcommand{\mm}[1]{\mathrm{#1}} \renewcommand{\cc}[1]{\mathcal{#1}} \renewcommand{\oo}[1]{\operatorname{#1}} \renewcommand{\gvn}{\mid} \renewcommand{\II}{\mathbb{I}}\)

Stochastic processes indexed by time whose state is a discrete (possibly only countable) measure. Popular in, for example, mathematical models of alleles in biological evolution.

Or, to be starker: evolving categorical distributions.

Population genetics keywords, in approximate order of generality and chronology: Fisher-Wright diffusion, Moran process, Viot-Fleming process. Obviously there are many other processes matching this broad description.

For continuous state see measure processes.

Discrete univariate state

In the simplest case, imagine that we wish to construct a process that is a distribution over \(\{0,1\},\) e.g. for some process \(\{P_t\}_t, P_t\in[0,1] \forall t\) we could imagine that this provides a probability for Bernoulli observation likelihood. We have a lot of candidates for \(P_t\). Here are some examples:

  1. Take two Gamma processes, \(G_t^1, G_t^2\), we could take \(P_t:=\frac{G_t^1}{G_t^1+ G_t^2}\). This would be a marginally Beta distribution.
  2. Take any process \(\{B_t\}_t\) which has instantaneous values \(B_t\in\mathbb{R},\) e.g. a Wiener process or Ornstein-Uhlenbeck process. Now \(\sigma(B_t)\) is the desired process, for \(\sigma:\mathbb{R}\to[0,1].\)

Both these could be generalised in various ways to be Dirichlet distributions of a desired dimensionality.

Option (1) is the most interesting to me for various reasons. We can come back to (2) later with Shakir Mohammed’s classic post on Tricks with Sticks.

Let us look at a basic case, cribbing shamelessly from the Gamma processes notebook:

# generate a stationary thinned autoregressive Gamma series
gamp = function(T, alpha, lambda, rho) {
  g = rgamma(1, alpha, rate=lambda)
  b = rbeta(T, alpha*rho, alpha*(1-rho))
  zeta = rgamma(T, alpha*(1-rho), rate=lambda)
  gs = numeric(T)
  for (i in 1:T) {
    g = b[i] * g + zeta[i]
    gs[i] = g
betap = function(T, alpha, lambda, rho) {
    g1 = gamp(T, head(alpha,1), head(lambda,1), head(rho,1))
    g2 = gamp(T, tail(alpha,1), tail(lambda,1), tail(rho,1))
    g1 / (g1+g2)
T = 10000
ts = 1:T
plot(ts, betap(T, 1.0, 0.1, 0.999),
    type = "l", col = 2,
    ylim = c(0, 1), ylab="", xlab = "time")
lines(ts, betap(T, 10, 1.0, 0.999),
    type = "l", col = 3)
lines(ts, betap(T, 100, 10.0, 0.999),
    type = "l", col = 4)
legend("bottomright", c("lambda=0.1", "lambda=1", "lambda=10"),
    lty = 1, col = 2:4)
Some stationary Beta processes

Figure 1: Some stationary Beta processes

Note that we are not actually required to keep the parameters the same for both latent Gamma processes; we can have different \(\alpha\) and introduce a bias away from \(\mathbb{E}P_t=0.5\):

plot(ts, betap(T, c(1, 1), 1.0, 0.999),
    type = "l", col = 2,
    ylim = c(0, 1), ylab="", xlab = "time")
lines(ts, betap(T, c(1, 10), 1.0, 0.999),
    type = "l", col = 3)
lines(ts, betap(T, c(10, 1), 1.0, 0.999),
    type = "l", col = 4)
legend("bottomright", c("alpha1=1,alpha2=1", "alpha1=1,alpha2=10", "alpha1=10,alpha2=1"),
    lty = 1, col = 2:4)
Some stationary Beta processes with asymmetric latents

Figure 2: Some stationary Beta processes with asymmetric latents

If the \(\lambda\) parameter is not the same between them, then it is not longer marginally Beta, but it is still in the correct \([0,1]\) range to constitute a valid probability. We can get some interesting bias and variance structures this way:

plot(ts, betap(T, c(1, 1), c(1,1), 0.999),
    type = "l", col = 2,
    ylim = c(0, 1), ylab="", xlab = "time")
lines(ts, betap(T, c(1,3), c(1, 1/3), 0.999),
    type = "l", col = 3)
lines(ts, betap(T, c(3, 1), c(1/3, 1), 0.999),
    type = "l", col = 4)
legend("bottomright", c("lambda1=1,lambda2=1","lambda1=1,lambda2=3", "lambda1=3,lambda2=1"),
    lty = 1, col = 2:4)
Some stationary  processes on the unit interval

Figure 3: Some stationary processes on the unit interval

You get the idea.

It would be nice if we had a formulation for which it were easy to marginalise out the implied latent processes and various other statistical operations of use. Can we get there using Gamma-Dirichlet relations?

One obvious recipe that might work in discrete time is

  1. Given \(P_{t-1}\sim\operatorname{Beta}(\alpha,K-\alpha),\) draw some \(G_t\sim\operatorname{Gamma}(K, \lambda)\). Now \(G_t^1:=P_{t-1}G_t\sim \operatorname{Gamma}(\alpha, \lambda)\) and \(G_t^1:=(1-P_{t-1})G_t\sim \operatorname{Gamma}(K-\alpha, \lambda)\) independently (TODO: check this).
  2. We can perturb either or both of those Gamma variates to another one by thinning and addition, as in thinned autoregressive Gamma processes with some correlation param \(\rho\), giving us \(G_t^1{}',G_t^2{}'\).
  3. Finally, \(P_t=\frac{G_t^1{}'}{G_t^1{}'+G_t^2{}'}\sim\operatorname{Beta}(\alpha,K-\alpha),\) i.e. it has the same distribution.

This circuitous method has the virtue that there are no dependencies on unobserved latent dependent time series, and the process thus depends only on the previous timestep, i.e. it is \(\{P_{t-1}\}\)-measurable, whereas process (1) is \(\{G_{t-1}^1,G_{t-1}^2\}\)-measurable. However, that generation process is weird and lengthy.

Exercise: What is the density of \(P_t|P_{t-1}\) in this formulation?

Discrete multivariate state

How do we generalise the discrete model to multivariate observations? Imagine our observations are \(d\)-dimensional lattice-structured. This looks contrived to the statistician, but is bread-and-butter to computer scientists, who deal with binary vectors all the time. In the maximally general case relationships between variables can be… arbitrary of course.

But there are models that assume some structure and I reckon some of those might be useful for inference.

Classics include

  1. Discrete state graphical models which can exploit knowledge of relationships, and which have a well-developed and reliable belief propagation theory.
  2. Spin glasses-type stochastic physics models, which impose an undirected graphical structure on the variates. Classically this is a lattice structure (the Ising model) but I many other structures can be found now, including random structures. Spin glasses can be computationally impractical, as with many undirected graphical model problems. Physicists have spent a lot of sweat and blood developing rather complicated vocabulary for these models, which is similar but not quite the same as the vcabulary that statisticians use in inference over such models.
  3. Network models, stochastic block models etc, would probably fit naturally in here.

I would like something that uses spatial adjacency in a non-parametric way and which has convenient inference. Are there models in this class, which make the β€œright” assumptions to be tractable for use in inference?

Discrete stick breaking process

We can break up a discrete vector of length \(d\) into regions of equal mass, and then use a stick-breaking process to sample those regions; the stick breaking distribution in this case would naturally be, say, Binomial. We can also imagine Indian Buffet/Chinese restaurant style constructions. In fact, maybe that would be even easier.

Does this get us anywhere? What dependency could we naturally introduce here? One candidate would be to randomly allocate a small number of cells in each partition to random other partitions at each timestep.


Asselah, Amine, Pablo A. Ferrari, and Pablo Groisman. 2011. β€œQuasistationary Distributions and Fleming-Viot Processes in Finite Spaces.” Journal of Applied Probability 48 (2): 322–32.
Donnelly, Peter, and Thomas G. Kurtz. 1996. β€œA Countable Representation of the Fleming-Viot Measure-Valued Diffusion.” The Annals of Probability 24 (2): 698–742.
Dunson, David B, and Ju-Hyun Park. 2008. β€œKernel Stick-Breaking Processes.” Biometrika 95 (2): 307–23.
Ethier, S. N., and R. C. Griffiths. 1993. β€œThe Transition Function of a Fleming-Viot Process.” The Annals of Probability 21 (3): 1571–90.
Ethier, S. N., and Thomas G. Kurtz. 1993. β€œFleming–Viot Processes in Population Genetics.” SIAM Journal on Control and Optimization 31 (2): 345–86.
Fleming, Wendell H, and Michel Viot. 1979. β€œSome Measure-Valued Markov Processes in Population Genetics Theory.” Indiana University Mathematics Journal 28 (5): 817–43.
Griffin, J. E., and M. F. J. Steel. 2006. β€œOrder-Based Dependent Dirichlet Processes.” Journal of the American Statistical Association 101 (473): 179–94.
Ishwaran, Hemant, and Lancelot F James. 2001. β€œGibbs Sampling Methods for Stick-Breaking Priors.” Journal of the American Statistical Association 96 (453): 161–73.
Konno, N., and T. Shiga. 1988. β€œStochastic Partial Differential Equations for Some Measure-Valued Diffusions.” Probability Theory and Related Fields 79 (2): 201–25.
Marzen, S. E., and J. P. Crutchfield. 2020. β€œInference, Prediction, and Entropy-Rate Estimation of Continuous-Time, Discrete-Event Processes.” arXiv:2005.03750 [Cond-Mat, Physics:nlin, Stat], May.
Moran, P. a. P. 1958. β€œRandom Processes in Genetics.” Mathematical Proceedings of the Cambridge Philosophical Society 54 (1): 60–71.
Nowak, Martin A. 2006. Evolutionary Dynamics: Exploring the Equations of Life. Belknap Press of Harvard University Press.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.