The classical regression set up: Your process of interest generates observations conditional on certain predictors. The observations (but not predictors) are corrupted by noise.

Hierarchical set up: There is a directed graph of interacting random processes, generating the observations you observe, and you would like to reconstruct the parameters, possibly even conditional distributions of parameters, accounting for interactions.

Known as mixed effects models, hierarchical models, nested models (careful! many definitions to that term), random coefficient models, error-in-variables models.

Directed graphical models provide the formalism for such models. When you mention graphical models, frequently the emphasis is on the independence graph itself, and rather general framings. When you mention hierarchical models it seems to be assumed that you wish to estimate parameters, or sample from posteriors, or what-have-you.

In certain cute cases (i.e. linear, homoskedastic) these problems become deconvolution. (🏗 explain what I mean here and why I bothered to say it.) See ANOVA for an important special case. More generally, we sometimes find it convenient to use hierarchical generalised linear models, which have all manner of nice properties for inference.

In the case that you have many layers of hidden variables and don’t expect any of them to correspond to a “real” state so much as simply to approximate the unknown function better, you just discovered a deep neural network, possibly even a probabilistic neural network. (Ranzato 2013) (for example) does explicitly discusses them in this way.

Thomas Wiecki wrote:

The Best Of Both Worlds: Hierarchical Linear Regression in PyMC3

Why hierarchical models are awesome, tricky, and Bayesian:

[… want to take the opportunity to make another point that is not directly related to hierarchical models but can be demonstrated quite well here. Usually when talking about the perils of Bayesian statistics we talk about priors, uncertainty, and flexibility when coding models using Probabilistic Programming. However, an even more important property is rarely mentioned because it is much harder to communicate. @rosstaylor touched on this point in his tweet:

It’s interesting that many summarize Bayes as being about priors; but real power is its focus on integrals/expectations over maxima/modes

Michael Betancourt makes a similar point when he says “Expectations are the only thing that make sense.”

But what’s wrong with maxima/modes? Aren’t those really close to the posterior mean (i.e. the expectation)? Unfortunately, that’s only the case for the simple models we teach to build up intuitions. In complex models, like the hierarchical one, the MAP can be far away and not be interesting or meaningful at all. […]

This strong divergence of the MAP and the Posterior Mean does not only happen in hierarchical models but also in high dimensional ones, where our intuitions from low-dimensional spaces gets twisted in serious ways. …

[…] Final disclaimer: This might provide the impression that this is a property of being in a Bayesian framework, which is not true. Technically, we can talk about Expectations vs Modes irrespective of that. Bayesian statistics just happens to provide a very intuitive and flexible framework for expressing and estimating these models.

Some of Andrew Gelman’s blog posts on hierarchical models provide helpful context (1, 2, 3).

## Teaching

See this nice animated demonstration.

## Cluster randomized trials

Melanie Bell, Cluster Randomized Trials

Cluster randomized trials (CRTs) are studies where groups of people, rather than individuals, are randomly allocated to intervention or control. While these type of designs can be appropriate and useful for many research settings, care must be taken to correctly design and analyze them. This talk will give an overview of cluster trials, and various methodological research projects on cluster trials that I’ve been undertaken: designing CRTs, the use of GEE with small number of clusters, handling missing data in CRTs, and analysis using mixed models.

## Implementations

Just see probabilistic programming.

## References

*Sociological Methods & Research*, June, 0049124115589052.

*Trends in Ecology & Evolution*24 (3): 127–35.

*Journal of the American Statistical Association*88 (421): 9–25.

*The R Journal*10 (1): 395–411.

*Journal of Time Series Analysis*, January, n/a–.

*arXiv:2011.07051 [Econ, Stat]*, November.

*Journal of the American Statistical Association*104 (487): 1015–28.

*Technometrics*48 (3): 432–35.

*Regression and other stories*. Cambridge, UK: Cambridge University Press.

*Journal of Educational and Behavioral Statistics*40 (5): 530–43.

*Journal of Econometrics*140 (2): 670–94.

*Computer*42 (8): 30–37.

*Biometrika*88 (4): 987–1006.

*Journal of the Royal Statistical Society: Series C (Applied Statistics)*55 (2): 139–85.

*Bernoulli*13 (3): 601–22.

*Biometrika*73 (3): 645–56.

*Mathematical Models of Social Evolution: A Guide for the Perplexed*. University Of Chicago Press.

*The Chicago Guide to Writing about Multivariate Analysis*. Second edition. Chicago Guides to Writing, Editing, and Publishing. Chicago: University of Chicago Press.

*IEEE Transactions on Pattern Analysis and Machine Intelligence*35 (9): 2206–22.

*Econometrica*18 (4): 375.

*Electronic Journal of Statistics*8 (1): 201–25.

*Journal of Ornithology*152 (2): 393–408.

*Fisheries Research*, Models in Fisheries Research: GLMs, GAMS and GLMMs, 70 (2–3): 319–37.

## No comments yet. Why not leave one?