- TODO
- Classic linear models
- Generalised linear models
- Hierarchical generalised linear models
- Generalised additive models
- Generalised additive models for location, scale and shape
- Vector generalised additive models
- Vector generalised hierarchical additive models for location, scale and shape
- Generalised estimating equations
- GGLLM
- References

Using the machinery of linear regression to predict in
somewhat more general regressions, using least-squares or
quasi-likelihood approaches.
This means you are still doing something *like* familiar linear regression,
but outside the setting of e.g. linear response and possibly homoskedastic
Gaussian noise.

## TODO

Discover the magical powers of log-concavity and what they enable.

## Classic linear models

Consider the original linear model. We have a (column) vector \(\mathbf{y}=[y_1,y_2,\dots,t_n]^T\) of \(n\) observations, an \(n\times p\) matrix \(\mathbf{X}\) of \(p\) covariates, where each column corresponds to a different covariate and each row to a different observation.

We assume the observations are assumed to related to the covariates by \[ \mathbf{y}=\mathbf{Xb}+\mathbf{e} \] where \(\mathbf{b}=[b_1,y_2,\dots,b_p]\) gives the parameters of the model which we donβt yet know, We call \(\mathbf{e}\) the βresidualβ vector. Legendre and Gauss pioneered the estimation of the parameters of a linear model by minimising the squared residuals, \(\mathbf{e}^T\mathbf{e}\), i.e. \[ \begin{aligned}\hat{\mathbf{b}} &=\operatorname{arg min}_\mathbf{b} (\mathbf{y}-\mathbf{Xb})^T (\mathbf{y}-\mathbf{Xb})\\ &=\operatorname{arg min}_\mathbf{b} \|\mathbf{y}-\mathbf{Xb}\|_2\\ &=\mathbf{X}^+\mathbf{y} \end{aligned} \] where we find the pseudo inverse \(\mathbf{X}^+\) using a numerical solver of some kind, using one of many carefully optimised methods that exists for least squares.

So far there is no statistical argument, merely function approximation.

However it turns out that if you assume that the \(\mathbf{e}_i\) are distributed randomly and independently i.i.d. errors in the observations (or at least indepenedent with constant variance), then there is also a statistical justification for this idea;

π more exposition of these. Linkage to Maximum likelihood.

## Generalised linear models

The original extension. Kenneth Tayβs explanation is simple and efficient.

To learn:

- When we can do this? e.g. Must the response be from an exponential family for really real? What happens if not?
- Does anything funky happen with regularisation? what?
- model selection theory

### Response distribution

π What constraints do we have here?

### Linear Predictor

π

### Link function

An invertible (monotonic?) function relating the mean of the linear predictor and the mean of the response distribution.

### Quaslilikelihood

A generalisation of likelihood of use in some tricky corners of GLMs. (Wedderburn 1976) used it to provide a unified GLM/ML rationale. I donβt yet understand it. Heyde says (Heyde 1997):

Historically there are two principal themes in statistical parameter estimation theory

It is now possible to unify these approaches under the general description of quasi-likelihood and to develop the theory of parameter estimation in a very general setting. [β¦]

It turns out that the theory needs to be developed in terms of estimating functions (functions of both the data and the parameter) rather than the estimators themselves. Thus, our focus will be on functions that have the value of the parameter as a root rather than the parameter itself.

## Hierarchical generalised linear models

GLM + hierarchical model = HGLM.

## Generalised additive models

Generalised generalised linear models. Semiparametric simultaneous discovery of some non-linear predictors and their response curve under the assumption that the interaction is additive in the transformed predictors \[ g(\operatorname{E}(Y))=\beta_0 + f_1(x_1) + f_2(x_2)+ \cdots + f_m(x_m). \]

These have now also been generalised in the obvious way.

## Generalised additive models for location, scale and shape

Folding GARCH and other regression models into GAMs.

GAMLSS is a modern distribution-based approach to (semiparametric) regression models, where all the parameters of the assumed distribution for the response can be modelled as additive functions of the explanatory variables

## Vector generalised hierarchical additive models for location, scale and shape

Exercise for the student.

## Generalised estimating equations

π

But see Johnny Hong and Kellie Ottoboni. Is this just the quasi-likelihood thing again?

## GGLLM

GeneralizedΒ² LinearΒ² models (Gordon 2002) unify GLMs with non-linear matrix factorisations.

## References

*IEEE Signal Processing Magazine*23 (2): 154β61.

*arXiv:1708.03395 [Cond-Mat, Physics:math-Ph]*, August.

*Trends in Ecology & Evolution*24 (3): 127β35.

*arXiv:1609.06764 [Stat]*, September.

*Journal of the American Statistical Association*88 (421): 9β25.

*Annals of Statistics*17 (2): 453β510.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*68 (2): 259β80.

*Journal of Time Series Analysis*, January, n/aβ.

*arXiv:1606.08650 [Stat]*, June.

*Journal of Statistical Software*33 (1): 1β22.

*Proceedings of the 15th International Conference on Neural Information Processing Systems*, 593β600. NIPSβ02. Cambridge, MA, USA: MIT Press.

*arXiv:1003.0848 [Math, Stat]*, March.

*Generalized Additive Models*. Vol. 43. CRC Press.

*Quasi-likelihood and its application a general approach to optimal parameter estimation*. New York: Springer.

*Generalized linear models with random effects*. Monographs on statistics and applied probability 106. Boca Raton, FL: Chapman & Hall/CRC.

*Journal of the Royal Statistical Society: Series C (Applied Statistics)*61 (3): 403β27.

*European Journal of Operational Research*16 (3): 285β92.

*Encyclopedia of Statistical Sciences*. John Wiley & Sons, Inc.

*Journal of the Royal Statistical Society. Series A (General)*135 (3): 370β84.

*International Journal of Financial Engineering and Risk Management*1 (1): 6β19.

*Journal of Statistical Software*23 (7): 1β46.

*Flexible Regression and Smoothing: Using GAMLSS in R*.

*Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, 3402β10. Curran Associates, Inc.

*Fisheries Research*, Models in Fisheries Research: GLMs, GAMS and GLMMs, 70 (2β3): 319β37.

*Biometrika*61 (3): 439β47.

*Biometrika*63 (1): 27β32.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*70 (3): 495β518.

*Statistics*48 (4): 778β86.

*Vector Generalized Linear and Additive Models*. Springer Series in Statistics. New York, NY: Springer New York.

*2007 5th International Symposium on Image and Signal Processing and Analysis*, 435β40. Istanbul, Turkey: IEEE.

## No comments yet. Why not leave one?