Using the machinery of linear regression to predict in
somewhat more general regressions, using least-squares or
quasi-likelihood approaches.
This means you are still doing something *like* familiar linear regression,
but outside the setting of e.g. linear response and possibly homoskedastic
Gaussian noise.

## TODO

Discover the magical powers of log-concavity and what they enable.

## Classic linear models

Consider the original linear model. We have a (column) vector \(\mathbf{y}=[y_1,y_2,\dots,t_n]^T\) of \(n\) observations, an \(n\times p\) matrix \(\mathbf{X}\) of \(p\) covariates, where each column corresponds to a different covariate and each row to a different observation.

We assume the observations are assumed to related to the covariates by

\[ \mathbf{y}=\mathbf{Xb}+\mathbf{e} \]

where \(\mathbf{b}=[b_1,y_2,\dots,b_p]\) gives the parameters of the model which we don’t yet know, We call \(\mathbf{e}\) the “residual” vector. Legendre and Gauss pioneered the estimation of the parameters of a linear model by minimising the squared residuals, \(\mathbf{e}^T\mathbf{e}\), i.e.

\[ \begin{aligned}\hat{\mathbf{b}} &=\operatorname{arg min}_\mathbf{b} (\mathbf{y}-\mathbf{Xb})^T (\mathbf{y}-\mathbf{Xb})\\ &=\operatorname{arg min}_\mathbf{b} \|\mathbf{y}-\mathbf{Xb}\|_2\\ &=\mathbf{X}^+\mathbf{y} \end{aligned} \]

where we find the pseudo inverse \(\mathbf{X}^+\) using a numerical solver of some kind, using one of many carefully optimised methods that exists for least squares.

So far there is no statistical argument, merely function approximation.

However it turns out that if you assume that the \(\mathbf{e}_i\) are distributed randomly and independently i.i.d. errors in the observations (or at least indepenedent with constant variance), then there is also a statistical justification for this idea;

🏗 more exposition of these. Linkage to Maximum likelihood.

## Generalised linear models

The original extension. 🏗 explain.

To learn:

- When we can do this? e.g. Must the response be from an exponential family for really real? What happens if not?
- Does anything funky happen with regularisation? what?
- When you combine all these fancy GLM extensions, how do you work out if your parameters are identifiable?
- non-monotonic relations between predictors - how does one handle these?
- model selection?

### Response distribution

🏗 What constraints do we have here?

### Linear Predictor

🏗

### Link function

An invertible (monotonic?) function relating the mean of the linear predictor and the mean of the response distribution.

### Quaslilikelihood

An generalisation of likelihood of use in some tricky corners of GLMs. Wedd74 used it to provide a unified GLM/ML rationale.

I don’t yet understand it.

Heyde says (Heyd97):

Historically there are two principal themes in statistical parameter estimation theory

It is now possible to unify these approaches under the general description of quasi-likelihood and to develop the theory of parameter estimation in a very general setting.

…It turns out that the theory needs to be developed in terms of estimating functions (functions of both the data and the parameter) rather than the estimators themselves. Thus, our focus will be on functions that have the value of the parameter as a root rather than the parameter itself.

## Hierarchical generalised linear models

GLM + hierarchical model = HGLM.

## Generalised additive models

Generalised generalised linear models.

Semiparametric simultaneous discovery of some non-linear predictors and their response curve under the assumption that the interaction is additive in the transformed predictors

\[ g(\operatorname{E}(Y))=\beta_0 + f_1(x_1) + f_2(x_2)+ \cdots + f_m(x_m). \]

These have now also been generalised in the obvious way.

## Generalised additive models for location, scale and shape

Folding GARCH and other regession models into GAMs.

GAMLSS is a modern distribution-based approach to (semiparametric) regression models, where all the parameters of the assumed distribution for the response can be modelled as additive functions of the explanatory variables

## Generalised hierarchical additive models for location, scale and shape

Exercise for the student.

Atal, B. S. 2006. “The History of Linear Prediction.” *IEEE Signal Processing Magazine* 23 (2): 154–61. https://doi.org/10.1109/MSP.2006.1598091.

Barbier, Jean, Florent Krzakala, Nicolas Macris, Léo Miolane, and Lenka Zdeborová. 2017. “Phase Transitions, Optimal Errors and Optimality of Message-Passing in Generalized Linear Models,” August. http://arxiv.org/abs/1708.03395.

Bolker, Benjamin M., Mollie E. Brooks, Connie J. Clark, Shane W. Geange, John R. Poulsen, M. Henry H. Stevens, and Jada-Simone S. White. 2009. “Generalized Linear Mixed Models: A Practical Guide for Ecology and Evolution.” *Trends in Ecology & Evolution* 24 (3): 127–35. https://doi.org/10.1016/j.tree.2008.10.008.

Boyd, Nicholas, Trevor Hastie, Stephen Boyd, Benjamin Recht, and Michael Jordan. 2016. “Saturating Splines and Feature Selection,” September. http://arxiv.org/abs/1609.06764.

Breslow, N. E., and D. G. Clayton. 1993. “Approximate Inference in Generalized Linear Mixed Models.” *Journal of the American Statistical Association* 88 (421): 9–25. https://doi.org/10.2307/2290687.

Buja, Andreas, Trevor Hastie, and Robert Tibshirani. 1989. “Linear Smoothers and Additive Models.” *The Annals of Statistics* 17 (2): 453–510.

Currie, I. D., M. Durban, and P. H. C. Eilers. 2006. “Generalized Linear Array Models with Applications to Multidimensional Smoothing.” *Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 68 (2): 259–80. https://doi.org/10.1111/j.1467-9868.2006.00543.x.

Dimitrios Stasinopoulos, Robert Anthony Rigby, Gillian Heller, Vlasios Voudouris, and Fernanda De Bastiani. n.d. *Flexible Regression and Smoothing: Using GAMLSS in R*. http://www.gamlss.org/wp-content/uploads/2015/07/FlexibleRegressionAndSmoothingDraft-1.pdf.

Eichler, Michael, Rainer Dahlhaus, and Johannes Dueck. 2016. “Graphical Modeling for Multivariate Hawkes Processes with Nonparametric Link Functions.” *Journal of Time Series Analysis*, January, n/a–n/a. https://doi.org/10.1111/jtsa.12213.

Finke, Axel, and Sumeetpal S. Singh. 2016. “Approximate Smoothing and Parameter Estimation in High-Dimensional State-Space Models,” June. http://arxiv.org/abs/1606.08650.

Friedman, Jerome, Trevor Hastie, and Rob Tibshirani. 2010. “Regularization Paths for Generalized Linear Models via Coordinate Descent.” *Journal of Statistical Software* 33 (1): 1–22. https://doi.org/10.18637/jss.v033.i01.

Hansen, Niels Richard. 2010. “Penalized Maximum Likelihood Estimation for Generalized Linear Point Processes,” March. http://arxiv.org/abs/1003.0848.

Hastie, Trevor J., and Robert J. Tibshirani. 1990. *Generalized Additive Models*. Vol. 43. CRC Press. https://books.google.com.au/books?hl=en&lr=&id=qa29r1Ze1coC&oi=fnd&pg=PR13&ots=j32OnmAYkL&sig=uIjcDemVVYQpa1hDj4ip8OK4gcE.

Heyde, C. C. 1997. *Quasi-Likelihood and Its Application a General Approach to Optimal Parameter Estimation*. New York: Springer. http://site.ebrary.com/id/10015678.

Lee, Youngjo., John A. Nelder, and Yudi Pawitan. 2006. *Generalized Linear Models with Random Effects*. Monographs on Statistics and Applied Probability 106. Boca Raton, FL: Chapman & Hall/CRC.

Mayr, Andreas, Nora Fenske, Benjamin Hofner, Thomas Kneib, and Matthias Schmid. 2012. “Generalized Additive Models for Location, Scale and Shape for High Dimensional Data—a Flexible Approach Based on Boosting.” *Journal of the Royal Statistical Society: Series C (Applied Statistics)* 61 (3): 403–27. https://doi.org/10.1111/j.1467-9876.2011.01033.x.

McCullagh, Peter. 1984. “Generalized Linear Models.” *European Journal of Operational Research* 16 (3): 285–92. https://doi.org/10.1016/0377-2217(84)90282-0.

Nelder, J. A., and R. J. Baker. 2004. “Generalized Linear Models.” In *Encyclopedia of Statistical Sciences*. John Wiley & Sons, Inc. http://onlinelibrary.wiley.com/doi/10.1002/0471667196.ess0866.pub2/abstract.

Nelder, J. A., and R. W. M. Wedderburn. 1972. “Generalized Linear Models.” *Journal of the Royal Statistical Society. Series A (General)* 135 (3): 370–84. https://doi.org/10.2307/2344614.

Scandroglio, Giacomo, Andrea Gori, Emiliano Vaccaro, and Vlasios Voudouris. 2013. “Estimating VaR and ES of the Spot Price of Oil Using Futures-Varying Centiles.” *International Journal of Financial Engineering and Risk Management* 1 (1): 6–19. https://doi.org/10.1504/IJFERM.2013.053713.

Stasinopoulos, D. Mikis, Robert A. Rigby, and others. 2007. “Generalized Additive Models for Location Scale and Shape (GAMLSS) in R.” *Journal of Statistical Software* 23 (7): 1–46. https://doi.org/10.18637/jss.v023.i07.

Thrampoulidis, Chrtistos, Ehsan Abbasi, and Babak Hassibi. 2015. “LASSO with Non-Linear Measurements Is Equivalent to One with Linear Measurements.” In *Advances in Neural Information Processing Systems 28*, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, R. Garnett, and R. Garnett, 3402–10. Curran Associates, Inc. http://papers.nips.cc/paper/5739-lasso-with-non-linear-measurements-is-equivalent-to-one-with-linear-measurements.pdf.

Venables, W. N., and C. M. Dichmont. 2004. “GLMs, GAMs and GLMMs: An Overview of Theory for Applications in Fisheries Research.” *Fisheries Research*, Models in Fisheries Research: GLMs, GAMS and GLMMs, 70 (2–3): 319–37. https://doi.org/10.1016/j.fishres.2004.08.011.

Wedderburn, R. W. M. 1974. “Quasi-Likelihood Functions, Generalized Linear Models, and the Gauss—Newton Method.” *Biometrika* 61 (3): 439–47. https://doi.org/10.1093/biomet/61.3.439.

———. 1976. “On the Existence and Uniqueness of the Maximum Likelihood Estimates for Certain Generalized Linear Models.” *Biometrika* 63 (1): 27–32. https://doi.org/10.1093/biomet/63.1.27.

Wood, Simon N. 2008. “Fast Stable Direct Fitting and Smoothness Selection for Generalized Additive Models.” *Journal of the Royal Statistical Society: Series B (Statistical Methodology)* 70 (3): 495–518. https://doi.org/10.1111/j.1467-9868.2007.00646.x.

Xia, Tian, Xue-Ren Wang, and Xue-Jun Jiang. 2014. “Asymptotic Properties of Maximum Quasi-Likelihood Estimator in Quasi-Likelihood Nonlinear Models with Misspecified Variance Function.” *Statistics* 48 (4): 778–86. https://doi.org/10.1080/02331888.2013.829060.