Regression estimation with penalties on the model parameters. I am especially interested when the penalties are sparsifying penalties, and I have more notes to sparse regression.

Here I consider general penalties: ridge etc. At least in principle — I have no active projects using penalties without sparsifying them at the moment.

Why might I use such penalties? One reason would be that \(L_2\) penalties have simple forms for their information criteria, as shown by Konishi and Kitagawa (Konishi and Kitagawa 2008, 5.2.4).

See also matrix factorisations, optimisation, multiple testing, concentration inequalities, sparse flavoured icecream.

To discuss:

Ridge penalties, relationship with robust regression, statistical learning theory etc.

In nonparametric statistics we might estimate simultaneously what look like many, many parameters, which we constrain in some clever fashion, which usually boils down to something we can interpret as a “penalty” on the parameters.

“Penalization” has a genealogy unknown to me, but is probably the least abstruse for common, general usage.

The “regularisation” nomenclature claims descent from Tikhonov, (eg Tikhonov and Glasko (1965) etc) who wanted to solve ill-conditioned integral and differential equations, so it’s somewhat more general. “Smoothing” seems to be common in the spline and kernel estimate communities of (Silverman 1982, 1984; Wahba 1990) et al, who usually actually want to smooth curves. When you say “smoothing” you usually mean that you can express your predictions as a “linear smoother”/hat matrix, which has certain nice properties in generalised cross validation.

“smoothing” is not a great general term, since penalisation does not necessarily cause “smoothness” — for example, some penalties cause the coefficients to become sparse and therefore, from the perspective of coefficients, it promotes non-smooth vectors.

In every case, you wish to solve an ill-conditioned inverse problem, so you tame it by adding a penalty to solutions you feel one should be reluctant to accept.

🏗 specifics

## Adaptive regularization

What should we regularize to attain specific kinds of solutions?

Here’s one thing I saw recently:

Venkat Chandrasekaran, Learning Semidefinite Regularizers via Matrix Factorization

Abstract: Regularization techniques are widely employed in the solution of inverse problems in data analysis and scientific computing due to their effectiveness in addressing difficulties due to ill-posedness. In their most common manifestation, these methods take the form of penalty functions added to the objective in optimization-based approaches for solving inverse problems. The purpose of the penalty function is to induce a desired structure in the solution, and these functions are specified based on prior domain-specific expertise. We consider the problem of learning suitable regularization functions from data in settings in which prior domain knowledge is not directly available. Previous work under the title of ‘dictionary learning’ or ‘sparse coding’ may be viewed as learning a polyhedral regularizer from data. We describe generalizations of these methods to learn semidefinite regularizers by computing structured factorizations of data matrices. Our algorithmic approach for computing these factorizations combines recent techniques for rank minimization problems along with operator analogs of Sinkhorn scaling. The regularizers obtained using our framework can be employed effectively in semidefinite programming relaxations for solving inverse problems. (Joint work with Yong Sheng Soh)

## References

*Proceeding of the Second International Symposium on Information Theory*, edited by Petrovand F Caski, 199–213. Budapest: Akademiai Kiado.

*Biometrika*60 (2): 255–65.

*arXiv:1506.00898 [Cs, Math, Stat]*, June.

*arXiv:0901.3202 [Cs, Stat]*.

*Advances in Neural Information Processing Systems 27*, edited by Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, 1556–64. Curran Associates, Inc.

*Information Theory Workshop, 2008. ITW’08. IEEE*, 247–57. IEEE.

*Neural Computation*4 (2): 141–66.

*arXiv:1609.06675 [Math, Stat]*, September.

*Test*15 (2): 271–344.

*The Annals of Statistics*32 (4): 1723–43.

*Statistics for High-Dimensional Data*, 77–97. Springer Series in Statistics. Springer Berlin Heidelberg.

*arXiv:1503.06426 [Stat]*9 (1): 1449–73.

*Biometrika*82 (4): 877–86.

*Journal of Fourier Analysis and Applications*19 (6): 1229–54.

*Proceedings of the IEEE*98 (6): 925–36.

*Statistics & Probability Letters*33 (2): 201–8.

*Annual Review of Economics*7 (1): 649–88.

*Journal of the American Statistical Association*99 (467): 619–32.

*The Annals of Statistics*32 (2): 407–99.

*arXiv:1302.2068 [Stat]*, February.

*Journal of Statistical Software*33 (1): 1–22.

*Journal of the American Statistical Association*114 (525): 445–52.

*Scandinavian Journal of Statistics*41 (1): 72–86.

*arXiv:1409.8557 [Math, Stat]*, September.

*arXiv:1412.5896 [Cs, Math, Stat]*, December.

*Technometrics*21 (2): 215–23.

*The Annals of Statistics*18 (2): 758–78.

*IEEE Transactions on Medical Imaging*9 (1): 84–93.

*Journal of the Royal Statistical Society. Series B (Methodological)*52 (3): 443–52.

*Journal of the American Statistical Association*88 (422): 495–504.

*Bioinformatics*21 (13): 3001–8.

*Generalized Additive Models*. Vol. 43. CRC Press.

*Statistical Learning with Sparsity: The Lasso and Generalizations*. Boca Raton: Chapman and Hall/CRC.

*IEEE Transactions on Image Processing*22 (6): 2138–50.

*Proceedings of the 32nd International Conference on Machine Learning (ICML-15)*, 928–37.

*Technometrics*12 (1): 55–67.

*Biometrika*93 (1): 85–98.

*Biometrika*102 (2): 479–85.

*Journal of Machine Learning Research*15 (1): 2869–909.

*Biometrika*101 (4): 771–84.

*Machine Learning and Knowledge Discovery in Databases*, edited by José Luis Balcázar, Francesco Bonchi, Aristides Gionis, and Michèle Sebag, 66–81. Lecture Notes in Computer Science. Springer Berlin Heidelberg.

*Advances in Statistical Modeling and Inference*, 613–34.

*Information Criteria and Statistical Modeling*. Springer Series in Statistics. New York: Springer.

*Biometrika*83 (4): 875–90.

*IEEE transactions on medical imaging*9 (4): 439–46.

*Advances in Neural Information Processing Systems 23*, edited by J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel, and A. Culotta, 1432–40. Curran Associates, Inc.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*72 (4): 417–73.

*The Annals of Applied Statistics*2 (3): 1013–33.

*Compressed Sensing: Theory and Applications*, 394–438.

*arXiv:0803.2392 [Cs, Math]*, March.

*Advances in Neural Information Processing Systems*, 1313–20. Curran Associates, Inc.

*Proceedings of ICML*.

*Journal of the American Statistical Association*101 (474): 554–68.

*Technometrics*46 (3): 306–17.

*Journal of the American Statistical Association*97 (457): 210–21.

*The Annals of Statistics*10 (3): 795–810.

*The Annals of Statistics*12 (3): 898–916.

*Journal of Statistical Software*39 (5).

*Neural Networks*11 (4): 637–49.

*arXiv:1609.07415 [Cs, Math, Stat]*, September.

*The Annals of Statistics*9 (6): 1135–51.

*arXiv:1411.6144 [Stat]*, November.

*USSR Computational Mathematics and Mathematical Physics*5 (3): 93–107.

*arXiv:1504.06706 [Math, Stat]*, April.

*Spline Models for Observational Data*. SIAM.

*The Annals of Statistics*46 (6A): 3099–129.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*62 (2): 413–28.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*70 (3): 495–518.

*The Annals of Applied Statistics*2 (1): 224–44.

*arXiv:1611.03131 [Cs, Stat]*, November.

*Journal of the American Statistical Association*93 (441): 120–31.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*76 (1): 217–42.

*Journal of the American Statistical Association*105 (489): 312–23.

*Journal of the Royal Statistical Society: Series B (Statistical Methodology)*67 (2): 301–20.

*The Annals of Statistics*35 (5): 2173–92.

## No comments yet. Why not leave one?