Neural mixtures of experts

Switching regression, mixture of experts

March 29, 2016 — June 11, 2024

Bayes
classification
clustering
compsci
density
ensemble
information
linear algebra
model selection
nonparametric
optimization
probability
regression
sparser than thou
statistics

Mixtures or model combinations the gating/mixing function is itself learned.

Placeholder.

Figure 1

1 References

Barron, Rissanen, and Yu. 1998. The Minimum Description Length Principle in Coding and Modeling.” IEEE Transactions on Information Theory.
Battey, and Sancetta. 2013. Conditional Estimation for Dependent Functional Data.” Journal of Multivariate Analysis.
Bernardo, and Smith. 2000. Bayesian Theory.
Bishop, and Svensen. 2012. Bayesian Hierarchical Mixtures of Experts.” arXiv:1212.2447 [Cs, Stat].
Boyd, Hastie, Boyd, et al. 2016. Saturating Splines and Feature Selection.” arXiv:1609.06764 [Stat].
Chamroukhi, Pham, Hoang, et al. 2024. Functional Mixtures-of-Experts.” Statistics and Computing.
Claeskens, and Hjort. 2008. Model Selection and Model Averaging. Cambridge Series in Statistical and Probabilistic Mathematics.
Clarke. 2003. Comparing Bayes Model Averaging and Stacking When Model Approximation Error Cannot Be Ignored.” The Journal of Machine Learning Research.
Clyde, and Iversen. 2013. Bayesian Model Averaging in the M-Open Framework.” In Bayesian Theory and Applications.
Dempster. 1973. “Alternatives to Least Squares in Multiple Regression.” Multivariate Statistical Inference.
Draper. 1995. Assessment and Propagation of Model Uncertainty.” Journal of the Royal Statistical Society: Series B (Methodological).
Eilers, and Marx. 1996. Flexible Smoothing with B-Splines and Penalties.” Statistical Science.
Fragoso, and Neto. 2015. Bayesian Model Averaging: A Systematic Review and Conceptual Classification.” arXiv:1509.08864 [Stat].
Hansen. 2007. Least Squares Model Averaging.” Econometrica.
Hinton, Vinyals, and Dean. 2015. Distilling the Knowledge in a Neural Network.” arXiv:1503.02531 [Cs, Stat].
Hjort, and Claeskens. 2003. Frequentist Model Average Estimators.” Journal of the American Statistical Association.
Hoeting, Madigan, Raftery, et al. 1999. Bayesian Model Averaging: A Tutorial.” Statistical Science.
Hurn, Justel, and Robert. 2003. Estimating Mixtures of Regressions.” Journal of Computational and Graphical Statistics.
Le, and Clarke. 2022. Model Averaging Is Asymptotically Better Than Model Selection For Prediction.” Journal of Machine Learning Research.
Leung, and Barron. 2006. Information Theory and Mixing Least-Squares Regressions.” IEEE Transactions on Information Theory.
Marin, Mengersen, and Robert. 2005. Bayesian Modelling and Inference on Mixtures of Distributions.” In Handbook of Statistics.
Masegosa. 2020. Learning Under Model Misspecification: Applications to Variational and Ensemble Methods.” In Proceedings of the 34th International Conference on Neural Information Processing Systems. NIPS’20.
Minka. 2002. Bayesian Model Averaging Is Not Model Combination.”
Nguyen, Trungtin, Forbes, Arbel, et al. 2023. Bayesian Nonparametric Mixture of Experts for Inverse Problems.”
Nguyen, TrungTin, Nguyen, Chamroukhi, et al. 2024. Non-Asymptotic Oracle Inequalities for the Lasso in High-Dimensional Mixture of Experts.”
Nott, Tan, Villani, et al. 2012. Regression Density Estimation With Variational Methods and Stochastic Approximation.” Journal of Computational and Graphical Statistics.
Piironen, and Vehtari. 2017. Comparison of Bayesian Predictive Methods for Model Selection.” Statistics and Computing.
Raftery, and Zheng. 2003. Discussion: Performance of Bayesian Model Averaging.” Journal of the American Statistical Association.
Tan, and Nott. 2014. Variational Approximation for Mixtures of Linear Mixed Models.” Journal of Computational and Graphical Statistics.
Viele, and Tong. 2002. Modeling with Mixtures of Linear Regressions.” Statistics and Computing.
Wang, Zhang, and Zou. 2009. Frequentist Model Averaging Estimation: A Review.” Journal of Systems Science and Complexity.
Waterhouse, MacKay, and Robinson. 1995. Bayesian Methods for Mixtures of Experts.” In Advances in Neural Information Processing Systems.
Yao, Pirš, Vehtari, et al. 2022. Bayesian Hierarchical Stacking: Some Models Are (Somewhere) Useful.” Bayesian Analysis.
Zeevi, Meir, and Maiorov. 1998. Error Bounds for Functional Approximation and Estimation Using Mixtures of Experts.” IEEE Transactions on Information Theory.
Zhang, Tianfang, Bokrantz, and Olsson. 2021. A Similarity-Based Bayesian Mixture-of-Experts Model.” arXiv:2012.02130 [Cs, Stat].
Zhang, Xinyu, and Liang. 2011. Focused Information Criterion and Model Averaging for Generalized Additive Partial Linear Models.” The Annals of Statistics.
Zhang, Kaiqi, and Wang. 2022. Deep Learning Meets Nonparametric Regression: Are Weight-Decayed DNNs Locally Adaptive?