AutoML



The sub-field of optimisation that specifically aims to automate model selection in machine learning. (and also occasionally ensemble construction)

There are two major approaches that I am aware of, both of which are related in a kind of abstract way, but which are in practice different

  1. Finding the right architecture for your nueral net, a.k.a architecture search
  2. Hyperparameter optimisation which I have made into a separate notebook.

The first one I might cover .

TODO: work out if this is the same as β€œmeta learning”? I think not; I suspect that of being transfer learning.

Reinforcement learning approaches

Quoc Le & Barret Zoph discuss using reinforcement learning to learn neural models:

Typically, our machine learning models are painstakingly designed by a team of engineers and scientists. This process of manually designing machine learning models is difficult because the search space of all possible models can be combinatorially large β€” a typical 10-layer network can have ~1010 candidate networks! […]

To make this process of designing machine learning models much more accessible, we’ve been exploring ways to automate the design of machine learning models. […] in this blog post, we’ll focus on our reinforcement learning approach and the early results we’ve gotten so far.

In our approach (which we call β€œAutoML”), a controller neural net can propose a β€œchild” model architecture, which can then be trained and evaluated for quality on a particular task. That feedback is then used to inform the controller how to improve its proposals for the next round.

Implementations

Lightwood

  • George - Epistemink, Lightwood

    Specifically, George is interested in thinking about the incentive structures of designing, using and contributing to ML software, and trying to think about what useful and rigorous results can actually be achieved by benchmarking of algorithm performance.

  • mindsdb/lightwood: Lightwood is Legos for Machine Learning.

auto-sklearn

  • auto-sklearn, The implementation of hyperparameter optimization by Feurer et al. (2015):

    auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator:

    import autosklearn.classification
    cls = autosklearn.classification.AutoSklearnClassifier()
    cls.fit(X_train, y_train)
    predictions = cls.predict(X_test)

    auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction.

References

Abdel-Gawad, Ahmed, and Simon Ratner. 2007. β€œAdaptive Optimization of Hyperparameters in L2-Regularised Logistic Regression.”
Bengio, Yoshua. 2000. β€œGradient-Based Optimization of Hyperparameters.” Neural Computation 12 (8): 1889–1900.
Bergstra, James S., RΓ©mi Bardenet, Yoshua Bengio, and BalΓ‘zs KΓ©gl. 2011. β€œAlgorithms for Hyper-Parameter Optimization.” In Advances in Neural Information Processing Systems, 2546–54. Curran Associates, Inc.
Bergstra, J, D Yamins, and D D Cox. 2013. β€œMaking a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures.” In ICML, 9.
Domke, Justin. 2012. β€œGeneric Methods for Optimization-Based Modeling.” In International Conference on Artificial Intelligence and Statistics, 318–26.
Eggensperger, Katharina, Matthias Feurer, Frank Hutter, James Bergstra, Jasper Snoek, Holger H. Hoos, and Kevin Leyton-Brown. n.d. β€œTowards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters.”
Eigenmann, R., and J. A. Nossek. 1999. β€œGradient Based Adaptive Regularization.” In Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468), 87–94.
Elsken, Thomas, Jan Hendrik Metzen, and Frank Hutter. 2019. β€œNeural Architecture Search: A Survey.” arXiv:1808.05377 [Cs, Stat], April.
Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. β€œEfficient and Robust Automated Machine Learning.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2962–70. Curran Associates, Inc.
Foo, Chuan-sheng, Chuong B. Do, and Andrew Y. Ng. 2008. β€œEfficient Multiple Hyperparameter Learning for Log-Linear Models.” In Advances in Neural Information Processing Systems 20, edited by J. C. Platt, D. Koller, Y. Singer, and S. T. Roweis, 377–84. Curran Associates, Inc.
Fu, Jie, Hongyin Luo, Jiashi Feng, Kian Hsiang Low, and Tat-Seng Chua. 2016. β€œDrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks.” In PRoceedings of IJCAI, 2016.
Gelbart, Michael A., Jasper Snoek, and Ryan P. Adams. 2014. β€œBayesian Optimization with Unknown Constraints.” In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 250–59. UAI’14. Arlington, Virginia, United States: AUAI Press.
GrΓΌnewΓ€lder, Steffen, Jean-Yves Audibert, Manfred Opper, and John Shawe-Taylor. 2010. β€œRegret Bounds for Gaussian Process Bandit Problems.” In, 9:273–80.
Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. 2011. β€œSequential Model-Based Optimization for General Algorithm Configuration.” In Learning and Intelligent Optimization, 6683:507–23. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, Berlin, Heidelberg.
Hutter, Frank, Holger Hoos, and Kevin Leyton-Brown. 2013. β€œAn Evaluation of Sequential Model-Based Optimization for Expensive Blackbox Functions.” In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, 1209–16. GECCO ’13 Companion. New York, NY, USA: ACM.
Li, Lisha, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2017. β€œHyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.” The Journal of Machine Learning Research 18 (1): 6765–6816.
Liu, Hanxiao, Karen Simonyan, and Yiming Yang. 2019. β€œDARTS: Differentiable Architecture Search.” arXiv:1806.09055 [Cs, Stat], April.
Maclaurin, Dougal, David Duvenaud, and Ryan Adams. 2015. β€œGradient-Based Hyperparameter Optimization Through Reversible Learning.” In Proceedings of the 32nd International Conference on Machine Learning, 2113–22. PMLR.
Močkus, J. 1975. β€œOn Bayesian Methods for Seeking the Extremum.” In Optimization Techniques IFIP Technical Conference: Novosibirsk, July 1–7, 1974, edited by G. I. Marchuk, 400–404. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer.
Real, Esteban, Chen Liang, David R. So, and Quoc V. Le. 2020. β€œAutoML-Zero: Evolving Machine Learning Algorithms From Scratch,” March.
Salimans, Tim, Diederik Kingma, and Max Welling. 2015. β€œMarkov Chain Monte Carlo and Variational Inference: Bridging the Gap.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15), 1218–26. ICML’15. Lille, France: JMLR.org.
Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. 2012. β€œPractical Bayesian Optimization of Machine Learning Algorithms.” In Advances in Neural Information Processing Systems, 2951–59. Curran Associates, Inc.
Snoek, Jasper, Kevin Swersky, Rich Zemel, and Ryan Adams. 2014. β€œInput Warping for Bayesian Optimization of Non-Stationary Functions.” In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 1674–82.
Srinivas, Niranjan, Andreas Krause, Sham M. Kakade, and Matthias Seeger. 2012. β€œGaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design.” IEEE Transactions on Information Theory 58 (5): 3250–65.
Swersky, Kevin, Jasper Snoek, and Ryan P Adams. 2013. β€œMulti-Task Bayesian Optimization.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 2004–12. Curran Associates, Inc.
Thornton, Chris, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown. 2013. β€œAuto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms.” In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 847–55. KDD ’13. New York, NY, USA: ACM.

No comments yet. Why not leave one?

GitHub-flavored Markdown & a sane subset of HTML is supported.