AutoML

2017-07-17 — 2020-10-01

Suspiciously similar content

The sub-field of optimization specifically aims to automate model selection in machine learning (and also occasionally ensemble construction).

There are two major approaches that I am aware of, both of which are related in an abstract way but are different in practice:

Finding the right architecture for your neural net, a.k.a architecture search
Hyperparameter optimization, which I have made into a separate notebook.

TODO: When does “AutoML” become “meta-learning”?

1 Reinforcement learning approaches

Quoc Le & Barret Zoph discuss using reinforcement learning to learn neural models:

Typically, our machine learning models are painstakingly designed by a team of engineers and scientists. This process of manually designing machine learning models is difficult because the search space of all possible models can be combinatorially large — a typical 10-layer network can have ~1010 candidate networks! […]

To make this process of designing machine learning models much more accessible, we’ve been exploring ways to automate the design of machine learning models. […] in this blog post, we’ll focus on our reinforcement learning approach and the early results we’ve gotten so far.

In our approach (which we call “AutoML”), a controller neural net can propose a “child” model architecture, which can then be trained and evaluated for quality on a particular task. That feedback is then used to inform the controller how to improve its proposals for the next round.

2 Differentiable architecture search

Various tricks I am not so familiar with. Overview in Elsken, Metzen, and Hutter (2019). See, e.g. Weight sharing for neural architecture search. 🏗️

I think there is a bunch of stuff implemented in archai (blogged).

3 Implementations

3.1 Lightwood

George - Epistemink, Lightwood

Specifically, George is interested in thinking about the incentive structures of designing, using and contributing to ML software, and trying to think about what useful and rigorous results can actually be achieved by benchmarking of algorithm performance.
mindsdb/lightwood: Lightwood is Legos for Machine Learning.

3.2 auto-sklearn

auto-sklearn, The implementation of hyperparameter optimization by Feurer et al. (2015):
auto-sklearn is an automated machine learning toolkit and a drop-in replacement for a scikit-learn estimator:
```
import autosklearn.classification
cls = autosklearn.classification.AutoSklearnClassifier()
cls.fit(X_train, y_train)
predictions = cls.predict(X_test)
```
auto-sklearn frees a machine learning user from algorithm selection and hyperparameter tuning. It leverages recent advantages in Bayesian optimization, meta-learning and ensemble construction.

4 Incoming

Here’s what we need to do to fix AutoML

5 References

Abdel-Gawad, and Ratner. 2007. “Adaptive Optimization of Hyperparameters in L2-Regularised Logistic Regression.”

Bengio. 2000. “Gradient-Based Optimization of Hyperparameters.” Neural Computation.

Bergstra, James S., Bardenet, Bengio, et al. 2011. “Algorithms for Hyper-Parameter Optimization.” In Advances in Neural Information Processing Systems.

Bergstra, J, Yamins, and Cox. 2013. “Making a Science of Model Search: Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures.” In ICML.

Domke. 2012. “Generic Methods for Optimization-Based Modeling.” In International Conference on Artificial Intelligence and Statistics.

Eggensperger, Feurer, Hutter, et al. n.d. “Towards an Empirical Foundation for Assessing Bayesian Optimization of Hyperparameters.”

Eigenmann, and Nossek. 1999. “Gradient Based Adaptive Regularization.” In Neural Networks for Signal Processing IX: Proceedings of the 1999 IEEE Signal Processing Society Workshop (Cat. No.98TH8468).

Elsken, Metzen, and Hutter. 2019. “Neural Architecture Search: A Survey.” arXiv:1808.05377 [Cs, Stat].

Feurer, Klein, Eggensperger, et al. 2015. “Efficient and Robust Automated Machine Learning.” In Advances in Neural Information Processing Systems 28.

Foo, Do, and Ng. 2008. “Efficient Multiple Hyperparameter Learning for Log-Linear Models.” In Advances in Neural Information Processing Systems 20.

Fu, Luo, Feng, et al. 2016. “DrMAD: Distilling Reverse-Mode Automatic Differentiation for Optimizing Hyperparameters of Deep Neural Networks.” In PRoceedings of IJCAI, 2016.

Gelbart, Snoek, and Adams. 2014. “Bayesian Optimization with Unknown Constraints.” In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence. UAI’14.

Grünewälder, Audibert, Opper, et al. 2010. “Regret Bounds for Gaussian Process Bandit Problems.” In.

Hutter, Hoos, and Leyton-Brown. 2011. “Sequential Model-Based Optimization for General Algorithm Configuration.” In Learning and Intelligent Optimization. Lecture Notes in Computer Science.

Hutter, Hoos, and Leyton-Brown. 2013. “An Evaluation of Sequential Model-Based Optimization for Expensive Blackbox Functions.” In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation. GECCO ’13 Companion.

Li, Jamieson, DeSalvo, et al. 2017. “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization.” The Journal of Machine Learning Research.

Liu, Simonyan, and Yang. 2019. “DARTS: Differentiable Architecture Search.” arXiv:1806.09055 [Cs, Stat].

Maclaurin, Duvenaud, and Adams. 2015. “Gradient-Based Hyperparameter Optimization Through Reversible Learning.” In Proceedings of the 32nd International Conference on Machine Learning.

Močkus. 1975. “On Bayesian Methods for Seeking the Extremum.” In Optimization Techniques IFIP Technical Conference: Novosibirsk, July 1–7, 1974. Lecture Notes in Computer Science.

Real, Liang, So, et al. 2020. “AutoML-Zero: Evolving Machine Learning Algorithms From Scratch.”

Salimans, Kingma, and Welling. 2015. “Markov Chain Monte Carlo and Variational Inference: Bridging the Gap.” In Proceedings of the 32nd International Conference on Machine Learning (ICML-15). ICML’15.

Snoek, Larochelle, and Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Advances in Neural Information Processing Systems.

Snoek, Swersky, Zemel, et al. 2014. “Input Warping for Bayesian Optimization of Non-Stationary Functions.” In Proceedings of the 31st International Conference on Machine Learning (ICML-14).

Srinivas, Krause, Kakade, et al. 2010. “Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design.” In Proceedings of the 27th International Conference on International Conference on Machine Learning. ICML’10.

Swersky, Snoek, and Adams. 2013. “Multi-Task Bayesian Optimization.” In Advances in Neural Information Processing Systems 26.

Thornton, Hutter, Hoos, et al. 2013. “Auto-WEKA: Combined Selection and Hyperparameter Optimization of Classification Algorithms.” In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. KDD ’13.