# Surrogate optimisation methods for optimising optimisations

Closely related is AutoML, and certain types of stochastic optimisation.

## Problem statement

We are interested in solving

$x^* = \arg \min_x f(x)$

under the constraints that

• $$f$$ is a black box for which no closed form is known (nor its gradients);
• $$f$$ is expensive to evaluate;
• evaluations of $$y=f(x)$$ may be noisy.

It is possible to imagine we might even have access to gradients some times in which case we will additionally say that, rather than observing $$\nabla f, \nabla^2 f$$ we observe some random variables $$G(x),H(x)$$ with $$bb{E}G=\nabla f$$ and $$\bb{E}(H)=\nabla^2 f$$ as in stochastic optimisation.

This is similar to the typical framing of reinforcement learning problems where there is a similar explore/exploit tradeoff, although I do not know the precise disciplinary boundaries that may transect these areas.

The most common method seems to the “Bayesian optimisation”, which is typically based on Gaussian process regression of the loss surface. However, this is not a requirement, and many possible wacky regression models can give you the optimisation surrogate.

Of renewed interest for its use in hyperparameter/ model selection, in e.g. regularising complex models, which is compactly referred to these days as automl.

You could also obviously use it in industrial process control, which is where I originally saw this kind of thing, in the form of sequential ANOVA design, which is an incredible idea itself, although that is now years old so is not nearly so hip. Since this effectively an attempt at optimal, nonlinear, heteroskedastic sequential ANOVA, I am led to wonder if we can dispense with ANOVA now. Does this stuff actually work well enough? Or is it the same thing, repackaged?

## Implementation

### PySOT

PySOT

Surrogate Optimization Toolbox (pySOT) for global deterministic optimization problems. pySOT is hosted on GitHub: https://github.com/dme65/pySOT.

The main purpose of the toolbox is for optimization of computationally expensive black-box objective functions with continuous and/or integer variables. All variables are assumed to have bound constraints in some form where none of the bounds are infinity. The tighter the bounds, the more efficient are the algorithms since it reduces the search region and increases the quality of the constructed surrogate. This toolbox may not be very efficient for problems with computationally cheap function evaluations. Surrogate models are intended to be used when function evaluations take from several minutes to several hours or more.

### skopt

skopt (aka scikit-optimize)

is a simple and efficient library to minimize (very) expensive and noisy black-box functions. It implements several methods for sequential model-based optimization.

### spearmint

Spearmint is a package to perform Bayesian optimization according to the algorithms outlined in the paper (Snoek, Larochelle, and Adams 2012)

The code consists of several parts. It is designed to be modular to allow swapping out various ‘driver’ and ‘chooser’ modules. The ‘chooser’ modules are implementations of acquisition functions such as expected improvement, UCB or random. The drivers determine how experiments are distributed and run on the system. As the code is designed to run experiments in parallel (spawning a new experiment as soon a result comes in), this requires some engineering.

Spearmint2 is similar, but more recently updated and fancier; however it has a restrictive license prohibiting wide redistribution without the payment of fees. You may or may not wish to trust the implied level of development and support of 4 Harvard Professors, depending on your application.

Both of the Spearmint options (especially the latter) have opinionated choices of technology stack in order to do their optimizations, which means they can do more work for you, but require more setup, than a simple little thing like skopt. Depending on your computing environment this might be an overall plus or a minus.

### SMAC

SMAC (AGPLv3)

(sequential model-based algorithm configuration) is a versatile tool for optimizing algorithm parameters (or the parameters of some other process we can run automatically, or a function we can evaluate, such as a simulation).

SMAC has helped us speed up both local search and tree search algorithms by orders of magnitude on certain instance distributions. Recently, we have also found it to be very effective for the hyperparameter optimization of machine learning algorithms, scaling better to high dimensions and discrete input dimensions than other algorithms. Finally, the predictive models SMAC is based on can also capture and exploit important information about the model domain, such as which input variables are most important.

We hope you find SMAC similarly useful. Ultimately, we hope that it helps algorithm designers focus on tasks that are more scientifically valuable than parameter tuning.

Python interface through pysmac.

Allen-Zhu, Zeyuan, Yuanzhi Li, Aarti Singh, and Yining Wang. 2017. “Near-Optimal Design of Experiments via Regret Minimization.” In PMLR, 126–35. http://proceedings.mlr.press/v70/allen-zhu17e.html.

Feurer, Matthias, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. 2015. “Efficient and Robust Automated Machine Learning.” In Advances in Neural Information Processing Systems 28, edited by C. Cortes, N. D. Lawrence, D. D. Lee, M. Sugiyama, and R. Garnett, 2962–70. Curran Associates, Inc. http://papers.nips.cc/paper/5872-efficient-and-robust-automated-machine-learning.pdf.

Franceschi, Luca, Michele Donini, Paolo Frasconi, and Massimiliano Pontil. 2017. “On Hyperparameter Optimization in Learning Systems.” In. https://arxiv.org/abs/1703.01785.

Gelbart, Michael A., Jasper Snoek, and Ryan P. Adams. 2014. “Bayesian Optimization with Unknown Constraints.” In Proceedings of the Thirtieth Conference on Uncertainty in Artificial Intelligence, 250–59. UAI’14. Arlington, Virginia, United States: AUAI Press. http://hips.seas.harvard.edu/files/gelbart-constrained-uai-2014.pdf.

Grünewälder, Steffen, Jean-Yves Audibert, Manfred Opper, and John Shawe-Taylor. 2010. “Regret Bounds for Gaussian Process Bandit Problems.” In, 9:273–80. https://hal-enpc.archives-ouvertes.fr/hal-00654517/document.

Hutter, Frank, Holger H. Hoos, and Kevin Leyton-Brown. 2011. “Sequential Model-Based Optimization for General Algorithm Configuration.” In Learning and Intelligent Optimization, 6683:507–23. Lecture Notes in Computer Science. Berlin, Heidelberg: Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-25566-3_40.

Hutter, Frank, Holger Hoos, and Kevin Leyton-Brown. 2013. “An Evaluation of Sequential Model-Based Optimization for Expensive Blackbox Functions.” In Proceedings of the 15th Annual Conference Companion on Genetic and Evolutionary Computation, 1209–16. GECCO ’13 Companion. New York, NY, USA: ACM. https://doi.org/10.1145/2464576.2501592.

Li, Lisha, Kevin Jamieson, Giulia DeSalvo, Afshin Rostamizadeh, and Ameet Talwalkar. 2016. “Hyperband: A Novel Bandit-Based Approach to Hyperparameter Optimization,” March. http://arxiv.org/abs/1603.06560.

Močkus, J. 1975. “On Bayesian Methods for Seeking the Extremum.” In Optimization Techniques IFIP Technical Conference, edited by Prof Dr G. I. Marchuk, 400–404. Lecture Notes in Computer Science. Springer Berlin Heidelberg. https://doi.org/10.1007/978-3-662-38527-2_55.

Snoek, Jasper, Hugo Larochelle, and Ryan P. Adams. 2012. “Practical Bayesian Optimization of Machine Learning Algorithms.” In Advances in Neural Information Processing Systems, 2951–9. Curran Associates, Inc. http://papers.nips.cc/paper/4522-practical-bayesian-optimization-of-machine-learning-algorithms.

Snoek, Jasper, Kevin Swersky, Rich Zemel, and Ryan Adams. 2014. “Input Warping for Bayesian Optimization of Non-Stationary Functions.” In Proceedings of the 31st International Conference on Machine Learning (ICML-14), 1674–82. http://www.jmlr.org/proceedings/papers/v32/snoek14.pdf.

Srinivas, Niranjan, Andreas Krause, Sham M. Kakade, and Matthias Seeger. 2012. “Gaussian Process Optimization in the Bandit Setting: No Regret and Experimental Design.” IEEE Transactions on Information Theory 58 (5): 3250–65. https://doi.org/10.1109/TIT.2011.2182033.

Staines, Joe, and David Barber. 2012. “Variational Optimization,” December. http://arxiv.org/abs/1212.4507.

Swersky, Kevin, Jasper Snoek, and Ryan P Adams. 2013. “Multi-Task Bayesian Optimization.” In Advances in Neural Information Processing Systems 26, edited by C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani, and K. Q. Weinberger, 2004–12. Curran Associates, Inc. http://papers.nips.cc/paper/5086-multi-task-bayesian-optimization.pdf.