-

F. Hutter, H. Hoos, and K. Leyton-Brown, 'Identifying key algorithm parameters and instance features using forward selection', in Learning and Intelligent Optimization, pp.

Bayesian Optimization for More Automatic Machine Learning

Frank Hutter

0 0 Based on joint work with Tobias Domhan , Holger Hoos, Kevin Leyton-Brown, Jost Tobias Springenberg, and Chris Thornton

2014

381 2013

Bayesian optimization (see, e.g., [2]) is a framework for the optimization of expensive blackbox functions that combines prior assumptions about the shape of a function with evidence gathered by evaluating the function at various points. In this talk, I will briefly describe the basics of Bayesian optimization and how to scale it up to handle structured high-dimensional optimization problems in the sequential model-based algorithm configuration framework SMAC [6].

Then, I will discuss applications of SMAC to two structured highdimensional optimization problems from the growing field of automatic machine learning:

Feature selection, learning algorithm selection, and optimization of its hyperparameters are crucial for achieving good performance in practical applications of machine learning. We demonstrate that a combined optimization over all of these choices can be carried out effectively by formulating the problem of finding a good instantiation of the popular WEKA framework as a 768-dimensional optimization problem. The resulting Auto-WEKA framework [7] allows non-experts with some available compute time to achieve state-of-the-art learning performance on the push of a button. Deep learning has celebrated many recent successes, but its performance is known to be very sensitive to architectural choices and hyperparameter settings. Therefore, so far its potential could only be unleashed by deep learning experts. We formulated the combined problem of selecting the right neural network architecture and its associated hyperparameters as a 81-dimensional optimization problem and showed that an automated procedure could find a network whose performance exceeded the previous state-of-theart achieved by human domain experts using the same building blocks [3]. Computational time remains a challenge, but this result is a step towards deep learning for non-experts.

To stimulate discussion, I will finish by highlighting several further opportunities for combining meta-learning and Bayesian optimization:

Prediction of learning curves [3], Learning the importance of hyperparameters (and of metafeatures) [4, 5], and Using meta-features to generalize hyperparameter performance across datasets [1, 8], providing a prior for Bayesian optimization.