Bayesian Optimization for More Automatic Machine Learning (extended abstract for invited talk) Frank Hutter1 Bayesian optimization (see, e.g., [2]) is a framework for the op- REFERENCES timization of expensive blackbox functions that combines prior as- [1] R. Bardenet, M. Brendel, B. Kgl, and M. Sebag, ‘Collaborative hyperpa- sumptions about the shape of a function with evidence gathered by rameter tuning’, in Proc. of ICML, (2013). evaluating the function at various points. In this talk, I will briefly de- [2] E. Brochu, V. M. Cora, and N. de Freitas, ‘A tutorial on Bayesian scribe the basics of Bayesian optimization and how to scale it up to optimization of expensive cost functions, with application to ac- handle structured high-dimensional optimization problems in the se- tive user modeling and hierarchical reinforcement learning’, CoRR, abs/1012.2599, (2010). quential model-based algorithm configuration framework SMAC [6]. [3] Tobias Domhan, Tobias Springenberg, and Frank Hutter, ‘Extrapolating Then, I will discuss applications of SMAC to two structured high- learning curves of deep neural networks’, in ICML 2014 AutoML Work- dimensional optimization problems from the growing field of auto- shop, (June 2014). matic machine learning: [4] F. Hutter, H. Hoos, and K. Leyton-Brown, ‘Identifying key algorithm parameters and instance features using forward selection’, in Learning and Intelligent Optimization, pp. 364–381, (2013). • Feature selection, learning algorithm selection, and optimization [5] F. Hutter, H. Hoos, and K. Leyton-Brown, ‘An efficient approach for of its hyperparameters are crucial for achieving good performance assessing hyperparameter importance’, in International Conference on in practical applications of machine learning. We demonstrate that Machine Learning, (2014). a combined optimization over all of these choices can be carried [6] F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘Sequential model-based optimization for general algorithm configuration’, in Proc. of LION-5, out effectively by formulating the problem of finding a good in- (2011). stantiation of the popular WEKA framework as a 768-dimensional [7] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘Auto- optimization problem. The resulting Auto-WEKA framework [7] WEKA: combined selection and hyperparameter optimization of clas- allows non-experts with some available compute time to achieve sification algorithms’, in Proc. of KDD’13, (2013). [8] D. Yogatama and G. Mann, ‘Efficient transfer learning method for auto- state-of-the-art learning performance on the push of a button. matic hyperparameter tuning’, in Proc. of AISTATS, (2014). • Deep learning has celebrated many recent successes, but its per- formance is known to be very sensitive to architectural choices and hyperparameter settings. Therefore, so far its potential could only be unleashed by deep learning experts. We formulated the com- bined problem of selecting the right neural network architecture and its associated hyperparameters as a 81-dimensional optimiza- tion problem and showed that an automated procedure could find a network whose performance exceeded the previous state-of-the- art achieved by human domain experts using the same building blocks [3]. Computational time remains a challenge, but this re- sult is a step towards deep learning for non-experts. To stimulate discussion, I will finish by highlighting several fur- ther opportunities for combining meta-learning and Bayesian opti- mization: • Prediction of learning curves [3], • Learning the importance of hyperparameters (and of meta- features) [4, 5], and • Using meta-features to generalize hyperparameter performance across datasets [1, 8], providing a prior for Bayesian optimization. Based on joint work with Tobias Domhan, Holger Hoos, Kevin Leyton-Brown, Jost Tobias Springenberg, and Chris Thornton. 1 University of Freiburg, Germany. Email: fh@cs.uni-freiburg.de.