Bayesian Optimization for
                             More Automatic Machine Learning
                                                     (extended abstract for invited talk)

                                                                    Frank Hutter1


   Bayesian optimization (see, e.g., [2]) is a framework for the op-          REFERENCES
timization of expensive blackbox functions that combines prior as-
                                                                              [1] R. Bardenet, M. Brendel, B. Kgl, and M. Sebag, ‘Collaborative hyperpa-
sumptions about the shape of a function with evidence gathered by                 rameter tuning’, in Proc. of ICML, (2013).
evaluating the function at various points. In this talk, I will briefly de-   [2] E. Brochu, V. M. Cora, and N. de Freitas, ‘A tutorial on Bayesian
scribe the basics of Bayesian optimization and how to scale it up to              optimization of expensive cost functions, with application to ac-
handle structured high-dimensional optimization problems in the se-               tive user modeling and hierarchical reinforcement learning’, CoRR,
                                                                                  abs/1012.2599, (2010).
quential model-based algorithm configuration framework SMAC [6].              [3] Tobias Domhan, Tobias Springenberg, and Frank Hutter, ‘Extrapolating
   Then, I will discuss applications of SMAC to two structured high-              learning curves of deep neural networks’, in ICML 2014 AutoML Work-
dimensional optimization problems from the growing field of auto-                 shop, (June 2014).
matic machine learning:                                                       [4] F. Hutter, H. Hoos, and K. Leyton-Brown, ‘Identifying key algorithm
                                                                                  parameters and instance features using forward selection’, in Learning
                                                                                  and Intelligent Optimization, pp. 364–381, (2013).
• Feature selection, learning algorithm selection, and optimization           [5] F. Hutter, H. Hoos, and K. Leyton-Brown, ‘An efficient approach for
  of its hyperparameters are crucial for achieving good performance               assessing hyperparameter importance’, in International Conference on
  in practical applications of machine learning. We demonstrate that              Machine Learning, (2014).
  a combined optimization over all of these choices can be carried            [6] F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘Sequential model-based
                                                                                  optimization for general algorithm configuration’, in Proc. of LION-5,
  out effectively by formulating the problem of finding a good in-                (2011).
  stantiation of the popular WEKA framework as a 768-dimensional              [7] C. Thornton, F. Hutter, H. H. Hoos, and K. Leyton-Brown, ‘Auto-
  optimization problem. The resulting Auto-WEKA framework [7]                     WEKA: combined selection and hyperparameter optimization of clas-
  allows non-experts with some available compute time to achieve                  sification algorithms’, in Proc. of KDD’13, (2013).
                                                                              [8] D. Yogatama and G. Mann, ‘Efficient transfer learning method for auto-
  state-of-the-art learning performance on the push of a button.
                                                                                  matic hyperparameter tuning’, in Proc. of AISTATS, (2014).
• Deep learning has celebrated many recent successes, but its per-
  formance is known to be very sensitive to architectural choices and
  hyperparameter settings. Therefore, so far its potential could only
  be unleashed by deep learning experts. We formulated the com-
  bined problem of selecting the right neural network architecture
  and its associated hyperparameters as a 81-dimensional optimiza-
  tion problem and showed that an automated procedure could find
  a network whose performance exceeded the previous state-of-the-
  art achieved by human domain experts using the same building
  blocks [3]. Computational time remains a challenge, but this re-
  sult is a step towards deep learning for non-experts.

   To stimulate discussion, I will finish by highlighting several fur-
ther opportunities for combining meta-learning and Bayesian opti-
mization:

• Prediction of learning curves [3],
• Learning the importance of hyperparameters (and of meta-
  features) [4, 5], and
• Using meta-features to generalize hyperparameter performance
  across datasets [1, 8], providing a prior for Bayesian optimization.

Based on joint work with Tobias Domhan, Holger Hoos, Kevin
Leyton-Brown, Jost Tobias Springenberg, and Chris Thornton.
1 University of Freiburg, Germany. Email: fh@cs.uni-freiburg.de.