<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>F. Hutter, H. Hoos, and K. Leyton-Brown, 'Identifying key algorithm
parameters and instance features using forward selection', in Learning
and Intelligent Optimization, pp.</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Bayesian Optimization for More Automatic Machine Learning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Frank Hutter</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Based on joint work with Tobias Domhan</institution>
          ,
          <addr-line>Holger Hoos, Kevin Leyton-Brown, Jost Tobias Springenberg, and Chris Thornton</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2014</year>
      </pub-date>
      <volume>381</volume>
      <issue>2013</issue>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Bayesian optimization (see, e.g., [2]) is a framework for the
optimization of expensive blackbox functions that combines prior
assumptions about the shape of a function with evidence gathered by
evaluating the function at various points. In this talk, I will briefly
describe the basics of Bayesian optimization and how to scale it up to
handle structured high-dimensional optimization problems in the
sequential model-based algorithm configuration framework SMAC [6].</p>
      <p>Then, I will discuss applications of SMAC to two structured
highdimensional optimization problems from the growing field of
automatic machine learning:</p>
      <p>Feature selection, learning algorithm selection, and optimization
of its hyperparameters are crucial for achieving good performance
in practical applications of machine learning. We demonstrate that
a combined optimization over all of these choices can be carried
out effectively by formulating the problem of finding a good
instantiation of the popular WEKA framework as a 768-dimensional
optimization problem. The resulting Auto-WEKA framework [7]
allows non-experts with some available compute time to achieve
state-of-the-art learning performance on the push of a button.
Deep learning has celebrated many recent successes, but its
performance is known to be very sensitive to architectural choices and
hyperparameter settings. Therefore, so far its potential could only
be unleashed by deep learning experts. We formulated the
combined problem of selecting the right neural network architecture
and its associated hyperparameters as a 81-dimensional
optimization problem and showed that an automated procedure could find
a network whose performance exceeded the previous
state-of-theart achieved by human domain experts using the same building
blocks [3]. Computational time remains a challenge, but this
result is a step towards deep learning for non-experts.</p>
      <p>To stimulate discussion, I will finish by highlighting several
further opportunities for combining meta-learning and Bayesian
optimization:</p>
      <p>Prediction of learning curves [3],
Learning the importance of hyperparameters (and of
metafeatures) [4, 5], and
Using meta-features to generalize hyperparameter performance
across datasets [1, 8], providing a prior for Bayesian optimization.</p>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>