<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Extensive Checklist for Building AutoML Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Thiloshon Nagarajah</string-name>
          <email>thiloshon@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Guhanathan Poravi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Informatics Institute of Technology</institution>
          ,
          <addr-line>Ramakrishna Road, Colombo 6</addr-line>
          ,
          <country country="LK">Sri Lanka</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Westminster</institution>
          ,
          <addr-line>New Cavendish Street, London</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Automated Machine Learning is a research area which has gained a lot of focus in the recent past. But the required components to build an autoML system is neither properly documented nor very clear due to the differences and the recentness of researches. If the required steps are analyzed and brought under a common survey, it will assist in continuing researches. This paper presents an analysis of the components and technologies in the domains of autoML, hyperparameter tuning and meta learning and, presents a checklist of steps to follow while building an AutoML system. This paper is a part of an ongoing research and the findings presented will assist in developing a novel architecture for an autoML system.</p>
      </abstract>
      <kwd-group>
        <kwd>AutoML</kwd>
        <kwd>Hyperparameter</kwd>
        <kwd>Meta-learning</kwd>
        <kwd>Algorithm-Selection</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>AutoML</title>
      <p>
        The umbrella term AutoML coined from ‘Automated Machine Learning’ [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] refers to
the large scale automation of a wide spectrum of the machine learning process beyond
the traditional model-creation such as data pre-processing, meta-learning [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2–5</xref>
        ], feature
learning, model searching, hyperparameter optimization [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], training [
        <xref ref-type="bibr" rid="ref7 ref8 ref9">7–9</xref>
        ], workflows
generation [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9–12</xref>
        ], data acquisition and reporting. These black-box learning machines
gained popularity after ChaLearn initiated AutoML competitions [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] in 2015. Started
as a ‘benchmark for automated machine learning systems that can be operated without
any human intervention’ the challenge focused on automating hyperparameter tuning
and model selection for classification learnings.
      </p>
      <p>
        Even though there were many promising systems emerged from these competitions
and recently we have been introduced to some commercial level AutoML systems by
Google [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and H2O.ai [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], the majority of the concepts and researches are in very
early stages [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Researchers have used varieties of statistical theories like
regularization, Bayesian priors [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], Minimum Description Length (MDL), Structural Risk
Minimization (SRM) and genetic programming while further researches are required to find
the best suiting techniques that are generic and works consistently. In this paper, the
different components that are required to build an autoML system and the technologies
available to develop those are explored.
      </p>
      <p>In section 2, we present the methodology we used to conduct this survey and collect
data. Section 3 covers the architecture designs proposed by researchers. Section 4
covers the preprocessing techniques that can be automated. Section 5 deals with algorithm
selection and meta learning methods to find best candidate algorithms. Section 6 covers
hyperparameter optimization techniques used in this domain and their reviews. Section
7 covers how to automate the evaluation of models to choose the best one. Section 8
deals with how to benchmark the developed autoML system and in the last chapter
conclusion is provided.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>We started gaining domain knowledge with a literature survey in the domain of autoML
systems. We came across 48 such primary studies and identified the different
approaches used for autoML in the available work. We short listed techniques used by the
available work in order to achieve hyperparameter optimization, meta learning and
algorithm selection. These techniques were selected to be discussed in this paper,
according to the majority of use.
3</p>
    </sec>
    <sec id="sec-3">
      <title>AutoML Architecture</title>
      <p>
        Throughout the researches on the autoML domain, the final goal has always been to
automate the entire pipeline of the machine learning. However, it has proved to be a
difficult task as a whole. Thus several work has been conducted in automating at least
some part of machine learning, with the intention of putting all these together at the
end. Liu [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], realized this limitation and came up with two categories to differentiate
this.
1. Narrow AutoML deals with partial automation or concentration of autoML systems,
that is mainly fueled by commercial needs.
2. Generalized AutoML aims to automate the entire process, which would lead way to
Artificial General Intelligence and is predominantly seen in academic researches.
      </p>
      <p>
        According to him, even though most of the work available are narrow autoML, it is
eminent to achieve pivotal progresses in generalized autoML. The same concept has
been covered by Guyon et al [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] as well, who termed these two categories as
semiautomated and fully-automated autoML. With this understanding, more focus is given
to generalized autoML in this paper.
      </p>
      <p>
        An autoML system will need to automate all the parts of machine learning in its
architecture. Das and Cakmak [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] came up with a requirement list that reflects this. It
had moving and interconnected components of machine learning that needs to be
automated including feature preprocessing, feature selection, model selection and
hyperparameter optimization. They defined all the required components to achieve, full
automation as follows,
      </p>
      <sec id="sec-3-1">
        <title>Phases</title>
      </sec>
      <sec id="sec-3-2">
        <title>Data</title>
      </sec>
      <sec id="sec-3-3">
        <title>Sources</title>
      </sec>
      <sec id="sec-3-4">
        <title>Data</title>
      </sec>
      <sec id="sec-3-5">
        <title>Processing</title>
      </sec>
      <sec id="sec-3-6">
        <title>Model</title>
      </sec>
      <sec id="sec-3-7">
        <title>Training</title>
      </sec>
      <sec id="sec-3-8">
        <title>Deployment</title>
        <p>Though this covers all major aspects of machine learning, few important steps like
train/test splitting, models ensembling and reporting are missed in this list. Most of the
architectures proposed in autoML domain (AutoWeka, Hyperopt-Sklearn,
AutoSklearn, TPOT, H2O.ai) contains at least some subset of these components. Olson et al
proposed an architecture [48] where these systems are arranged into different engines.
Addition to visualization and graph engines that offers insights which are not listed in
the above list, they also added a ‘Human Engine’, which takes human inputs for
maintenance of the autoML system. With this understanding of architecture, let’s analyze how
each component can be developed.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Automated preprocessing</title>
      <p>Raw data used in machine learning is often unclean, skewed and noisy. Cleaning data
and feature transformations have proven to improve accuracy of machine learning
systems substantially. Thus the first step of an AutoML system is data preprocessing. The
following section covers some important preprocessing steps that can be automated at
least to certain extent.</p>
      <p>Data transformation. For numerical data, some of the common processes to automate
are,
 scaling - standardization and normalization
 missing values imputation - using global constant, using mean / median, using
indicator variable, predicting the most probable value or simply removing record
 outlier detection: univariate - interquartile range and filtering, ‘Winsorizing’ or
trimming: multivariate - one class SVM, Local Outlier Factor (LOF) and isolation forest
 binning – equal width binning, equal frequency binning; log and power
transformations
 identifier detection</p>
      <p>For categorical variables, some of the common processes are,
 encoding - label encoding, one-hot encoding, frequency-based encoding, target
mean encoding, binary encoding, hash encoding
 replacing missing values with the mode</p>
      <p>Additionally, for other data types such as text or video, several other preprocessing
techniques like tokenization, normalization or substitution are available. An interesting
point to note is that, tree based supervised models such are random forests are able to
handle feature or data abnormalities by themselves, whereas non-tree based supervised
learning algorithms, are much sensitive to abnormalities.</p>
      <p>Feature selection. Some of the feature selection that can be done before creating a
model are as follows,
 Identifying highly correlated variables and treating them
 Excluding features with low variance or univariate feature selection
 Recursive feature elimination - Measuring information gain for the available set of
features and choosing the top N features accordingly
 Dimensionality reduction with PCA - transforming the data in the high-dimensional
space to a space of fewer dimensions.</p>
      <p>And, if we are to do feature selection after creating baseline model,
 Using linear regression and selecting variables based on p values
 Using stepwise selection for linear regression and selecting the important variables
 Feature selection using random forest - Using random forest and selecting the top N
important variables
 Feature generation - It is also possible to generate new features from the intrinsic
data with techniques such as numerical feature generation, pairwise feature creation,
categorical feature creation, temporal feature creation, etc.</p>
      <p>These preprocessing steps can improve efficiency and accuracy of the subsequent
machine learning workflows. Several of these preprocessing can be decided to be used
based on simple statistical metrics. For example, multi-collinearity between features
can be found with Pearson’s Correlation Coefficients. These variables can be treated
with stepwise regression or principle component analyses. Such automatic
decisionexecution pairs can help build a powerful autoML system.</p>
    </sec>
    <sec id="sec-5">
      <title>Automated Algorithm Selection and Model Initiation</title>
      <p>Next step is to automatically find candidate algorithms suitable for the dataset. The
categories of machine learning algorithms that needs to be considered are as follows,</p>
      <sec id="sec-5-1">
        <title>Machine Learning</title>
        <p>
          There are other aspects data scientists worry about in algorithm selection, such as
computational complexity, differences in training and scoring time, linearity versus
nonlinearity, etc. and it’s useful if these are to be considered while automating. The main
quantitative techniques in this paradigm can be categorized as rules-based and
metalearning. The following section discusses these techniques.
In rules based systems, we try to mimic how a data scientist manually does algorithm
selection, which is a mixture of initial exploration of the dataset and his experience.
Certain characteristics of the dataset and the domain of the dataset can suggest the
possible candidates of algorithms to build machine learning experiments with. A rules
system can be implemented to reflect these characteristics with the help of many cheat
sheets of algorithms publicly available in the internet. For example, Scikit Learn python
package has a map [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] to help its users find the best classifier available in its package.
Rules Based Machine Learning is a “method that identifies, learns, or evolves 'rules' to
store, manipulate or apply” [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] for decision making mechanisms. Set of rules in the
format {IF 'condition' THEN 'result'} makes up this rules system or knowledge base. In
autoML space, characteristics can be modelled as conditions and algorithms as results.
For example, in python language, Skope-rules is used to perform RBML. Under RBML,
Learning Classifier Systems [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] uses genetic algorithm for discovery component and
usual machine learning classifiers as learning component.
        </p>
        <p>
          Case Based Reasoning [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ] solves new problems, based on the solutions of the past
identical problems. It follows a four step method of retrieving, reusing, revising and
retaining. Rule induction is a term to denote general area of ML where formal rules are
extracted based on set of observations. In autoML landscape, all of these techniques
can help build a rules system that can suggest algorithms based on the dataset properties
and the domain the data is from.
With metalearning (Machine Learning for meta-learning) [
          <xref ref-type="bibr" rid="ref23 ref24">23, 24</xref>
          ], we try to gain
insights from the metadata of the machine learning experiments. Results of each model
training is stored along with its dataset and performance details and used in the future
runs. There has been substantial interest in the meta learning space in the recent past
and many autoML systems (TPOT, Auto-Sklearn) have integrated these.
The first step in any meta learning solution is creating a meta-database. OpenML [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ]
is one such prominent database available now. These databases will contain information
about datasets such as number of features, number of records, correlation of features,
number of missing values, information about models such as algorithms,
hyperparameter spaces and also performance information such as running time and accuracy. These
metadata can be learned with machine learning algorithms and best algorithm settings
can be predicted for new datasets [
          <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
          ]. It can also be used to suggest the initial
hyperparameter settings to start modelling. Feurer et al [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] in their research gathered
140 datasets from OpenML repository and created instantiation settings based on
metafeatures using Bayesian optimization (SMAC with cross-validation) that checks
empirical performance for that dataset. Then, meta-features of new dataset are stripped and
compared with L1 distances of offline datasets gathered before, to choose 25 nearest
datasets and their parameter space. Optimization is then done starting from these
initializations to get high model accuracies.
5.3
        </p>
        <sec id="sec-5-1-1">
          <title>Graphical Methods</title>
          <p>Algorithm selection is almost always backed by visualizing the dataset in graphical
methods. These Exploratory Data Analysis (ETA) methods can be done in conjunction
with other statistical methods. This helps one understand data beyond the statistical
modelling or hypothesis testing procedures. The only issue with these methods are that
these cannot be automated. This is done solely with the supervision of a human
component. But viewing these in the AutoML system as part of the configuration step or
report generation step will give additional insight to the user. The following table gives
an overview of such exploratory methods.</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Method</title>
        </sec>
        <sec id="sec-5-1-3">
          <title>Ordination</title>
        </sec>
        <sec id="sec-5-1-4">
          <title>Median polish</title>
        </sec>
        <sec id="sec-5-1-5">
          <title>Description</title>
          <p>Mainly used in data clustering, groups
similar multivariate objects near each
other and dissimilar objects farther. Most
common ordination technique is Principle
Component Analysis (PCA).</p>
          <p>Uses the medians of the rows and columns
to iteratively fit model for the data.</p>
        </sec>
        <sec id="sec-5-1-6">
          <title>Used To</title>
          <p>Very common
widely used.
and</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Not sensitive to outliers but very simple method.</title>
        <sec id="sec-5-2-1">
          <title>Box plot</title>
        </sec>
        <sec id="sec-5-2-2">
          <title>Histogram</title>
        </sec>
        <sec id="sec-5-2-3">
          <title>Run Chart</title>
        </sec>
        <sec id="sec-5-2-4">
          <title>Scatter</title>
        </sec>
        <sec id="sec-5-2-5">
          <title>Plot</title>
        </sec>
        <sec id="sec-5-2-6">
          <title>Parallel</title>
        </sec>
        <sec id="sec-5-2-7">
          <title>Coordinates</title>
        </sec>
        <sec id="sec-5-2-8">
          <title>Targeted projection pursuit</title>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>Used in numerical data to depict quartiles and variabilities outside quartiles.</title>
      </sec>
      <sec id="sec-5-4">
        <title>Used in numerical data to accurately de</title>
        <p>pict distribution of the data. It represents
the probability distribution of continuous
variables, but is limited to a single variable
per graph.</p>
        <p>Used in time series data to display data in
time sequence. Used as univariate
graphical method.</p>
      </sec>
      <sec id="sec-5-5">
        <title>Drawn in Cartesian coordinates to compare two variables of the data. It is possible to add more dimensions in terms of color codes or point shapes.</title>
        <p>Used to visualize high dimension
geometry and multivariate data. It is closely
related to time series graphs but it doesn’t
have any time variables, thus do not have
natural order.</p>
        <p>Used in very complex data to find features
or patterns of interest.</p>
      </sec>
      <sec id="sec-5-6">
        <title>To find spread, skew</title>
        <p>
          ness and outliers in the
data
Find density of
distributions and one of seven
tools of quality control
[
          <xref ref-type="bibr" rid="ref28">28</xref>
          ].
        </p>
      </sec>
      <sec id="sec-5-7">
        <title>Validate univariate data assumptions and find anomalies / outliers over time.</title>
        <p>Find correlation in data
and is one of seven tools
of quality control.</p>
      </sec>
      <sec id="sec-5-8">
        <title>Find relationship tween dimensions. be</title>
      </sec>
      <sec id="sec-5-9">
        <title>Find ‘interestingness’ in data and as feature selection method</title>
        <p>Now that we have analyzed several algorithm selection methods, the following table gives a
summary of these techniques grouped under its three types.
After algorithm selection, the machine learning model and its features to learn will be
customized. There are few quantitative methods data scientists use manually that can
be automated as discussed below.</p>
        <sec id="sec-5-9-1">
          <title>Akaike information criterion (AIC) / Watanabe–AIC (WAIC) calculates the rela</title>
          <p>
            tive quality of models for a dataset compared to other models, thus can be used for
model selection [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ]. It uses the amount of information lost by the model as the
parameter of quality. It is very common and widely used.
          </p>
          <p>
            Bayesian information criterion (BIC) [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ] is very similar to AIC but is based on
likelihood function. Model with smaller BIC value is considered the best. BIC cannot
handle high dimension model selection tasks and at times, is less effective than AIC [
            <xref ref-type="bibr" rid="ref29">29</xref>
            ].
Focused information criterion (FIC) is yet another method for selecting best model
among possible competitors. Model with the best estimated precision is chosen. Unlike
AIC or BIC, FIC [
            <xref ref-type="bibr" rid="ref30">30</xref>
            ] doesn’t find overall fitness of models, rather, on the parameter
of primary interest that gives different estimates in all candidates.
          </p>
          <p>Mallows's Cp calculates fit of regression models, where a model with best subset of
predictors among predictor variables, available for some outcome is chosen. Small
value of Cp is considered to be more precise. Cp only works well in large sample sizes
and can’t handle complex collection of models.</p>
          <p>
            Stepwise Regression (SR) chooses each feature in the dataset incrementally and finds
accuracy of the models [
            <xref ref-type="bibr" rid="ref33">33</xref>
            ]. By following this for every feature, it chooses the set of
features that increases accuracy of the models and removes others. This can be used as
a feature selection mechanism. Typically, AIC, BIC, FDR or Mallows's Cp is used as
the selection criterion. Stepwise regression if often criticized as data dredging and
biased as it works on the data itself and favored over by ensembles.
6
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Automated Hyperparameter optimization</title>
      <p>In the entire machine learning landscape there are two types of parameters.
 Model parameters that are learned by the algorithm while learning, thus does not
need to be automated
 Hyperparameters that needs to be set before beginning of the learning, thus needs to
be automated</p>
      <p>Optimizing the hyperparameter is a function with the objective of minimizing
the loss / cost of the algorithm, which in turn helps keep balance between the model
bias and variance. This is essential in getting a low cross-validation error at the end of
the experiment. While automating machine learning process, it is also expected to
automate tuning or optimizing hyperparameters that best fits the dataset. In the following
section analysis of hyperparameter optimization techniques are provided.
6.1</p>
      <sec id="sec-6-1">
        <title>Simple Search Approaches</title>
        <p>The most trivial techniques of hyperparameter tuning is grid search and randomized
search.</p>
        <p>Grid Search. Grid search expects few set of values as parameter space and tries all
combinations of these values to learn in brute force manner. Search will be guided by
a metric, which is often cross validation error of the training data or evaluation on the
test data. Grid search suffers from curse of dimensionality, because even when there
are two hyperparameters and five distinct values of these parameters, it requires
twentyfive times of modelling and evaluation. Besides, there is no feedback or adjustment
mechanism, thus the algorithm is highly unintelligent.</p>
        <p>
          Random Search. Random search [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] is very similar to grid search and does pretty
much the same, but in a random combination of hyperparameters. It is proved to
outperform Grid search, but performs poorly in real cases as there is not adjustment or
feedback in the learning process based on the results of previous learning.
Because of the limitations of simple search approaches came a second technique called
‘Sequential Model based global Optimization’ (SMBO) [
          <xref ref-type="bibr" rid="ref32 ref33">32, 33</xref>
          ].
6.2
        </p>
      </sec>
      <sec id="sec-6-2">
        <title>Heuristic Search</title>
      </sec>
      <sec id="sec-6-3">
        <title>Sequential Model based global Optimization. In scenarios where the evaluation of</title>
        <p>fitness function is expensive and costly, these model-based systems evaluate fitness
with a surrogate that is cheaper to calculate [34]. Among other options, Expected
Improvement (EI) has turned out to be a good candidate for the surrogate fitness function.
The concept is to use objective functions like Gaussian process to choose good
hyperparameter values and then sequentially update values based on results. This makes
use of the results of the previous iteration to find better hyperparameter values to try in
the next iteration, thus, is considered smart.</p>
        <p>
          Bayesian-based hyperparameter optimization is one of SMBO technique that is
widely used in the autoML systems. Bayesian optimization is proved to work much
better than other alternatives, as it is able to reason about the quality of runs before they
even start. It has been proven Gaussian process [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] based BO [Spearmint [35]] to
perform better on low-dimension data and tree based BO to perform better on high
dimension data [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Within tree based systems, random forest based SMAC (Sequential
Model-based Algorithm Configuration) works better than another high performing
system, TPE (Tree-structured Parzen Estimator). SMAC is also faster as it uses cross
validation fold wise and removing poor parameter settings early in the optimizations.
        </p>
        <p>Building on top of Bayesian optimization, there has been other advancements as
well. For example, the concept of meta-learning has been used to build the initial model
for BO to optimize. By referring to the meta data of the hyperparameters and their
performance on past similar datasets, new parameter spaces are developed that are more
likely to fit well. Other than BO, there are Random Online Aggressive Racing (ROAR)
as well.
In evolutionary optimization [34], evolutionary algorithms follow a process inspired by
biological concept of evolution. It first creates a random hyperparameter population as
much as one hundred. It starts evaluating these and gets their fitness functions. Based
on these relative fitness values, parameters are ranked. Worst performing tuples of
parameters are replaced by new ones generated through crossover and mutation. This is
repeated until the performance is not improved. Though mainly used in deep learning
tasks, these have started to be used in typical machine learning as well.</p>
        <p>In the recent times, there has been interests in developing techniques out of this
standard scope, to achieve hyperparameter optimization. Genetic programming [36],
transfer learning [37] and reinforcement learning [38] are some of the techniques used.
Genetic programming is mainly used in neural networks and SVMs. Bandit-based
approaches have been developed which uses small subsets of data to find settings space
to try on complete data, making the process much more efficient.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Automated Evaluation</title>
      <p>Typically, a model evaluation can contain statistical methods as well as the business
rules specific for the problem. While the statistical methods are general to all learning
problems, business rules will be tailor made to the question at hand. Even though
business rules can be skipped in the autoML system, it is essential to automate the statistical
evaluation techniques. Along with it, there can also be other processes like refining
models, re-training, and deployment to be automated.</p>
      <p>The following table gives an overview of few evaluation metrics that is used while
training models.</p>
      <sec id="sec-7-1">
        <title>Description</title>
      </sec>
      <sec id="sec-7-2">
        <title>Advantage / Limitation Used if</title>
        <p>Hold-Out Validation: In a dataset with independently and identically distributed
(IID) records, a small subset of random records is held out for validation. Model
training is done with the large portion and the evaluation metrics are calculated with
the smaller potion. A common practice is to subset 20% as validation set.</p>
        <p>Very easy to subset, but since validation is done on If the dataset is big
the smaller subset, generalization error can be less reli- enough to break into
able and higher variance. subsets.</p>
      </sec>
      <sec id="sec-7-3">
        <title>Cross-Validation / Out-of-Sample Testing: In this method, dataset is first divided</title>
        <p>into k number of folds. Iteratively we consider each fold as the held-out validation
set and training is done on the rest of the folds. The overall performance is the
average of all k folds.</p>
        <p>Much better metric than hold out validation. Can even If the dataset is very
be used in hyperparameter tuning to calculate perfor- small or computer power
mance of tuples. is limited.</p>
        <p>The following table gives an overview of few evaluation metrics that is used while
testing models in regression tasks.
Root Mean Square Error (RMSE / MSE): The most commonly used metric. It is
defined as square root of the average squared distance between the actual score and
the predicted score. In other words, sample standard deviation between predicted and
observed values. It gives us the sense of how far the predictions were from actual
values. Lower the RMSE better the model.</p>
        <p>Very common and widely used. Always
Mean Absolute Error (MAE): Similar to RMSE, but the absolute values of
distances are taken. Thus all the individual differences are weighted equally which
makes it a linear score.</p>
        <p>Easy to interpret and understand than RMSE. More If interpretation is
imrobust to outliers. portant.</p>
        <p>R Squared (R2): R2 tells how the selected independent variables explain variability
in the dependent variables. Simply said, it tells how close the data are to the fitted
regression line, which is also known as coefficient of determination. The higher R
Squared means model fits well.</p>
        <p>Doesn’t necessarily tell if the model is bad or good. In exploratory analysis.</p>
        <p>Just gives the relationship.</p>
        <p>The following table gives an overview of few evaluation metrics that is used while
testing models in classification tasks.
Accuracy: Tells how often classier makes correct prediction. Calculated as the ratio
of number of correct predictions among total predictions.</p>
        <p>Very simple and widely used. For simple metric.</p>
        <p>Confusion Matrix: Shows a detailed breakdown of correct classifications in classes.
Presented as table of ground truth labels and predictions.</p>
        <p>More detailed than accuracy, thus can diag- For metric breakdown of
indinose issues in dataset. vidual classes.</p>
        <p>Logarithmic Loss: Used when the classifier gives numeric probability as output
instead of class. It is considered as soft measurement as it contains details of how
incorrect or how correct a prediction is and not just if it is correct.</p>
        <p>More tolerant to confidence values. If output is numeric probability.
Area Under the Curve: Shows the sensitivity of classifier by plotting the rate of
true positives to the rate of false positives. Used mainly in binary classification.
Greater the AUC means it’s a better model.</p>
        <p>Hard to interpret For binary classifications</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Benchmarking AutoML systems</title>
      <p>Once an autoML system is developed, it will have to be benchmarked against existing
systems and manual procedures. Though it won’t be a component of the system itself
and not required to be automated, it helps evaluate the built system. The following
section discusses three researchers and their benchmarking methods.</p>
      <p>Thronton et al [39] in their system Auto-WEKA, used 21 prominent bench mark
datasets including 15 from the UCI repository with 70%-30% train-test splitting. Intel
Xeon X5650 six-core processors, at 2.66GHz were used with RAM limit of 3GB for
classification datasets to mimic typical data scientist settings. Bootstrap sampling and
cross validation were used to choose best setting.</p>
      <p>
        Feurer [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] in his project used 140 binary and multiclass classification sets with more
than 1000 data points from OpenML. These datasets were of varied types like text, digit,
gene, telescope and advertisement. He used balanced classification error rate instead of
standard classification error since the data sets had imbalanced class distribution.
Trainings were done on multiple controlled schemes like, with and without meta learning
and, with and without ensembles. When testing a dataset, meta-data of only 139 other
datasets were used according to leave-one-dataset-out method.
      </p>
      <p>Allen et al [40] in their benchmarking project used mean squared error and weighted
F1 score for regression and classification tasks. They choose 57 classification and 30
regression OpenML datasets. As the process was pretty extensive with as much as
10,440 compute hours, they opted for an amazon web service distributed setup.
Average of several different pairwise comparisons with different seed values were
considered as the final performance score.
9</p>
    </sec>
    <sec id="sec-9">
      <title>Future Avenues and Conclusion</title>
      <p>This paper presented the steps to build an automated machine learning system starting
from data preprocessing to model deployment. The different methods and technologies
available to develop these systems and their review were also discussed. From the
findings, it is clear the algorithm selection and feature preprocessing components require
more refinement from research community while hyperparameter tuning and meta
learning spaces are explored actively. More technologies and statistical concepts
unexplored in the autoML systems will make up the majority of future efforts while the
knowledge of previous efforts need to be accumulated as knowledge hubs or meta
databases. So far most of the researches are in the Python and Java languages while
statistical languages like R can open up more options to explore. Since execution of
multiple model trainings can take up high computational power, autoML systems needs to
explore distributed task offloading mechanisms actively.</p>
      <p>With this knowledge of the autoML components, and how they can be improved, we
hope to come up with an architectural style in near future, towards an efficient
automated machine learning system.
34. Bergstra, J., Bardenet, R., Bengio, Y., Kegl, B.: Algorithms for Hyper-Parameter</p>
      <p>Optimization. Curran Associates Inc. , USA ©2011. 2546–2554 (2011).
35. Rasmussen, C.E., Williams, C.K.I.: Gaussian processes for machine learning. MIT</p>
      <p>Press, Cambridge, Mass (2006).
36. Paris, L.: Genetic Programming/Auto-ML for One-Shot Learning. 5 (2018).
37. Quanming, Y., Mengshuo, W., Hugo, J.E., Isabelle, G., Yi-Qi, H., Yu-Feng, L.,
Wei-Wei, T., Qiang, Y., Yang, Y.: Taking Human out of Learning Applications:
A Survey on Automated Machine Learning. arXiv:1810.13306 [cs, stat]. (2018).
38. Li, Y.-F., Wang, H., Wei, T., Tu, W.-W.: Towards Automated Semi-Supervised
Learning. Association for the Advancement of Artificial Intelligence Conference
on Artificial Intelligence (AAAI-19). 33, 8 (2019).
39. Thornton, C., Hutter, F., Hoos, H.H., Leyton-Brown, K.: Auto-WEKA: Combined
Selection and Hyperparameter Optimization of Classification Algorithms.
arXiv:1208.3719 [cs]. (2012).
40. Balaji, A., Allen, A.: Benchmarking Automatic Machine Learning Frameworks.</p>
      <p>arXiv:1808.06492 [cs, stat]. (2018).
41. Cashman, D., Humayoun, S.R., Heimerl, F., Park, K., Das, S., Thompson, J.,
Saket, B., Mosca, A., Stasko, J., Endert, A., Gleicher, M., Chang, R.: Visual
Analytics for Automated Model Discovery. arXiv:1809.10782 [cs]. (2018).
42. Shen, W.: DARPA’s Data Driven Discovery of Models (D3M) and Software
Defined Hardware (SDH) Programs. In: Proceedings of the 2018 on Great Lakes
Symposium on VLSI - GLSVLSI ’18. pp. 1–1. ACM Press, Chicago, IL, USA
(2018).
43. Bergstra, J., Yamins, D., Cox, D.D.: Making a Science of Model Search:
Hyperparameter Optimization in Hundreds of Dimensions for Vision Architectures.
Proceedings of the 30 th International Conference on Ma- chine Learning, Atlanta,
Georgia, USA, 2013. JMLR: W&amp;CP. 28, 9 (2013).
44. Liu, Z., Bousquet, O., Elisseeff, A., Escalera, S., Guyon, I., Jacques, J., Pavao, A.,
Silver, D., Sun-Hosoya, L., Treguer, S., Tu, W.-W., Wang, J., Yao, Q.: AutoDL
Challenge Design and Beta Tests-Towards automatic deep learning. CiML
workshop @ NIPS2018, Dec 2018, Montreal, Canada. 7 (2018).
45. Mahpod, S., Keller, Y.: Auto-ML Deep Learning for Rashi Scripts OCR.</p>
      <p>arXiv:1811.01290 [cs]. (2018).
46. Gijsbers, P.: Automatic construction of machine learning pipelines. 65 (2017).
47. He, Y., Lin, J., Liu, Z., Wang, H., Li, L.-J., Han, S.: AMC: AutoML for Model</p>
      <p>Compression and Acceleration on Mobile Devices. 17 (2018).
48. Olson, R.S., Sipper, M., La Cava, W., Tartarone, S., Vitale, S., Fu, W.,
Orzechowski, P., Urbanowicz, R.J., Holmes, J.H., Moore, J.H.: A System for
Accessible Artificial Intelligence. arXiv:1705.00594 [cs]. (2017).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bennett</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cawley</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalante</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Escalera</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Tin Kam Ho, Macia,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Saeed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Statnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Viegas</surname>
          </string-name>
          , E.:
          <article-title>Design of the 2015 ChaLearn AutoML challenge</article-title>
          .
          <source>Presented at the July</source>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bardenet</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brendel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kégl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Collaborative hyperparameter tuning</article-title>
          .
          <source>Proceedings of the 30th International Conference on International Conference on Machine Learning</source>
          .
          <volume>28</volume>
          ,
          <string-name>
            <surname>II-</surname>
          </string-name>
          199
          <string-name>
            <surname>--</surname>
          </string-name>
          II-
          <volume>207</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cha</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>J.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cho</surname>
          </string-name>
          , D.-Y.,
          <string-name>
            <surname>Kim</surname>
          </string-name>
          , J.: Auto-Meta:
          <source>Automated Gradient Based Meta Learner Search. 32nd Conference on Neural Information Processing Systems (NIPS</source>
          <year>2018</year>
          ), Montréal, Canada. (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Sun-Hosoya</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sebag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Algorithm Recommendation with Active Meta Learning</article-title>
          . IAL 2018 workshop, ECML PKDD,
          <year>Sep 2018</year>
          , Dublin, Ireland.
          <volume>12</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Akimoto</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>D.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Udell</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>OBOE: Collaborative Filtering for AutoML Initialization</article-title>
          . arXiv:
          <year>1808</year>
          .03233 [cs, stat]. (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Guyon</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun-Hosoya</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boulle</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Es</surname>
            ,
            <given-names>H.J.</given-names>
          </string-name>
          :
          <source>Analysis of the AutoML Challenge series 2015-2018</source>
          .
          <volume>46</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Costa</surname>
            ,
            <given-names>V.O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>C.R.</given-names>
          </string-name>
          :
          <article-title>Hierarchical Ant Colony for Simultaneous Classifier Selection and Hyperparameter Optimization</article-title>
          .
          <source>In: 2018 IEEE Congress on Evolutionary Computation (CEC)</source>
          . pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          . IEEE, Rio de Janeiro (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>de Sá</surname>
            ,
            <given-names>A.G.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Freitas</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pappa</surname>
            ,
            <given-names>G.L.</given-names>
          </string-name>
          :
          <article-title>Automated Selection and Configuration of Multi-Label Classification Algorithms with Grammar-Based Genetic Programming</article-title>
          . In: Auger,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fonseca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.M.</given-names>
            ,
            <surname>Lourenço</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Machado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Paquete</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            , and
            <surname>Whitley</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>(eds.) Parallel Problem Solving from Nature -</article-title>
          PPSN XV. pp.
          <fpage>308</fpage>
          -
          <lpage>320</lpage>
          . Springer International Publishing,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wever</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohr</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hüllermeier</surname>
          </string-name>
          , E.:
          <source>Automated Multi-Label Classification based on ML-Plan</source>
          . arXiv:
          <year>1811</year>
          .04060 [cs, stat]. (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gil</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yao</surname>
          </string-name>
          , K.-T.,
          <string-name>
            <surname>Ratnakar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garijo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steeg</surname>
            ,
            <given-names>G.V.</given-names>
          </string-name>
          :
          <article-title>P4ML: A Phased Performance-Based Pipeline Planner for Automated Machine Learning</article-title>
          .
          <source>Proceedings of Machine Learning Research</source>
          <volume>1</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          ,
          <year>2018</year>
          ICML 2018 AutoML Workshop. 8 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mohr</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wever</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hüllermeier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>ML-Plan: Automated machine learning via hierarchical planning</article-title>
          .
          <source>Machine Learning</source>
          .
          <volume>107</volume>
          ,
          <fpage>1495</fpage>
          -
          <lpage>1515</lpage>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Wever</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mohr</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hullermeier</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>ML-Plan for Unlimited-Length Machine Learning Pipelines</article-title>
          .
          <source>ICML 2018 AutoML Workshop</source>
          . 8 (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zoph</surname>
            ,
            <given-names>B.:</given-names>
          </string-name>
          <article-title>Using Machine Learning to Explore Neural Network Architecture</article-title>
          , https://ai.googleblog.com/
          <year>2017</year>
          /05/
          <article-title>using-machine-learning-to-explore</article-title>
          .html, (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <article-title>AutoML: Automatic Machine Learning</article-title>
          , http://docs.h2o.ai/h2o/latest-stable/h2odocs/automl.html, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Bogatinovski</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Automating machine learning for structured output prediction</article-title>
          .
          <volume>25</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Snoek</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Larochelle</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adams</surname>
            ,
            <given-names>R.P.</given-names>
          </string-name>
          :
          <article-title>Practical Bayesian Optimization of Machine Learning Algorithms</article-title>
          . arXiv:
          <volume>1206</volume>
          .2944 [cs, stat]. (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A Very Brief and Critical Discussion on AutoML</article-title>
          . arXiv:
          <year>1811</year>
          .03822 [cs]. (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cakmak</surname>
          </string-name>
          , U.M.:
          <article-title>Hands-on automated machine learning: a beginner's guide to building automated machine learning systems using AutoML and Python</article-title>
          . (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <article-title>Scikit-learn: Choosing the right estimator</article-title>
          , https://scikit-learn.org/stable/tutorial/machine_learning_map/index.html.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Weiss</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Indurkhya</surname>
          </string-name>
          , N.:
          <article-title>Rule-based Machine Learning Methods for Functional Prediction</article-title>
          . arXiv:cs/9512107. (
          <year>1995</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Urbanowicz</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moore</surname>
            ,
            <given-names>J.H.</given-names>
          </string-name>
          :
          <article-title>Learning Classifier Systems: A Complete Introduction, Review, and Roadmap</article-title>
          .
          <source>Journal of Artificial Evolution and Applications</source>
          .
          <year>2009</year>
          ,
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Agnar</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Enric</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Case-Based Reasoning: Foundational Issues, Methodological Variations, and System Approaches</article-title>
          .
          <source>AI Communications</source>
          .
          <volume>39</volume>
          -
          <fpage>59</fpage>
          (
          <year>1994</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Feurer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Springenberg</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Initializing Bayesian Hyperparameter Optimization via Meta-Learning</article-title>
          .
          <source>AAAI'15 Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence</source>
          .
          <fpage>1128</fpage>
          -
          <lpage>1135</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Lemke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Budka</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabrys</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Metalearning: a survey of trends and technologies</article-title>
          .
          <source>Artificial Intelligence Review</source>
          .
          <volume>44</volume>
          ,
          <fpage>117</fpage>
          -
          <lpage>130</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Vanschoren</surname>
            , J., van Rijn,
            <given-names>J.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bischl</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Torgo</surname>
          </string-name>
          , L.:
          <article-title>OpenML: networked science in machine learning</article-title>
          .
          <source>ACM SIGKDD Explorations Newsletter</source>
          .
          <volume>15</volume>
          ,
          <fpage>49</fpage>
          -
          <lpage>60</lpage>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Komer</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergstra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eliasmith</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Hyperopt-Sklearn</surname>
          </string-name>
          :
          <article-title>Automatic Hyperparameter Configuration for Scikit-Learn</article-title>
          .
          <source>PROC. OF THE 13th PYTHON IN SCIENCE CONF. (SCIPY</source>
          <year>2014</year>
          ).
          <volume>7</volume>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Feurer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eggensperger</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Springenberg</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blum</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <source>Efficient and Robust Automated Machine Learning. NIPS'15 Proceedings of the 28th International Conference on Neural Information Processing Systems</source>
          .
          <volume>2</volume>
          ,
          <fpage>2755</fpage>
          -
          <lpage>2763</lpage>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Tague</surname>
            ,
            <given-names>N.R.:</given-names>
          </string-name>
          <article-title>The quality toolbox</article-title>
          . ASQ Quality Press, Milwaukee, Wis (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Burnham</surname>
            ,
            <given-names>K.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anderson</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          :
          <article-title>Multimodel Inference: Understanding AIC and BIC in Model Selection</article-title>
          .
          <source>Sociological Methods &amp; Research</source>
          .
          <volume>33</volume>
          ,
          <fpage>261</fpage>
          -
          <lpage>304</lpage>
          (
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Claeskens</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hjort</surname>
            ,
            <given-names>N.L.</given-names>
          </string-name>
          :
          <article-title>The Focused Information Criterion</article-title>
          .
          <source>Journal of the American Statistical Association</source>
          .
          <volume>98</volume>
          ,
          <fpage>900</fpage>
          -
          <lpage>916</lpage>
          (
          <year>2003</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Bergstra</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Random Search for Hyper-Parameter Optimization</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          .
          <volume>13</volume>
          ,
          <fpage>281</fpage>
          -
          <lpage>305</lpage>
          (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Hutter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hoos</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leyton-Brown</surname>
          </string-name>
          , K.:
          <article-title>Sequential Model-Based Optimization for General Algorithm Configuration</article-title>
          . In: Coello,
          <string-name>
            <surname>C.A.C.</surname>
          </string-name>
          (ed.)
          <source>Learning and Intelligent Optimization</source>
          . pp.
          <fpage>507</fpage>
          -
          <lpage>523</lpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg (
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Daning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hanping</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fen</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shigang</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yunquan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Using Known Information to Accelerate HyperParameters Optimization Based on SMBO</article-title>
          . arXiv:
          <year>1811</year>
          .03322 [cs, stat]. (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>