<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>AutoXplain: Towards Automated Interpretable Model Selection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tessel Haagen</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Heysem Kaya</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Joop Snijder</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Melchior Nierman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Atalmedial Medical Diagnostics Centres</institution>
          ,
          <addr-line>Jan Tooropstraat 138, 1061 AD Amsterdam</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Info Support</institution>
          ,
          <addr-line>Kruisboog 42, 3905 TG Veenendaal</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Utrecht University</institution>
          ,
          <addr-line>Heidelberglaan 8, 3584 CS Utrecht</addr-line>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Machine learning (ML) algorithms are increasingly used in high-stake domains like healthcare. While ML systems frequently outperform humans in specific tasks, ensuring safety and transparency is critical in these domains. Interpretability, therefore, plays a crucial role in understanding the decision-making process, auditing and correction of ML models and establishing trust. Furthermore, there is a growing demand for automated machine learning (AutoML) to facilitate model development without expert intervention. However, the combination of interpretability and AutoML has received limited attention thus far. In this study, we propose two objective model-agnostic measures of interpretability to quantify model compactness and explanation stability, embedded within an automated interpretable ML pipeline. We experiment with a set of interpretable models on medical classification tasks reporting the proposed measures along with the predictive performances. We further conduct a user study with domain experts to evaluate the correlation between these measures and the subjective concept of interpretability. Our ifndings demonstrate the efectiveness of the proposed measures, afirming their success and validating their utility in creating an interpretable automated pipeline.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Interpretable automated pipeline</kwd>
        <kwd>Interpretability measures</kwd>
        <kwd>Automated Machine Learning (AutoML)</kwd>
        <kwd>Model-agnostic measures</kwd>
        <kwd>Machine Learning for health-care</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Motivation</title>
      <p>
        Machine learning (ML) algorithms, known for their superior performance [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], are widely
used in high-stake domains such as healthcare [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This rapid growth calls for two things:
interpretability and hands-free solutions. Safety and transparency are crucial considerations,
emphasizing the need for interpretability [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Interpretability plays a vital role in establishing
trust [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], but its definition and measurement are challenging. In this study, we adopt Molnar’s
definition of interpretability as the degree to which a human can consistently predict the model’s
result [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To achieve interpretability, models must have intelligible features, lower complexity,
and incorporate a small number of features to account for human working memory limitations
[
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Hands-free solutions are needed for a quicker process, leading to Automated Machine
Learning (AutoML) [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. AutoML pipelines consist of data pre-processing like encoding [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11</xref>
        ]
and discretization [
        <xref ref-type="bibr" rid="ref12 ref13">12, 13</xref>
        ], optimization including random search or genetic algorithms [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ],
and result-generation steps. AutoML tools, both open-source and commercial, are available
[
        <xref ref-type="bibr" rid="ref15 ref16 ref17 ref18">15, 16, 17, 18, 19, 20, 21</xref>
        ], but all are black-box solutions.
      </p>
      <p>Integrating interpretability into AutoML stays unexplored. This study proposes AutoXplain,
an automated pipeline using interpretable models and model-agnostic interpretability
measures for model selection. AutoXplain can improve decision-making in high-stake domains by
exhibiting good tas performance while providing interpretability.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Proposed Measures of Interpretability</title>
      <p>To compare interpretability across models within the automated pipeline, model-agnostic
objective measures are essential. Currently lacking in the literature, we introduce two such
measures: compactness and stability.</p>
      <sec id="sec-2-1">
        <title>2.1. Compactness</title>
        <p>The compactness of a single explanation quantifies the ability to convey relevant information
using a concise set of features. It is defined using the equation:</p>
        <p>Explanation compactness = 1 −
|| − 1
| |
where  is the set of features used in the model and  is the subset of features used in the
explanation. To avoid penalizing single-feature explanations, 1 is subtracted from ||. A higher
compactness score (between 0 and 1) indicates a more compact explanation. This equation is
used to evaluate the compactness of a model, applying it to each individual explanation, and
their average is computed. Model compactness is calculated using the equation:
Model compactness =
1</p>
        <p>∑︁
|| · ∈
︂(
1
−
|| − 1 )︂
| |
where  is the set of explanations generated by the model.</p>
        <p>For decision rules and decision trees, the number of splits in each rule/to the leaf node is used
as the number of features in each explanation. For linear models, a two-step process is followed.
First, the importance of each feature in an explanation is determined using the absolute value
of its t-statistic. Then, a softmax function is applied to these importances to obtain a probability
distribution over the features. A threshold is used to select the most important features for the
(1)
(2)
explanation, resulting in .</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Stability</title>
        <p>An explanation method is considered stable if, for similar instances, similar explanations are
provided. The methodology proposed is built upon the works of Turney [22] and Zatar et al.
[23]. In order to evaluate stability, we examine the agreement between the instances and the
explanations generated by a single classification algorithm. The proposed measure consists of
the following steps:
1. Finding similar instances per instance: We utilize the -nearest neighbour method
to identify neighbours for each instance in the training dataset. In our experiments, 
is set to 9 however the value of k can be adjusted based on the dataset size and desired
strictness for stability. By calculating the average radius used to determine -neighbors
for each instance in the training set (), with a training set size of  , we establish the
radius threshold :</p>
        <p>= ∑︀=0  . (3)
2. Creating the instance space: Using the calculated radius threshold , we identify
neighboring instances in the test data, creating the instance space.
3. Creating the explanation space: For decision trees and decision rules, each leaf or rule
is its own explanation. For models with feature importances, we use the same method as
the instance space, but with the feature importances instead of the feature values. The
explanation space consists of the neighbours of each instance that share the same decision
rule or leaf (for decision trees and decision rules) or the same set of important features
(for models providing feature importances).
4. Calculating the stability measurement: From the instance space () and the
explanation space (), we can calculate the agreement of neighbours, which results in the
stability measurement:
 = | ∩ |
||
(4)</p>
        <p>A higher stability score (0-1) indicates greater explanation stability.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Results</title>
      <sec id="sec-3-1">
        <title>3.1. Data and Machine Learning Experiments</title>
        <p>Our pipeline primarily focuses on tabular datasets in the medical domain. Specifically, we utilized
the Atalmedial Anticoagulation Clinic (AAC) dataset, consisting of de-identified patient records
of individuals on oral anticoagulation therapy (VKA) for thrombo-embolic event prevention. The
dataset includes important details such as blood values, medical events, and medication history
spanning the previous 60 days. Each entry is labelled as S (severe bleeding) or N (non-severe
bleeding). Notably, the dataset exhibits class imbalance with 47 S and 5544 N samples. Since the
dataset features capture temporal data collected over the last 60 days, they are structured as
lists.</p>
        <p>AutoXplain explores a range of configurations for Decision Trees (DT) [ 24] with variations in
maximum depth, maximum number of leaf nodes, and minimum samples per leaf, Explainable
Boosting Machines (EBM) [25] with diferent parameter settings for the number of interactions,
learning rate, and early stopping rounds, and lastly Dominance Classifier Predictors (DCP) [ 26]
are examined with diferent parameter settings for the ratio-threshold and voting method. Each
model is trained using diferent subsets of features. In total, 222 models were considered during
the optimisation stage. It selects the top 3 model per method on the weigthed score of F1-score,
Compactness and Stability with crossvalidation. The parameter setting per ML method with
the best weighted score on the test set is presented in Table 1. Our top-performing model is a
DT, which achieved a weighted score of 0.864 and an F1 score of 0.984.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Human evaluation</title>
        <p>To assess the explanations generated by our models on the AAC dataset, we conducted a user
study with Atalmedial employees, including dosing advisors and specialized medical doctors in
VKA-anticoagulation. This diverse group of participants ensured a comprehensive evaluation.
Participants were presented with explanations generated by our top models for instances in
the AAC dataset, specifically focusing on severe bleeding incidents. Participants then filled
in the System Causability Scale [27]. This user study allowed us to gather valuable feedback
and insights from professionals in the medical field. The results of the paired-t test revealed
a significant diference between DT (  = 0.6,  = 0.07) and DCP ( = 0.4,  = 0.1),
(7) = 3.5,  = .010, as well as between EBM ( = 0.6,  = 0.08) and DCP, (7) = 4.7,
 = .002. However, no significant diference was found between DT and EBM, (7) = 2.2,
 = .061.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion</title>
      <p>The results of our study demonstrate that the automated pipeline we developed performs well
in terms of model performance and interpretability. Validating the feasibility of an automated
interpretable machine learning pipeline.</p>
      <p>By incorporating interpretable models and assessing their interpretability using quantitative
measures, our automated pipeline allows for the selection of models that strike a balance between
performance and interpretability. These results highlight the potential of the automated pipeline
in high-stakes decision-making domains, where both accuracy and transparency are essential.
The pipeline’s ability to generate highly performant models while ofering interpretability
empowers decision-makers to gain insights into the decision-making process and make
wellinformed judgments.
ifguration for scikit-learn, in: ICML workshop on AutoML, volume 9, Citeseer, 2014,
p. 50.
[19] R. S. Olson, N. Bartley, R. J. Urbanowicz, J. H. Moore, Evaluation of a tree-based pipeline
optimization tool for automating data science, in: Proceedings of the genetic and evolutionary
computation conference 2016, 2016, pp. 485–492.
[20] E. LeDell, S. Poirier, H2o automl: Scalable automatic machine learning, 7th ICML
Workshop on Automated Machine Learning (AutoML) (2020). URL: https://www.automl.org/
wp-content/uploads/2020/07/AutoML_2020_paper_61.pdf.
[21] N. Erickson, J. Mueller, A. Shirkov, H. Zhang, P. Larroy, M. Li, A. Smola, Autogluon-tabular:</p>
      <p>Robust and accurate automl for structured data, arXiv preprint arXiv:2003.06505 (2020).
[22] P. Turney, Bias and the quantification of stability, Machine Learning 20 (1995) 23–33.
[23] M. R. Zafar, N. Khan, Deterministic local interpretable model-agnostic explanations for
stable explainability, Machine Learning and Knowledge Extraction 3 (2021) 525–541.
[24] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel,
P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher,
M. Perrot, E. Duchesnay, Scikit-learn: Machine learning in Python, Journal of Machine
Learning Research 12 (2011) 2825–2830.
[25] H. Nori, S. Jenkins, P. Koch, R. Caruana, Interpretml: A unified framework for machine
learning interpretability, arXiv preprint arXiv:1909.09223 (2019).
[26] B. Kovalerchuk, N. Neuhaus, Toward eficient automation of interpretable machine
learning, in: 2018 IEEE International Conference on Big Data (Big Data), 2018, pp. 4940–4947.
doi:10.1109/BigData.2018.8622433.
[27] A. Holzinger, A. Carrington, H. Müller, Measuring the quality of explanations: the system
causability scale (scs), KI-Künstliche Intelligenz 34 (2020) 193–198.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>K.</given-names>
            <surname>Grace</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Salvatier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dafoe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <article-title>When will ai exceed human performance? evidence from ai experts</article-title>
          ,
          <source>Journal of Artificial Intelligence Research</source>
          <volume>62</volume>
          (
          <year>2018</year>
          )
          <fpage>729</fpage>
          -
          <lpage>754</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Safdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zafar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Zafar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. F.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <article-title>Machine learning based decision support systems (dss) for heart disease diagnosis: a review</article-title>
          ,
          <source>Artificial Intelligence Review</source>
          <volume>50</volume>
          (
          <year>2018</year>
          )
          <fpage>597</fpage>
          -
          <lpage>623</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Doshi-Velez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <article-title>Towards a rigorous science of interpretable machine learning</article-title>
          ,
          <year>2017</year>
          . URL: https://arxiv.org/abs/1702.08608. doi:
          <volume>10</volume>
          .48550/ARXIV.1702.08608.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Rudin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Semenova</surname>
          </string-name>
          ,
          <string-name>
            <surname>C. Zhong,</surname>
          </string-name>
          <article-title>Interpretable machine learning: Fundamental principles and 10 grand challenges</article-title>
          ,
          <source>Statistics Surveys</source>
          <volume>16</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>85</lpage>
          . URL: https://doi.org/10.1214/21-
          <fpage>SS133</fpage>
          . doi:
          <volume>10</volume>
          .1214/21-
          <fpage>SS133</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>C. Molnar,</surname>
          </string-name>
          <article-title>Interpretable machine learning</article-title>
          ,
          <source>Lulu.com</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>I.</given-names>
            <surname>Lage</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Narayanan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. J.</given-names>
            <surname>Gershman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Doshi-Velez</surname>
          </string-name>
          ,
          <article-title>Human evaluation of models built for interpretability</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Human Computation and Crowdsourcing</source>
          <volume>7</volume>
          (
          <year>2019</year>
          )
          <fpage>59</fpage>
          -
          <lpage>67</lpage>
          . URL: https://ojs.aaai.org/index. php/HCOMP/article/view/5280.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G. A.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <article-title>The magical number seven, plus or minus two: Some limits on our capacity for processing information</article-title>
          .,
          <source>Psychological review 63</source>
          (
          <year>1956</year>
          )
          <fpage>81</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Kotthof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanschoren</surname>
          </string-name>
          ,
          <source>Automated machine learning: methods, systems, challenges</source>
          , Springer Nature,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Bishop</surname>
          </string-name>
          ,
          <source>Pattern Recognition and Machine Learning</source>
          , Springer,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Micci-Barreca</surname>
          </string-name>
          ,
          <article-title>A preprocessing scheme for high-cardinality categorical attributes in classification and prediction problems</article-title>
          ,
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>3</volume>
          (
          <year>2001</year>
          )
          <fpage>27</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>J. Van den Bossche</surname>
          </string-name>
          , J. De Bock, J. De Brabanter,
          <article-title>Eficient categorical variable encoding for multiclass classification</article-title>
          ,
          <source>Machine Learning and Knowledge Extraction</source>
          <volume>1</volume>
          (
          <year>2015</year>
          )
          <fpage>101</fpage>
          -
          <lpage>121</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lichman</surname>
          </string-name>
          ,
          <article-title>Machine learning in r: using caret</article-title>
          ,
          <source>Journal of Statistical Software</source>
          <volume>58</volume>
          (
          <year>2013</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Setiono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Feature selection in knowledge discovery and data mining</article-title>
          ,
          <source>Data Mining and Knowledge Discovery</source>
          <volume>2</volume>
          (
          <year>1996</year>
          )
          <fpage>359</fpage>
          -
          <lpage>394</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <article-title>Random search for hyper-parameter optimization</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>13</volume>
          (
          <year>2012</year>
          )
          <fpage>281</fpage>
          -
          <lpage>305</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Feurer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Klein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Eggensperger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Springenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <article-title>Eficient and robust automated machine learning</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>28</volume>
          (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>C.</given-names>
            <surname>Thornton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Hutter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Hoos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Leyton-Brown</surname>
          </string-name>
          ,
          <article-title>Auto-weka: Combined selection and hyperparameter optimization of classification algorithms</article-title>
          ,
          <source>in: Proceedings of the 19th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>847</fpage>
          -
          <lpage>855</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Auto-keras: An eficient neural architecture search system</article-title>
          ,
          <source>in: Proceedings of the 25th ACM SIGKDD international conference on knowledge discovery &amp; data mining</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1946</fpage>
          -
          <lpage>1956</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Komer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Bergstra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Eliasmith</surname>
          </string-name>
          ,
          <article-title>Hyperopt-sklearn: automatic hyperparameter con-</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>