<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Hyperparameter Optimization in Keras for the MediaEval 2018 Medico Multimedia Task</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Rune Johan Borgli, Pål Halvorsen, Michael Riegler, Håkon Kvale Stensland Simula Research Laboratory</institution>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <fpage>29</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>This paper details the approach to the MediaEval 2018 Medico Multimedia Task made by the Rune team. The decided upon approach uses a work-in-progress hyperparameter optimization system called Saga. Saga is a system for creating the best hyperparameter finding in Keras [5], a popular machine learning framework, using Bayesian optimization and transfer learning [3]. In addition to optimizing the Keras classifier configuration, we try manipulating the dataset by adding extra images in a class lacking in images and splitting a commonly misclassified class into two classes.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        We made the following approach as a submission to the Mediaeval
2018 Medico Multimedia Task. The task contains several sub-tasks,
but we have focused solely on the detection sub-task. Information
about the given dataset, task, and evaluation are described in the
Medical Multimedia Task overview paper [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In short, the task is
to create a classifier for a 16-class dataset containing a total of 5277
images from the medical domain of the digestive system.
      </p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>
        Our approach is a two-split approach. First, our main contribution is
to use automatic hyperparameter optimization building on Borgli’s
thesis [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to find a hyperparameter configuration in Keras for our
classifier submission. Second, we tried a few alterations of the
dataset to see if we could improve the classifier performance further.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Hyperparameter optimization</title>
      <p>
        Automatic hyperparameter optimization was done using an early,
unpublished, work-in-progress system called Saga. Saga is a tool
with a web-interface providing developers of image-based machine
learning applications an easy, customizable work-flow for creating
well-performing classifiers for image data. The user only needs
to supply the training and validation dataset. Everything else is
provided by either a Keras [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or a Pytorch [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] back-end. We split
the system into three: (1) preprocessing of the dataset and a metric
for the predicted efectiveness of the dataset, (2) running any valid
configuration of hyperparameters or hyperparameter optimizations
available in Keras or Pytorch for training of the classifier on a
given dataset, and (3) visualization and analysis of the training
and its outcome. For this submission, Keras was used together with
TensorFlow [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] with no preprocessing of the dataset and no analysis
of the results as we have not implemented the implementation of
Pytorch, preprocessing, and analysis as of writing this paper.
      </p>
      <p>
        Due to the nature of a working paper, we will only briefly
describe the system. More details about how we implemented the
hyperparameter optimization can be found in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Our target domain is image datasets. Convolutional Neural
Networks (CNNs) are types of machine learning models that excel at
classifying images. However, these types of models require enough
training data to avoid generalization issues when training. The
provided dataset for the Medico Multimedia Task [
        <xref ref-type="bibr" rid="ref11 ref12">11, 12</xref>
        ] contains
about 5000 images which are a low number compared to common
benchmark datasets such as ImageNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. To accommodate for this,
we use transfer learning. Transfer learning is a technique where,
instead of training a model from scratch, we pre-train the model
using a diferent dataset in a similar domain, then fine-tune the
model to our dataset. The idea behind transfer learning is to transfer
relevant knowledge learned from the pre-training instead of
learning it from scratch. Transfer learning works for diferent datasets
because they often share basic image features, and has the added
benefit that training is significantly faster. Achieving this is easy
in Keras as we can use models pre-trained on ImageNet available
from the framework.
      </p>
      <p>
        We use Bayesian optimization for the automatic hyperparameter
optimization. Bayesian optimization uses a surrogate function to
map the function we try to optimize. Based on sequential
observations, where one observation is a training run with hyperparameters
chosen by the optimization, Bayesian optimization fits the
surrogate function to the function to optimize. An acquisition function
decides where in the search space to try the next observation based
on a balance between exploration and exploitation. Exploration
tries to explore the whole search space, and exploitation tries to
slightly adjust the observations in those parts of the search space
where results are good. Bayesian optimization can optimize any
number and type of hyperparameters, but observations are costly,
so we limit the dimensionality and size of the search space. We use
a framework called GPyOpt for our implementation of Bayesian
optimization [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and use default parameters for the optimization
function. The optimization is done after a given number of
observations, and the best hyperparameters are the ones from the
observation achieving the highest validation accuracy.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Dataset manipulation</title>
      <p>Besides the hyperparameters, the performance of the classifier is
dependent on the training set. Therefore, to try to improve our
classifier’s performance, we made two separate tweaks to the dataset.
First, based on an observation that the esophagitis dataset contains
images of both the upper and middle esophagus and the diseased
z-line, we wanted to see if extracting the esophagitis z-line
images in a separate class could increase the detection rate between
Model
esophagitis and normal z-line. The idea behind this is that the
classifier could become "confused" by seeing very diferent images of
the same class. Secondly, we added images to the out-of-patient
class to increase the performance. These images were of medical
equipment in medical rooms.
3</p>
    </sec>
    <sec id="sec-5">
      <title>RESULTS AND DISCUSSION</title>
      <p>The hyperparameters we optimized were the model, the gradient
descent optimizer, the learning rate and the delimiting layer. First,
we ran automatic hyperparameter optimization for the gradient
descent optimizer and delimiting layer for each model to have an
initial map of the performance of each model. The results are listed
in table 1. Furthermore, we picked the best performing
hyperparameter configuration and ran a new optimization only optimizing
the learning rate for the first and second step of the transfer
learning. The configuration was model DenseNet169, gradient descent
optimizer SGD, and delimiting layer of 0, meaning we fine tune
all of the layers. The best classifier’s learning rate for the block
optimization ended on 0.0001, and the fine tuning ended on 0.00067,
with a validation accuracy of 0.954. The second best classifier’s
learning rate for the block optimization ended on 0.000046, and the
ifne tuning ended on 0.004, with a validation accuracy of 0.952.</p>
      <p>After hyperparameter optimization, we trained classifiers on a
dataset where esophagitis images of the z-line were extracted into
a separate class. We trained the classifiers using DenseNet169, SGD,
a delimiting layer of 0, and a learning rate default to SGD, which is
0.01. The results for the best classifier were a validation accuracy
of 0.928 and for the second best classifier a validation accuracy of
0.925. For the classifier where we added more images to the out of
patient class, we ran the same classifier setup and got a validation
accuracy of 0.925.</p>
      <p>All submissions can be found with their resulting Matthews
correlation coeficient (MCC) score in figure 2. For both the
hyperparameter optimization and the split class, we see that the second
best submission performed better. This observation indicates that
the best runs overfitted on the validation set. For future
experiments, measures such as splitting into a third test set or using
k-fold cross-validation should be applied to the results to avoid this
issue.</p>
      <p>Lastly, table 3 shows the confusion matrix for the best result on
the test set. We can see that many of the classes have very high
R. J. Borgli et al.
accuracy, and a few pairs of classes have many misclassifications
between them. We speculate that this is due to the nature of the
dataset where several classes are very diferent while others are
very similar. An example of misclassification and the similarities
between classes can be observed in figure 1.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Martín</given-names>
            <surname>Abadi</surname>
          </string-name>
          , Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis,
          <string-name>
            <given-names>Jefrey</given-names>
            <surname>Dean</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Matthieu</given-names>
            <surname>Devin</surname>
          </string-name>
          , Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geofrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke,
          <string-name>
            <given-names>Yuan</given-names>
            <surname>Yu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Xiaoqiang</given-names>
            <surname>Zheng</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <source>TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems</source>
          . (
          <year>2015</year>
          ). https://www.tensorflow.org/ Software available from tensorflow.
          <source>org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] The GPyOpt authors</article-title>
          .
          <year>2016</year>
          .
          <article-title>GPyOpt: A Bayesian Optimization framework in python</article-title>
          . http://github.com/ShefieldML/GPyOpt. (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Rune</given-names>
            <surname>Johan Borgli</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Hyperparameter optimization using Bayesian optimization on transfer learning for medical image classification</article-title>
          . (
          <year>2018</year>
          ).
          <source>Master thesis</source>
          at University of Oslo. https://www.duo.uio.no/ handle/10852/64146.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>François</given-names>
            <surname>Chollet</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Xception: Deep Learning with Depthwise Separable Convolutions</article-title>
          .
          <source>CoRR abs/1610</source>
          .02357 (
          <year>2016</year>
          ). arXiv:
          <volume>1610</volume>
          .02357 http://arxiv.org/abs/1610.02357
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>François</given-names>
            <surname>Chollet</surname>
          </string-name>
          and others.
          <source>2015</source>
          . Keras. https://keras.io. (
          <year>2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Kaiming</given-names>
            <surname>He</surname>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Deep Residual Learning for Image Recognition</article-title>
          .
          <source>CoRR abs/1512</source>
          .03385 (
          <year>2015</year>
          ). arXiv:
          <volume>1512</volume>
          .03385 http://arxiv.org/abs/1512.03385
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Gao</given-names>
            <surname>Huang</surname>
          </string-name>
          , Zhuang Liu, Laurens van der Maaten, and
          <string-name>
            <surname>Kilian Q Weinberger</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Densely connected convolutional networks</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision</source>
          and Pattern Recognition.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jia</given-names>
            <surname>Deng</surname>
          </string-name>
          , Wei Dong, Richard Socher,
          <string-name>
            <surname>Li-Jia</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kai</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <year>2009</year>
          .
          <article-title>ImageNet: A large-scale hierarchical image database</article-title>
          .
          <source>In 2009 IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          . https://doi.org/10.1109/CVPRW.
          <year>2009</year>
          .5206848
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Diederik</surname>
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Kingma</surname>
            and
            <given-names>Jimmy</given-names>
          </string-name>
          <string-name>
            <surname>Ba</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Adam: A Method for Stochastic Optimization</article-title>
          .
          <source>CoRR abs/1412</source>
          .6980 (
          <year>2014</year>
          ). arXiv:
          <volume>1412</volume>
          .6980 http://arxiv.org/abs/1412.6980
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Adam</surname>
            <given-names>Paszke</given-names>
          </string-name>
          , Sam Gross, Soumith Chintala, Gregory Chanan, Edward Yang,
          <string-name>
            <surname>Zachary</surname>
            <given-names>DeVito</given-names>
          </string-name>
          , Zeming Lin, Alban Desmaison, Luca Antiga, and
          <string-name>
            <given-names>Adam</given-names>
            <surname>Lerer</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Automatic diferentiation in PyTorch</article-title>
          .
          <source>In Conference on Neural Information Processing Systems</source>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Konstantin</surname>
            <given-names>Pogorelov</given-names>
          </string-name>
          , Kristin Ranheim Randel, Thomas de Lange, Sigrun Losada Eskeland, Carsten Griwodz, Dag Johansen, Concetto Spampinato, Mario Taschwer, Mathias Lux, Peter Thelin Schmidt,
          <string-name>
            <given-names>Michael</given-names>
            <surname>Riegler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Pål</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Nerthus: A Bowel Preparation Quality Video Dataset</article-title>
          .
          <source>In Proceedings of the 8th ACM on Multimedia Systems Conference. ACM</source>
          ,
          <volume>170</volume>
          -
          <fpage>174</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Konstantin</surname>
            <given-names>Pogorelov</given-names>
          </string-name>
          , Kristin Ranheim Randel, Carsten Griwodz, Sigrun Losada Eskeland, Thomas de Lange, Dag Johansen, Concetto Spampinato,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen, Mathias Lux, Peter Thelin Schmidt,
            <given-names>Michael</given-names>
          </string-name>
          <string-name>
            <surname>Riegler</surname>
            , and
            <given-names>Pål</given-names>
          </string-name>
          <string-name>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Kvasir: A MultiClass Image Dataset for Computer Aided Gastrointestinal Disease Detection</article-title>
          .
          <source>In Proceedings of the 8th ACM on Multimedia Systems Conference. ACM</source>
          ,
          <volume>164</volume>
          -
          <fpage>169</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Konstantin</surname>
            <given-names>Pogorelov</given-names>
          </string-name>
          , Michael Riegler, Pål Halvorsen, Thomas de Lange, Kristin Ranheim Randel,
          <string-name>
            <surname>Duc-Tien</surname>
            Dang-Nguyen,
            <given-names>Mathias</given-names>
          </string-name>
          <string-name>
            <surname>Lux</surname>
            , and
            <given-names>Olga</given-names>
          </string-name>
          <string-name>
            <surname>Ostroukhova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Medico Multimedia Task at MediaEval 2018</article-title>
          .
          <article-title>In CEUR Proceeding of the MediaEval Benchmark</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          .
          <source>CoRR abs/1409</source>
          .1556 (
          <year>2014</year>
          ). arXiv:
          <volume>1409</volume>
          .1556 http://arxiv.org/abs/1409.1556
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Christian</surname>
            <given-names>Szegedy</given-names>
          </string-name>
          , Sergey Iofe, and
          <string-name>
            <given-names>Vincent</given-names>
            <surname>Vanhoucke</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning</article-title>
          .
          <source>CoRR abs/1602</source>
          .07261 (
          <year>2016</year>
          ). arXiv:
          <volume>1602</volume>
          .07261 http://arxiv.org/abs/1602.07261
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Christian</surname>
            <given-names>Szegedy</given-names>
          </string-name>
          , Vincent Vanhoucke, Sergey Iofe, Jonathon Shlens, and
          <string-name>
            <given-names>Zbigniew</given-names>
            <surname>Wojna</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Rethinking the Inception Architecture for Computer Vision</article-title>
          . CoRR abs/1512.00567 (
          <year>2015</year>
          ). arXiv:
          <volume>1512</volume>
          .00567 http://arxiv.org/abs/1512.00567
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>T.</given-names>
            <surname>Tieleman</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <source>2012. Lecture 6</source>
          .5
          <article-title>-RmsProp: Divide the gradient by a running average of its recent magnitude</article-title>
          .
          <source>COURSERA: Neural Networks for Machine Learning</source>
          . (
          <year>2012</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Matthew</surname>
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Zeiler</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>ADADELTA: An Adaptive Learning Rate Method</article-title>
          .
          <source>CoRR abs/1212</source>
          .5701 (
          <year>2012</year>
          ). arXiv:
          <volume>1212</volume>
          .5701 http://arxiv.org/ abs/1212.5701
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>