<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Sketch2Code: Automatic hand-drawn UI elements detection with Faster R-CNN</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aleš Zita</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lukáš Picek</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonín Říha</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Czech Academy of Sciences, Institute of Information Theory and Automation</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Cybernetics, Faculty of Applied Sciences, University of West Bohemia</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Information Technology, Czech Technical University</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Faculty of Mathematics and Physics, Charles University</institution>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Submission Overall Precision</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Transcription of User Interface (UI) elements hand drawings to the computer code is a tedious and repetitive task. Therefore, a need arose to create a system capable of automating such process. This paper describes a deep learning-based method for hand-drawn user interface elements detection and localization. The proposed method scored 1st place in the ImageCLEFdrawnUI competition while achieving an overall precision of 0.9708. The final method is based on Faster R-CNN object detector framework with ResNet-50 backbone architecture trained with advanced regularization techniques. The code has been made available at: https://github.com/picekl/ImageCLEF2020-DrawnUI.</p>
      </abstract>
      <kwd-group>
        <kwd>Web Design</kwd>
        <kwd>Object Detection</kwd>
        <kwd>Convolutional Neural Networks</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Computer Vision</kwd>
        <kwd>User Interface</kwd>
        <kwd>Deep Learning</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>1.1</p>
      <sec id="sec-1-1">
        <title>Motivation</title>
        <p>The main motivation for this task is to simplify the process of websites creation
by enabling people to create websites by drawing UI elements on a whiteboard
or on a piece of paper to make the web page building process more accessible.
In this context, the detection and recognition of hand drawn UI elements task
addresses the problem of automatically transcribing the UI to computer code.
The complete dataset consists of 1,000 hand drawn templates captured multiple
times with diferent cameras, resulting in 2,950 high-resolution images. These
data were further randomly split into 2,363 training and 587 test images. The
training part includes 65,993 UI elements belonging to 21 classes. All images
were annotated with bounding boxes and class labels by human experts. More
detailed class distribution description is listed in Table 1. Example images are
depicted in Figure 1.
1.3</p>
      </sec>
      <sec id="sec-1-2">
        <title>Solution</title>
        <p>
          The proposed solution is based on utilization of a standard object detection
network architecture and coherent data preparation and augmentation. In
particular, the Faster R-CNN [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] framework with the ResNet-50 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] feature
extractor was used. The system was implemented and fine-tuned using TensorFlow
Object Detection API1 [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] from publicly available checkpoints. All networks in
our experiments shared the optimizer settings - RMSProp [13] with momentum
of 0.9. The initial architecture was based on our work [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] submitted to
ImageCLEFcoral competition [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. This included for instance the data augmentation
methods or Accumulated Gradient Normalization technique [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. During our
followup research, we considered and tested several approaches including new data
synthesis, diferent network architectures as well as network ensemble variants.
1 https://github.com/tensorflow/models/blob/master/research/object_detection
Dataset splitting for validation - To create a set for continuous network
performance evaluation, the provided dataset needed to be split into training and
validation sets. After careful examination of the content, it became apparent that
a random split of the dataset could cause discrepancies between the validation
and training sets performances. The reason being, that less frequent classes could
end up not having comparable representations in both the training and validation
sets. Therefore the split had to be carefully engineered and resulted in the final
approximate ratio of 11:1 for training and validation sets, respectively.
Data distribution - To better understand the problem at hand, we have
performed a frequency analysis on UI element type distribution and concluded,
that some of the element types are represented by very few occurrences in the
training dataset, namely the ’stepper input’, ’text area’ or ’table’ (See Table 1).
Reviewing the training dataset further revealed that it contains multiple images
of the same drawings. This is caused by the fact that the whole dataset (training
and testing) consists of 2,950 images of only 1,000 templates, i.e., the templates
were each captured by several diferent cameras. Following the random splitting
of the dataset to the training and testing part caused some rarer elements to
go to the training set multiple times and others not at all. This worsens the
uneven distribution of the UI element classes in such a way that, for example,
the rarest element is contained only on two templates (6 images) in the training
dataset. For the deep network to learn to recognize such an element, a much
higher number of examples is needed.
        </p>
        <p>Synthetic dataset - To compensate for the uneven distribution of the UI
element types, we decided to expand the training dataset with synthetic data
containing such elements. The data were generated using augmentations of
segmented UI elements, which were consequently pasted on random size paper of
very light random color. The augmentation consisted mainly of constrained
random afine transformations. We have added 500 synthetically generated images
with the least frequent classes. Examples of the synthetic data are depicted in
Figure 2. UI element classes which were artificially added are: datepicker, rating,
slider, textarea, table and stepperinput.</p>
        <p>
          The experiment performed with ResNet-50 backbone and grayscale data with
1000 × 1000 input size was evaluated over the validation set and showed
interesting improvement in all measured scores on RGB images. Specifically, mean
average precision with Intersection over Union (IoU) greater than 0.5 (mAP0.5)
by 0.0081, mAP by 0.0222, and by Recall@100 (Recall calculated using best
100 detections) 0.0315. Although we were able to flatten the UI elements
distribution curve, the overall performance of the original network was marginally
better on grayscale images.
Data preprocessing - While testing, we experimented with three diferent
approaches to input preprocessing. First, images without any augmentations.
Second, images were converted to grayscale. Third, where grayscale images without
borders around captured drawing and with dilated lines were used. This efect
was achieved by finding contours in the image using OpenCV [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. These contours
were then sorted by area, and the largest of them was used to crop the image.
In some cases, this approach did not work correctly, especially where parts of
the image were covered with a shadow, so this crop was only applied when the
area of the contour was at least 70% of the image. After the crop, we utilized
CLAHE [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] for histogram equalization and applied topological erosion to
pronounce lines on paper. This approach is described as Gray++ in our results.
Refer to Table 2, where it can be seen that Gray++ was worse in our testing,
so our submissions mainly used grayscale images.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>2.2 Object Detection Networks</title>
        <p>
          There are several network architectures that were taken into the consideration
for this task, in particular the Faster R-CNN [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and EficientDet [ 12]. The
initial performance test for each particular network architecture was to train
these networks with recommended configuration. These tests revealed the overall
architecture suitability for the task at hand. The best performance was achieved
with the Faster R-CNN architecture that used comparable backbones. Refer to
Table 4.
        </p>
        <p>
          Next, we needed to decide on the object detection network backbone. We
have tested several widely used backbone architectures, namely the
ResNet50 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], ResNet-101 [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], Inception V2 and Inception-ResNet-V2 [11] (Table 3).
In this competition, the AICrowd platform2 was used to evaluate participants
submissions. Each participating team was allowed to submit up to 10 text files
with detection bounding-boxes in a specific format for each image. We have
created 7 submission using configurations listed below.
        </p>
        <p>
          Baseline configuration - As a baseline for all our experiments we used Faster
R-CNN with ResNet-50 as a backbone. For training we used parameters and
augmentations described in Table 5 and [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], respectively. Finally,
thresholding was used to select only detection with high confidence.
        </p>
        <p>Submission 1 - Baseline experiment trained on RGB images. Tested on
originalsize RGB images. Detection confidence threshold was set to 0.8.</p>
        <p>Submission 2 - Submission 1 trained and tested on the grayscale images.
Submission 3 - Submission 2 trained on whole training set with no data for
validation with confidence threshold of 0.95.</p>
        <p>Submission 4 - Submission 3 trained for 80 epochs.</p>
        <p>Submission 5 - Submission 1 with Inception-ResNet-V2 as backbone. Trained
and tested on grayscale images with confidence threshold of 0.8.</p>
        <p>Submission 6 - Voting ensemble created by combining models used in
Submissions 2, 3 and 5 with confidence threshold of 0.8.</p>
        <p>Submission 7 - Submission 6 with confidence threshold of 0.45.
4</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Competition Results and Discusion</title>
      <p>The oficial ImageCLEFdrawnUI competition results are displayed in Figure 3.
The proposed system achieved the best Overall Precision score of 0.9709 and
outperformed 2 other participating teams as well as the baseline solution
proposed by organizers. The best scoring submission was produced by Mask R-CNN
model with ResNet-50 backbone architecture and input resolution of 1000×1000
trained for 80 epochs with parameters and augmentations described in Table 5
2 https://www.aicrowd.com</p>
      <p>
        Run ID
and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], respectively. The resulting predictions were filtered with confidence
threshold of 0.95 to maximize the oficial metric of mAP.
      </p>
      <p>In our opinion, the winning submission is not the best of our submissions.
According to the widely accepted performance metrics (mAP@0.5 and Recall@0.5),
our Submission 5 (run ID 68003), which scored 3rd place overall, is superior to
the winning submission. It diminishes ImageCLEF Overall Precision only by
0.0144, while it increases mAP@0.5 by 0.111 and Recall@0.5 by 0.074.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Conclusion</title>
      <p>In this paper, we have presented a system for automatic hand-drawn UI
element detection and localization. To achieve this goal, we had to gain a deep
understanding of the provided dataset and perform many experiments to craft
the best data preprocessing and augmentation methods, as well as objectively
adjust the network parameters.</p>
      <p>The final methods were based on the Faster R-CNN detection network with
ResNet-50 used as a backbone architecture. The presented method scored first
place in ImageCLEFdrawnUI competition, with an overall precision of 0.9708.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgements</title>
      <p>Lukáš Picek was supported by the Ministry of Education, Youth and Sports of
the Czech Republic project No. LO1506, and by the grant of the UWB project
No. SGS-2019-027.
11. Szegedy, C., Iofe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
and the impact of residual connections on learning. In: Thirty-first AAAI
conference on artificial intelligence (2017)
12. Tan, M., Pang, R., Le, Q.V.: Eficientdet: Scalable and eficient object detection.</p>
      <p>In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern
Recognition. pp. 10781–10790 (2020)
13. Tieleman, T., Hinton, G.: Lecture 6.5-rmsprop: Divide the gradient by a running
average of its recent magnitude. COURSERA: Neural networks for machine
learning 4(2), 26–31 (2012)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bradski</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The OpenCV Library</article-title>
          .
          <source>Dr. Dobb's Journal of Software Tools</source>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chamberlain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clift</surname>
            ,
            <given-names>L.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , A.:
          <article-title>Overview of the ImageCLEFcoral 2020 task: Automated coral reef image annotation</article-title>
          .
          <source>In: CLEF2020 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fichou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berari</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brie</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dogariu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ştefan</surname>
            ,
            <given-names>L.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Constantin</surname>
            ,
            <given-names>M.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Overview of ImageCLEFdrawnUI 2020: The Detection and Recognition of Hand Drawn Website UIs Task</article-title>
          .
          <source>In: CLEF2020 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          -25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noordhuis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wesolowski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyrola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tulloch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Accurate, large minibatch sgd: Training imagenet in 1 hour</article-title>
          .
          <source>arXiv preprint arXiv:1706.02677</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rathod</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korattikara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fathi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wojna</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>Speed/accuracy trade-ofs for modern convolutional object detectors</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          . pp.
          <fpage>7310</fpage>
          -
          <lpage>7311</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Péteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DemnerFushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ştefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in lifelogging, medical, nature, and internet applications</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), vol.
          <volume>12260</volume>
          .
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          - 25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Picek</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Říha</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zita</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Coral reef annotation, localisation and pixel-wise classification using mask-rcnn and bag of tricks</article-title>
          .
          <source>In: CLEF (Working Notes)</source>
          . CEURWS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          , Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          -25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pizer</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amburn</surname>
            ,
            <given-names>E.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Austin</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cromartie</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Geselowitz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Greer</surname>
            , T., ter Haar Romeny,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zimmerman</surname>
            ,
            <given-names>J.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zuiderveld</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Adaptive histogram equalization and its variations</article-title>
          .
          <source>Computer vision</source>
          , graphics,
          <source>and image processing 39(3)</source>
          ,
          <fpage>355</fpage>
          -
          <lpage>368</lpage>
          (
          <year>1987</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <string-name>
            <surname>Faster</surname>
          </string-name>
          r-cnn:
          <article-title>Towards real-time object detection with region proposal networks</article-title>
          . In: Cortes,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.D.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.D.</given-names>
            ,
            <surname>Sugiyama</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Garnett</surname>
          </string-name>
          ,
          <string-name>
            <surname>R</surname>
          </string-name>
          . (eds.)
          <source>Advances in Neural Information Processing Systems</source>
          <volume>28</volume>
          , pp.
          <fpage>91</fpage>
          -
          <lpage>99</lpage>
          . Curran Associates, Inc. (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>