<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Coral Reef annotation, localisation and pixel-wise classification using Mask R-CNN and Bag of Tricks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Lukáš Picek</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonín Říha</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aleš Zita</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Cybernetics, Faculty of Applied Sciences, University of West Bohemia</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Faculty of Information Technology, Czech Technical University</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Faculty of Mathematics and Physics, Charles University</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>The Czech Academy of Sciences, Institute of Information Theory and Automation</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>This article describes an automatic system for detection, classification and segmentation of individual coral substrates in underwater images. The proposed system achieved the best performances in both tasks of the second edition of the ImageCLEFcoral competition. Specifically, mean average precision with Intersection over Union (IoU) greater then 0.5 (mAP@0.5) of 0.582 in case of Coral reef image annotation and localisation, and mAP@0.5 of 0.678 in Coral reef image pixel-wise parsing. The system is based on Mask R-CNN object detection and instance segmentation framework boosted by advanced training strategies, pseudo-labeling, test-time augmentations, and Accumulated Gradient Normalisation. To support future research, code has been made available at: https://github.com/picekl/ImageCLEF2020-DrawnUI.</p>
      </abstract>
      <kwd-group>
        <kwd>Deep Learning</kwd>
        <kwd>Computer Vision</kwd>
        <kwd>Instance Segmentation</kwd>
        <kwd>Convolutional Neural Networks</kwd>
        <kwd>Machine Learning</kwd>
        <kwd>Object Detection</kwd>
        <kwd>Corals</kwd>
        <kwd>Biodiversity</kwd>
        <kwd>Conservation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The ImageCLEFcoral [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] challenge was organized in conjunction with the
ImageCLEF 2020 evaluation campaign [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] at the Conference and Labs of the
Evaluation Forum (CLEF1). The main goal for this competition was to create
such an algorithm or system that can automatically detect and annotate a variety
of benthic substrate types over image collections taken from multiple coral reefs
as part of a coral reef monitoring project with the Marine Technology Research
Unit at the University of Essex.
Live corals are an important biological class that has a massive contribution to
the ocean ecosystem biodiversity. Corals are key habitat for thousands of marine
species [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and provide an essential source of nutrition and yield for people in
the developing countries [
        <xref ref-type="bibr" rid="ref2 ref3">3,2</xref>
        ]. Therefore, automatic monitoring of coral reefs
condition plays a crucial part in understanding future threats and prioritizing
conservation eforts.
1.2
      </p>
      <sec id="sec-1-1">
        <title>Datasets</title>
        <p>This section will briefly describe the provided data and their subsets: an
annotated dataset that contains 440 images, and a testing dataset with 400 images
without annotations. Additionally, we introduce an precisely engineered
training/validation split of the annotated dataset for the training purposes.
Annotated dataset - The annotated dataset is a combination of 440 images
containing 12,082 individual coral objects. Each coral was annotated with expert
level knowledge, including segmentation mask, bounding box, and class that
represents 1 out of 13 substrate types. The dataset is heavily unbalanced (refer
to Table 1), having almost 50% of objects from a single class (Soft Coral) and
approximately 8% for the eight least frequent classes. Moreover, images have
diferent colour variations, are heavily blurred, and came from diferent locations
and geographical regions. Furthermore, coral substrates belonging to the same
class can be observed in diferent morphology, colour variations, or patterns.
Finally, some images contain a measurement tape that partially covers objects
of interest.</p>
        <p>For the network training process evaluation, the annotated dataset needed to
be divided into two parts. One used for network optimization and the second for
network performance validation. To create these subsets, every tenth image was
designated for validation set, the rest was used for training. As the validation
set class distribution did not match the training one, particular images from the
validation set needed to be replaced by carefully cherry-picked images from the
training set. This resulted in an almost perfect split with similar distributions for
both, the training and the validation set. This similarity ensured a representative
validation process.</p>
        <p>
          Testing dataset - The testing dataset contains 400 images from four diferent
locations. Namely, the same location as is in the training set, similar location
to the training set, geographically similar location to the training set, and
geographically distinct location from the training set.
The proposed object detection and instance segmentation system extends recent
state-of-the-art Convolutional Neural Network (CNN) object detection
framework (Mask R-CNN [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]) with additional Bag of Tricks that considerably
increased the performance. The TensorFlow Object Detection API2 [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] was used
as a deep learning framework for fine-tuning the publicly available checkpoints.
All bells and whistles are further described in Section 2. Additionally, approaches
that did not contribute positively but could have some potential for future
editions of the ImageCLEFcoral competition are discussed.
2 https://github.com/tensorflow/models/blob/master/research/object_detection
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>This section describes all approaches and techniques used in the benthic
substrate detection, annotation and segmentation tasks. The modern object
detection and instance segmentation methods are summarized, followed by the
description of the chosen system and its configuration. Furthermore, all the used
bells and whistles (Bag of Tricks) are introduced and described.
2.1</p>
      <sec id="sec-2-1">
        <title>Object Detection</title>
        <p>
          Although conventional digital image processing methods are capable of detecting
particular local features, modern object detectors based on Deep Convolutional
Neural Networks (DCNN) achieve superior performance in object detection and
instance segmentation tasks. Several network architectures were pre-selected
based on study published by Huang et al. [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], namely the Faster R-CNN [18],
SSD [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and Mask R-CNN [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The initial performance experiment was to train
these detection frameworks with default or recommended configurations. This
experiment revealed the most suitable framework for both the tasks within the
ImageCLEFcoral competition - the Mask R-CNN.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Network parameters</title>
        <p>Experiments on the validation set, reveled the best optimizer settings for the
framework. These settings were shared between all of our experiments, unless
stated otherwise. For detailed description refer to Table 2.
Augmentations - The provided dataset contains 440 images. Considering that
44 were used for validation, 396 images is too few for robust network
optimization. To alleviate this issue, multiple data augmentation techniques were utilized.
The following methods were included in the final training pipeline:
Colour Distortions - Brightness variations with max delta of 0.2, contrast
and saturation variations scale each by random value in range of 0.8 - 1.25,
hue variations ofsets by random value of up to 0.02, and random RGB to
grayscale conversion with 10% probability.</p>
        <p>Image Flips - Random horizontal and vertical flip, and 90 degree rotations.</p>
        <p>Each with 50% chance.</p>
        <p>Random Jitter - Every bounding box corner can be randomly shifted by
amount corresponding up to 2% of the bounding box width and height in x
and y coordinates, respectively.</p>
        <p>
          Cut Out [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] - Random black square patches are added into the image. More
precisely, add up to 10 patches with 50% occurrence probability and each
with side length corresponding to 10% of the image height or width, whichever
is smaller.
        </p>
        <p>By utilizing techniques mentioned above, we have increased the model mAP@0.5
performance by 0.0392 as measured on the validation set.</p>
        <p>Input Resolution - In the task of object detection, primarily where a small
object occurs, input resolution plays a crucial role. Theoretically, the higher the
resolution is, the more objects will be detected. Unfortunately, the detection of
high resolution images is GPU memory-limited. Hence, it always is a trade-of
between performance and hardware requirements.</p>
        <p>
          Backbone - To find the best backbone architecture for Mask R-CNN
framework. We performed an experiment over 3 diferent backbone models including
ResNet-50 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], ResNet-101 [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], and Inception-ResNet-V2 [20]. Detailed
performance comparison is included in Table 3.
Pseudo Labels - Performance of DCNN’s heavily depends on the size of the
training set. To facilitate this issue, we have developed a naive pseudo-labelling
approach inspired by [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. In short, already trained network is used to label the
unlabelled testing data with so-called weak labels. Only the overconfident
detections were used; the rest of the image was blurred out. Even though there
is a high chance of overfitting to incorrect pseudo-labels due to the
confirmation bias, pseudo-labels can significantly improve the performance of the CNN
if pseudo-labelled images are added sensitively.
        </p>
        <p>
          Transfer Learning - Big-transfer [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] or transfer learning is a fine-tuning
technique commonly used in deep learning. Rather then initialize the weights
of neural network randomly, pretrained weights are used. Furthermore, final
model could benefit from similar domain weights. To evaluate a potential of
such approach for the purposes of this competition, we experimented with
finetuning of the publicly available checkpoints, including ImageNet3, iNaturalist3,
COCO [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], PlantCLEF2018 [19] and PlanCLEF2019 [17]. The idea was that
ifne-tuning checkpoints trained on nature-oriented datasets would outperform
the non-nature oriented ones. One could assume, that this is caused by significant
diference when compared to other domains. Based on that it has been decided
to use the COCO pretrained checkpoint which includes both the backbone and
region proposed weights.
Test Time Augmentations - Test time augmentation is a method of
applying transformations on a given image to generate its several slightly diferent
variations that are used to create predictions that, when combined, can improve
ifnal prediction. Our submissions utilized augmentations consisting of simple
horizontal and vertical flips of the image. Their combinations produced four sets
of detections for each image. These sets were then joined using voting strategy
described in [16] by Moshkov et al..
        </p>
        <p>
          Ensembles - Ensemble methods combine predictions from multiple models to
obtain final output [ 21]. These methods can be used to improve accuracy in
machine learning tasks. In our work, we utilize a simple method for combining
outputs from multiple detection networks based on voting [16]. Detections
describing one object are grouped together by size of the overlap region belonging
to the same class. Instances, where majority of the detectors agree on class label
and position are replaced by single detection with the highest score.
Accumulated Gradient Normalization - In order to achieve the best
performance possible, we aimed to maximize the resolution of input data. Therefore,
3 https://github.com/tensorflow/models/blob/master/research/object_detection/
g3doc/tf1_detection_zoo.md
we have decided to train the network on mini-batches of size 1. To overcome
disadvantages that comes with using minimal mini-batch size [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], the Accumulated
Gradient Normalization [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] technique was utilized. This approach resulted in a
considerable performance gain.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Submissions</title>
      <p>For evaluation of the participants submissions, the AICrowd platform4 was used.
Each participating team was allowed to submit up to 10 submission files
following specific requirements for both tasks. We have used allowed maximum for
both tasks. Because we have utilized single architecture for both the detection
and segmentation tasks, multiple submissions were produced using the same
network. Therefore in the following part, we denoted annotation and localisation
task submissions by D and pixel-wise parsing task submissions by S. Finally,
thresholding was used to discard predictions with low confidence.
Baseline configuration - As a baseline for all our experiments we used Mask
R-CNN with ResNet-50 as a backbone. For training we used parameters and
augmentations described in Table 2.2 and Section 2.3, respectively. Input
resolution was 1000 × 1000 pixels.</p>
      <p>Submission 1D/1S - Baseline experiment using a confidence threshold that
corresponded to the best F1 score on our validation dataset (0.58).
Submission 2D - Submission 1D with a fixed programming bug that resulted
in few detections being incorrectly generated.</p>
      <p>Submission 3D - Submission 2D with confidence threshold set to 0.95.
Submission 4D/2S - Baseline configuration that used Pseudo-labels as
described in Section 2.3. The confidence threshold was set to 0.95.</p>
      <p>Submission 5D/3S - Baseline configuration that utilized test time
augmentations as described in Section 2.3 with confidence threshold of 0.9.
Submission 6D/4S - Submission 5D/3S with confidence threshold of 0.999.
Submission 7D/5S - Ensemble of two checkpoints of baseline configuration
model. Taken after 40 epochs and 50 epochs. Confidence threshold of 0.9.
Submission 8D/6S - Submission 7D/5S with confidence threshold of 0.999.
Submission 9D/8S - Submission 7D/5S with test time augmentations and
with confidence threshold of 0.999.</p>
      <p>Submission 10D/10S - Submission 7D/5S with confidence threshold of 0.95.
Submission 7S - Submission 9D/8S with confidence threshold of 0.9.
Submission 9S - Submission 9D/8S with modified voting ensemble. Only one
detection is suficient as opposed to majority voting.
4 https://www.aicrowd.com
00,,78 ,8053 ,0851 ,0825 ,0814 ,0822 ,5707 ,5072 ,4077 ,0792 ,6207 ,7407 ,073 ,9075 ,6080
82 5
5 6
,0 ,50 ,53 7
0 ,051 ,049 ,0457 ,044 ,0439 ,0424 ,0422 ,0415 ,401 ,0405 ,0392 ,9103 ,0388 ,0383 ,0377 ,0396 ,0357 ,0349 ,0473 ,0323 ,0313 ,0303 ,028 ,0263 ,0245 ,4023 ,0233 ,026 ,0274
0
0,9
0,6
0,5
0,4
0,3
0,2
0,1
0
The oficial competition results are shown in Figure 2 for annotation and
localisation task, and in Figure 3 for pixel-wise parsing. Our System achieved the
best performances in both tasks of the second edition of the ImageCLEFcoral
competition. Specifically, mAP@0.5 of 0.582 in case of Coral reef image
annotation and localisation (Run ID 68143), and mAP@0.5 of 0.678 in Coral reef
image pixel-wise parsing (Run ID 67864). Results of all our submissions are
listed in Table 5. Table 6 illustrates the performance over diferent subsets of the
test dataset. The system performed comparably over the Same Location (SL),
Similar Location (SiL) and Geographically Similar Location (GS) subsets. The
performance significantly drops in Geographically Distinct Location (GD). This
is probably caused by a lack of diverse training data.</p>
      <p>The best scoring submission for pixel-wise parsing task was a single Mask
RCNN with ResNet-50 backbone architecture and input resolution of 1000 × 1000.
The system was trained for 50 epochs while using heavy augmentations as
described in Section 2.3. Additionally, the pseudo-labeling (refer to Section 2.3) was
used to increase the training dataset size with overconfident detections from the
test set. Finally, the predictions were filtered with confidence threshold of 0.95
to maximize the oficial mAP metric while still having decent recall score.</p>
      <p>The best scoring submission for annotation and localisation task was an
ensemble of two checkpoints of the same Mask R-CNN model with ResNet-50
backbone architecture and input resolution of 1000×1000, one taken after 40 and
other one after 50 epochs. The system was trained using heavy augmentations.
Furthermore, the predictions were filtered with confidence threshold of 0.999 to
maximize the oficial metric of mAP.</p>
      <p>Run ID
Run ID</p>
      <p>Annotation and localisation task submissions
1D 2D 3D 4D 5D 6D 7D 8D 9D 10D
0.347 0.357 0.439 0.565 0.349 0.530 0.377 0.582 0.517 0.415
0.728 0.712 0.774 0.851 0.709 0.825 0.721 0.853 0.814 0.747</p>
    </sec>
    <sec id="sec-4">
      <title>5 Conclusion and Discussion</title>
      <p>The proposed system designed for automatic pixel-wise detection of 13 coral
substrates achieved impressive mAP@0.5 of 0.582 in localization task and 0.678,
for instance segmentation task of the ImageCLEFcoral competitions. The system
is wrapped up around the Mask R-CNN, the state-of-the-art instance
segmentation framework, and additional known as well as some unique techniques, e.g.,
detection ensemble, test time data augmentations, accumulated gradient
normalization, and pseudo-labelling. Surprisingly, results for pixel-wise parsing are
considerably better. This is unexpected mainly because the test set is the same
for both tasks, and our submissions used the same set of detections. Therefore,
more similar scores were expected. This led us to believe that annotations for
both tasks are not the same.
0,7
0,6
0,5
0,4
0,3
0,2
0,1
0
8
7
6
,
0
2
4
8
,
0
4
6
6
,
0
7
1
8
,
0
3
1
8
,
0
9
2
6
,
0
4
2
6
,
0
7
0
8
,
0
7
1
6
,
0
7
2
7
,
0
5
1
7
,
0
7
0
5
,
0
1
0
7
,
0
8
0
7
,
0
2
7
,
0
7
1
7
,
0
4
9
6
,
0
5
9
6
,
0
9
8
6
,
0
4
9
6
,
0
8
6
6
,
0
2
9
6
,
0
5
7
6
,
0
4
7
4
,
0
7
4
,
0
9
6
4
,
0
3
5
4
,
0
9
4
4
,
0
1
4
4
,
0
5
3
4
,
0
4
3
4
,
0
3
3
4
,
0
4
2
4
,
0
6
1
4
,
0
7
0
4
,
0
9
2
6
,
0
2
3
6
,
0
6
7
3
,
0
1
7
3
,
0
2
0
6
,
0
4
0
3
,
0</p>
      <p>More in-depth performance examination of our submissions revealed a small
regularisation capability related to geographical regions and specific locations.
This is indication that the network could be over-fitted on the training dataset
location, which have specific distribution of coral species. The system could achieve
better performance with class priors corresponding to desired location. If the
location transfer is essential, location generalisation should be main goal for the
future challenges.</p>
      <p>While comparing the model performance with the top results from the
previous edition of this challenge (mAP@0.5 of 0.2427 and 0.0419), our model achieved
superior performance. Even though the test datasets are not identical, such
difference shows the increasing trend of machine learning model performance. This
increase is probably related to a higher number of training images.</p>
      <p>Lastly, due to our GPU memory constraints we were limited to an input
image resolution of 1000 × 1000 combined with ResNet-50 backbone. Conducted
experiments showed that input resolution of 1200 × 1200 and ResNet-101 would
yield better results, therefore usage of GPUs with more memory would lead to
a considerable increase of the system’s performance.</p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>Lukáš Picek was supported by the Ministry of Education, Youth and Sports of
the Czech Republic project No. LO1506, and by the grant of the UWB project
No. SGS-2019-027.
16. Moshkov, N., Mathe, B., Kertesz-Farkas, A., Hollandi, R., Horvath, P.: Test-time
augmentation for deep learning-based cell segmentation on microscopy images.</p>
      <p>Scientific reports 10(1), 1–7 (2020)
17. Picek, L., Sulc, M., Matas, J.: Recognition of the amazonian flora by inception
networks with test-time class prior estimation. In: Working Notes of CLEF 2019
Conference and Labs of the Evaluation Forum (2019)
18. Ren, S., He, K., Girshick, R., Sun, J.: Faster r-cnn: Towards real-time object
detection with region proposal networks. In: Cortes, C., Lawrence, N.D., Lee, D.D.,
Sugiyama, M., Garnett, R. (eds.) Advances in Neural Information Processing
Systems 28, pp. 91–99. Curran Associates, Inc. (2015)
19. Sulc, M., Picek, L., Matas, J.: Plant recognition by inception networks with
testtime class prior estimation. In: Working Notes of CLEF 2018 - Conference and
Labs of the Evaluation Forum (2018)
20. Szegedy, C., Iofe, S., Vanhoucke, V., Alemi, A.A.: Inception-v4, inception-resnet
and the impact of residual connections on learning. In: Thirty-first AAAI
conference on artificial intelligence (2017)
21. Zhang, C., Ma, Y.: Ensemble machine learning: methods and applications. Springer
(2012)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arazo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ortego</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Albert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Connor</surname>
            ,
            <given-names>N.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Pseudolabeling and confirmation bias in deep semi-supervised learning</article-title>
          .
          <source>arXiv preprint arXiv:1908</source>
          .
          <volume>02983</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Birkeland</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Global status of coral reefs: In combination, disturbances</article-title>
          and stressors become ratchets pp.
          <fpage>35</fpage>
          -
          <lpage>56</lpage>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Brander</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rehdanz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tol</surname>
            , R.S., Van Beukering,
            <given-names>P.J.:</given-names>
          </string-name>
          <article-title>The economic impact of ocean acidification on coral reefs</article-title>
          .
          <source>Climate Change Economics</source>
          <volume>3</volume>
          (
          <issue>01</issue>
          ),
          <volume>1250002</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Chamberlain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clift</surname>
            ,
            <given-names>L.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>García Seco de Herrera</surname>
          </string-name>
          , A.:
          <article-title>Overview of the ImageCLEFcoral 2020 task: Automated coral reef image annotation</article-title>
          .
          <source>In: CLEF2020 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org &lt;http://ceur-ws.
          <source>org&gt;</source>
          (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Coker</surname>
            ,
            <given-names>D.J.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <given-names>S.K.</given-names>
            ,
            <surname>Pratchett</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.S.:</surname>
          </string-name>
          <article-title>Importance of live coral habitat for reef fishes</article-title>
          .
          <source>Reviews in Fish Biology and Fisheries</source>
          <volume>24</volume>
          (
          <issue>1</issue>
          ),
          <fpage>89</fpage>
          -
          <lpage>126</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. DeVries,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Taylor</surname>
          </string-name>
          , G.W.:
          <article-title>Improved regularization of convolutional neural networks with cutout</article-title>
          .
          <source>arXiv preprint arXiv:1708.04552</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noordhuis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wesolowski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyrola</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tulloch</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jia</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Accurate, large minibatch sgd: Training imagenet in 1 hour</article-title>
          .
          <source>arXiv preprint arXiv:1706.02677</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkioxari</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
          </string-name>
          , R.:
          <string-name>
            <surname>Mask</surname>
          </string-name>
          r-cnn.
          <source>In: The IEEE International Conference on Computer Vision</source>
          (ICCV) (
          <year>Oct 2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In: The IEEE Conference on Computer Vision and Pattern Recognition (CVPR)</source>
          (
          <year>June 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Hermans</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spanakis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Möckel</surname>
          </string-name>
          , R.:
          <article-title>Accumulated gradient normalization</article-title>
          .
          <source>arXiv preprint arXiv:1710.02368</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rathod</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korattikara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fathi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wojna</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , et al.:
          <article-title>Speed/accuracy trade-ofs for modern convolutional object detectors</article-title>
          .
          <source>In: Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          . pp.
          <fpage>7310</fpage>
          -
          <lpage>7311</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Müller</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Péteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Abacha</surname>
            ,
            <given-names>A.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datla</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasan</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DemnerFushman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kozlovski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelka</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Friedrich</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>de Herrera</surname>
            ,
            <given-names>A.G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ninh</surname>
            ,
            <given-names>V.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>T.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Piras</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riegler</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , l Halvorsen,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fichou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Berari</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Brie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Dogariu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Ştefan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.D.</given-names>
            ,
            <surname>Constantin</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.G.</surname>
          </string-name>
          :
          <article-title>Overview of the ImageCLEF 2020: Multimedia retrieval in medical, lifelogging, nature, and internet applications</article-title>
          .
          <source>In: Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of the 11th International Conference of the CLEF Association (CLEF</source>
          <year>2020</year>
          ), vol.
          <volume>12260</volume>
          .
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Thessaloniki,
          <source>Greece (September</source>
          <volume>22</volume>
          - 25
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kolesnikov</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Beyer</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhai</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Puigcerver</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yung</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gelly</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Houlsby</surname>
          </string-name>
          , N.:
          <article-title>Big transfer (bit</article-title>
          ):
          <article-title>General visual representation learning (</article-title>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hays</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollár</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zitnick</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>Microsoft coco: Common objects in context</article-title>
          .
          <source>In: European conference on computer vision</source>
          . pp.
          <fpage>740</fpage>
          -
          <lpage>755</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anguelov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erhan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szegedy</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reed</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berg</surname>
            ,
            <given-names>A.C.</given-names>
          </string-name>
          : Ssd:
          <article-title>Single shot multibox detector</article-title>
          .
          <source>In: European conference on computer vision</source>
          . pp.
          <fpage>21</fpage>
          -
          <lpage>37</lpage>
          . Springer (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>