=Paper=
{{Paper
|id=Vol-3180/paper-108
|storemode=property
|title=Monitoring Coral Reefs Using Faster R-CNN
|pdfUrl=https://ceur-ws.org/Vol-3180/paper-108.pdf
|volume=Vol-3180
|authors=Felix Kerlin,Kirill Bogomasov,Stefan Conrad
|dblpUrl=https://dblp.org/rec/conf/clef/KerlinB022
}}
==Monitoring Coral Reefs Using Faster R-CNN==
<pdf width="1500px">https://ceur-ws.org/Vol-3180/paper-108.pdf</pdf>
<pre>
Monitoring Coral Reefs Using Faster R-CNN
Felix Kerlin1 , Kirill Bogomasov1 and Stefan Conrad1
1
    Heinrich-Heine-Universität Düsseldorf (Germany), Universitätsstraße 1, 40225 Düsseldorf, Germany


                                         Abstract
                                         Monitoring coral reefs is an important procedure to protect the persistence of many marine species. The
                                         imageCLEFcoral 2022 Challenge aims to identify and annotate corals on underwarter images. These
                                         images vary in terms of quality and are therefore of a high complexity. While our investigation, we
                                         focused on the data set and searched for ways to improve the image quality. To be specific, we minimized
                                         the impact of color casts, and erratic annotations by a color balancing strategy, as well as combining
                                         the prediction results of the trained deep learning architectures on preprocessed and original images.
                                         Object detection was handled by deep learning entirely. In particular, faster R-CNN with a ResNet+FPN
                                         backbone network was the architecture of the choice. The merging strategy is based by a Non-maximum
                                         Suppression (NMS) and reduces therefore overlapping predictions. Additionally, we analyzed the impact
                                         of the depth of the chosen backbone network. We have identified a connection between increasing
                                         network depth and increasing accuracy for underwater imaging. Overall, our best approach achieved a
                                         MAP0.5 value of 0.396.

                                         Keywords
                                         Computer Vision, Object Detection, Neural Networks, Coral Reefs Detection, Deep Learning


1. Introduction
The CLEF Initiative’s [1] imageCLEFcoral 2022 Challenge [2] addresses the issue of the de-
struction of coral reefs due to climate change and human activities. The reefs and the entire
surrounding ecosystems are threatened with extinction within the next 30 years [3]. This
would lead not only to the end of many marine species, but also to a humanitarian crisis of
global proportions, as many regions depend on coral reefs. A quick change in the near future is
essential. By this reason, an invention is indispensable. An appropriate intervention, in terms
of environmental protection, can only take place if it is known which steps need to be taken.
These are depending on the current state of the coral landscape, i.e. coral distribution, stocks
and many more. For this reason, the entire area of coral reefs needs to be analyzed and regularly
monitored subsequently. Manual monitoring by experts such as marine biologists is expensive
and not feasible at all, keeping in mind the total area of 255 000 𝑘𝑚2 covered by corals [4].
Therefore automation is necessary. Our aim is to investigate how well we can locate and classify
corals. For this purpose, we use efficient technologies from the field of deep learning. In the
following chapters we will describe procedures used for the submissions to the challenge in
detail.


CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ kerlin@hhu.de (F. Kerlin); bogomasov@hhu.de (K. Bogomasov); stefan.conrad@hhu.de (S. Conrad)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
2. Related Work
The annually occurring Coral reef image annotation and localization task is taking place for
the fourth time in a row. The results in the recent past have always shown potential for future
developments. Last years’ winning team achieved a MAP0.5 value of just 0.121 in 2021 [5]. Even
though the data sets have been revised this year and therefore the results cannot be directly
compared, these are the results that can best serve as a benchmark for our work. Over the
years, we have seen different approaches in submissions to the challenge, both classic feature
engineering [6], as was commonly used in computer vision, and newer deep learning methods
[7, 8, 9]. Our preliminary work, compared and combined both [10, 11]. A key lesson learned
from previous investigation is the need of balancing the highly unbalanced data, which is not
trivial and necessity of improving the quality of underwater images according to typical issues
such as cloudiness and shifted color distribution. This year we want to benefit from these
findings once again. Additionally, as proposed in [10], we will build our investigations mainly
around regions with CNN features. Although, [9] already experimented with Faster R-CNN
and achieved a MAP0.5 value of 0.13996, we see potentials for improvement.


3. Data set
The data set was provided by the CLEF initiative. The training data set consisted of 1,374 images
from a total of four different locations with a total of 31,517 annotations and 13 different classes.
Additionally, the evaluation data for the submissions consisted of 200 images from one location
and was not available for own investigations until the final submissions.
   Regarding the quality of the data, it was varying and therefore challenging. Many images had
a severe color shift, some images were blurry. Another complexity in a multi-label classification
task is the number of objects. The number of corals in an image varied from 1 to 116. In addition,
in groups of multiple corals of the same coral species, the corals are sometimes annotated in
one bounding box and sometimes in multiple boxes as shown in Figure 2.
   As can be seen in Table 1, the individual classes are distributed very unbalanced, i.e. the
substrate type "c_soft_coral" alone comprising 24.65% of the annotations. The three most
frequent classes account for 67.37% of the annotations, while the three least frequent classes
account for just 1.28% of the annotations.
   To give an example: Figure 1 illustrates four images of the data set with visualized annotations.
Noticeable are, the color and quality differences of the images that can be easily observed. In
particular, while images b) and c) have a strong blue cast, image a) is blurred. Especially in
figure d), the problem of delineating identical corals within an assemblage becomes clear. To
be specific: the corals of the type "c_hard_coral_branching" are divided into a total of three
annotations. Contrarily, in the same image the three large corals of type "c_hard_coral_table" in
the lower right corner are combined into one annotation, although they are clearly separable.
   Another example is shown in Figure 2. In image a), many corals of type "c_soft_coral" are
annotated by a single bounding box per coral. In image b) a group of corals of type "c_soft_coral"
is annotated by one large bounding box. Remembering the main evaluation metric MAP0.5 , the
impact of varying strategies while annotating becomes clear. For example, splitting a group
Table 1
Distribution of the individual classes in the training data set
                           Class              absolute occurrence   relative occurrence
                        c_soft_coral                  7769                24.65%
                  c_hard_coral_boulder                7373                23.39%
                         c_sponge                     6091                19.33%
                 c_hard_coral_branching               3132                 9.94%
                c_hard_coral_submassive               2637                 8.37%
                c_algae_macro_or_leaves               1870                 5.93%
                    c_hard_coral_table                920                  2.92%
                     c_sponge_barrel                   606                 1.92%
                c_hard_coral_encrusting               380                  1.21%
                c_hard_coral_mushroom                  335                1.06%
                   c_hard_coral_foliose                233                 0.74%
                 c_soft_coral_gorgonian                171                 0.54%
                  c_fire_coral_millepora                0                  0.00%


of the same coral species among more or fewer annotations in the submission would have a
negative impact on the score.


4. Approach
The core strategy used for object detection uses the state-of-the-art convolutional neural
network Faster R-CNN [12]. It is built with different ResNet backbone networks and the
framework detectron2 [13] for PyTorch [14]. We will first observe the effect of network depth
on coral detection and secondly try to compensate for the previously mentioned weaknesses of
the dataset through image enhancement.


4.1. Network architecture
For the network, we chose Faster R-CNN as described in the related work section. As a backbone
network, we have chosen ResNet+FPN [15]. This approach achieved the best results on the
COCO-dataset [16] in the FPN paper [15] and in detectron2’s Model Zoo baseline [13]. In
addition to the commonly used ResNet-50 and ResNet-101 and according to He et al. [17] who
showed that residual networks gain precision by increasing depth, we included ResNet-152.

4.2. Training hyperparameter
Because of the small data set, we divided it into 90% training data and 10% validation data.
To monitor overfitting, we calculated the evaluation metric of the challenge MAP0.5 in small
intervals of 250 iterations (≈ 104 epochs).
   Figure 7 shows the total loss of the training process and the MAP0.5 of the checkpoints as an
example. Although the training loss decreased over the complete 100.000 iterations (≈ 41727
                        (a)                                                (b)


                        (c)                                                (d)
Figure 1: Different images from the data set with visualized annotations


epochs), the network started to overfit at 70.000 iterations (≈ 29209 epochs) and the MAP0.5
decreased from there on. Therefore, in the end, we chose the checkpoint with the best MAP0.5
on the evaluation data for the final submission.
  For the learning process, we chose a base learning rate of 0.0005 for the first 25,000 iterations
(≈ 10431 epochs). After that, we lowered the learning rate to 0.0001 for the next 25,000 iterations
(≈ 10431 epochs) and to 0.00005 after 50,000 iterations (≈ 20863 epochs).
  Figure 8 shows a comparison of the different batch sizes. The ResNet-50 and ResNet-101
networks were trained with batch sizes 32, 64, 128, 256 and 512 each. In both cases, the networks
with larger batch sizes performed better than those with smaller batch sizes. Therefore, we
chose a batch size of 512 for the final submissions.
                        (a)                                                 (b)
Figure 2: Different annotation strategies for assemblages of the same coral species


                     (a) original image                       (b) enhanced image


               (c) Histogram of original image          (d) Histogram of enhanced image
Figure 3: Training image before and after image enhancement
4.3. Color balancing
To counteract the problem of color casts in the images, a function was written that removes blue
and green casts. For this purpose, the average values for red, green and blue were calculated for
each image. Then the image was shifted slightly into the red range until the average red value
reached a certain threshold. Since the termination criterion depends only on the red component
of the image, both images with a blue cast and images with a green cast can be improved in this
way. [18]
   A comparison of a random image before and after color balancing, including the corresponding
histograms, is shown in Figure 3. Contrary to the histogram of the original image, the green
channel value of the postprocessed image is much less dominant. Its value was shifted by the
image enhancement. Because of that, the histogram of the processed image shows a much more
balanced distribution for all three channels.

4.4. Dual network approach
Both image variations are used for training subsequently. For a better comparison, all tested
networks share the same settings and hyper-parameters. The setup for the training is explained
in Figure 4. The finally submitted predictions were then calculated for both types of data and
combined to to a common result. This process is illustrated in Figure 5.
   All potential duplicates were removed using Non-maximum Suppression. NMS iteratively
removes boxes with a lower confidence for all overlapping boxes that have an IoU greater than
0.8 and keeps the box with the highest confidence. If two boxes of different classes with an IoU
greater than 0.8 overlap, the box with smaller confidence is also discarded.

               input dataset        image enhancement           enhanced dataset

             training                                                    training

                 network               shared config         color balanced network


Figure 4: Training process of the combined Network


5. Submissions
Nine runs were submitted to the "Coral reef image annotation and localisation" task. These
consist of a combination of the backbone construction i.e. depth variation and the image type
that was used for training. Each run configuration is explained in Table 2. According to the
challenge organization its main evaluation metric is mean average precision. Furthermore we
added the submissions’ mean average recall for a better comparison.
                  input image          image enhancement                enhanced image


                    network                  shared config          color balanced network

         box prediction                                                         box prediction

                  annotations                    NMS               color balanced annotations


                                            final annotations


Figure 5: Use of the combined Network


Table 2
Results of the submission runs
                 Run-ID          Backbone          Image type       Precision    Recall
                 183911                               default         0.365      0.269
                 183912    ResNet-50+FPN          color balanced      0.318      0.256
                 183913                             combined          0.297      0.337
                 183914                               default         0.371      0.246
                 183916    ResNet-101+FPN         color balanced      0.305      0.270
                 183918                             combined          0.291      0.344
                 183919                               default         0.396      0.292
                 183920    ResNet-152+FPN         color balanced      0.366      0.292
                 183922                             combined          0.336      0.393


6. Results and discussion
The results of the submitted runs are shown in Table 2. The best result according to the
Challenge’s evaluation metric was the run with the ID 183919. It achieved an MAP0.5 of 0.396.
  In view of increasing precision in connection with increasing depth, we can assume, that
precision can be improved using an even deeper backbone. The same applies to recall.
  With the combined method using both the original and color balanced images, the recall
could be improved significantly. For example the MAR0.5 for the ResNet-152+FPN network
with combined images was 0.393 while the MAR0.5 of the network with the original images
was 0.292.
  The results of all three image-type approaches used can be seen in Figure 6. In total, 5 corals
could be identified in the exemplary image section by combining using NMS. In comparison, the
network on the original images found only 3 corals and the network on the enhanced images
found only 4 corals. The duplicate annotations found on both images were correctly sorted out
by NMS. Using the data for the MAR0.5 from Table 2, it can be seen that this method was able
to find more corals for all 3 network architectures by combining the predictions on the original
images and the predictions on the enhanced images.


    (a) predictions on original image (b) predictions on enhanced image   (c) combined predictions
Figure 6: Predictions for original image, enhanced image and combined predictions


   The average precision per substrate for the ResNet-152 approach for the original images and
the enhanced images given in Table 3. For most coral species the difference is less than 0.01,
but the species "c_hard_coral_boulder", "c_hard_coral_mushroom" and "c_hard_coral_foliose"
could be detected much better by the network with the enhanced images. In contrast, it was
significantly worse especially with the species "c_soft_coral_gorgonian". Because both networks
had strengths and weaknesses for certain coral species, the combined network was able to
benefit from the strengths of both.

Table 3
Average precision per substrate on validation data
           Class               AP original images      AP enhanced images     difference    difference(%)
         c_soft_coral                 0.237                    0.233            -0.004          -1,7%
   c_hard_coral_boulder               0.220                    0.242            +0.022          +10%
          c_sponge                    0.121                    0.117            -0.004          -3,3%
  c_hard_coral_branching              0.208                    0.199            -0.009          -4,3%
 c_hard_coral_submassive              0.197                    0.197               0             0%
 c_algae_macro_or_leaves              0.049                    0.057            +0.008         +16,3%
     c_hard_coral_table               0.318                    0.316            -0.002          -0,6%
      c_sponge_barrel                 0.354                    0.326            -0.028          -7,9%
 c_hard_coral_encrusting              0.461                    0.467            +0.006         +1,3%
 c_hard_coral_mushroom                0.251                    0.316            +0.065         +25,9%
    c_hard_coral_foliose              0.103                    0.184            +0.081         +78,6%
  c_soft_coral_gorgonian              0.254                    0.142            -0.112         -44,1%
   c_fire_coral_millepora               0                        0                 0              0
7. Conclusion and Perspective
Overall, the results of the challenge are satisfying and the improvements we made to the models
had the desired effects. Some image quality issues, as described in section 2, were improved.
However, the quality of the annotation data should be addressed in future versions of the
challenge. For instance, it is a bad starting point to have inaccurate bounding boxes containing
in some cases a set of individual corals, and in other cases grouping these objects to a single
annotation. It appears to make much more sense to have more precise annotations, similar to
the "coral reef image pixel-wise parsing" subtask.
   Furthermore, dense coverage of ground surface as well as fluctuating image quality, makes it
difficult to distinguish the different substrate types. However, a deeper network seems to be
more capable of handling these difficulties. The image enhancements made have not produced
better results on their own, but have found different corals than the network with the original
images. By combining the bounding boxes of both approaches and applying non-maximum
suppression, the best overall MAR0.5 was obtained.


References
 [1] B. Ionescu, H. Müller, R. Péteri, J. Rückert, A. Ben Abacha, A. G. S. de Herrera, C. M.
     Friedrich, L. Bloch, R. Brüngel, A. Idrissi-Yaghir, H. Schäfer, S. Kozlovski, Y. D. Cid, V. Ko-
     valev, L.-D. Ştefan, M. G. Constantin, M. Dogariu, A. Popescu, J. Deshayes-Chossart,
     H. Schindler, J. Chamberlain, A. Campello, A. Clark, Overview of the ImageCLEF 2022:
     Multimedia retrieval in medical, social media and nature applications, in: Experimental IR
     Meets Multilinguality, Multimodality, and Interaction, Proceedings of the 13th Interna-
     tional Conference of the CLEF Association (CLEF 2022), LNCS Lecture Notes in Computer
     Science, Springer, Bologna, Italy, 2022.
 [2] J. Chamberlain, A. Garcia Seco de Herrera, A. Campello, A. Clark, ImageCLEFcoral task:
     Coral reef image annotation and localisation, in: Experimental IR Meets Multilingual-
     ity, Multimodality, and Interaction, Proceedings of the 13th International Conference of
     the CLEF Association (CLEF 2022), LNCS Lecture Notes in Computer Science, Springer,
     Bologna, Italy, 2022.
 [3] O. Hoegh-Guldberg, The Impact of Climate Change on Coral Reef Ecosystems,
     Springer Netherlands, Dordrecht, 2011, pp. 391–403. URL: https://doi.org/10.1007/
     978-94-007-0114-4_22. doi:10.1007/978-94-007-0114-4_22.
 [4] M. Spalding, A. Grenfell, New estimates of global and regional coral reef areas, Coral reefs
     16 (1997) 225–230.
 [5] L. Soukup, Automatic coral reef annotation, localization and pixel-wise parsing using
     mask r-cnn (2021).
 [6] C. M. Caridade, A. R. Marçal, Automatic classification of coral images using color and
     textures., in: CLEF (Working Notes), 2019.
 [7] L. Picek, A. Ríha, A. Zita, Coral reef annotation, localisation and pixel-wise classification
     using mask-rcnn and bag of tricks., in: CLEF (Working Notes), 2020.
 [8] A. Steffens, A. Campello, J. Ravenscroft, A. Clark, H. Hagras, Deep segmentation: using
     deep convolutional networks for coral reef pixel-wise parsing., in: CLEF (Working Notes),
     2019.
 [9] S. Jaisakthi, P. Mirunalini, C. Aravindan, Coral reef annotation and localization using
     faster r-cnn., in: CLEF (Working Notes), 2019.
[10] K. Bogomasov, P. Grawe, S. Conrad, A two-staged approach for localization and classifica-
     tion of coral reef structures and compositions, in: CLEF (Working Notes), 2019.
[11] K. Bogomasov, P. Grawe, S. Conrad, Enhanced localization and classification of coral reef
     structures and compositions, in: CLEF (Working Notes), 2020.
[12] S. Ren, K. He, R. B. Girshick, J. Sun, Faster R-CNN: towards real-time object detection with
     region proposal networks, CoRR abs/1506.01497 (2015). URL: http://arxiv.org/abs/1506.
     01497. arXiv:1506.01497.
[13] Y. Wu, A. Kirillov, F. Massa, W.-Y. Lo, R. Girshick, Detectron2, https://github.com/
     facebookresearch/detectron2, 2019.
[14] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin,
     N. Gimelshein, L. Antiga, A. Desmaison, A. Kopf, E. Yang, Z. DeVito, M. Raison, A. Tejani,
     S. Chilamkurthy, B. Steiner, L. Fang, J. Bai, S. Chintala, Pytorch: An imperative style,
     high-performance deep learning library, in: H. Wallach, H. Larochelle, A. Beygelzimer,
     F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Sys-
     tems 32, Curran Associates, Inc., 2019, pp. 8024–8035. URL: http://papers.neurips.cc/paper/
     9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf.
[15] T.-Y. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, S. Belongie, Feature pyramid networks
     for object detection, 2016. URL: https://arxiv.org/abs/1612.03144. doi:10.48550/ARXIV.
     1612.03144.
[16] T.-Y. Lin, M. Maire, S. Belongie, L. Bourdev, R. Girshick, J. Hays, P. Perona, D. Ramanan,
     C. L. Zitnick, P. Dollár, Microsoft coco: Common objects in context, 2014. URL: https:
     //arxiv.org/abs/1405.0312. doi:10.48550/ARXIV.1405.0312.
[17] K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, 2015. URL:
     https://arxiv.org/abs/1512.03385. doi:10.48550/ARXIV.1512.03385.
[18] N. B. Andersen, underwater-image-color-correction, https://github.com/nikolajbech/
     underwater-image-color-correction, 2020.
A. Figures


                                           (a) total_loss


                                           (b) MAP0.5
Figure 7: Total training loss and MAP0.5
                           (a) training loss for ResNet-50 with different batch sizes
                                               32, 64, 128, 256, 512


                           (b) training loss for ResNet-101 with different batch sizes
                                                32, 64, 128, 256, 512
Figure 8: Total training loss for different batch sizes

</pre>