<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Coral Reef Annotation and Localization using Faster R-CNN</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>S M Jaisakthi</string-name>
          <email>jaisakthi.murugaiyan@vit.ac.in</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>P Mirunalini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chandrabose Aravindan</string-name>
          <email>aravindancg@ssn.edu.in</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Faster R-CNN</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>SSN College of Engineeing. Kalavakkam</institution>
          ,
          <addr-line>Chennai</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Vellore Institute of Technology</institution>
          ,
          <addr-line>Vellore</addr-line>
          ,
          <country country="IN">India</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Coral reefs are the most diverse and valuable ecosystems in the world. It is also called as rainforests of the sea as they are so diverse. Coral reefs are important since it provide shelter and food to many marine species and also act as the source of nitrogen and other essential nutrients for marine food chains. Recent studies show that coral reefs ecosystems are extremely threatened due to pollution, sedimentation, unviable shing practices, and climate change. So, coral reefs should be protected and monitored to save marine ecosystem. Hence, to monitor coral reef a task was introduced in ImageCLEF 2019, to automatically identify and label di erent types of benthic substrate with bounding boxes in a given image. This paper presents a Convolutional Neural Network (CNN) based method to locate and detect di erent types of benthic substrate. We have used faster RCNN architecture to detect the substrate since this method is much faster and accurate in detecting the objects.</p>
      </abstract>
      <kwd-group>
        <kwd>Coal Reef Object Detection tional Neural Network (CNN)</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Coral reefs [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are large underwater structures which are composed of the
skeletons of colonial marine invertebrates called coral. These colonies are groups
of individual animals called polyps. The reef structures are formed by the polyps
secretion, upon which they live, which is made up of a substance called calcium
carbonate. These coral make signi cant contributions to the well-being of people,
animals, and plants in marine and coastal environments. They protect the coastal
land from erosion that is caused by waves and storms. Coral reefs are not only
important in terms of worldwide tourism, but it also serve as an important
indicator to evaluate the health of our planet. In addition, they are an essential
source of food and protein for millions of people throughout the world and also
provide medical bene ts to us. But today, we are the ones threatening reefs.
      </p>
      <p>Around the world, roughly 50 percent of coral reefs have died in just the past
few decades. The Great Barrier Reef was even declared dead last year. Coral
reefs should be protected and many organisations are working hard to protect
coral reefs. So an automatic system to locate and detect the coral reef in the sea
will be helpful to conserve coral reefs. In the ImageCLEFcoral 2019 task, coral
reefs are localised and annotated automatically using the CNN based object
detection methods in an given image.</p>
      <p>Predicting the location of the object along with the class label is called as
object detection. This can be achieved with deep learning or computer vision
techniques by localizing the objects along with image classi cation in each image.
The traditional object detection methods involve detection based on block-wise
orientation of histogram features. These methods use low level characteristics of
the object features and hence, not able to discriminate objects of di erent labels
well. But deep learning based methods construct a representation in hierarchical
manner using low to high level features extracted from neural networks which
improves the detection accuracy much better.</p>
      <p>In deep learning object detection problem can be considered as a classi
cation problem by classifying the image patches extracted from the images. In
general the CNNs used for classi cation were too slow and computationally
expensive because of running on so many patches generated by sliding window
detector. This problem can be solved using R-CNN, which uses selective search
that reduces number of bounding boxes to the classi er. Selective search uses
local cues like texture, intensity, color and/or a measure of insideness etc. to
generate all the possible locations of the objects. The selected objects regions
are wrapped to a xed size pixels and are fed to a classi er which gives the
individual probability of the region belonging to background and classes. So to
locate and detect the coral reefs in an input image we have used faster R-CNN
technique to achieve good accuracy.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Proposed Methodology</title>
      <p>
        Our method for substrate detection is based on the Faster R-CNN [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
architecture. Faster R-CNN uses Region Proposal Network (RPN) using CNN to
generate region proposals. This architecture consist of 3 layers, namely
convolutional layer to extract feature maps, RPN to obtain region proposals with the
help of anchor boxes and detection network for predicting object classes and
bounding boxes. The proposed method consists of 3 stages namely
preprocessing, training and substrate detection. In this work we have used ve variants of
pretrained COCO object detection model namely (1)Faster RCNN with NasNet
(with augmentation), (2) Faster RCNN with NasNet (without augmentation) (3)
Faster RCNN with inception V2 (with augmentation) (4) Faster RCNN with
inception V2 (without augmentation) and (5) Faster RCNN with resnet101 (with
augmentation).
2.1
      </p>
      <sec id="sec-2-1">
        <title>Dataset</title>
        <p>The dataset for this task is taken from coral reefs around the world as part of a
coral reef monitoring project with the Marine Technology Research Unit at the
University of Essex. The images contains the following 13 types of substrates:
Hard Coral { Branching, Hard Coral { Submassive, Hard Coral { Boulder, Hard
Coral { Encrusting, Hard Coral { Table, Hard Coral { Foliose, Hard Coral {
Mushroom, Soft Coral, Soft Coral { Gorgonian, Sponge, Sponge { Barrel, Fire
Coral { Millepora and Algae - Macro or Leaves. The data set contains 240
training images with 6670 annotated substrates along with ground truth annotations
as bounding boxes.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Preprocessing</title>
        <p>
          To reduce the computational complexity we have scaled down the input images.
To build a strong object detector we have applied image augmentation [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] to
improve the accuracy. We have created more training images using augmentation
by applying horizontal and vertical ips, rotating by 90 degrees and randomly
adjusting the contrast and brightness of the images.
2.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Training</title>
        <p>
          We have used ve variants of Faster R-CNN architecture that comes with the
Tensor ow Object Detection API [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. The architectures were pre-trained using
the COCO dataset [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] that contains 300k images from 80 categories of animals,
furniture, vehicles, etc. for general object detection. In order to make the
pretrained models to learn the characteristics of benthic substrates we have
netuned it using the data set provided by the task ImageCLEFcoral 2019 [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
2.4
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>Coral Reef Image Annotation and Localisation</title>
        <p>To localize the coral reef we have trained the models using the dataset provided
by the organizers using the hyper-parameters recommended in the Tensor ow
Object Detection API. The di erent models used in this task are discussed in
the following sections.</p>
        <p>Faster RCNN with NasNet In this model we have trained Faster R-CNN
with NasNet as backbone. To study the performance of this model we have
conducted two di erent experiments, one with image augmentation and the other
without augmentation. This architecture used NasNet to extract the features in
the rst stage with l2 regularizer. Since NasNet utilizes very large memory space
we experienced resource allocation problem during the training phase. So we
have down scaled the input images size as 300 300 and trained the architecture
using dataset for 120000 epochs.</p>
        <p>Faster R-CNN with Inception V2 Faster R-CNN with inception V2 model
extracts the features from the input images using inception resnet v2 during
the rst stage. To reduce the computational complexity the input images are
reduced to the size of 600*1024. The model is trained for 100000 epochs with
l2 regularizer and truncated normal initializer in the rst stage. Anchors are
generated for 4 scales with 3 di erent aspect ratio. For box predictor, the model
is trained with l2 regularizer and variance scale initializer is used. The model
performance is evaluated and analysed with both with and without image
augmentation techniques.</p>
        <p>Faster R-CNN with Resnet101 In this architecture we have used Faster
RCNN Resnet101 to extract features in the rst stage along with the image
augmentation technique. The model is trained with the coral reef dataset for
150000 epochs.
2.5</p>
      </sec>
      <sec id="sec-2-5">
        <title>Results and Discussion</title>
        <p>The proposed methods were evaluated using intersection over union (IoU), the
area of intersection between the foreground in the output segmentation and the
foreground in the ground-truth segmentation, divided by the area of their union.
The nal results were calculated using average performance over all images of
all concepts, and also per concept performance over all images. The following
table shows the result of the proposed method presented in this paper.</p>
        <p>MAP 50 is the localised Mean average precision (MAP) for each submitted
method for using the performance measure of IoU &gt;= 50 of the ground truth,
R 50 is the localised mean recall for each submitted method for using the
performance measure of IoU &gt;= 50 of the ground truth and MAP 0 is the image
annotation average for each method with success if the concept is simply
detected in the image without any localisation. From the above table it is clear
that all the methods with image augmentation techniques produced a good mean
average precision when compared with the other methods trained without
augmentation. But mean average recall is slightly higher for Faster R-CNN with
NasNet without augmentation when compared to NasNet with augmentation.
This variation may be due to which we have downscaled the input images too
much. So further we need to conduct experiments by increasing the size of the
input images. It is also found that among three di erent architectures Faster
R-CNN with NasNet produced a good result in terms of both precision and
recall.</p>
        <p>In terms of per substrate accuracy, Faster R-CNN with NasNet produced a
good accuracy when compared to the methods presented by the other
participants. Table 2 shows the results of per substrate accuracy presented by other
teams.</p>
        <p>From Table 2 it is evident that our method produced better accuracy in
identifying many substrate types.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgements</title>
      <p>The authors would like to thank the management of VIT university, Vellore,
India and SSN College of Engineering, Chennai India for funding the respective
research labs where the research work is being carried out. One of the authors S
M Jaisakthi would like to thank NVIDIA for providing a GPU grant in support
of this research work and similarly P Mirunalini and Chandrabose Aravindan
would like to thank the management for providing the GPU machine where this
research is carried out.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>1. Introduction to Coral Reefs. http://www.deepbluediscoveries.com/ introduction-to-coral-reefs</mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chamberlain</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Campello</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wright</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clift</surname>
            ,
            <given-names>L.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clark</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , Garc a Seco de Herrera, A.:
          <article-title>Overview of ImageCLEFcoral 2019 task</article-title>
          .
          <source>In: CLEF2019 Working Notes. CEUR Workshop Proceedings</source>
          , CEUR-WS.org (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Gray</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Coral</surname>
            <given-names>Reefs:</given-names>
          </string-name>
          <article-title>An introduction</article-title>
          . https://www.edgeofexistence.org/blog/ coral
          <article-title>-reefs-an-introduction/ (</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rathod</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Korattikara</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fathi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischer</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wojna</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Song</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guadarrama</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Speed/accuracy trade-o s for modern convolutional object detectors</article-title>
          .
          <source>CoRR abs/1611</source>
          .10012 (
          <year>2016</year>
          ), http:// arxiv.org/abs/1611.10012
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ionescu</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Muller, H.,
          <string-name>
            <surname>Peteri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cid</surname>
            ,
            <given-names>Y.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liauchuk</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kovalev</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klimuk</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tarasau</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ben</surname>
            <given-names>Abacha</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Hasan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.A.</given-names>
            ,
            <surname>Datla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Demner-Fushman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Dang-Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.T.</given-names>
            ,
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Riegler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.T.</given-names>
            ,
            <surname>Lux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gurrin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Pelka</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.M.</given-names>
            ,
            <surname>de Herrera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.G.S.</given-names>
            ,
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Kavallieratou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>del Blanco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.R.</given-names>
            , Rodr guez, C.C.,
            <surname>Vasillopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Karampidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            ,
            <surname>Chamberlain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Campello</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>ImageCLEF 2019: Multimedia retrieval in medicine, lifelogging, security and nature</article-title>
          . In:
          <article-title>Experimental IR Meets Multilinguality, Multimodality, and Interaction</article-title>
          .
          <source>Proceedings of the 10th International Conference of the CLEF Association (CLEF</source>
          <year>2019</year>
          ),
          <source>LNCS Lecture Notes in Computer Science</source>
          , Springer, Lugano,
          <source>Switzerland (September 9-12</source>
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maire</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Belongie</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hays</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perona</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ramanan</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zitnick</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>Microsoft coco: Common objects in context</article-title>
          .
          <source>In: European conference on computer vision</source>
          . pp.
          <volume>740</volume>
          {
          <fpage>755</fpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mikolajczyk</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grochowski</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Data augmentation for improving deep learning in image classi cation problem</article-title>
          .
          <source>In: 2018 International Interdisciplinary PhD Workshop</source>
          (IIPhDW). pp.
          <volume>117</volume>
          {
          <issue>122</issue>
          (May
          <year>2018</year>
          ). https://doi.org/10.1109/IIPHDW.
          <year>2018</year>
          .8388338
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
          </string-name>
          , J.:
          <string-name>
            <surname>Faster R-CNN</surname>
          </string-name>
          <article-title>: towards real-time object detection with region proposal networks</article-title>
          .
          <source>CoRR abs/1506</source>
          .01497 (
          <year>2015</year>
          ), http: //arxiv.org/abs/1506.01497
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>