-

Coral Reef Annotation and Localization using Faster R-CNN

S M Jaisakthi

jaisakthi.murugaiyan@vit.ac.in 1

P Mirunalini

Chandrabose Aravindan

aravindancg@ssn.edu.in 0

Faster R-CNN

0 SSN College of Engineeing. Kalavakkam , Chennai , India 1 Vellore Institute of Technology , Vellore , India

Coral reefs are the most diverse and valuable ecosystems in the world. It is also called as rainforests of the sea as they are so diverse. Coral reefs are important since it provide shelter and food to many marine species and also act as the source of nitrogen and other essential nutrients for marine food chains. Recent studies show that coral reefs ecosystems are extremely threatened due to pollution, sedimentation, unviable shing practices, and climate change. So, coral reefs should be protected and monitored to save marine ecosystem. Hence, to monitor coral reef a task was introduced in ImageCLEF 2019, to automatically identify and label di erent types of benthic substrate with bounding boxes in a given image. This paper presents a Convolutional Neural Network (CNN) based method to locate and detect di erent types of benthic substrate. We have used faster RCNN architecture to detect the substrate since this method is much faster and accurate in detecting the objects.

Coal Reef Object Detection tional Neural Network (CNN)

Coral reefs [ 1 ], [ 3 ] are large underwater structures which are composed of the skeletons of colonial marine invertebrates called coral. These colonies are groups of individual animals called polyps. The reef structures are formed by the polyps secretion, upon which they live, which is made up of a substance called calcium carbonate. These coral make signi cant contributions to the well-being of people, animals, and plants in marine and coastal environments. They protect the coastal land from erosion that is caused by waves and storms. Coral reefs are not only important in terms of worldwide tourism, but it also serve as an important indicator to evaluate the health of our planet. In addition, they are an essential source of food and protein for millions of people throughout the world and also provide medical bene ts to us. But today, we are the ones threatening reefs.

Around the world, roughly 50 percent of coral reefs have died in just the past few decades. The Great Barrier Reef was even declared dead last year. Coral reefs should be protected and many organisations are working hard to protect coral reefs. So an automatic system to locate and detect the coral reef in the sea will be helpful to conserve coral reefs. In the ImageCLEFcoral 2019 task, coral reefs are localised and annotated automatically using the CNN based object detection methods in an given image.

Predicting the location of the object along with the class label is called as object detection. This can be achieved with deep learning or computer vision techniques by localizing the objects along with image classi cation in each image. The traditional object detection methods involve detection based on block-wise orientation of histogram features. These methods use low level characteristics of the object features and hence, not able to discriminate objects of di erent labels well. But deep learning based methods construct a representation in hierarchical manner using low to high level features extracted from neural networks which improves the detection accuracy much better.

In deep learning object detection problem can be considered as a classi cation problem by classifying the image patches extracted from the images. In general the CNNs used for classi cation were too slow and computationally expensive because of running on so many patches generated by sliding window detector. This problem can be solved using R-CNN, which uses selective search that reduces number of bounding boxes to the classi er. Selective search uses local cues like texture, intensity, color and/or a measure of insideness etc. to generate all the possible locations of the objects. The selected objects regions are wrapped to a xed size pixels and are fed to a classi er which gives the individual probability of the region belonging to background and classes. So to locate and detect the coral reefs in an input image we have used faster R-CNN technique to achieve good accuracy. 2

Proposed Methodology

Our method for substrate detection is based on the Faster R-CNN [ 8 ] architecture. Faster R-CNN uses Region Proposal Network (RPN) using CNN to generate region proposals. This architecture consist of 3 layers, namely convolutional layer to extract feature maps, RPN to obtain region proposals with the help of anchor boxes and detection network for predicting object classes and bounding boxes. The proposed method consists of 3 stages namely preprocessing, training and substrate detection. In this work we have used ve variants of pretrained COCO object detection model namely (1)Faster RCNN with NasNet (with augmentation), (2) Faster RCNN with NasNet (without augmentation) (3) Faster RCNN with inception V2 (with augmentation) (4) Faster RCNN with inception V2 (without augmentation) and (5) Faster RCNN with resnet101 (with augmentation). 2.1

Dataset

The dataset for this task is taken from coral reefs around the world as part of a coral reef monitoring project with the Marine Technology Research Unit at the University of Essex. The images contains the following 13 types of substrates: Hard Coral { Branching, Hard Coral { Submassive, Hard Coral { Boulder, Hard Coral { Encrusting, Hard Coral { Table, Hard Coral { Foliose, Hard Coral { Mushroom, Soft Coral, Soft Coral { Gorgonian, Sponge, Sponge { Barrel, Fire Coral { Millepora and Algae - Macro or Leaves. The data set contains 240 training images with 6670 annotated substrates along with ground truth annotations as bounding boxes. 2.2

Preprocessing

To reduce the computational complexity we have scaled down the input images. To build a strong object detector we have applied image augmentation [ 7 ] to improve the accuracy. We have created more training images using augmentation by applying horizontal and vertical ips, rotating by 90 degrees and randomly adjusting the contrast and brightness of the images. 2.3

Training

We have used ve variants of Faster R-CNN architecture that comes with the Tensor ow Object Detection API [ 4 ]. The architectures were pre-trained using the COCO dataset [ 6 ] that contains 300k images from 80 categories of animals, furniture, vehicles, etc. for general object detection. In order to make the pretrained models to learn the characteristics of benthic substrates we have netuned it using the data set provided by the task ImageCLEFcoral 2019 [ 2 ] [ 5 ]. 2.4

Coral Reef Image Annotation and Localisation

To localize the coral reef we have trained the models using the dataset provided by the organizers using the hyper-parameters recommended in the Tensor ow Object Detection API. The di erent models used in this task are discussed in the following sections.

Faster RCNN with NasNet In this model we have trained Faster R-CNN with NasNet as backbone. To study the performance of this model we have conducted two di erent experiments, one with image augmentation and the other without augmentation. This architecture used NasNet to extract the features in the rst stage with l2 regularizer. Since NasNet utilizes very large memory space we experienced resource allocation problem during the training phase. So we have down scaled the input images size as 300 300 and trained the architecture using dataset for 120000 epochs.

Faster R-CNN with Inception V2 Faster R-CNN with inception V2 model extracts the features from the input images using inception resnet v2 during the rst stage. To reduce the computational complexity the input images are reduced to the size of 600*1024. The model is trained for 100000 epochs with l2 regularizer and truncated normal initializer in the rst stage. Anchors are generated for 4 scales with 3 di erent aspect ratio. For box predictor, the model is trained with l2 regularizer and variance scale initializer is used. The model performance is evaluated and analysed with both with and without image augmentation techniques.

Faster R-CNN with Resnet101 In this architecture we have used Faster RCNN Resnet101 to extract features in the rst stage along with the image augmentation technique. The model is trained with the coral reef dataset for 150000 epochs. 2.5

Results and Discussion

The proposed methods were evaluated using intersection over union (IoU), the area of intersection between the foreground in the output segmentation and the foreground in the ground-truth segmentation, divided by the area of their union. The nal results were calculated using average performance over all images of all concepts, and also per concept performance over all images. The following table shows the result of the proposed method presented in this paper.

MAP 50 is the localised Mean average precision (MAP) for each submitted method for using the performance measure of IoU >= 50 of the ground truth, R 50 is the localised mean recall for each submitted method for using the performance measure of IoU >= 50 of the ground truth and MAP 0 is the image annotation average for each method with success if the concept is simply detected in the image without any localisation. From the above table it is clear that all the methods with image augmentation techniques produced a good mean average precision when compared with the other methods trained without augmentation. But mean average recall is slightly higher for Faster R-CNN with NasNet without augmentation when compared to NasNet with augmentation. This variation may be due to which we have downscaled the input images too much. So further we need to conduct experiments by increasing the size of the input images. It is also found that among three di erent architectures Faster R-CNN with NasNet produced a good result in terms of both precision and recall.

In terms of per substrate accuracy, Faster R-CNN with NasNet produced a good accuracy when compared to the methods presented by the other participants. Table 2 shows the results of per substrate accuracy presented by other teams.

From Table 2 it is evident that our method produced better accuracy in identifying many substrate types.

Acknowledgements

The authors would like to thank the management of VIT university, Vellore, India and SSN College of Engineering, Chennai India for funding the respective research labs where the research work is being carried out. One of the authors S M Jaisakthi would like to thank NVIDIA for providing a GPU grant in support of this research work and similarly P Mirunalini and Chandrabose Aravindan would like to thank the management for providing the GPU machine where this research is carried out.

1. Introduction to Coral Reefs. http://www.deepbluediscoveries.com/ introduction-to-coral-reefs

2. Chamberlain , J. , Campello , A. , Wright , J.P. , Clift , L.G. , Clark , A. , Garc a Seco de Herrera, A.: Overview of ImageCLEFcoral 2019 task . In: CLEF2019 Working Notes. CEUR Workshop Proceedings , CEUR-WS.org ( 2019 )

3. Gray , C. : Coral

Reefs:

An introduction . https://www.edgeofexistence.org/blog/ coral -reefs-an-introduction/ ( 2012 )

4. Huang , J. , Rathod , V. , Sun , C. , Zhu , M. , Korattikara , A. , Fathi , A. , Fischer , I. , Wojna , Z. , Song , Y. , Guadarrama , S. , Murphy , K. : Speed/accuracy trade-o s for modern convolutional object detectors . CoRR abs/1611 .10012 ( 2016 ), http:// arxiv.org/abs/1611.10012

5. Ionescu , B. , Muller, H., Peteri , R. , Cid , Y.D. , Liauchuk , V. , Kovalev , V. , Klimuk , D. , Tarasau , A. , Ben

Abacha

, A. , Hasan , S.A. , Datla , V. , Liu , J. , Demner-Fushman , D. , Dang-Nguyen , D.T. , Piras , L. , Riegler , M. , Tran , M.T. , Lux , M. , Gurrin , C. , Pelka , O. , Friedrich , C.M. , de Herrera , A.G.S. , Garcia , N. , Kavallieratou , E. , del Blanco , C.R. , Rodr guez, C.C., Vasillopoulos , N. , Karampidis , K. , Chamberlain , J. , Clark , A. , Campello , A. : ImageCLEF 2019: Multimedia retrieval in medicine, lifelogging, security and nature . In: Experimental IR Meets Multilinguality, Multimodality, and Interaction . Proceedings of the 10th International Conference of the CLEF Association (CLEF 2019 ), LNCS Lecture Notes in Computer Science , Springer, Lugano, Switzerland (September 9-12 2019 )

6. Lin , T.Y. , Maire , M. , Belongie , S. , Hays , J. , Perona , P. , Ramanan , D. , Dollar , P. , Zitnick , C.L. : Microsoft coco: Common objects in context . In: European conference on computer vision . pp. 740 { 755 . Springer ( 2014 )

7. Mikolajczyk , A. , Grochowski , M. : Data augmentation for improving deep learning in image classi cation problem . In: 2018 International Interdisciplinary PhD Workshop (IIPhDW). pp. 117 { 122 (May 2018 ). https://doi.org/10.1109/IIPHDW. 2018 .8388338

8. Ren , S. , He , K. , Girshick , R.B. , Sun , J.: Faster R-CNN : towards real-time object detection with region proposal networks . CoRR abs/1506 .01497 ( 2015 ), http: //arxiv.org/abs/1506.01497