Automatic Coral Reef Annotation, Localization and Pixel-wise Parsing Using Mask R-CNN

Automatic Coral Reef Annotation, Localization and Pixel-wise Parsing Using Mask R-CNN LukášSoukup lsoukup@kky.zcu.cz Faculty of Applied Sciences Department of Cybernetics University of West Bohemia

Univerzitní 8, 00 301 Plzeň Czech Republic

Automatic Coral Reef Annotation, Localization and Pixel-wise Parsing Using Mask R-CNN 1613-0073 F90CDC6EC66F7B4C2DA38223BDFA16B5 GROBID - A machine learning software for extracting information from scholarly documents Object detection Semantic segmentation Neural networks Deep learning Machine learning Coral reefs detection Coral reefs segmentation

This paper describes the methods that were used for annotation, localization and pixel-wise parsing of the coral reefs from underwater images. The proposed system achieved competitive results in the third edition of ImageCLEFcoral 2021 challenge. Specifically, in case of annotation and localization task achieved mean average precision with Intersection over Union (IoU) greater that 0.

Introduction

The ImageCLEFcoral 2021 challenge [1] is motivated by the impact of recent climate changes on the coral reefs and the ecosystem they support. Coral reefs are in danger of being lost within next 30 years which would lead to not only extinction of many marine species but also to a humanitarian crisis on a global scale for people who rely on the reef services. By monitoring the changes of reef we could help with prioritizing conservation efforts.

The goal of the challenge is to create an automatic system for monitoring the coral reefs using the provided dataset. The challenge consist of the tasks: (1) annotation and localization, and (2) pixel-wise parsing.

The proposed solution is using state-of-the-art object detection model Mask R-CNN [2] which provides both detection and semantic segmentation information.

Provided dataset [1] consists of 1052 train images and 485 test images taken from coral reefs around the world as part of a coral reef monitoring project with the Marine Technology Research Unit at the University of Essex. Each coral object in the training set was annotated by an expert including a bounding box, segmentation polygon and a class representing one of 13 substrate types. In total 21,749 objects were annotated in the dataset.

The dataset is very challenging from different perspectives. Each image contain many different coral objects, on average there are over 20 corals in a single image. The dataset is highly unbalanced having 33.5% of all objects from class c_soft_coral and only 0.12% of all objects from class c_fire_coral_millepora. Additionally, the quality of the images is very inconsistent, some images are heavily blurred and there are noticeable color variations. Fig. 1 show example from the dataset with drawn annotations for both tasks.

Dataset split

For the purpose of optimizing network parameters the provided training dataset [1] was divided into train set and validation set. To correctly evaluate the performance on the validation set it is crucial that the validation set has the same data distribution as the training set. To preserve the distribution we decided to make the split with respect to location where the image was taken. From each location 80% of images were added to the train set and rest to the validation set.

Data Preprocessing

The implementation of the CNN [2] used for the experiments expects the target data to be in the specific format. For the subtask annotation and localization the expected target data are the bounding boxes specified by 4 numbers -coordinates of upper left corner, width and height of the bounding box. The provided bounding box annotations for this subtask were given by coordinates of the upper left corner and bottom right corner. Thus, the preprocessing of target data was simple.

The preprocessing of target data for the second subtask (pixel-wise parsing) was more interesting. The CNN expects the target data for the segmentation to be binary segmentation masks. The annotation provided in the challenge were marking every segmentation object as a set of points making a polygons around the coral (as shown in Fig. 1).

To create a submission to the challenge, the segmentation masks had to be transferred to the set of points again. Several methods of creating polygons were tested. The only one not creating self-intersecting polygons turned out to be searching for the contours in every binary mask and then creating a convex hull of the contour.

Method

The proposed object detection and pixel-wise parsing method is state-of-the-art convolutional neural network Mask R-CNN [2] pretrained on ImageNet dataset [3]. Specifically, PyTorch [4] implementation of this network was used in the experiments. The model provides predictions useful for both subtasks -bounding boxes for annotation and localization, and binary segmentation masks for pixel-wise parsing.

Experimental Setup

Even though resolution of input images is crucial for the task of object detection since some of the objects are relatively small, all the training images were resized to 1000 × 1000 due GPU memory limitation. The model was trained with batch size 2 and accumulated gradient 4 using SGD optimizer with an initial learning rate 0.005 step decay 0.0005 after 3 epochs. The best model was chosen by an early stopping method over last 5 epoch evaluated on the validation set.

Augmentations

To enrich the training set data augmentation were applied to the training images (online -during the training process). When loading the image, first, a random horizontal flip was applied with probability 0.5. Second, random brightness and contrast variations were used to simulate color inconsistency in the training data. Specifically, brightness variation with delta of 0.15 with probability 0.6 and contrast and saturation variations scaled by random value in range from 0.85 to 1.15 with probability 0.8.

Evaluation

The trained models were evaluated on the validation set and the best models used for the prediction on the test set. The evaluation criteria for both subtasks is mean average precision with Intersection over Union (IoU) greater that 0.5 (mAP@0.5). The model chosen for the the prediction on test set achieved mAP@0.5 of 0.18 in case of object detection and mAP@0.5 of 0.35 in case of instance segmentation on the validation set.

Submissions

The submissions to the challenge were created as the prediction of the proposed method on the test set provided in the challenge [1]. For evaluation of participants submission, the AICrowd platform was used. Each team was allowed to submit up to 10 runs. I have used 2 runs for annotation and localization task and only 1 run for pixel-wise parsing task.

Tables 1 and 2 show the description and result of each submission to both subtasks of the challenge. Fig. 2 shows example of object detection on one of the images from test set.

Competition results

The official competition results are shown in Table 3 for pixel-wise parsing task and in Table 4 for annotation and localization task. The proposed method achieved the best score in both tasks of the ImageCLEFcoral 2021 competition. Specifically, achieved mAP@0.5 of 0.075 in case of pixel-wise parsing (run id 139084) and mAP@0.5 of 0.121 in case of annotation and localization (run id 138115) of coral reefs.

Conclusion

This paper presents automatic system for annotation, localization and segmentation of coral reefs which was used in ImageCLEFcoral 2021 challenge. The detection method based on Mask R-CNN achieved mAP@0.5 of 0.121 in case of annotation and localization task and 0.075 in case of pixel-wise parsing task. Despite the unsatisfying results, I believe that more advanced methods and utilization of knowledge distillation could significantly improve the results in the future.

Figure 1 :1Figure 1: Example image with drawn (a) detection annotation, (b) pixel-wise parsing annotation

Figure 2 :2Figure 2: Example result of object detection on the test set.

Table 11Annotation and localization results on the test set.SetupmAP@0.5 recallMask R-CNN0.1050.055Mask R-CNN + augmentations0.1210.059Table 2Pixel-wise parsing results on the test set.SetupmAP@0.5 recallMask R-CNN + augmentations0.0750.048

Table 33Comparison with other participants in pixel-wise parsing task.GroupmAP@0.5MTRU0.011MTRU0.017MTRU0.018MTRU0.021University of West Bohemia0.075

Table 44Comparison with other participants in annotation and localization task.GroupmAP@0.5UAlbany0.001University of West Bohemia0.105University of West Bohemia0.121

Acknowledgments

This work was supported by the grant of University of West Bohemia, project No. SGS-2019-027. Computational resources were supplied by the project "e-Infrastruktura CZ" (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.

Overview of the ImageCLEFcoral 20201task: Coral reef image annotation of a 3d environment JChamberlain AGarcía Seco De Herrera ACampello AClark TAOliver HMoustahfid CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS

Bucharest, Romania

2021 <author> <persName><forename type="first">K</forename><surname>He</surname></persName> </author> <author> <persName><forename type="first">G</forename><surname>Gkioxari</surname></persName> </author> <author> <persName><forename type="first">P</forename><surname>Dollár</surname></persName> </author> <author> <persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Girshick</surname></persName> </author> <author> <persName><forename type="first">R-Cnn</forename><surname>Mask</surname></persName> </author> <idno>CoRR abs/1703.06870</idno> <ptr target="http://arxiv.org/abs/1703.06870.arXiv:1703.06870" /> <imprint> <date type="published" when="2017">2017</date> </imprint> </monogr> </biblStruct> <biblStruct xml:id="b2"> <analytic> <title level="a" type="main">Imagenet: A large-scale hierarchical image database JDeng WDong RSocher L.-JLi KLi LFei-Fei IEEE conference on computer vision and pattern recognition Ieee 2009. 2009 Pytorch: An imperative style, high-performance deep learning library APaszke SGross FMassa ALerer JBradbury GChanan TKilleen ZLin NGimelshein LAntiga ADesmaison AKopf EYang ZDevito MRaison ATejani SChilamkurthy BSteiner LFang JBai SChintala Advances in Neural Information Processing Systems 32 HWallach HLarochelle ABeygelzimer FAlché-Buc EFox RGarnett Curran Associates, Inc 2019