<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Automatic Coral Reef Annotation, Localization and Pixel-wise Parsing Using Mask R-CNN</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Lukáš</forename><surname>Soukup</surname></persName>
							<email>lsoukup@kky.zcu.cz</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Faculty of Applied Sciences</orgName>
								<orgName type="department" key="dep2">Department of Cybernetics</orgName>
								<orgName type="institution">University of West Bohemia</orgName>
								<address>
									<addrLine>Univerzitní 8, 00</addrLine>
									<postCode>301</postCode>
									<settlement>Plzeň</settlement>
									<country key="CZ">Czech Republic</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Automatic Coral Reef Annotation, Localization and Pixel-wise Parsing Using Mask R-CNN</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">F90CDC6EC66F7B4C2DA38223BDFA16B5</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T20:41+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Object detection</term>
					<term>Semantic segmentation</term>
					<term>Neural networks</term>
					<term>Deep learning</term>
					<term>Machine learning</term>
					<term>Coral reefs detection</term>
					<term>Coral reefs segmentation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>This paper describes the methods that were used for annotation, localization and pixel-wise parsing of the coral reefs from underwater images. The proposed system achieved competitive results in the third edition of ImageCLEFcoral 2021 challenge. Specifically, in case of annotation and localization task achieved mean average precision with Intersection over Union (IoU) greater that 0.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The ImageCLEFcoral 2021 challenge <ref type="bibr" target="#b0">[1]</ref> is motivated by the impact of recent climate changes on the coral reefs and the ecosystem they support. Coral reefs are in danger of being lost within next 30 years which would lead to not only extinction of many marine species but also to a humanitarian crisis on a global scale for people who rely on the reef services. By monitoring the changes of reef we could help with prioritizing conservation efforts.</p><p>The goal of the challenge is to create an automatic system for monitoring the coral reefs using the provided dataset. The challenge consist of the tasks: <ref type="bibr" target="#b0">(1)</ref> annotation and localization, and (2) pixel-wise parsing.</p><p>The proposed solution is using state-of-the-art object detection model Mask R-CNN <ref type="bibr" target="#b1">[2]</ref> which provides both detection and semantic segmentation information.</p><p>Provided dataset <ref type="bibr" target="#b0">[1]</ref> consists of 1052 train images and 485 test images taken from coral reefs around the world as part of a coral reef monitoring project with the Marine Technology Research Unit at the University of Essex. Each coral object in the training set was annotated by an expert including a bounding box, segmentation polygon and a class representing one of 13 substrate types. In total 21,749 objects were annotated in the dataset.</p><p>The dataset is very challenging from different perspectives. Each image contain many different coral objects, on average there are over 20 corals in a single image. The dataset is highly unbalanced having 33.5% of all objects from class c_soft_coral and only 0.12% of all objects from class c_fire_coral_millepora. Additionally, the quality of the images is very inconsistent, some images are heavily blurred and there are noticeable color variations. Fig. <ref type="figure" target="#fig_0">1</ref> show example from the dataset with drawn annotations for both tasks.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Dataset split</head><p>For the purpose of optimizing network parameters the provided training dataset <ref type="bibr" target="#b0">[1]</ref> was divided into train set and validation set. To correctly evaluate the performance on the validation set it is crucial that the validation set has the same data distribution as the training set. To preserve the distribution we decided to make the split with respect to location where the image was taken. From each location 80% of images were added to the train set and rest to the validation set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Data Preprocessing</head><p>The implementation of the CNN <ref type="bibr" target="#b1">[2]</ref> used for the experiments expects the target data to be in the specific format. For the subtask annotation and localization the expected target data are the bounding boxes specified by 4 numbers -coordinates of upper left corner, width and height of the bounding box. The provided bounding box annotations for this subtask were given by coordinates of the upper left corner and bottom right corner. Thus, the preprocessing of target data was simple.</p><p>The preprocessing of target data for the second subtask (pixel-wise parsing) was more interesting. The CNN expects the target data for the segmentation to be binary segmentation masks. The annotation provided in the challenge were marking every segmentation object as a set of points making a polygons around the coral (as shown in Fig. <ref type="figure" target="#fig_0">1</ref>).</p><p>To create a submission to the challenge, the segmentation masks had to be transferred to the set of points again. Several methods of creating polygons were tested. The only one not creating self-intersecting polygons turned out to be searching for the contours in every binary mask and then creating a convex hull of the contour.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Method</head><p>The proposed object detection and pixel-wise parsing method is state-of-the-art convolutional neural network Mask R-CNN <ref type="bibr" target="#b1">[2]</ref> pretrained on ImageNet dataset <ref type="bibr" target="#b2">[3]</ref>. Specifically, PyTorch <ref type="bibr" target="#b3">[4]</ref> implementation of this network was used in the experiments. The model provides predictions useful for both subtasks -bounding boxes for annotation and localization, and binary segmentation masks for pixel-wise parsing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Experimental Setup</head><p>Even though resolution of input images is crucial for the task of object detection since some of the objects are relatively small, all the training images were resized to 1000 × 1000 due GPU memory limitation. The model was trained with batch size 2 and accumulated gradient 4 using SGD optimizer with an initial learning rate 0.005 step decay 0.0005 after 3 epochs. The best model was chosen by an early stopping method over last 5 epoch evaluated on the validation set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Augmentations</head><p>To enrich the training set data augmentation were applied to the training images (online -during the training process). When loading the image, first, a random horizontal flip was applied with probability 0.5. Second, random brightness and contrast variations were used to simulate color inconsistency in the training data. Specifically, brightness variation with delta of 0.15 with probability 0.6 and contrast and saturation variations scaled by random value in range from 0.85 to 1.15 with probability 0.8.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Evaluation</head><p>The trained models were evaluated on the validation set and the best models used for the prediction on the test set. The evaluation criteria for both subtasks is mean average precision with Intersection over Union (IoU) greater that 0.5 (mAP@0.5). The model chosen for the the prediction on test set achieved mAP@0.5 of 0.18 in case of object detection and mAP@0.5 of 0.35 in case of instance segmentation on the validation set.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Submissions</head><p>The submissions to the challenge were created as the prediction of the proposed method on the test set provided in the challenge <ref type="bibr" target="#b0">[1]</ref>. For evaluation of participants submission, the AICrowd platform was used. Each team was allowed to submit up to 10 runs. I have used 2 runs for annotation and localization task and only 1 run for pixel-wise parsing task.</p><p>Tables <ref type="table" target="#tab_0">1 and 2</ref> show the description and result of each submission to both subtasks of the challenge. Fig. <ref type="figure" target="#fig_1">2</ref> shows example of object detection on one of the images from test set. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Competition results</head><p>The official competition results are shown in Table <ref type="table" target="#tab_1">3</ref> for pixel-wise parsing task and in Table <ref type="table" target="#tab_2">4</ref> for annotation and localization task. The proposed method achieved the best score in both tasks of the ImageCLEFcoral 2021 competition. Specifically, achieved mAP@0.5 of 0.075 in case of pixel-wise parsing (run id 139084) and mAP@0.5 of 0.121 in case of annotation and localization (run id 138115) of coral reefs. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>This paper presents automatic system for annotation, localization and segmentation of coral reefs which was used in ImageCLEFcoral 2021 challenge. The detection method based on Mask R-CNN achieved mAP@0.5 of 0.121 in case of annotation and localization task and 0.075 in case of pixel-wise parsing task. Despite the unsatisfying results, I believe that more advanced methods and utilization of knowledge distillation could significantly improve the results in the future.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Example image with drawn (a) detection annotation, (b) pixel-wise parsing annotation</figDesc><graphic coords="2,89.29,84.17,187.49,140.62" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example result of object detection on the test set.</figDesc><graphic coords="5,193.47,84.18,208.33,156.25" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Annotation and localization results on the test set.</figDesc><table><row><cell>Setup</cell><cell cols="2">mAP@0.5 recall</cell></row><row><cell>Mask R-CNN</cell><cell>0.105</cell><cell>0.055</cell></row><row><cell>Mask R-CNN + augmentations</cell><cell>0.121</cell><cell>0.059</cell></row><row><cell>Table 2</cell><cell></cell><cell></cell></row><row><cell>Pixel-wise parsing results on the test set.</cell><cell></cell><cell></cell></row><row><cell>Setup</cell><cell cols="2">mAP@0.5 recall</cell></row><row><cell>Mask R-CNN + augmentations</cell><cell>0.075</cell><cell>0.048</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 3</head><label>3</label><figDesc>Comparison with other participants in pixel-wise parsing task.</figDesc><table><row><cell>Group</cell><cell>mAP@0.5</cell></row><row><cell>MTRU</cell><cell>0.011</cell></row><row><cell>MTRU</cell><cell>0.017</cell></row><row><cell>MTRU</cell><cell>0.018</cell></row><row><cell>MTRU</cell><cell>0.021</cell></row><row><cell>University of West Bohemia</cell><cell>0.075</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 4</head><label>4</label><figDesc>Comparison with other participants in annotation and localization task.</figDesc><table><row><cell>Group</cell><cell>mAP@0.5</cell></row><row><cell>UAlbany</cell><cell>0.001</cell></row><row><cell>University of West Bohemia</cell><cell>0.105</cell></row><row><cell>University of West Bohemia</cell><cell>0.121</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was supported by the grant of University of West Bohemia, project No. SGS-2019-027. Computational resources were supplied by the project "e-Infrastruktura CZ" (e-INFRA LM2018140) provided within the program Projects of Large Research, Development and Innovations Infrastructures.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Overview of the ImageCLEFcoral 20201task: Coral reef image annotation of a 3d environment</title>
		<author>
			<persName><forename type="first">J</forename><surname>Chamberlain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>García Seco De Herrera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Campello</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">A</forename><surname>Oliver</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Moustahfid</surname></persName>
		</author>
		<ptr target="org" />
	</analytic>
	<monogr>
		<title level="m">CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS</title>
				<meeting><address><addrLine>Bucharest, Romania</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title/>
		<author>
			<persName><forename type="first">K</forename><surname>He</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gkioxari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dollár</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">B</forename><surname>Girshick</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R-Cnn</forename><surname>Mask</surname></persName>
		</author>
		<idno>CoRR abs/1703.06870</idno>
		<ptr target="http://arxiv.org/abs/1703.06870.arXiv:1703.06870" />
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Imagenet: A large-scale hierarchical image database</title>
		<author>
			<persName><forename type="first">J</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Dong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Socher</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fei-Fei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE conference on computer vision and pattern recognition</title>
				<imprint>
			<publisher>Ieee</publisher>
			<date type="published" when="2009">2009. 2009</date>
			<biblScope unit="page" from="248" to="255" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Pytorch: An imperative style, high-performance deep learning library</title>
		<author>
			<persName><forename type="first">A</forename><surname>Paszke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Massa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Lerer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bradbury</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Chanan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Killeen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Lin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Gimelshein</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Antiga</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Desmaison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Kopf</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Devito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Raison</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Tejani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chilamkurthy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Steiner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Bai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chintala</surname></persName>
		</author>
		<ptr target="http://papers.neurips.cc/paper/9015-pytorch-an-imperative-style-high-performance-deep-learning-library.pdf" />
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems 32</title>
				<editor>
			<persName><forename type="first">H</forename><surname>Wallach</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">H</forename><surname>Larochelle</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Beygelzimer</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Alché-Buc</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Fox</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="8024" to="8035" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
