=Paper= {{Paper |id=Vol-2936/paper-119 |storemode=property |title=Pixelwise annotation of coral reef substrates |pdfUrl=https://ceur-ws.org/Vol-2936/paper-119.pdf |volume=Vol-2936 |authors=Jessica Wright,Ioana-Lia Palosanu,Louis Clift,Alba García Seco de Herrera,Jon Chamberlain |dblpUrl=https://dblp.org/rec/conf/clef/WrightPCHC21 }} ==Pixelwise annotation of coral reef substrates== https://ceur-ws.org/Vol-2936/paper-119.pdf
Pixelwise annotation of coral reef substrates
Jessica Wright1 , Ioana-Lia Palosanu1 , Louis Clift1 , Alba García Seco de Herrera1 and
Jon Chamberlain1
1
    School of Computer Science and Electronic Engineering, University of Essex, UK


                                         Abstract
                                         Coral reef substrate composition is regularly surveyed for ecosystem health monitoring. The current
                                         method of visual assessment is slow and limited in scale. ImageCLEFcoral aims to identify reef areas
                                         of interest and annotate them appropriately. We present an adaptation of a submission to the 2019 Im-
                                         ageCLEFcoral task that uses a semantic segmentation model, DeepLabV3, with a ResNet-101 backbone.
                                         We implemented pre-training image colour enhancement and supplemented the available training data
                                         with that of NOAA NCEI for specific runs. Our runs had no overall improvement from the 2019 code,
                                         though did predict submassive corals and table corals with greater accuracy (+3.022% and +0.353%).
                                         Though none of our model runs had the highest precision or accuracy, we did best predict submassive
                                         corals (3.022%), boulder corals (12.787%), table corals (0.353%), foliose corals (0.097%), gorgonian
                                         soft corals (0.002%) and algae (0.027%) across 3 of our 4 runs. Image colour enhancement benefited
                                         the prediction accuracy of boulder corals (+1.209−5.026%), encrusting corals (+1.7−2.578%) and al-
                                         gae (+0.027%), most likely by making them more distinct from their surroundings. Adding NOAA data
                                         enhanced the precision of encrusting coral, soft coral and gorgonian predictions despite only providing
                                         additional annotations for encrusting and foliose corals. Our results suggest that a more balanced ap-
                                         proach to data augmentation combined with image-specific colour improvements may provide a more
                                         desirable outcome, particularly when paired with a model that is fine-tuned to the data set used.

                                         Keywords
                                         Image segmentation, automatic annotation, coral reef annotation, semantic segmentation




1. Introduction
Coral reefs are vital marine systems that are known to provide many ecosystem functions and
services [1] while supporting one third of marine species [2]. Their decline has been widely
reported and tracked [3]. Current monitoring of coral reefs benthic communities relies on in-situ
data collection, sometimes followed by ex-situ video analysis, requiring time and expertise to
analyse correctly [4]. Automatic annotation from video stills or photographs would greatly
increase the speed and scale of feasible monitoring, and could free up reef experts to focus on
other areas to gain a wider view of shifting coral reef dynamics.
   Deep learning algorithms provide an answer to automatic annotation [5]. The underlying
architecture of most are Convolutional Neural Networks, often used for image and pattern recog-
nition [6]. Image segmentation models have been the most successful in the ImageCLEFcoral
pixel-wise parsing task [7, 8], where each pixel is predicted as a particular class.
   This is the third iteration of an annual ImageCLEF task [9, 10, 11] which has subtasks looking

CLEF 2021 – Conference and Labs of the Evaluation Forum, September 21–24, 2021, Bucharest, Romania
" jpwriga@essex.ac.uk (J. Wright); ip17484@essex.ac.uk (I. Palosanu)
                                       © 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)
(a)                                                          (b)




Figure 1: Examples of reef images (a) without and (b) with morphological substrate annotations.


into (1) Coral reef image annotation and localisation and (2) Coral reef image pixel-wise parsing.
Considering the value of each subtask in terms of practical use in monitoring reef systems
accurately, we focused on subtask 2.


2. Data
The initial data provided were split into a training and test images of coral reef systems. The
training set was annotated (Fig. 1) with the morphological substrate classes set in the task [9]
and the test set was not annotated. The training set was then provided to build and train a
network, with the test set given later to get submission runs. More details about the dataset can
be found at Chamberlain et al. [11].

2.1. The training set
879 images with a combined set of 21,748 annotations were provided as the training data. The
annotations were not evenly split across classes, likely as some are more prevalent than others
in reef systems (Fig. 2).
   Each substrate morphology can be indistinct from others due to the variation in that class’
species. This is particularly true of classes that are not broken down into morphological groups,
i.e. “soft coral”, and less of an issue with classes that are split, i.e. each “hard coral” group.
   The use of NOAA NCEI1 and/or CoralNet2 data was recommended for the task. We chose to
utilise the NOAA data set for some experiments. 3032 NOAA images were downloaded of a
possible 15,019, due to time limitations on our machines. The NOAA data set contains a greater
number of classification labels than the ImageCLEFcoral classes. These classifications are also
of a single pixel (10 pixels per image) so did not provide enough information for our image
analysis and recognition algorithms. We developed a NOAA Translation processor to capture
      1
          https://www.ncei.noaa.gov/access/metadata/landing-page/bin/iso?id=gov.noaa.nodc:0211063
      2
          https://coralnet.ucsd.edu/
Figure 2: Substrate annotations in the ImageCLEF training set of 879 images (n = 21, 748; green), and
in the training set when combined with an additional 502 NOAA images (n = 22, 403; orange).


the classification types within the data set and translate them via an expert defined translation
matrix into the ImageCLEFcoral classes which we made available through the ImageCLEFcoral
website for other participants. The processor then created an adjustable Region Of Interest
(ROI) around the same pixel to provide an image patch, typically a 10x10 pixel area, that enabled
our machine learning routines to adapt to the NOAA data sets.
  5 substrate classes were then selected to refine the number of images to a more manageable
amount: Fire Coral - Millepora, Hard Coral - Foliose, Hard Coral - Table, Hard Coral - Sub-
Massive, and Hard Coral - Encrusting. These classes had a lower number of annotations than
others and were chosen to increase accuracy. Despite low incidence, Soft Coral - Gorgonian,
Hard Coral - Mushroom, and Sponge - Barrel were not chosen from the NOAA data set as they
have more distinct morphologies than the selected classes so were more likely to be predicted
despite relatively few occurrences. Algae - Macro or Leaves were also not selected from the
NOAA data set despite low incidence. Algae classification of the ImageCLEF set only accounted
for large leaf macroalgae, whereas the NOAA data set also included other types such as turf
and CCA, so conflicting annotations could have hampered the model predictions.
  502 viable NOAA images were found, within which 2 of the 5 selected classes were found:
Hard Coral - Encrusting and Hard Coral - Foliose. This almost doubled the processing time per
epoch and pushed the entire model training time from 10 hours to 17.5 hours (10 epochs total),
and increased the total number of substrate annotations from 21,748 to 22,403 (Fig. 2).
(a)                                 (b)                                 (c)




(d)                                 (e)                                 (f)




Figure 3: Transformation of (a) a green and (d) blue image through 2 stages to an image used in
training. Each image had (b,e) balanced RGB levels, then (c,f) when through a generalised channel
mixing process to balance the colors while maintaining image contrast. The leveling and mixing were
selected to optimise substrate color and contrast with less focus on the background and water colouring.


2.2. Image enhancement
Underwater imagery is often lower quality than that taken on land. Light attenuation distorts
colour detection and water turbidity can reduce image quality, and with all underwater imagery
there is a greater chance for blurred or unfocused photographs. Taking steps to investigate,
process and augment the provided data is expected to improve the data quality and subsequent
network results [12, 7].
   Images were visually assessed and split into those with accurate colouring and contrast, those
with a heavy green tint and those with a heavy blue tint. Accurate images were not altered in
any way. Green and blue images were passed through an RGB histogram leveller followed by
an RGB channel mixer, generalised to green or blue images for speed (Fig. 3). This would allow
all the images to be processed easily but would not allow for image-specific editing.

2.3. Data augmentation
Before training the model, each image was cropped into 12 squares which were each then
cropped at a random point to a 480px square. Random horizontal flips were also utilized due
to the limited amount of data. These pre-processing techniques are used to present the model
with different iterations of the same images, increasing the size of the data set available.
2.4. The test set
The provided test set included 485 unannotated images from 4 different regions:
    Region 1:       The training set location.
    Region 2:       A geographically and biologically similar region.
    Region 3:       A geographically distinct but biologically similar region.
    Region 4:       A region that is both geographically and biologically distinct.
  The test images were also cropped into 12 squares to match the training images used on the
model. Each test image was then resized to a 520px square, which allowed us to predict all test
images despite system limitations.
  The predicted pixel array of each test image had to be resized to its original dimensions
before submission to match the ground truth annotation mask. This was carried out using
spline interpolation through the zoom function in SciPy3 .


3. The model
We used the DeepLabV3 model based on a previous submission [8]. It makes use of a ResNet-101
backbone and the application of both atrous convolution and Atrous Spatial Pyramid Pooling
(ASPP). ResNet-101 is used for feature extraction before atrous convolution and ASPP are
applied. Atrous convolution increases the field of view in the last layer of ResNet-101 by
inserting 0-values between filter values used in the network layer [13]. The atrous rate utilised
corresponds to the amount of 0-values inserted - the higher the rate, the bigger the field of view
becomes. ASPP is then applied to assign a label to each pixel using 4 atrous convolution rates.
This enables the model to utilise different aspects of the objects it is identifying, ensuring that
when pixels are labelled the network has seen multiple perspectives of field of view.
   The model was evaluated using different crop and batch sizes during training. Batch size
4 was used in each run as it had the best performance within our system limitations. A crop
size of 480px was selected as, when combined with batch size 4, it provided the greatest overall
accuracy (per mAP0.0 and mAP0.5) of all tested crop size combinations.


4. Submission
Each team in the competition were allowed to submit up to 10 runs per task using the collabo-
ration platform AICrowd4 . We chose to submit 4 files to the pixelwise-parsing subtask only,
representing 4 individual runs:
    MTRU1: the “baseline” run, using the 2019 submission [8] that was rewritten and
           finetuned by experimenting on crop × batch size combinations. Batch size
           4 with crop size 480 were found to give the best results and were used in
           this run.


   3
       https://docs.scipy.org/doc/scipy/reference/generated/scipy.ndimage.zoom.html
   4
       https://www.aicrowd.com/
    MTRU2: the edited ImageCLEF run, using the same settings as MTRU1. Poorly
           coloured training images were enhanced to represent more accurate color-
           ing of the coral reefs.
    MTRU3: the NOAA run, using additional data from NOAA in three different sub-
           strates. The images were not enhanced or edited in any way, and the same
           settings from MTRU1 were used.
    MUTR4: the fully edited run, using same settings as MTRU1, with both the additional
           NOAA data and image colour enhancements where needed.


4.1. Self-intersecting polygons
All 4 runs predicted some images containing self-intersecting polygons. These polygons invali-
date a run and are not permitted in the submission file so must be removed. The evaluation
script was used to identify any images with self-intersecting polygons and the substrate type of
the polygon. This process involved removing each polygon of the relevant substrate type one
by one, re-running the evaluation script each time to check if the error was resolved.
   Initial images were checked polygon by polygon in this manner to minimise any impact on
model accuracy but the time constraints of the challenge required faster processing of the latter
runs. Images in these runs were checked in polygon "batches", where several at a time would be
deleted before running the evaluation script. While this did increase the speed of evaluation
before submission, it is likely that a significant proportion of the deleted polygons were not
self-intersecting and as such the mean average precision (mAP) of the runs would be both lower
and less accurate.

4.2. Blank predictions
3 of the 4 runs (MTRU2, MTRU3 and MTRU4) predicted images with no substrate classes. While
clearly an error as all images were of coral reef substratum, these predictions were a part of our
model outcome and therefore our submitted runs. The evaluation script used upon submission
blocks these images and deem runs with them a failure so each had to be altered. As all images
from the test set must be used, the blank images could not be removed. Our solution to this
was to include a small square annotation in the center of the blank images and label it as Fire
Coral - Millepora. This class was used as it had the lowest number of annotations and had no
additional images added from NOAA images so was likely to be the least accurate class, limiting
the effect on overall accuracy as much as possible.


5. Results and discussion
Results provided by ImageCLEFcoral after submission used 2 metrics. mAP0.5 showed the
localised mean average precision using IoU ≥ 0.5. Accuracy per substrate calculated the
segmenation accuracy as the number of correctly labelled pixels of class over the number of
pixels labeled with class in ground truth.
  Overall results of the pixel-wise parsing subtask (Table 1) show that our runs were less
accurate and precise than the other participant team. When considering the accuracy per class,
however, there were some substrate categories that were better predicted by our model.
   Across the MTRU runs, we saw the highest accuracy of submassive coral, table coral and
foliose coral predictions when images run unedited and without additional NOAA data. The
greatest prediction accuracy of boulder corals and algae across the subtask occurred when
images were colour corrected, and of gorgonian soft corals occurred when unedited ImageCLEF
data and NOAA data were used. MTRU3 was the only instance of gorgonian predictions with
positive accuracy (0.002%) across all submissions. Similarly, MTRU1 was the only instance of
positive accuracy in foliose coral prediction (0.097%). None of our runs predicted mushroom
corals, sponges, barrel sponges or fire coral accurately.
   Of our runs, the greatest precision was seen in MTRU1 (mAP0.5 = 0.021), though it did not
have the highest accuracy (2.767%). MTRU4 was most accurate (2.951%) despite having the
lowest overall precision (mAP0.5 = 0.011).

Table 1
Overall precision (mAP0.5), average accuracy (%) and substrate class accuracy (%) of pixel-wise pars-
ing subtask submissions to ImageCLEFcoral 2021 (4 MTRU runs, 1 run from team Pilsen Eyes that
performed best overall). The best scores for each category are shown in orange.
   Category                      MTRU1        MTRU2        MTRU3         MTRU4        Pilsen Eyes
   mAP0.5                        0.021        0.018        0.017         0.011        0.075
   Average accuracy              2.767        2.531        1.942         2.951        6.147
   Hard Coral - Branching        1.090        2.299        0.536         5.562        11.095
   Hard Coral - Submassive       3.022        0.279        1.036         0.039        2.704
   Hard Coral - Boulder          9.607        12.787       7.601         8.827        5.385
   Hard Coral - Encrusting       0.017        2.595        0.729         2.429        2.615
   Hard Coral - Table            0.353        0            0             0            0.008
   Hard Coral - Foliose          0.097        0            0             0            0
   Hard Coral - Mushroom         0            0            0             0            0
   Soft Coral                    0            0            0.228         0            50.433
   Gorgonian                     0            0            0.002         0            0
   Sponge                        0            0            0             0            1.625
   Barrel Sponge                 0            0            0             0            0.329
   Fire Coral - Millepora        0            0            0             0            0
   Algae                         0            0.027        0             0            1.0𝑒−4

  Overall precision and average accuracy were also lower than the 2019 run of this model [8],
however we did show improvement in the prediction of submassive corals (MTRU1 = 0.030221,
2019 = 0) and table corals (MTRU1 = 0.0.003534, 2019 = 0), neither of which were predicted
with any accuracy in 2019.

5.1. Image colour enhancement
The colour adjustments made to the images increased the prediction accuracy of boulder
corals, encrusting corals and algae (Table 1). For boulder corals, colour enhancement may have
distinguished them from other reef substrates and enabled greater recognition of the coral over
rocks and other substratum that they can easily resemble. Encrusting corals would benefit for
the same reasons. Algae would likely show improvement with colour enhancement due to the
removal of green image tints, which would allow the natural green of the algae to become more
defined and clear. Brown and red algae would also benefit from the red channel correction to
make them more distinct from surrounding substrate.
  Submassive corals were less accurately predicted with image enhancement, as well as table
corals, foliose corals, soft corals and gorgonians. Any loss in predictive power is likely due to
the general nature of the colour correction performed. While some images would improve with
the balancing and mixing at the levels set, others may have had colour blow outs or excessive
input from one or more RGB channels. This could have a blur-like effect, wherein neighbouring
substrate categories look indistinct from each other due to a lack of colour definition.

5.2. Augmenting annotations with NOAA data
The NOAA data used was from a different location than the ImageCLEF data which could
greatly impact mAP0.5 and prediction accuracy as substrates from different geographic regions
can show vastly different morphologies. Of the 2 categories with increased annotations from
the NOAA data set, encrusting corals saw a greater accuracy while foliose corals had less
prediction accuracy. Encrusting corals are very similar globally despite varying conditions, so
increasing the number of annotations would likely improve the models predictive power by
adding distinctive pixels to train on. This is not the case with foliose corals, which are more
likely to show differing morphologies as they are not flat to the substrate. Foliose corals also
have extensive structures that often appear layered and often appear to have many shadows
that could hamper the training capabilities of the model. Any shadows would look like black,
probably with a flat texture, regions of the image. These would provide no benefit to the model
and may cause it to relate any dark spots to foliose corals or to fail to recognise them at all.
   Adding NOAA data had a detrimental effect on the accuracy of most other substrate categories.
Where a prediction accuracy > 0 without NOAA data (MTRU1 and MTRU2), adding NOAA
annotations reduced the prediction accuracy of submassive, boulder, and table corals as well
as algae. This could occur if the additional NOAA annotations skewed the models perception
of each category and altered the predictions made as a result. Accuracy also decreased for
branching corals between MTRU1 and MTRU3 (unedited images), but increased between the
colour enhanced runs (MTRU2 and MTRU4) by 5.026%. Predictions were also more accurate for
soft coral (+0.228%) and gorgonians (+0.002%) when NOAA data was added but no colour
enhancement was performed. These substrate categories can form more distinct morphologies
across all locations that may have become more distinct with an increasingly balanced data set at
the expense of the other classes. Although the soft coral category encompasses several distinct
organisms with different morphologies, the abundance of annotations likely compensated by
providing many examples of each structure.


6. Conclusions
Image colour enhancements can increase the accuracy of coral reef substrate predictions when
those substrates are otherwise difficult to distinguish from the surrounding environment. It
can also be detrimental when the editing performed is generalised instead of image specific.
Similarly, augmenting the training data set with NOAA annotations can improve the predictions
of substrates that are either morphologically general across different geographical regions or
those that form distinct structures despite changing geography. Large increases in the number
of annotations should be reflected in a subsequent increase of accuracy in the represented
classes. When this does not occur, the abundance of data can impair the predictive power of the
model by blurring the line between substrate categories through incorrect annotation or by
skewing the predictions made as a result of an imbalanced data set.
   A combination of an augmented data set with distinct image enhancement pathways for
either different geographic locations or substrate categories may provide a more accurate and
precise prediction array. Combining these steps with improved hyperparameters would enhance
model performance and provide a coral reef substrate prediction tool that would be applicable
to reefs across the globe.

6.1. Limitations of the model
The use of a dedicated GPU greatly increases the computational power of machine learning
models. Training time can then be diminished and hyperarameters can be improved. The
machine we used to run our model was affected by a lack of GPU memory, which can only be
rectified by changing the graphics card to a more powerful one. The memory limitation heavily
impacted batch sizes testing, limiting tests to batch size 4 at most. DeepLabV3 works best with
a batch size of 16 (demonstrated on the PASCAL VOC data set [13]). Using a computer with a
better GPU would allow for a greater batch size to be used which would improve the model
parameters and strengthen the power of the predictions.
   In the future we plan to include a greater volume of NOAA data when training the model.
This would both increase the number annotations per class across the training data. More
specific pixel expansion would have also enabled us to be more precise in training and may
have provided more pixels per class than otherwise achieved. A potential method could have
different expansion shapes set by class (i.e. boulder coral expands as a circle) and a pixel
selection/rejection threshold based on annotated pixel value.

6.2. Improving the approach
Images and predictions would likely benefit from a more tailored colour correction approach.
This could be performed with the commonly used Rayleigh distribution [12, 14, 15] or with a
different approach such as red channel weighted compensations [16] that leverage the other
colour input channels to colour balance an image with accuracy.
   Leveraging the results from this approach, developing a staggered pipeline may improve
prediction accuracy in the future. A bounding box approach to gain a generalised location of
each substrate could be used to then send images through different processing steps, such as
colour correction, blur reduction, contrast changes, etc, based on the class found. This could
then feed into a pixel-wise prediction model to find precise location of substrate classes within
an image.
Acknowledgments
We would like to thank the team that developed the 2019 base code that we used [8], particularly
Antonio Campello for his support and advice throughout this process. We would also like to
thank NOAA and the MTRU team of participants at the 2020 NOAA hackathon https://www.
gpuhackathons.org/event/noaa-gpu-hackathon, when we began working on this project.


References
 [1] F. Moberg, C. Folke, Ecological goods and services of coral reef systems, Ecological
     Economics 29 (1999) 215–233.
 [2] B. Bowen, L. Rocha, R. Toonen, S. Karl, The origins of tropical marine biodiversity, Trends
     in Ecology and Evolution 28 (2013) 359–366.
 [3] L. Jones, P. Mannion, A. Farnsworth, P. Valdes, S. Kelland, P. A. Allison, Coupling of
     palaeontological and neontological reef coral data improves forecasts of biodiversity
     responses under global climatic change, Royal Society Open Science 6 (2019).
 [4] J. Hill, C. Wilkinson, Methods for Ecological Monitoring of Coral Reefs, 1 ed., Australian
     Institute of Marine Science, Townsville, Australia, 2004.
 [5] A. Mahmood, M. Bennamoun, S. An, F. Sohel, F. Boussaid, R. Hovey, G. Kendrick, R. Fisher,
     Automatic annotation of coral reefs using deep learning, in: OCEANS 2016 MTS/IEEE
     Monterey, 2016.
 [6] K. O’Shea, R. Nash, An Introduction to Convolutional Neural Networks, 2015.
     arXiv:1511.08458.
 [7] L. Picek, A. Říha, A. Zita, Coral reef annotation, localisation and pixel-wise classification
     using Mask R-CNN and Bag of Tricks, in: CLEF2020 Working Notes, volume 2696 of CEUR
     Workshop Proceedings, CEUR-WS.org, 2020.
 [8] A. Steffens, A. Campello, J. Ravenscroft, A. Clark, H. Hagras, Deep segmentation: Using
     deep convolutional networks for coral reef pixel-wise parsing, in: CLEF2019 Working
     Notes, volume 2380 of CEUR Workshop Proceedings, CEUR-WS.org, 2019.
 [9] J. Chamberlain, A. Campello, J. P. Wright, L. G. Clift, A. Clark, A. García Seco de Herrera,
     Overview of ImageCLEFcoral 2019 task, in: CLEF2019 Working Notes, volume 2380 of
     CEUR Workshop Proceedings, CEUR-WS.org, 2019.
[10] J. Chamberlain, A. Campello, J. P. Wright, L. G. Clift, A. Clark, A. García Seco de Herrera,
     Overview of the ImageCLEFcoral 2020 task: Automated coral reef image annotation, in:
     CLEF2020 Working Notes, volume 2696 of CEUR Workshop Proceedings, CEUR-WS.org,
     2020.
[11] J. Chamberlain, A. García Seco de Herrera, A. Campello, A. Clark, T. A. Oliver, H. Mous-
     tahfid, Overview of the ImageCLEFcoral 2021task: Coral reef image annotation of a 3d
     environment, in: CLEF2021 Working Notes, CEUR Workshop Proceedings, CEUR-WS.org,
     Bucharest, Romania, 2021.
[12] M. Arendt, J. Rückert, R. Brüngel, C. Brumann, C. Friedrich, The effects of colour enhance-
     ment and IoU optimisation on object detection and segmentation of coral reef structures,
     in: CLEF2020 Working Notes, volume 2696 of CEUR Workshop Proceedings, CEUR-WS.org,
     2020.
[13] L.-C. Chen, G. Papandreou, F. Schroff, H. Adam, Rethinking Atrous Convolution for
     Semantic Image Segmentation, 2017. arXiv:1706.05587.
[14] A. Abdul Ghani, N. Mat Isa, Underwater image quality enhancement through composition
     of dual-intensity images and rayleigh-stretching, SpringerPlus 3 (2014) 757.
[15] A. Abdul Ghani, N. Mat Isa, Underwater image quality enhancement through integrated
     color model with rayleigh distribution, Applied Soft Computing 27 (2014) 219–230.
[16] W. Xiang, P. Yang, S. Wang, B. Xu, H. Liu, Underwater image enhancement based on red
     channel weighted compensation and gamma correction model, Opto-Electronic Advances
     1 (2018) 180024.