<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The e ects of colour enhancement and IoU optimisation on ob ject detection and segmentation of coral reef structures</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Applied Sciences and Arts Dortmund</institution>
          ,
          <addr-line>Emil-Figge-Str. 42, 44227 Dortmund</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Medical Informatics, Biometry and Epidemiology (IMIBE), University Hospital Essen</institution>
          ,
          <addr-line>Essen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Kairos GmbH</institution>
          ,
          <addr-line>Bochum</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper considers approaches used to localise and annotate coral reef structures in underwater images. Beside the actual localisation and annotation the focus laid on image pre-processing and their evaluation. Underwater images di er from terrestrial images in illumination, acuity and colour which make them more blurred with a green and blue cast. To enhance those physical properties, Image Blurriness and Light Absorption (IBLA) with additional Rayleigh optimisation or additional colour reduction were used. Afterwards, for both competition tasks Mask R-CNN was used, involving on-the- y data augmentation and oversampling to combat the coral class imbalances. Several types of post-processing were applied to the generated boxes and polygons, mostly to account for the evaluation methodologies. IBLA and Rayleigh pre-processing improved accuracy for the localisation and annotation task, while colour reduction led to overall worse results than the original images and also oversampling led to even worse mean Average Precision (mAP) and only a slightly better average accuracy. For pixelwise parsing IBLA achieved better mAP score but worse accuracy and Rayleigh achieved worse results for mAP and accuracy. Colour reduction worked well and oversampling reduced mAP but strongly improved average accuracy. Concluding, image pre-processing { in particular IBLA and Rayleigh { has improved accuracy for both tasks and only achieved better mAP on the pixel-wise parsing task. In future work, the results could be improved by using larger images, trying other types of oversampling and train separate models for di erent classes and object size.</p>
      </abstract>
      <kwd-group>
        <kwd>underwater colour correction</kwd>
        <kwd>box optimisation</kwd>
        <kwd>Mask RCNN</kwd>
        <kwd>deep learning</kwd>
        <kwd>Jaccard index</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>This paper considers the results of the 2020 ImageCLEFcoral challenge which
took place for the second time [3]. Main focus lies on automatic image
detection and classi cation of coral reef structures. ImageCLEFcoral is a subtask of
ImageCLEF [7]. Underwater images di er from terrestrial images in terms of
colour, illumination and acuity which can cause problems in automatic
detection. Nevertheless, automatic detection and classi cation is needed as manual
detection is cost and time intensive [15]. To regard the issues with underwater
images the approaches focused on image pre-processing to enhance image
structure, illumination and colour to test the e ect of these steps on detection and
segmentation.</p>
      <p>After actual training of the models, several types of post-processing were
applied to the generated boxes and polygons, mostly to account for the
evaluation methodologies. For this, generated polygons were validated and boxes were
shrunk to achieve a better Jaccard index, also known as Intersection over Union
(IoU).</p>
      <p>The paper is separated in the sections of data set description and
preprocessing showing the main issues with underwater images followed by
annotation and localisation (object detection task 1) and pixel-wise parsing
(segmentation task 2). The results and discussion build the main part and is completed
with the conclusion. All scripts used during this project are available in a GitHub
repository4.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Image and annotation data</title>
      <p>All details about the task and data can be found in the task overview paper
[3]. The provided version 4 of the data set consists of n = 440 images in the
development part (training set) and m = 400 images in the test part. All images
are provided in the Joint Photographic Experts Group (JPEG) format with a
resolution of 4032 3024 px. An annotation le for the development part holds
k = 12; 082 annotations, structured in eight variables:
image id: Data set-wide unique image identi er, equal to the image lename.
substrate: Image identi er-wide unique substrate index, starting by 0.
c class: Name of one of the 13 present coral classes.
confidence: Con dence of annotation, always 1 for 100% con dence.
x min: Minimum x-axis bounding box coordinate5.
y max: Maximum y-axis bounding box coordinate5.
x max: Maximum x-axis bounding box coordinate5.
y min: Minimum y-axis bounding box coordinate5.</p>
      <p>In the following, the results of explorative analyses on annotation data and
applied image pre-processing methods are presented.</p>
      <sec id="sec-2-1">
        <title>4 https://github.com/saviola777/fhdo-imageclef2020-coral/, accessed 2020-07-07</title>
        <p>5 On the image level the coordinate system origin (x min = 0; y min = 0) is located
in the upper left corner.
2.1</p>
        <sec id="sec-2-1-1">
          <title>Explorative data analyses on annotations</title>
          <p>Explorative analyses on annotation data of the training set were conducted using
the statistical language R6 [12] in version 4.0.1 and the integrated development
environment RStudio7 [14] in version 1.2.5033. Spatial analysis of substrate
bounding boxes was performed using the Simple Features for R (sf) package8
[9] in version 0.9-2.</p>
          <p>During a rst screening minor inconsistencies were identi ed. These
comprise (i) a single negative coordinate value of x min = 1 present in one row
(substrate 10 in 2018 0714 112502 024), and (ii) ve dot-sized bounding boxes
with x min = x max and y min = y max (substrate 17 in 2018 0714 112534 047;
substrate 14 in 2018 0714 112535 042; substrate 1 in 2018 0714 112535 050;
substrate 21 in 2018 0729 112613 064; substrate 5 in 2018 0729 112458 039).
For the negative coordinate the respective substrate was checked manually on
the respective image. A sign ip was performed and its bounding box was kept.
Dot-sized bounding boxes were removed. Cleansing of annotation data resulted
in a total of k = 12; 077 entries, still related to n = 440 images. Presented results
are based on the cleansed annotations.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>6 https://www.r-project.org/, accessed 2020-07-09</title>
      </sec>
      <sec id="sec-2-3">
        <title>7 https://rstudio.com/, accessed 2020-07-09</title>
      </sec>
      <sec id="sec-2-4">
        <title>8 https://github.com/r-spatial/sf, accessed 2020-07-09</title>
        <p>Annotations comprise 13 substrate classes, listed in Table 1 together with
their frequencies, overall percentages, presence in images and maximum count
they occurred in an image. The top two classes in regards to their frequency
account for 60:1% of all annotations while the top ve classes account for 92:1%.
Soft Coral is the most common class and represents 47:2% of all annotations.</p>
        <p>Statistics on substrate bounding box per-image frequency, aspect ratio (x:y)
and area (px) are listed in Table 2. The substrate density in images features a
high variety. While 2018 0712 073801 116 is the only image with a single
substrate, 2018 0712 073920 154 shows the maximum of 96 substrates in a single
image. The median number of substrates in an image is 24, rather few images
contain a vast amount. The aspect ratio of substrate bounding boxes also shows
a wide span with a minimum of 0:12 and a maximum of 8:51. Highly elongated
bounding boxes are present, however, a median of 1:08 and an interquartile range
of 0:30 suggest a moderate elongation in most cases. Also the areas of substrate
bounding boxes show a wide span and with partially extreme low and high
values. For a better understanding square area values are discussed, assuming
substrate bounding boxes with an aspect ratio of 1:1. Here, the minimum area is
12:73 px2 while also a maximum of 3; 249:23 px2 is present. The median square
area is 241:94 px2.</p>
        <p>The spatial analysis of substrate bounding boxes revealed a notable amount
of overlaps where up to ve boxes shared an intersecting area. The most common
overlap scenario involve two and three substrate bounding boxes. For two
substrate bounding boxes the mean (median) is 14:90 (12) overlaps per image, for
three substrate bounding boxes it is 2:46 (3). A maximum of up to 81 overlaps
between two and up to 32 between three substrate bounding boxes indicates
numerous redundancies of annotated areas for several images. Overlaps between
four and ve substrate bounding boxes are rare. Also full overlaps between
substrate bounding boxes have been found where a bounding box of a substrate fully
covers that of another. For intra-class overlaps 146 full overlaps have been
identied, while for inter-class overlaps 509 were found. Especially inter-class overlaps
and full overlaps may display a challenging condition for object detection tasks.
2.2</p>
        <sec id="sec-2-4-1">
          <title>Pre-processing</title>
          <p>Underwater images di er from other images in their physical properties. The
deeper the images where taken the darker the images get as well as red light is
more absorbed than green and blue light. This often results in blurred images
with a green and blue cast [10].</p>
          <p>The idea was to enhance image quality prior to segmentation and parsing
which should lead to enhanced segmentation and parsing results. The best
preprocessing steps were chosen by visual inspection. Pre-processing functions [16]
that have been used are described in the following sections. The images were
processed rst by Image Blurriness and Light Absorption (IBLA) followed by
either a transformation of Rayleigh distribution or an octree colour reduction.
Figure 1 (a) and (b) are examples showing the problems with underwater images
described above.</p>
          <p>To visualise the colour distribution, normalised histograms were
made using the NumPy9 package. Figure 2 shows the histograms of images
2018 0714 112438 016 and 2018 0729 112414 024. Clearly seen is the either
high intensity of the green channel or the high intensity of the blue channel in
combination with very low intensity of the red channel typical for underwater
images.</p>
          <p>Image Blurriness and Light Absorption (IBLA) Underwater image
restoration based on IBLA was conducted on both the training and test set
[10,16]. IBLA transformation is based on four main steps. First, the image
blurriness is analysed and afterwards a smoothed and re ned blurriness map is
generated to optimise the image. Second, the background light pixels are estimated
by image blurriness and variance via a quad-tree algorithm. Third, the actual
enhancement using depth estimation based on light absorption and blurriness
which results in an optimised depth map. Last, the transmission map is
estimated leading to restoration rather than estimation [10]. The results are shown
in Figure 1c and Figure 1d.</p>
          <p>Enhancement based on Rayleigh distribution The method for image
enhancement with Rayleigh distribution is separated into two main steps [5]. First,
the contrast is corrected and second, the colour is corrected. For contrast
correction a global histogram stretching is implemented followed by division into
a lower and an upper side by the average point. Both parts are then
Rayleighstretched to the full gray-scale from 0 to 255 and recombined. For colour
enhancement the image is transformed into the Hue, Saturation, and Value (HSV)
colour model. The saturation and value levels are stretched and reconverted to
an Red, Green, Blue colour space (RGB) image. This led to an enhancement in
contrast and details and reduced image artefacts [5,16]. Results are in Figure 1e
and Figure 1f.
9 https://numpy.org/, accessed 2020-07-10
(c) Image (a) IBLA transformed
(d) Image (b) IBLA transformed
(e) Image (c) Rayleigh transformed</p>
          <p>(f) Image (d) Rayleigh transformed
(g) Image (c) colour reduced
(h) Image (d) colour reduced
Fig. 1: Comparison of the original images 2018 0714 112438 016 (a) and
2018 0729 112414 024 (b) from the training set with their three di erent
transformation (c) - (h). (a) and (b) contain the main problems with underwater
images. They are blurry and show a large translation of histograms towards green
or blue. The original images (a) and (b) are under copyright of the organisers
[3,7]
0:2
0:2
0:4</p>
          <p>0:6</p>
          <p>Intensity
(a) Histogram of 2018 0714 112438 016
0:8
0:8
1
1
3
ity 2
s
n
e 1
D
0</p>
          <p>0
10
y
t
isn 5
e
D
0
0</p>
          <p>Colour reduction The colour reduction was conducted using the octree process
reducing all IBLA transformed images to maximum 256 colours as implemented
in the following code 10. The octree colour reduction (for instance described in
[2, p. 333 sqq.]) results in an image with 256 colours with a harmonised colour
distribution [2]. The results are shown in Figure 1g and Figure 1h.
3</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Methods</title>
      <p>Mask R-CNN [6] is an instance segmentation framework which extends Faster
RCNN [13] with a parallel branch for instance segmentation on region of interests.</p>
      <p>The models described in this paper were trained on a Mask R-CNN
implementation using TensorFlow and Keras in Python 311 which was patched
to support TensorFlow 2.112. All models used weights pre-trained on the MS
COCO data set [8].</p>
      <p>To speed up on-the- y pre-processing and avoid padding, all images were
resized to 1536 1536 beforehand.
10 https://github.com/delimitry/octree color quantizer, accessed 2020-07-07
11 Abdulla, W.: Mask R-CNN for object detection and instance segmentation on Keras
and TensorFlow. https://github.com/matterport/Mask RCNN, accessed 2020-07-07
12 https://github.com/Di Pro-ML/Mask RCNN, accessed 2020-07-07</p>
      <p>The training was split into two phases: rst, only the newly added layers
were trained for one epoch. Second, the complete network was trained for the
remaining epochs. Then, Polyak averaging [11] was performed on the top ve
models based on their :632 error [4]. For the submission models, early stopping
was used based on the average epoch number for the top ve models during
cross-validation training.</p>
      <p>For on-the- y data augmentation, the images were randomly rotated (up to
180 in each direction), ipped (up/down or left/right) with 33% chance each,
and 0 to 5 of blur, sharpen, random crop (up to 20% on each side), Gaussian
noise, brightness, hue/saturation, and contrast were changed.</p>
      <p>To combat the class imbalance in the data set, oversampling was performed
which entails iterative optimisation of the Shannon entropy of the data set by
adding images until the number of images is tripled, with constraints on the
number of times a single image can appear in the nal data set.</p>
      <p>Considering only the ve most frequent classes was a consideration due to
the imbalance of the data set and achieved good results in our cross-validation
runs.</p>
      <p>Table 3 lists the most important parameters used for the di erent training
runs. The training run parameters are listed in Table 4. It includes the submission
ID, the run name, which data set was used (original images, IBLA pre-processing,
IBLA plus Rayleigh pre-processing, colour reduced, see Section 2.2), whether
onthe- y data augmentation as described above was applied, whether images of size
1024 1024 or 1536 1536 were used, whether oversampling was used as well
as the number of epochs.
The focus of the approach described in this paper was on the annotation and
localisation task, and models were trained using the bounding box annotations
and optimised against the PASCAL Visual Object Classes (VOC)-style mean
Average Precision (mAP) implementation in Mask R-CNN [6].</p>
      <p>The di erent con gurations that were analysed (as seen in Table 4) included
di erent data sets based on the pre-processing described in section 2.2, di erent
levels of on-the- y data augmentation, using two di erent image sizes, using
oversampling, as well as trying to train models for varying numbers of epochs.
For this task, similar approaches as for the annotation and localisation task were
applied only using the polygon annotations for training. The colour reduction
run was skipped due to its poor results for the rst task, instead a number of
di erent runs with larger images were included to see the results of more and
less training as well as a lower con dence threshold.
When the evaluation code for the challenge was published and it turned out
that the submitted bounding boxes would be evaluated against the polygons
instead of the bounding box annotations, it became clear that training with the
polygon annotations would be much more e ective. For example, this led to an
evaluation F1 score of 0:8 for the bounding box training ground truth, whereas
the score was 0:99 for the polygon training ground truth. The loss of 0:01 was
due to invalid polygons in the ground truth annotations.</p>
      <p>This value can be increased from 0:8 to 0:9 simply by reducing the size of the
bounding boxes by 7:5% on each side as seen in Figure 3. This post-processing
step was therefore used for all submissions of the rst task.</p>
      <p>Additionally, an iterative algorithm was created to approximate the best
possible rectangular box for a given polygon according to their IoU. This algorithm
is described in the next section. To make use of this algorithm, a model was
trained on the polygon annotations and the resulting polygons were used to
generate boxes.</p>
      <p>For the second task, the polygons generated from binary masks by OpenCV13
[1] were validated against the shapely library14 which was used in the evaluation
script since about 1% of generated polygons were not valid according to the
shapely library's de nition of a valid polygon and would be ignored by the
evaluation script. A valid polygon may not cross itself and may only touch in a
single point.</p>
      <p>To clean up invalid polygons, rst duplicate points were removed, then the
polygons were split in several separated polygons using the touching /
selfcrossing points and the biggest polygon was used. Separately, the buffer
function provided by the shapely library was used to generate a valid polygon. Then
the polygon with the overall least absolute area di erence compared to the
original, invalid polygon was used.</p>
      <p>0:534
0:593
0:73
Bounding</p>
      <p>Reduced</p>
      <p>Optimized
Fig. 3: IoU values for a detected polygon. The minimum bounding box shows an
IoU of 0:534. After reducing the size by 7:5% in all dimensions the result was
increased to 0:593 as the dotted box shows. Applying the iterative optimisation
algorithm led to the highest IoU of 0:73.
13 https://opencv.org/, accessed 2020-07-10
14 https://github.com/Toblerity/Shapely, accessed 2020-07-09
Bounding box IoU optimisation Given a solid polygon P , which is de ned
by a contour as a set of points. The corresponding minimal enclosing rectangle
can be de ned by four parameters: R0 := [x; y; w; h]. Here x and y de ne the top
rectangles left corner and w and h the width and height, respectively. Calculating
the IoU of the polygon with the rectangle showed that this value is not necessarily
the best to be achieved as can be seen in Figure 3. This became particularly
clear with thin long polygon arms that run parallel to the rectangle edges. In
these cases it often led to a higher IoU if the rectangle was slightly reduced for
the corresponding side. When considered the minimum bounding box, however,
an increase in edge length can only led to an IoU deterioration. Therefore the
polygons minimum bounding box was chosen as the algorithm's starting point
for maximising the IoU. The parameter space was thus de ned by the rectangle's
parameters.</p>
      <p>The optimised objective function was IoU(P; R) = jP \Rj for the given
polyjP [Rj
gon P and a current rectangle Rk at optimisation step k. To maximise the target
function, R was iterative changed in all parameters via translation and scaling, so
that the most optimised rectangle was used for the next iteration. This process
was continued until no further objective function improvement was achieved.
Consequently, the rectangle was accepted as optimised. During the optimisation
process, step sizes for translation, shrinkage and growth are given by t, s and g.
In this application all values were set to four. Each optimisation iteration was
then performed by calculating:</p>
      <p>Rk+1 = Rk + [ t; 0; 0; 0]</p>
      <p>T
Rk+1 = Rk + [0; t; 0; 0]</p>
      <p>T "
Rk+1 = Rk + [s; 0; s; 0]</p>
      <p>S
Rk+1 = Rk + [0; s; 0; s]</p>
      <p>S"
RGk+1 = Rk + [ g; 0; g; 0]
RGk+"1 = Rk + [0; g; 0; g]</p>
      <p>Rk+1 = Rk + [t; 0; 0; 0]</p>
      <p>T !
Rk+1 = Rk + [0; t; 0; 0]</p>
      <p>T #
Rk+1 = Rk + [0; 0; s; 0]</p>
      <p>S!
Rk+1 = Rk + [0; 0; 0; s]</p>
      <p>
        S#
RGk+!1 = Rk + [0; 0; g; 0]
RGk+#1 = Rk + [0; 0; 0; g]
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(3)
      </p>
      <p>As can be seen in Equation 1, the rectangle movement was performed in
every possible direction. It should be noted that the rectangle never excesses
the bounding box dimensions. Equation 2 shows the corresponding contraction
of the rectangle, while Equation 3 calculates its enlargement. Once all
possible rectangles Rk+1 were calculated, the corresponding objective function with
P was evaluated, so that the corresponding improvement factor was known.
Subsequently the operation with the most promising improvement factor was
performed by setting the corresponding rectangle as the new Rk and thus
setting the starting point for the next iteration. As soon as no rectangle shows an
improvement, Rk was no longer updated and was assumed to be optimised with
regard to the objective function. An example for an optimised rectangle is shown
in Figure 3.</p>
    </sec>
    <sec id="sec-4">
      <title>Results and Discussion</title>
      <p>For the image annotation and localisation task (see Table 5), on-the- y data
augmentation expectedly reduced over- tting and led to better results for both
mAP and average accuracy. Pre-processing using IBLA and Rayleigh led to lower
mAP values but increased average accuracy, while colour reduction produced
overall worse results compared to the original images. The Rayleigh run had
the highest average accuracy of all runs submitted for the rst task, followed by
the IBLA and augmentation runs, showing that while the models do not detect
objects as accurately as some of the competitors, they classify the detected
objects very well.
67914 Baseline
67919 5 classes
68188 Augmentation
68187 IBLA
68186 IBLA + Rayleigh
68184 Colour reduction
68185 Oversampling
68183 Larger images
68182 Segmentation
68181 Se+gmIoeUntaotpitoinmisation</p>
      <p>Considering only the 5 most frequent classes achieved worse results than the
baseline in both mAP and accuracy, unlike in the cross-validation runs, where
it achieved the best F1 score on the validation data.</p>
      <p>Surprisingly, oversampling led to even worse mAP results than the baseline
model, while having only slightly better average accuracy. In our cross-validation
runs on the other hand, oversampling achieved results on par with the data
augmentation run. This may be due to a slightly di erent class distribution in
the test data set, or too many epochs of training for the submission models.</p>
      <p>Using larger images led to better mAP scores but worse average accuracy.
Using polygon annotations for training clearly improved the mAP, but did not
improve the average accuracy.</p>
      <p>Looking at the per-substrate accuracies, the IoU optimised run produced the
worst results with seven classes having 0% accuracy. The run without IoU
optimisation and the run with larger images still have six classes without any correct
detections, meaning that all runs with larger images have trouble with the less
frequent classes. The baseline, for comparison, produced no correct detections
for ve classes. The oversampling run only has three classes without any correct
detections, but three more classes with accuracies below 0:05.</p>
      <p>The runs without oversampling performed better, each of them produced
correct detections for all but one class which was the class that had 0% accuracy
across all submitted runs of all participants and was only represented by less
than 30 instances in the training data set.</p>
      <p>The evaluation code was released only several weeks before the submission
deadline for the models, and it produced very di erent results than the
PASCAL VOC-style mAP evaluation recommended in the task's description and
implemented in Mask R-CNN.</p>
      <p>The evaluation strategy was mainly the same for both tasks. The focus of this
work would have been much more on the second task and on training instance
segmentation models if the evaluation code had been published earlier. It is
easier and more e ective to generate suitable boxes from polygons compared to
guessing suitable boxes based on the bounding box or just using the bounding
box itself.</p>
      <p>In e ect, a model which predicts perfect bounding boxes would never be able
to exceed an mAP score of about 0:8 based on objects that are shaped in a way
that produce bounding boxes with very low IoU. This e ect was demonstrated
by the bounding box re nement and IoU optimisation, which led to a signi cant
improvement in mAP score.</p>
      <p>For the image pixel-wise parsing task (see Table 6), IBLA achieved a better
mAP score than the original data set, but a worse average accuracy than the
baseline. Rayleigh performed even worse with a similar mAP score than the
baseline but worse average accuracy.</p>
      <p>Oversampling once again reduced the mAP score, but strongly improved the
average accuracy. Similar to the rst task, using larger images increased the
mAP score but reduced the average accuracy.</p>
      <p>Using larger images and the data set with reduced colours further improved
the mAP score, reaching the overall best value for the second task among the
submitted runs but produced a much lower average accuracy.</p>
      <p>Training for more or less epochs led to overall worse results. Reducing the
con dence threshold led to a very low mAP score but slightly improved the
average accuracy, led once again to the highest average accuracy out of all runs
submitted for the second task.</p>
      <p>Looking at the overall and per-substrate accuracies the models with larger
images interestingly had better accuracies than the one with smaller images,
unlike in the rst task, where it was the other way around. Excluding the run with
0:4 minimum detection threshold which achieved the highest overall accuracy
with a much lower mAP value the oversampling run had the best accuracy.</p>
      <p>The runs without oversampling achieved comparatively worse results with
four to ve classes having no correct detections whereas all other runs had only
the one class without correct detections.</p>
      <p>Post-processing as described in Section 2.1 did not achieved an impact on
classi cation quality as expected. Hence, those results are not shown as they
were not nally implemented.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>Concluding, image pre-processing using IBLA and Rayleigh has improved
accuracy for the localisation and annotation task while achieving better mAP on the
pixel-wise parsing task. Colour reduction worked well with larger images for the
second task in terms of mAP, but falls behind in accuracy.</p>
      <p>Oversampling was overall not successful even though it led to better accuracy
in the second task. This result was not re ected in cross-validation analysis on
the training data. Therefore, oversampling was used in the majority of submitted
runs decreasing the performance especially of the runs with larger images.</p>
      <p>Larger images performed worse in early runs that were not properly
netuned and hence enlargement was not considered in most of the further analysis.
All runs with larger images used oversampling which most likely hurt the model
performance. A stronger focus on larger images would have been useful, since
the results are promising at least for the second task.</p>
      <p>
        Nevertheless, this work has produced competitive models especially in terms
of the classi cation accuracy that could be improved in the future. Examples
are using larger images, trying other types of oversampling, or training separate
models for di erent classes or object sizes.
3. Chamberlain, J., Campello, A., Wright, J.P., Clift, L.G., Clark, A., Garc a Seco de
Herrera, A.: Overview of the ImageCLEFcoral 2020 Task: Automated Coral Reef
Image Annotation. In: CLEF2020 Working Notes. CEUR Workshop Proceedings,
CEUR-WS.org (2020)
4. Efron, B., Tibshirani, R.: Improvements on cross-validation: the 632+ bootstrap
method. Journal of the American Statistical Association 92(438), 548{560 (1997)
5. Ghani, A.S.A., Isa, N.A.M.: Underwater image quality enhancement through
composition of dual-intensity images and Rayleigh-stretching. SpringerPlus 3(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
(2014). https://doi.org/10.1186/2193-1801-3-757
6. He, K., Gkioxari, G., Dollar, P., Girshick, R.: Mask R-CNN. In: 2017
IEEE International Conference on Computer Vision (ICCV). IEEE (2017).
https://doi.org/10.1109/iccv.2017.322
7. Ionescu, B., Muller, H., Peteri, R., Abacha, A.B., Datla, V., Hasan, S.A.,
DemnerFushman, D., Kozlovski, S., Liauchuk, V., Cid, Y.D., Kovalev, V., Pelka, O.,
Friedrich, C.M., de Herrera, A.G.S., Ninh, V.T., Le, T.K., Zhou, L., Piras, L.,
Riegler, M., l Halvorsen, P., Tran, M.T., Lux, M., Gurrin, C., Dang-Nguyen, D.T.,
Chamberlain, J., Clark, A., Campello, A., Fichou, D., Berari, R., Brie, P., Dogariu,
M., Stefan, L.D., Constantin, M.G.: Overview of the ImageCLEF 2020: Multimedia
Retrieval in Lifelogging, Medical, Nature, and Internet Applications. In:
Experimental IR Meets Multilinguality, Multimodality, and Interaction. Proceedings of
the 11th International Conference of the CLEF Association (CLEF 2020), vol.
12260. LNCS Lecture Notes in Computer Science, Springer (2020)
8. Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollar,
P., Zitnick, C.: Microsoft COCO: Common Objects in Context. In: Computer
Vision { ECCV 2014. pp. 740{755. Springer International Publishing (2014).
https://doi.org/10.1007/978-3-319-10602-1 48
9. Pebesma, E.: Simple Features for R: Standardized Support for Spatial Vector Data.
      </p>
      <p>
        The R Journal 10(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), 439{446 (2018). https://doi.org/10.32614/RJ-2018-009
10. Peng, Y.T., Cosman, P.C.: Underwater Image Restoration Based on Image
Blurriness and Light Absorption. IEEE Transactions on Image Processing 26(4), 1579{
1594 (2017). https://doi.org/10.1109/tip.2017.2663846
11. Polyak, B.T., Juditsky, A.B.: Acceleration of stochastic approximation by
averaging. SIAM journal on control and optimization 30(4), 838{855 (1992)
12. R Core Team: R: A Language and Environment for Statistical Computing. R
Foundation for Statistical Computing, Vienna, Austria (2020), https://www.R-project.
org/
13. Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: Towards Real-Time Object
Detection with Region Proposal Networks. In: Advances in neural information
processing systems. pp. 91{99 (2015)
14. RStudio Team: RStudio: Integrated Development Environment for R. RStudio,
      </p>
      <p>Inc., Boston, MA (2019), https://www.rstudio.com/
15. Srividhya, K., Ramya, M.M.: Object classi cation in underwater images using
adaptive fuzzy neural network. In: 2017 13th International Conference on
Natural Computation, Fuzzy Systems and Knowledge Discovery (ICNC-FSKD). pp.
142{148. IEEE (2017). https://doi.org/10.1109/fskd.2017.8392973
16. Wang, Y., Song, W., Fortino, G., Qi, L.Z., Zhang, W., Liotta, A.: An
Experimental-Based Review of Image Enhancement and Image Restoration
Methods for Underwater Imaging. IEEE Access 7, 140233{140251 (2019).
https://doi.org/10.1109/access.2019.2932130</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bradski</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The OpenCV Library</article-title>
          .
          <source>Dr. Dobb's Journal of Software Tools</source>
          <volume>25</volume>
          ,
          <issue>120</issue>
          {
          <fpage>125</fpage>
          (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Burger</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Burge</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          :
          <source>Digital Image Processing - An Algorithmic Introduction Using Java</source>
          . Springer, Berlin, Heidelberg, 2nd edn. (
          <year>2016</year>
          ). https://doi.org/10.1007/978-1-
          <fpage>4471</fpage>
          -6684-9
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>