=Paper= {{Paper |id=Vol-2380/paper_105 |storemode=property |title=Automatic Classification of Coral Images using Color and Textures |pdfUrl=https://ceur-ws.org/Vol-2380/paper_105.pdf |volume=Vol-2380 |authors=Cristina M.R. Caridade,André R.S. Marcal |dblpUrl=https://dblp.org/rec/conf/clef/CaridadeM19 }} ==Automatic Classification of Coral Images using Color and Textures== https://ceur-ws.org/Vol-2380/paper_105.pdf
    Automatic classification of coral images using
                colour and textures

          Cristina M.R. Caridade,1[0000−0003−3667−5328] and André R.S.
                         Marcal,2[0000−0002−8501−0974]
                   1
                    Coimbra Polytechnic - ISEC, Coimbra, Portugal
                                  caridade@isec.pt
               2
                 Faculdade de Ciências, Universidade do Porto, Portugal
                              andre.marcal@fc.up.pt



        Abstract. The purpose of this work is to address the imageCLEF 2019
        coral challenge - to develop a system for the detection and identification
        of substrates in coral images. Initially a revision of the 13 classes was
        carried out by identifying a number of sub-classes for some substrates.
        Four features were considered – 3 related to greyscale intensity (1) and
        texture (2), and 1 related to the colour content. The Breiman’s Random
        forest algorithm was used to classify the corals in one of 13 classes defined.
        A classification accuracy of about 49% was obtained.

        Keywords: Image classification · classification methods · image pro-
        cessing.


1     Introduction

Coral reefs are large underwater structures composed of the skeletons of colonial
marine invertebrates called coral. The coral species that build reefs are known
as hermatypic, or ”hard,” corals because they extract calcium carbonate from
seawater to create a hard, durable exoskeleton that protects their soft, sac-like
bodies. Other species of corals that are not involved in reef building are known
as “soft” corals. These types of corals are flexible organisms often resembling
plants and trees and include species such as sea fans and sea whips, according to
the Coral Reef Alliance (CORAL), a non-profit environmental organization [1].
    Coral reefs support immense biodiversity and provide important ecosystem
services to many millions of people, yet they are degrading rapidly in response
to numerous anthropogenic drivers [2]. In fact, coral reefs are in danger of being
lost within the next 30 years, and with them the ecosystems they support [3].
This catastrophe will not only see the extinction of many marine species, but
also create a humanitarian crisis on a global scale for those who rely on reef
services. By monitoring the changes and composition of coral reefs conservation
    Copyright c 2019 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0). CLEF 2019, 9-12 Septem-
    ber 2019, Lugano, Switzerland.
efforts can be better implemented and prioritised [1]. The imageCLEF 2019
initiative addresses this issue, by proposing a challenge based on the detection
and identification of substrates in coral images. The aim is to define a set of
bounding boxes around the substrates found, and to make the identification of
the classes to which they belong [4, 5].
    In this paper, we propose to solve this challenge using a fully automatic
process to identify coral substrates in digital images. The method developed
uses colour and texture to identify regions of interest and, using the Breiman’s
Random forest algorithm [6] to classify a coral in one of 13 classes. The digital
images used in this study are 3024 × 4032 pixels, in RGB (Red, Green and Blue)
format.


2   Data

The data for the ImageCLEF2019 Coral task originates from a growing, large-
scale collection of images taken from coral reefs around the world as part of a
coral reef monitoring project with the Marine Technology Research Unit at the
University of Essex [3]. Substrates of the same type can have very different mor-
phologies, color variation and patterns. Some of the images contain a white line
(scientific measurement tape) that may occlude part of the entity. The quality of
the images is variable, some are blurry, and some have poor color balance. This
is representative of the Marine Technology Research Unit dataset and all images
are useful for data analysis. The images contain annotations of the following 13
types of substrates: Hard Coral – Branching, Hard Coral – Submassive, Hard
Coral – Boulder, Hard Coral – Encrusting, Hard Coral – Table, Hard Coral –
Foliose, Hard Coral – Mushroom, Soft Coral, Soft Coral – Gorgonian, Sponge,
Sponge – Barrel, Fire Coral – Millepora and Algae - Macro or Leaves [1, 4]. A
more detailed description of the dataset is presented in [1].
     The training set contains 240 images with 6430 substrates annotated. Two
files are provided with ground truth annotations:

 – one based on bounding boxes
   ”imageCLEFcoral2019 annotations training task 1”
 – and a more detailed annotation based on bounding polygon
   ”imageCLEFcoral2019 annotations training task 2”.

The test set contains 200 images [1, 4].


3   Methodology

The methodology proposed to detected and identify substrates in coral images
is presented schematically in Figure 1. In a first phase, the training images are
processed, and the regions that define each coral substrates are identified (Coral
substrates). Then, the features of each of these substrates (Coral features) are
used to train the classifier (Classification). In a second phase, the classifier is
applied to the test images, thus obtaining images of classified corals. These im-
ages are post-processed and their corals (Connected components) are identified,
a text file with the relevant information is produced.




                  Fig. 1. Diagram of the proposed methodology.



    The 13 types of substrates were identified in the 240 training images. A total
of 6430 substrates annotations are availably, as listed in Table 1 and illustrated
in Figure 2. A colour is assigned for each substrate, presented in the second
column of Table 1. The number of occurrences of each substrate in the training
images is presented in the third column of Table 1.
    Figure 2 shows two training images and the number and type of substrates
identified in these images. On the image on the left (2018 0714 11244 018), the
following substrates were identified: ”hard coral branching” (4 times), ”hard
coral boulder” (2 times), ”hard coral encrusting” (3 times), ”soft coral” (8
times), ”sponge” (7 times), ”sponge barrel” (1 time) and ”algae macro or leaves”
(3 times). For image 2018 0729 112525 048 (right): ”hard coral branching” (4
times), ”hard coral boulder” (4 times), ”hard coral encrusting” (2 times), ”soft
coral” (11 times) and ”sponge barrel” (1 time).


3.1   Coral substrates

By visual observation of several training images, it was verified that within
the same substrate there are different types of corals, both in shape, colour
and texture. Therefore, within each substrate different types (sub-classes) were
identified. Another difficulty in the analysis of these images is the overlapping
of the substrates identified in the training image. For example, in the left image
of Figure 2, the substrate ”sponge” (number 11 in green) is inside the region
defined by the substrate ”hard coral boulder” (number 0 in pink).
                 Table 1. Substrates annotated in training images.




Fig. 2. Training images with bounding box of the substrates annotated in the respective
color.



    Table 2 shows different types of corals identified in each substrate. The types
of corals defined in this table are due to the different textures and shapes that
each one presents. For example, in the hard coral branching (first line) two differ-
ent types were identified and in the soft coral (line 7) six types because the latter
substrate has a huge variety of types. This procedure was performed visually by
the observation of different images with different types of substrates.
    For each type of coral in each substrate, several replicates were identified
in different training images by manual process. These replicates were identified,
where possible, so that the substrate would fill the entire region defined. In the
type of corals that appear more frequently more replicates were identified than
in others.
              Table 2. The type of corals defined for each substrates.




3.2   Coral features

When we look at the coral images we find that they have different colours from
green, blue, red, orange, brown and white. However, these colours are not unique
identifiers of a substrate. The textures are also present on the substrates. Some
corals are harder, rugged, sharper, and others smoother and softer. Hue is also
a relevant characteristic of substrates.
    With the replicates identified, the 4 most relevant features (by empirical way)
were calculated, in a 5x5 neighbourhood of each pixel: mean (M), standard devi-
ation (STD), entropy (E) of grayscale image and hue ratio (HR). The mean and
the standard deviation of the neighbourhood identify the spatial arrangement
of intensities in a selected region of an image and are represented in Equation 1
and Equation 2 respectively. This is

                                            P
                                                 p
                                    M=                                          (1)
                                             N
                                       r
                                           (p − M )2
                              ST D =                 ,                          (2)
                                             N −1
   where p is the intensity of the pixels and N is the neighbourhood size. The
Entropy is a statistical measure of randomness that can be used to characterize
the image texture. Entropy (Equation 3) is defined as
                                       X
                              E=−          p ∗ log2 (p),                        (3)

    where p is the intensity of the pixels.
    The RGB image is converted to the HSV (Hue, Saturation, Value) colour
model. Only values of Hue (H) below 0.34 or above 0.73 are considered for this
feature. This threshold values were obtained by visual inspection of the training
images. After that, the Hue ratio (HR) in equation 4 is calculated by dividing
the number of pixels belonging to the range (H < 0.34 or H > 0.73) over the
total number of pixels in the neighbourhood region.

                            #pixels(H < 0.34 or H > 0.73)
                     HR =                                 .                     (4)
                                      #pixels
    Table 3 shows the type of corals defined for each substrate, the number of
replicates identified in each type and their features (mean, standard deviation,
entropy and hue ratio). In hard coral branching substrate the first and second
types was defined by 24 replicates while in algae macro or leaves substrate only
8 replicates are defined for the types.
    The features are elements of each type that identify it individually in relation
to the other types. With these elements it is intended to characterize each type
in a unique way. In Figure 3 it is possible to observe the features of a test image
(2018 0712 073252 024). In the first line we can observe the original image (a)
on the left, the average image (b) in the center and the standard deviation (c)
on the right. In the second line the entropy (d) on the center and hue ratio of
the image (e) on the right.


3.3   Classification method

Using Classification learner app available in MATLAB [7] environment, it is
possible to classify the training coral image using various algorithms and compare
the results in the same environment. After training multiple models, they were
compared, based on validation errors. The classification models available in this
app are: decision trees, discriminant analysis, Support Vector Machines (SVM),
logistic regression, nearest neighbors, and ensemble classification. With the data
                      Table 3. Features valuers for the substrates.




                (a)                        (b)                        (c)




                                           (d)                        (e)



Fig. 3. : Original image (a) and its features: mean (b), standard deviation (c), entropy
(d) and hue ratio (e).


described previously and the 4 features used, the best model was Random Forest
which obtained a classification accuracy score of about 49%.
    Random Forest consists of a collection of classifiers based on decision trees
in which each tree gives rise to a vote for the output forecast. Random Forest
produces a model consisting of n decision trees (ensemble), where each tree is
based on a number of randomly selected instances of the training set. Each
node of each tree is constructed from a random subset of the attributes. Upon
receiving a test instance each tree will decide (vote) on which class it belongs to.
The most voted class will be the class provided by the model. The most widely
reported Random Forest algorithm is the Breiman algorithm [6].
    The application of Random Forest pixel to pixel classifier in a test image with
3024 x 4032 pixels has a large computational weight since 4 matrices of 3024 x
4032 are constructed for the 4 features of each image pixel and another matrix
with same size with confidence values. On the other hand, as it was necessary
to classify 200 test images, the initial images were reduced by a factor of 4, so
that the processing became faster.

3.4   Connected components
After classification it is necessary to find connected components of pixels belong-
ing to the same substrate (area). As the areas of the substrates already identified
in the training image are greater than 43,865 pixels, then in the classification of
the test images, only areas with more than 500 (20 percent of one sixteenth of
the average size of the substrates - 20%(43, 865/16) ≈ 500) pixels will be vali-
dated as substrate areas if the confidence valuer is greater than or equal to 0.5.
With the regions identified in the images it is necessary to collect information
about the image and class to which they belong, the degree of confidence of this
classification and the position in the image of the bounding box that surrounds it
(x minimum, y minimum, width and height). Finally, this information is placed
in a text file as follows:




4     Results
The 200 test images were processed according to the methodology described in
section 3. Figures 4 and 5 show the results obtained for two different test images.
    In Figure 4 the results for the image 2018 0712 073252 024 and in Figure
5 for the image 2018 0712 073534 067. The original image (a) and the classi-
fied image (b) and (c) where the pixels are identified within a substrate. The
black pixels are not classified one of the following situations: substrate areas or
confidence valuer (see section 3.4). The coral identified (d) with only the coral
substrates presents in the test image. The colour shown in the images corre-
spond to the colours identified for each substrate in the table 1. In the case of
                         (a)                               (b)




                         (c)                               (d)



Fig. 4. : Original image (a), classified image in gray (b) and color (c) representation
and the coral identified (c).




                         (a)                               (b)




                         (c)                               (d)



Fig. 5. : Original image (a), classified image in gray (b) and color (c) representation
and the coral identified (c).
Figure 4, the substrates identified are: ”soft coral” (light blue), ”sponge barrel”
(red), ”sponge” (light green), ”hard coral encrusting” (dark blue), ”hard coral
boulder” (pink) and hard coral mushroom” (white). With this classification ob-
tained, the relevant information is exported to a text file. An example (for image
2018 0712 073252 024) is presented in Figure 6.




                       Fig. 6. : Line information in text file.




5   Conclusions

The methodology developed allows to identify substrates of different classes in
digital images of corals. The preliminary results are not very favourable, but
there are many potential improvements that can be implemented. The most
promising lines of work forward would be to focus on a better identification of
sub-classes, and the use of additional features related to both colour and texture.


References
1. Jon Chamberlain, Antonio Campello, Jessica P. Wright, Louis G. Clift, Adrian
   Clark and Alba Garcı́a Seco de Herrera: Overview of ImageCLEFcoral 2019 Task,
   CLEF 2019 working notes. CEUR Workshop Proceedings (CEUR- WS.org), ISSN
   1613-0073, http://ceur-ws.org/Vol-2380/.
2. Hughes T.P., Barnes M.L., Bellwood D.R., Cinner J.E., Cumming G.S., Jackson
   J.B.C., Kleypas J., van de Leemput I.A., Lough J.M., Morrison T.H., Palumbi
   S.R., van Nes E.H., Scheffer M., Coral reefs in the Anthropocene, NATURE, 546
   (7656), 82-90 (2017). DOI: 10.1038/nature22901
3. LifeScience, https://www.livescience.com/40276-coral-reefs.html. Last accessed 22
   May 2019.
4. ImageCLEFcoral 2019, https://www.imageclef.org/2019/coral. Last accessed 23
   May 2019.
5. Bogdan Ionescu, Henning Müller, Renaud Péteri, Yashin Dicente Cid, Vitali Li-
   auchuk, Vassili Kovalev, Dzmitri Klimuk, Aleh Tarasau, Asma Ben Abacha, Sadid
   A. Hasan, Vivek Datla, Joey Liu, Dina Demner-Fushman, Duc-Tien Dang-Nguyen,
   Luca Piras, Michael Riegler, Minh-Triet Tran, Mathias Lux, Cathal Gurrin, Obioma
   Pelka, Christoph M. Friedrich, Alba Garcı́a Seco de Herrera, Narciso Garcia, Ergina
   Kavallieratou, Carlos Roberto del Blanco, Carlos Cuevas Rodrı́guez, Nikos Vasil-
   lopoulos, Konstantinos Karampidis, Jon Chamberlain, Adrian Clark and Antonio
   Campello, ImageCLEF 2019: Multimedia Retrieval in Medicine, Lifelogging, Se-
   curity and Nature In: Experimental IR Meets Multilinguality, Multimodality, and
   Interaction. Proceedings of the 10th International Conference of the CLEF Associa-
   tion (CLEF 2019), Lugano, Switzerland, LNCS Lecture Notes in Computer Science,
   Springer (September 09-12 2019).
6. Breiman, L.: Random forests, Machine Learning, 45(1), 5-32 (2001).
7. MATLAB R2017a, The MathWorks, Inc., Natick, Massachusetts, United States.