Machine Learning for Automated Seabed Mapping
                                Umberto Di Laudo1,2,* , Silvia Ceramicola2 and Luca Manzoni1,2
                                1
                                    Dipartimento di Matematica, Informatica e Geoscienze, Università degli Studi di Trieste, Via Alfonso Valerio 12/1, 34127 Trieste, Italy
                                2
                                    Istituto Nazionale di Oceanografia e Geofisica Sperimentale, Borgo Grotta Gigante 42/c, 34010 Sgonico, Italy


                                                 Abstract
                                                 Interpreting morphological features of the seabed is a labor-intensive task for marine geologists especially when it concerns
                                                 extensive portions of seabed. By applying Machine Learning (ML) techniques from the field of computer vision, it is possible
                                                 to significantly streamline this process, speeding it up considerably. In this paper we present a model capable of automatically
                                                 categorizing seabed features, identifying different morphological elements, such as submarine canyons, escarpments, canyon
                                                 headwalls and mass movements. This model will serve as the basis for new tools to assist geologists as well as stakeholders
                                                 dealing with management of coastal or offshore areas in their work, providing them with an efficient support for seabed
                                                 analysis and characterization.

                                                 Keywords
                                                 Deep Learning, Seabed Mapping, Image Segmentation


                                1. Introduction                                                                    seabed interpretation can be considered as a special case
                                                                                                                   of image segmentation [3], where images are replaced
                                One important task for marine geologist is the identifica-                         with a map of seabed features and the labeling associ-
                                tion of the morphological characteristics of the seabed.                           ated to each pixel corresponds to the morphological fea-
                                Underwater morphological features, such as canyons, es-                            ture of the seabed in the specific latitude and longitude.
                                carpments, are components of the seabed environment.                               Hence, standard image segmentation techniques can be
                                These elements are typically shaped by geological pro-                             employed and adapted for this task. In this paper we
                                cesses, including erosion, sedimentation, and tectonic                             present the first results of using a U-Net [4] architecture
                                activity, over extended periods. Detecting their occur-                            for seabed classification trained using the data obtained
                                rence and characteristics play a crucial role in assessing                         via the MaGIC project (Marine Geohazards along the Ital-
                                marine hazards or when placing communication cables at                             ian Coasts) [5, 6], which produced maps of the Italian
                                seabed [1, 2]. However, detection and mapping of these                             coasts of Central and South Italy, Sicily, Sardinia, and
                                underwater morphological elements require specialized                              Liguria in a five-year time frame starting from 2007.
                                domain expertise, such as marine geologist. Despite this                              The paper is structured as follows: in Section 2 the
                                process being very time-consuming for scientists, nowa-                            current state if the art in image segmentation and the
                                days there is still no automated method to detect and                              main groups of existing techniques are presented. In
                                classify the various elements of the underwater environ-                           Section 3 the available data are presented and the specific
                                ment.                                                                              task to be solved is further detailed. The architecture
                                   Hence, any support to help the automatic identifica-                            of the network and the training process are detailed in
                                tion of these features via machine learning (ML) would                             Section 4. The results are then presented in Section 5.
                                produce a significant benefit for marine geologists, mak-                          Finally, in Section 6 we present the planned research
                                ing the entire process smoother and less time-consuming.                           directions.
                                The overarching aim of this project is to develop a model
                                capable of autonomously classifying seabed features, and
                                thus assisting geologists in their analysis. This model 2. Image Segmentation
                                will be designed to identify various morphological ele-
                                ments present in seabed data, like submarine canyons, Image segmentation is a computer vision technique that
                                escarpments, mass movements. To do so, the task of involve partitioning a digital image into multiple seg-
                                                                                                                                       ments or regions, separating meaningful objects or struc-
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- tures within an image from the background or other ob-
                                nized by CINI, May 29-30, 2024, Naples, Italy                                                          jects, i.e., it can be considered a classification problem at
                                *
                                  Corresponding author.                                                                                pixel-level. The segmentation process divides images into
                                †
                                  These authors contributed equally.                                                                   different regions based on certain characteristics such as
                                $ udilaudo@ogs.it (U. Di Laudo); sceramicola@ogs.it                                                    color, intensity, texture, or other features. In particular,
                                (S. Ceramicola); lmanzoni@units.it (L. Manzoni)
                                                                                                                                       with the advent of powerful ML techniques, the features
                                 0000-0002-1318-1272 (S. Ceramicola); 0000-0001-6312-7728
                                (L. Manzoni)                                                                                           are learned by the data instead of being handcrafted by
                                          © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License experts.
                                          Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
  According to [3], there exists three groups of image         to derive via standard GIS tools two additional features
segmentation:                                                  that are considered useful by domain experts: the slope
                                                               and the profile curvature of the seabed, that are respec-
      • Semantic segmentation: Semantic segmentation tively the first and the second derivative of the depth.
         involves classifying each pixel in an image into a The corresponding data can then be interpreted as 2𝐷
         specific category or class.                           fields with three features/channels (i.e., it can be directly
      • Instance segmentation: it extends semantic seg- interpreted/visualized as an image), each one describing
         mentation by not only classifying each pixel into a specific feature of the same seabed area. The data are
         categories but also distinguishing between differ- organized in a first image of about 2800 × 2400 pixels,
         ent object instances of the same category.            while the second of 3000 × 3800 pixels. Each of then
      • Panoptic segmentation: it aims to unify semantic is cut into smaller square windows of length 100 × 100
         segmentation and instance segmentation into a pixels for the training process, for a total of more than
         single framework. It divides an image into seman- 2100 squares.
         tically meaningful regions and assigns a unique
         label to each region, regardless of whether it cor-
                                                               3.2. Ground Truth
         responds to an object instance or a background
         category.                                             All the data obtained by the MaGIC project also received
                                                               a human interpretations of the seabed structure i.e., a
   The main traditional image segmentation techniques labels map of the same dimensions of the input 2D fields
include several methods like thresholding, histograms, indicating the positions of all the elements present in
watersheds, region-growing and clustering based seg- that region. In particular, the labeling is done by drawing
mentation [7].                                                 lines of different types over the depth maps. The different
   As stated above, with the advent of deep learning, new types of lines corresponds to different classes which are
techniques for image segmentation were developed [7], 97 in the original data. Due to the large number of classes
with the main distinction usually being the architecture and the fact that many of them were only represented by
of the neural network used. One of the first architecture a small number of samples, a first pre-processing step was
used for semantic segmentation was a Fully Convolu- done by reducing the number of classes to 15. These 15
tional Network (FCN) proposed by Long et al. in 2015 [8]. classes corresponds to grouping of the original 97 classes
That network includes only convolutional layers and is with the partitioning done according to the morphologi-
able to take an image of a certain size as input and re- cal and geological similarity of the seabed features and
turns a segmentation map of the same dimensions. Other with the help of marine geologists. Hence, the output of
deep learning-based models for segmentation was pro- the model has to be a 2𝐷 field in which each coordinate
posed following the encoder-decoder architecture: Badri- (each pixel) belongs to one of the 16 classes (15 related
narayanan et al. proposed the SegNet [9].                      to the different morphological elements plus 1 for the
   Inspired by the FCN and the encoder-decoder archi- background).
tecture, V-Net [10] and U-Net [4] were proposed, initially
mainly for medical and biomedical purposes. Over the
years, several modification of the U-Net architecture were 4. Architecture and Training
performed to adapt it to different kind of images. For ex-
ample Zhou et al [11] proposed a nested U-Net architec- In this work we perform semantic segmentation using
ture. Furthermore Cicek [12] build a U-Net architecture a U-Net architecture [19]. The network architecture is
for 3𝐷 images. Nowadays U-Net are also used in other comprised of three main elements 1:
fields, e.g., road segmentation [13], face detection [14],
                                                                     • A contracting path that reduce the spatial di-
and autonomous driving [15].
                                                                        mensions of the input image;
                                                                     • An expansive path that increase the spatial di-
3. Seabed Data                                                          mensions;
                                                                     • Skip connection between corresponding layers
In this section we present the main characteristics of the              in the contracting and expansive part.
data used for training and testing the proposed model.
                                                               The specific architecture used in this study is presented
                                                               in Figure 1, with the parameters of the different layers
3.1. Input Data
                                                               presented in Table 1. The contracting path has the typi-
The data are provided by the Italian MaGIC project are GIS cal structure of a Convolutional Neural Network (CNN),
data specifying the depth of the shallow coastal regions which in our case it consist in five layers. The first has
in Italy [16, 17, 18]. Starting from depth data it is possible three input channels (corresponding to the three input
                                                                                               16
                                                                           1024       1024


                                                                                             I/
                                                                          Bottleneck Conv
                                                      512    512                                    512   512   512   512


                                                                      8


                                                                                                                            8
                                                                   I/


                                                                                                                            I/
                                     256   256                                                                                   256   256   256   256
                                                 4


                                                                                                                                                         4
                                                 I/


                                                                                                                                                         I/
                    128 128                                                                                                                                   128 128 128 128
                              2


                                                                                                                                                                                2
                          I/


                                                                                                                                                                            I/
    3       64 64                                                                                                                                                                   64 64 64 64         16
     I


                I


                                                                                                                                                                                                  I


                                                                                                                                                                                                             I
    Input                                                                                                                                                                                             Output


Figure 1: U-Net architecture used in this work. The yellow and blue boxes represent convolutional and deconvolutional layers,
respectively. The ReLU functions are represented by the orange part of the boxes. The red squares corresponds to max pooling
layers.


features) and performs two convolutions followed by                                                       problem that is related to the intersection between the
batch normalization and ReLU activation function. As                                                      prediction and the target images.
stated above, the input data have a size of 100 × 100                                                        We partitioned the dataset into an 80% − 20% split
pixels. The remaining layers of this part has the same                                                    for training and testing, respectively and used a batch
structure with the addition of a max pooling operation at                                                 size of 128. We employed the Adam optimizer with a
the beginning.                                                                                            learning rate set to 0.001 and the training continued for
                                                                                                          60 epochs.
Table 1                                                                                                      Notice that the actual classification is not done directly
Convolutional and deconvolutional layers structure                                                        by assigning the most probable class, but by assigning a
                                                                                                          threshold for the logit value for all the 15 classes corre-
            Type                  Kernel size               Padding               Stride                  sponding to actual morphological features of the seabed.
      Convolution                          3                       1              1                       If one of them is above the threshold then the assigned
     Deconvolution                         2                       0              2                       classes is the one corresponding to it, even if the back-
                                                                                                          ground (i.e., no feature present) has a higher logit value
   Then there is the expansive path composed by five                                                      and would have been the most probable. This is done
layers in which deconvolution are performed in order to                                                   because of the strong imbalance of the dataset (the “no
upscale the image. The last layer has 16 output channels                                                  feature” class is present for more than 98% of the pixels)
like the total number of classes of seabed’s elements. The                                                and, thus, the model would prefer selecting that class.
output values are logit, which, if necessary, can then be
transformed in a probability distribution over the differ-                                                5. Results
ent classes via softmax.
                                                              In this section we present an initial analysis of the results
4.1. Training                                                 obtained.
                                                                 In Figure 2 two labels maps are presented. They corre-
As stated before, the network is trained with 100 × 100 spond to the first region in which the total dataset was
pixels images, cut from the total features map. Hence, split: in the left one there are the human interpretations
this network take a 100 × 100 3-channels image as input of the underwater environment provided by the MaGIC
and returns a 100 × 100 16-channels tensor. In this project that map every relevant element in the area; the
framework for each pixel we have 16 number, each one right one is the map reconstructed by the network after
related to a specific class. The class related to the channel the training. The the white squares denote the test set
with the bigger number is associated to that pixel.           that represent the 20% of the total part of the region.
   The loss between the output of the U-Net and the trans-       In Table 2 a comparison between the frequencies of
formed labels map is computed by using a traditional the labels of the reconstructed map and the human inter-
Cross-Entropy loss in addition to the Dice loss [20]. The pretations (the ground truth) is given. The frequencies
last one is a particular loss usually used in segmentation excludes the background pixels, i.e., where no feature is
Figure 2: In the left figure to the left there is the human interpretation of the seabed of one the two regions in which the
dataset is split. To the right there is the reconstructed map; the white squares represent the test set, which is not used during
the training.


Table 2                                                          Table 3
Comparison between the frequencies of the labels of the re-      Comparison between the frequencies of the labels of the re-
constructed map and the ground truth without considering         constructed map with no threshold and threshold value equal
the background. Only the test areas are considered.              to −1.5 and the ground truth. Only the test areas are consid-
                                                                 ered.
         Label    Ground Truth       Reconstructed
                  freq.              map freq.                        Label    Ground           Output          Threshold
                                                                               Truth            map             of −1.5
          1       11.93%             9.46%
          2       1.18%              1.19%                              0       98.05%          99.75%          98.37%
          3       22.56%             33.93%                             1       0.23%           0.02%           0.15%
          4       17.33%             17.30%                             2       0.02%           0.00%           0.02%
          5       0.00%              0.18%                              3       0.44%           0.08%           0.50%
          6       0.00%              0.00%                              4       0.34%           0.04%           0.20%
          7       5.82%              1.10%                              5       0.00%           0.00%           0.00%
          8       0.40%              0.00%                              6       0.00%           0.00%           0.00%
          9       14.50%             13.64%                             7       0.11%           0.00%           0.07%
          10      4.86%              0.35%                              8       0.01%           0.00%           0.00%
          11      0.00%              0.00%                              9       0.28%           0.03%           0.31%
          12      15.39%             21.74%                             10      0.09%           0.00%           0.03%
          13      0.14%              0.00%                              11      0.00%           0.00%           0.00%
          14      0.00%              0.00%                              12      0.30%           0.05%           0.30%
          15      5.89%              1.10%                              13      0.00%           0.00%           0.00%
                                                                        14      0.00%           0.00%           0.00%
                                                                        15      0.11%           0.00%           0.05%
present.
   Recall that due to the predominance of one of “no fea-
tures” class over the others, the model tends to favour this        The model trained with the data representing the re-
class, as typical for highly imbalanced datasets. To show         gion shown in Figure 2, that are only a part of the entire
that the use of threshold actually helps in improving the         dataset, is used on another region to test the model. The
classification we compared the class frequencies with and         results are shown in Figure 3: the image to the left is
without a threshold. In Table 3 frequencies of the classes        ground truth while the image to right is the reconstructed
of the ground truth are compared with those of the output         map with no threshold used. We can easily notice that
map with no threshold and with 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = −1.5 con-              a predominance of the background is present. In the
sidering now also the background (label 0). Notice that           Figure 4 results with different values of threshold act-
the frequency of the pixels of the background decrease            ing on the logit of the non-zero classes are present. The
with adding the threshold and also the other frequencies          reduction of the threshold value correspond to an in-
are more similar to those of the ground truth.                    crease in the presence of pixels belonging to non-zero
Figure 3: Comparison between the ground truth and the reconstructed map with no threshold. We can notice that there is a
predominance of the zero class (background)


Figure 4: Comparison between the reconstructed maps with different values of the threshold. The value of the thresholds
used are, from left to right equal to 1, −1, −3. Pixel density increases as threshold value decreases


labels. Clearly, the correct choice of a suitable threshold     ment of the seabed in an automatic way. The current
is essential for obtaining a good labeling of the seabed.       results shows that, by using a U-net it is possible to pro-
   One important aspect to notice is that, for the proposed     duce good results from a qualitative point of view, as
results, the actual loss is not a good indicator of the use-    long as thresholding is used in the output of the network.
fulness of the results. In fact, it represents only an proxy    Future goals are to improve the performance of the ex-
of the real usefulness of the proposed labeling, since label-   isting model in order to obtain more precise results. In
ing by experts is itself a subjective and noisy act. Thus,      particular, a new quantitative measure to encode the ex-
obtaining a feature that is shifted by a small amount           pert knowledge should be devised in order to speed-up
might be immaterial (the original lines were themselves         the evaluation of the model (and, possibly, as a loss func-
drawn by hand), like getting a non-continuous line (the         tion for the training). Additionally, a future project will
expert interpretation would be that there should be a           be to construct a model capable not only of identifying
“connection” between two features). Hence, most of the          and mapping morphological elements of the seabed but
evaluation is still qualitative and based on discussion         also detecting potentially dangerous zones within it, i.e.,
with experts, that are particularly interested in the ability   geohazards.
of the model to produce the correct “general shape” of
the features, more than some specific details.
                                                                References
6. Conclusions and Future Work                                   [1] F. Chiocci, D. Ridente, Regional-scale seafloor map-
                                                                     ping and geohazard assessment. The experience
The aim of this work was to construct a model that can               from the Italian Project MaGIC (Marine Geohaz-
identify and recognize the relevant morphological ele-               ards along the Italian Coasts), Marine Geophysical
     Research, 2011. doi:https://doi.org/10.100                    LGRS.2018.2802944.
     7/s11001-011-9120-6.                                     [14] K. Luu, C. Zhu, C. Bhagavatula, T. H. N. Le, M. Sav-
 [2] F. L. Chiocci, A. Cattaneo, R. Urgeles, Seafloor map-         vides, A Deep learning approach to joint face detec-
     ping for geohazard assessment: state of the art,              tion and segmentation, 2016. doi:10.1007/978-3
     Marine Geophysical Research, 2011. doi:https:                 -319-25958-1_1.
     //doi.org/10.1007/s11001-011-9139-8.                     [15] Çağrı Kaymak, A. Uçar, A brief survey and an ap-
 [3] A. Kirillov, K. He, R. Girshick, C. Rother, P. Dollar,        plication of semantic image segmentation for au-
     Panoptic segmentation, volume 2019-June, 2019.                tonomous driving, volume 136, 2019. doi:10.100
     doi:10.1109/CVPR.2019.00963.                                  7/978-3-030-11479-4_9.
 [4] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolu-     [16] S. Ceramicola, F. Fanucci, C. Corselli, E. Colizza,
     tional networks for biomedical image segmentation,            D. Morelli, A. Cova, A. Savini, D. Praeg, M. Zecchin,
     volume 9351, 2015. doi:10.1007/978-3-319-2                    A. Caburlotto, O. Candoni, D. Civile, M. Coste,
     4574-4_28.                                                    D. Cotterle, S. Critelli, A. Cuppari, M. Deponte,
 [5] Progetto MaGIC - Marine Geohazards along the                  R. Dominici, E. Forlin, E. Gordini, C. Tessarolo,
     Italian Coasts, 2007-2012. URL: https://www.protez            F. Marchese, F. Muto, S. Palamara, R. Riccardo,
     ionecivile.gov.it/en/approfondimento/progetto-m               L. Facchin, R. Romeo, Calabria Ionica (Tavola 8,
     agic-marine-geohazards-along-italian-coasts-0/,               Fogli 35-39), in: C. FL, B. F, C. S, G. F, O. P. (Eds)
     accessed on 2024-04-15.                                       (Eds.), Atlante dei lineamenti di pericolosità geolog-
 [6] S. Ceramicola, D. Praeg, M. Coste, E. Forlin, A. Cova,        ica dei mari italiani - Risultati del progetto MaGIC,
     E. Colizza, S. Critelli, Submarine Mass-Movements             CNR Edizioni, 2021, pp. 174–195.
     Along the Slopes of the Active Ionian Continen-          [17] S. Ceramicola, M. R. Senatore, A. Cova, A. Meo,
     tal Margins and Their Consequences for Marine                 M. Zecchin, D. Praeg, C. Diego, C. Salvatore,
     Geohazards (Mediterranean Sea), Springer Interna-             A. Caburlotto, D. Civile, M. Coste, R. Dominici,
     tional Publishing, Cham, 2014, pp. 295–306. doi:10            E. Forlin, F. Muto, A. Bosman, C. Francesco Latino,
     .1007/978-3-319-00972-8_26.                                   E. Lai, D. Casalbore, E. Morelli, O. Candoni, E. Gor-
 [7] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtar-        dini, D. Michele, R. Riccardo, L. Facchin, R. Romeo,
     navaz, D. Terzopoulos, Image segmentation using               Golfo di Taranto (Tavola 9, Fogli 40-46), in: C. FL,
     deep learning: A survey, IEEE Transactions on Pat-            B. F, C. S, G. F, O. P. (compilers) (Eds.), Atlante dei
     tern Analysis and Machine Intelligence 44 (2022)              lineamenti di pericolosità geologica dei mari ital-
     3523–3542. doi:10.1109/TPAMI.2021.3059968.                    iani - Risultati del progetto MaGIC, CNR Edizioni,
 [8] J. Long, E. Shelhamer, T. Darrell, Fully convolu-             2021, pp. 196–225.
     tional networks for semantic segmentation, volume        [18] C. F.L., F. Budillon, S. Ceramicola, F. Gamberi,
     07-12-June-2015, 2015. doi:10.1109/CVPR.2015.                 P. Orrù (Eds.), Atlante dei lineamenti di pericolosità
     7298965.                                                      geologica dei mari italiani-Risultati del progetto
 [9] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet:            MaGIC, CNR Edition, 2021.
     A deep convolutional encoder-decoder architecture        [19] A. Milesi, Pytorch-UNet: Pytorch implementa-
     for image segmentation, IEEE Transactions on Pat-             tion of the u-net for image semantic segmenta-
     tern Analysis and Machine Intelligence 39 (2017)              tion with high quality images, 2024. URL: https:
     2481–2495. doi:10.1109/TPAMI.2016.2644615.                    //github.com/milesial/Pytorch-UNet.
[10] F. Milletari, N. Navab, S. A. Ahmadi, V-net: Fully       [20] R. Zhao, B. Qian, X. Zhang, Y. Li, R. Wei, Y. Liu,
     convolutional neural networks for volumetric med-             Y. Pan, Rethinking dice loss for medical image
     ical image segmentation, 2016. doi:10.1109/3DV.               segmentation, volume 2020-November, 2020. doi:10
     2016.79.                                                      .1109/ICDM50108.2020.00094.
[11] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang,
     Unet++: A nested u-net architecture for medical
     image segmentation, volume 11045 LNCS, 2018.
     doi:10.1007/978-3-030-00889-5_1.
[12] Özgün Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox,
     O. Ronneberger, 3d u-net: Learning dense volumet-
     ric segmentation from sparse annotation, volume
     9901 LNCS, 2016. doi:10.1007/978-3-319-467
     23-8_49.
[13] Z. Zhang, Q. Liu, Y. Wang, Road extraction by
     deep residual u-net, IEEE Geoscience and Remote
     Sensing Letters 15 (2018) 749–753. doi:10.1109/