Machine Learning for Automated Seabed Mapping Umberto Di Laudo1,2,* , Silvia Ceramicola2 and Luca Manzoni1,2 1 Dipartimento di Matematica, Informatica e Geoscienze, Università degli Studi di Trieste, Via Alfonso Valerio 12/1, 34127 Trieste, Italy 2 Istituto Nazionale di Oceanografia e Geofisica Sperimentale, Borgo Grotta Gigante 42/c, 34010 Sgonico, Italy Abstract Interpreting morphological features of the seabed is a labor-intensive task for marine geologists especially when it concerns extensive portions of seabed. By applying Machine Learning (ML) techniques from the field of computer vision, it is possible to significantly streamline this process, speeding it up considerably. In this paper we present a model capable of automatically categorizing seabed features, identifying different morphological elements, such as submarine canyons, escarpments, canyon headwalls and mass movements. This model will serve as the basis for new tools to assist geologists as well as stakeholders dealing with management of coastal or offshore areas in their work, providing them with an efficient support for seabed analysis and characterization. Keywords Deep Learning, Seabed Mapping, Image Segmentation 1. Introduction seabed interpretation can be considered as a special case of image segmentation [3], where images are replaced One important task for marine geologist is the identifica- with a map of seabed features and the labeling associ- tion of the morphological characteristics of the seabed. ated to each pixel corresponds to the morphological fea- Underwater morphological features, such as canyons, es- ture of the seabed in the specific latitude and longitude. carpments, are components of the seabed environment. Hence, standard image segmentation techniques can be These elements are typically shaped by geological pro- employed and adapted for this task. In this paper we cesses, including erosion, sedimentation, and tectonic present the first results of using a U-Net [4] architecture activity, over extended periods. Detecting their occur- for seabed classification trained using the data obtained rence and characteristics play a crucial role in assessing via the MaGIC project (Marine Geohazards along the Ital- marine hazards or when placing communication cables at ian Coasts) [5, 6], which produced maps of the Italian seabed [1, 2]. However, detection and mapping of these coasts of Central and South Italy, Sicily, Sardinia, and underwater morphological elements require specialized Liguria in a five-year time frame starting from 2007. domain expertise, such as marine geologist. Despite this The paper is structured as follows: in Section 2 the process being very time-consuming for scientists, nowa- current state if the art in image segmentation and the days there is still no automated method to detect and main groups of existing techniques are presented. In classify the various elements of the underwater environ- Section 3 the available data are presented and the specific ment. task to be solved is further detailed. The architecture Hence, any support to help the automatic identifica- of the network and the training process are detailed in tion of these features via machine learning (ML) would Section 4. The results are then presented in Section 5. produce a significant benefit for marine geologists, mak- Finally, in Section 6 we present the planned research ing the entire process smoother and less time-consuming. directions. The overarching aim of this project is to develop a model capable of autonomously classifying seabed features, and thus assisting geologists in their analysis. This model 2. Image Segmentation will be designed to identify various morphological ele- ments present in seabed data, like submarine canyons, Image segmentation is a computer vision technique that escarpments, mass movements. To do so, the task of involve partitioning a digital image into multiple seg- ments or regions, separating meaningful objects or struc- Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- tures within an image from the background or other ob- nized by CINI, May 29-30, 2024, Naples, Italy jects, i.e., it can be considered a classification problem at * Corresponding author. pixel-level. The segmentation process divides images into † These authors contributed equally. different regions based on certain characteristics such as $ udilaudo@ogs.it (U. Di Laudo); sceramicola@ogs.it color, intensity, texture, or other features. In particular, (S. Ceramicola); lmanzoni@units.it (L. Manzoni) with the advent of powerful ML techniques, the features  0000-0002-1318-1272 (S. Ceramicola); 0000-0001-6312-7728 (L. Manzoni) are learned by the data instead of being handcrafted by © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License experts. Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings According to [3], there exists three groups of image to derive via standard GIS tools two additional features segmentation: that are considered useful by domain experts: the slope and the profile curvature of the seabed, that are respec- • Semantic segmentation: Semantic segmentation tively the first and the second derivative of the depth. involves classifying each pixel in an image into a The corresponding data can then be interpreted as 2𝐷 specific category or class. fields with three features/channels (i.e., it can be directly • Instance segmentation: it extends semantic seg- interpreted/visualized as an image), each one describing mentation by not only classifying each pixel into a specific feature of the same seabed area. The data are categories but also distinguishing between differ- organized in a first image of about 2800 × 2400 pixels, ent object instances of the same category. while the second of 3000 × 3800 pixels. Each of then • Panoptic segmentation: it aims to unify semantic is cut into smaller square windows of length 100 × 100 segmentation and instance segmentation into a pixels for the training process, for a total of more than single framework. It divides an image into seman- 2100 squares. tically meaningful regions and assigns a unique label to each region, regardless of whether it cor- 3.2. Ground Truth responds to an object instance or a background category. All the data obtained by the MaGIC project also received a human interpretations of the seabed structure i.e., a The main traditional image segmentation techniques labels map of the same dimensions of the input 2D fields include several methods like thresholding, histograms, indicating the positions of all the elements present in watersheds, region-growing and clustering based seg- that region. In particular, the labeling is done by drawing mentation [7]. lines of different types over the depth maps. The different As stated above, with the advent of deep learning, new types of lines corresponds to different classes which are techniques for image segmentation were developed [7], 97 in the original data. Due to the large number of classes with the main distinction usually being the architecture and the fact that many of them were only represented by of the neural network used. One of the first architecture a small number of samples, a first pre-processing step was used for semantic segmentation was a Fully Convolu- done by reducing the number of classes to 15. These 15 tional Network (FCN) proposed by Long et al. in 2015 [8]. classes corresponds to grouping of the original 97 classes That network includes only convolutional layers and is with the partitioning done according to the morphologi- able to take an image of a certain size as input and re- cal and geological similarity of the seabed features and turns a segmentation map of the same dimensions. Other with the help of marine geologists. Hence, the output of deep learning-based models for segmentation was pro- the model has to be a 2𝐷 field in which each coordinate posed following the encoder-decoder architecture: Badri- (each pixel) belongs to one of the 16 classes (15 related narayanan et al. proposed the SegNet [9]. to the different morphological elements plus 1 for the Inspired by the FCN and the encoder-decoder archi- background). tecture, V-Net [10] and U-Net [4] were proposed, initially mainly for medical and biomedical purposes. Over the years, several modification of the U-Net architecture were 4. Architecture and Training performed to adapt it to different kind of images. For ex- ample Zhou et al [11] proposed a nested U-Net architec- In this work we perform semantic segmentation using ture. Furthermore Cicek [12] build a U-Net architecture a U-Net architecture [19]. The network architecture is for 3𝐷 images. Nowadays U-Net are also used in other comprised of three main elements 1: fields, e.g., road segmentation [13], face detection [14], • A contracting path that reduce the spatial di- and autonomous driving [15]. mensions of the input image; • An expansive path that increase the spatial di- 3. Seabed Data mensions; • Skip connection between corresponding layers In this section we present the main characteristics of the in the contracting and expansive part. data used for training and testing the proposed model. The specific architecture used in this study is presented in Figure 1, with the parameters of the different layers 3.1. Input Data presented in Table 1. The contracting path has the typi- The data are provided by the Italian MaGIC project are GIS cal structure of a Convolutional Neural Network (CNN), data specifying the depth of the shallow coastal regions which in our case it consist in five layers. The first has in Italy [16, 17, 18]. Starting from depth data it is possible three input channels (corresponding to the three input 16 1024 1024 I/ Bottleneck Conv 512 512 512 512 512 512 8 8 I/ I/ 256 256 256 256 256 256 4 4 I/ I/ 128 128 128 128 128 128 2 2 I/ I/ 3 64 64 64 64 64 64 16 I I I I Input Output Figure 1: U-Net architecture used in this work. The yellow and blue boxes represent convolutional and deconvolutional layers, respectively. The ReLU functions are represented by the orange part of the boxes. The red squares corresponds to max pooling layers. features) and performs two convolutions followed by problem that is related to the intersection between the batch normalization and ReLU activation function. As prediction and the target images. stated above, the input data have a size of 100 × 100 We partitioned the dataset into an 80% − 20% split pixels. The remaining layers of this part has the same for training and testing, respectively and used a batch structure with the addition of a max pooling operation at size of 128. We employed the Adam optimizer with a the beginning. learning rate set to 0.001 and the training continued for 60 epochs. Table 1 Notice that the actual classification is not done directly Convolutional and deconvolutional layers structure by assigning the most probable class, but by assigning a threshold for the logit value for all the 15 classes corre- Type Kernel size Padding Stride sponding to actual morphological features of the seabed. Convolution 3 1 1 If one of them is above the threshold then the assigned Deconvolution 2 0 2 classes is the one corresponding to it, even if the back- ground (i.e., no feature present) has a higher logit value Then there is the expansive path composed by five and would have been the most probable. This is done layers in which deconvolution are performed in order to because of the strong imbalance of the dataset (the “no upscale the image. The last layer has 16 output channels feature” class is present for more than 98% of the pixels) like the total number of classes of seabed’s elements. The and, thus, the model would prefer selecting that class. output values are logit, which, if necessary, can then be transformed in a probability distribution over the differ- 5. Results ent classes via softmax. In this section we present an initial analysis of the results 4.1. Training obtained. In Figure 2 two labels maps are presented. They corre- As stated before, the network is trained with 100 × 100 spond to the first region in which the total dataset was pixels images, cut from the total features map. Hence, split: in the left one there are the human interpretations this network take a 100 × 100 3-channels image as input of the underwater environment provided by the MaGIC and returns a 100 × 100 16-channels tensor. In this project that map every relevant element in the area; the framework for each pixel we have 16 number, each one right one is the map reconstructed by the network after related to a specific class. The class related to the channel the training. The the white squares denote the test set with the bigger number is associated to that pixel. that represent the 20% of the total part of the region. The loss between the output of the U-Net and the trans- In Table 2 a comparison between the frequencies of formed labels map is computed by using a traditional the labels of the reconstructed map and the human inter- Cross-Entropy loss in addition to the Dice loss [20]. The pretations (the ground truth) is given. The frequencies last one is a particular loss usually used in segmentation excludes the background pixels, i.e., where no feature is Figure 2: In the left figure to the left there is the human interpretation of the seabed of one the two regions in which the dataset is split. To the right there is the reconstructed map; the white squares represent the test set, which is not used during the training. Table 2 Table 3 Comparison between the frequencies of the labels of the re- Comparison between the frequencies of the labels of the re- constructed map and the ground truth without considering constructed map with no threshold and threshold value equal the background. Only the test areas are considered. to −1.5 and the ground truth. Only the test areas are consid- ered. Label Ground Truth Reconstructed freq. map freq. Label Ground Output Threshold Truth map of −1.5 1 11.93% 9.46% 2 1.18% 1.19% 0 98.05% 99.75% 98.37% 3 22.56% 33.93% 1 0.23% 0.02% 0.15% 4 17.33% 17.30% 2 0.02% 0.00% 0.02% 5 0.00% 0.18% 3 0.44% 0.08% 0.50% 6 0.00% 0.00% 4 0.34% 0.04% 0.20% 7 5.82% 1.10% 5 0.00% 0.00% 0.00% 8 0.40% 0.00% 6 0.00% 0.00% 0.00% 9 14.50% 13.64% 7 0.11% 0.00% 0.07% 10 4.86% 0.35% 8 0.01% 0.00% 0.00% 11 0.00% 0.00% 9 0.28% 0.03% 0.31% 12 15.39% 21.74% 10 0.09% 0.00% 0.03% 13 0.14% 0.00% 11 0.00% 0.00% 0.00% 14 0.00% 0.00% 12 0.30% 0.05% 0.30% 15 5.89% 1.10% 13 0.00% 0.00% 0.00% 14 0.00% 0.00% 0.00% 15 0.11% 0.00% 0.05% present. Recall that due to the predominance of one of “no fea- tures” class over the others, the model tends to favour this The model trained with the data representing the re- class, as typical for highly imbalanced datasets. To show gion shown in Figure 2, that are only a part of the entire that the use of threshold actually helps in improving the dataset, is used on another region to test the model. The classification we compared the class frequencies with and results are shown in Figure 3: the image to the left is without a threshold. In Table 3 frequencies of the classes ground truth while the image to right is the reconstructed of the ground truth are compared with those of the output map with no threshold used. We can easily notice that map with no threshold and with 𝑡ℎ𝑟𝑒𝑠ℎ𝑜𝑙𝑑 = −1.5 con- a predominance of the background is present. In the sidering now also the background (label 0). Notice that Figure 4 results with different values of threshold act- the frequency of the pixels of the background decrease ing on the logit of the non-zero classes are present. The with adding the threshold and also the other frequencies reduction of the threshold value correspond to an in- are more similar to those of the ground truth. crease in the presence of pixels belonging to non-zero Figure 3: Comparison between the ground truth and the reconstructed map with no threshold. We can notice that there is a predominance of the zero class (background) Figure 4: Comparison between the reconstructed maps with different values of the threshold. The value of the thresholds used are, from left to right equal to 1, −1, −3. Pixel density increases as threshold value decreases labels. Clearly, the correct choice of a suitable threshold ment of the seabed in an automatic way. The current is essential for obtaining a good labeling of the seabed. results shows that, by using a U-net it is possible to pro- One important aspect to notice is that, for the proposed duce good results from a qualitative point of view, as results, the actual loss is not a good indicator of the use- long as thresholding is used in the output of the network. fulness of the results. In fact, it represents only an proxy Future goals are to improve the performance of the ex- of the real usefulness of the proposed labeling, since label- isting model in order to obtain more precise results. In ing by experts is itself a subjective and noisy act. Thus, particular, a new quantitative measure to encode the ex- obtaining a feature that is shifted by a small amount pert knowledge should be devised in order to speed-up might be immaterial (the original lines were themselves the evaluation of the model (and, possibly, as a loss func- drawn by hand), like getting a non-continuous line (the tion for the training). Additionally, a future project will expert interpretation would be that there should be a be to construct a model capable not only of identifying “connection” between two features). Hence, most of the and mapping morphological elements of the seabed but evaluation is still qualitative and based on discussion also detecting potentially dangerous zones within it, i.e., with experts, that are particularly interested in the ability geohazards. of the model to produce the correct “general shape” of the features, more than some specific details. References 6. Conclusions and Future Work [1] F. Chiocci, D. Ridente, Regional-scale seafloor map- ping and geohazard assessment. The experience The aim of this work was to construct a model that can from the Italian Project MaGIC (Marine Geohaz- identify and recognize the relevant morphological ele- ards along the Italian Coasts), Marine Geophysical Research, 2011. doi:https://doi.org/10.100 LGRS.2018.2802944. 7/s11001-011-9120-6. [14] K. Luu, C. Zhu, C. Bhagavatula, T. H. N. Le, M. Sav- [2] F. L. Chiocci, A. Cattaneo, R. Urgeles, Seafloor map- vides, A Deep learning approach to joint face detec- ping for geohazard assessment: state of the art, tion and segmentation, 2016. doi:10.1007/978-3 Marine Geophysical Research, 2011. doi:https: -319-25958-1_1. //doi.org/10.1007/s11001-011-9139-8. [15] Çağrı Kaymak, A. Uçar, A brief survey and an ap- [3] A. Kirillov, K. He, R. Girshick, C. Rother, P. Dollar, plication of semantic image segmentation for au- Panoptic segmentation, volume 2019-June, 2019. tonomous driving, volume 136, 2019. doi:10.100 doi:10.1109/CVPR.2019.00963. 7/978-3-030-11479-4_9. [4] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolu- [16] S. Ceramicola, F. Fanucci, C. Corselli, E. Colizza, tional networks for biomedical image segmentation, D. Morelli, A. Cova, A. Savini, D. Praeg, M. Zecchin, volume 9351, 2015. doi:10.1007/978-3-319-2 A. Caburlotto, O. Candoni, D. Civile, M. Coste, 4574-4_28. D. Cotterle, S. Critelli, A. Cuppari, M. Deponte, [5] Progetto MaGIC - Marine Geohazards along the R. Dominici, E. Forlin, E. Gordini, C. Tessarolo, Italian Coasts, 2007-2012. URL: https://www.protez F. Marchese, F. Muto, S. Palamara, R. Riccardo, ionecivile.gov.it/en/approfondimento/progetto-m L. Facchin, R. Romeo, Calabria Ionica (Tavola 8, agic-marine-geohazards-along-italian-coasts-0/, Fogli 35-39), in: C. FL, B. F, C. S, G. F, O. P. (Eds) accessed on 2024-04-15. (Eds.), Atlante dei lineamenti di pericolosità geolog- [6] S. Ceramicola, D. Praeg, M. Coste, E. Forlin, A. Cova, ica dei mari italiani - Risultati del progetto MaGIC, E. Colizza, S. Critelli, Submarine Mass-Movements CNR Edizioni, 2021, pp. 174–195. Along the Slopes of the Active Ionian Continen- [17] S. Ceramicola, M. R. Senatore, A. Cova, A. Meo, tal Margins and Their Consequences for Marine M. Zecchin, D. Praeg, C. Diego, C. Salvatore, Geohazards (Mediterranean Sea), Springer Interna- A. Caburlotto, D. Civile, M. Coste, R. Dominici, tional Publishing, Cham, 2014, pp. 295–306. doi:10 E. Forlin, F. Muto, A. Bosman, C. Francesco Latino, .1007/978-3-319-00972-8_26. E. Lai, D. Casalbore, E. Morelli, O. Candoni, E. Gor- [7] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtar- dini, D. Michele, R. Riccardo, L. Facchin, R. Romeo, navaz, D. Terzopoulos, Image segmentation using Golfo di Taranto (Tavola 9, Fogli 40-46), in: C. FL, deep learning: A survey, IEEE Transactions on Pat- B. F, C. S, G. F, O. P. (compilers) (Eds.), Atlante dei tern Analysis and Machine Intelligence 44 (2022) lineamenti di pericolosità geologica dei mari ital- 3523–3542. doi:10.1109/TPAMI.2021.3059968. iani - Risultati del progetto MaGIC, CNR Edizioni, [8] J. Long, E. Shelhamer, T. Darrell, Fully convolu- 2021, pp. 196–225. tional networks for semantic segmentation, volume [18] C. F.L., F. Budillon, S. Ceramicola, F. Gamberi, 07-12-June-2015, 2015. doi:10.1109/CVPR.2015. P. Orrù (Eds.), Atlante dei lineamenti di pericolosità 7298965. geologica dei mari italiani-Risultati del progetto [9] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: MaGIC, CNR Edition, 2021. A deep convolutional encoder-decoder architecture [19] A. Milesi, Pytorch-UNet: Pytorch implementa- for image segmentation, IEEE Transactions on Pat- tion of the u-net for image semantic segmenta- tern Analysis and Machine Intelligence 39 (2017) tion with high quality images, 2024. URL: https: 2481–2495. doi:10.1109/TPAMI.2016.2644615. //github.com/milesial/Pytorch-UNet. [10] F. Milletari, N. Navab, S. A. Ahmadi, V-net: Fully [20] R. Zhao, B. Qian, X. Zhang, Y. Li, R. Wei, Y. Liu, convolutional neural networks for volumetric med- Y. Pan, Rethinking dice loss for medical image ical image segmentation, 2016. doi:10.1109/3DV. segmentation, volume 2020-November, 2020. doi:10 2016.79. .1109/ICDM50108.2020.00094. [11] Z. Zhou, M. M. R. Siddiquee, N. Tajbakhsh, J. Liang, Unet++: A nested u-net architecture for medical image segmentation, volume 11045 LNCS, 2018. doi:10.1007/978-3-030-00889-5_1. [12] Özgün Çiçek, A. Abdulkadir, S. S. Lienkamp, T. Brox, O. Ronneberger, 3d u-net: Learning dense volumet- ric segmentation from sparse annotation, volume 9901 LNCS, 2016. doi:10.1007/978-3-319-467 23-8_49. [13] Z. Zhang, Q. Liu, Y. Wang, Road extraction by deep residual u-net, IEEE Geoscience and Remote Sensing Letters 15 (2018) 749–753. doi:10.1109/