=Paper= {{Paper |id=Vol-3207/paper7 |storemode=property |title=Model-Centric vs Data-Centric Deep Learning Approaches for Landslide Detection |pdfUrl=https://ceur-ws.org/Vol-3207/paper7.pdf |volume=Vol-3207 |authors=Hejar Shahabi,Omid Ghorbanzadeh |dblpUrl=https://dblp.org/rec/conf/cdceo/ShahabiG22 }} ==Model-Centric vs Data-Centric Deep Learning Approaches for Landslide Detection== https://ceur-ws.org/Vol-3207/paper7.pdf
Model-Centric vs Data-Centric Deep Learning Approaches
for Landslide Detection
Hejar Shahabi1 , Omid Ghorbanzadeh2
1
    Centre Eau Terre Environnement, Institut National de la Recherche Scientifique (INRS), Quebec City, QC G1K 9A9, Quebec, Canada
2
    Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria


                                       Abstract
                                       The implementation of deep learning (DL) models has significantly improved the accuracy and automation of remote sensing
                                       (RS) image classification tasks, such as landslide detection. The reason is that DL models have independent feature learning and
                                       strong computing capabilities and have attracted continuous attention in modifications and enhancements through numerous
                                       model-centric efforts. In practice, however, the impact of the quality of training samples on classification performance is
                                       usually ignored. This study uses a model-centric approach in which a U-Net network is regarded as a baseline model. A
                                       ResNet-34 model is used to optimize the baseline model, and the optimized model is further enhanced by adding an attention
                                       mechanism. However, in the data-centric approach, the baseline model is only trained based on the enhanced training
                                       samples. Our data-centric approach increased the F1-score by over 13 percentage points, which is the same increase as the
                                       most sophisticated and complex model-centric approach.

                                       Keywords
                                       Attention mechanism, deep learning, object-based image analysis (OBIA), landslide extraction



1. Introduction                                                                                                   and DenseNet [4].
                                                                                                                     In RS tasks such as image classification, however, the
As remote sensing (RS) imagery has become the basis goal usually is to label every pixel within an image, and
of data collection across various fields, such as agricul- DL semantic segmentation techniques such as Fully Con-
ture, environment, and disaster risk management, critical volutional Networks (FCN) are used to achieve that goal
information can be extracted from such multi-temporal [5, 6]. The U-Net [7] algorithm, which utilizes encoder-
and multi-resolution images through image classification, decoder architectures to improve FCN, is widely used by
object detection, and time series analysis [1]. However, the RS community for image segmentation and object de-
one of the most critical aspects of such data processing is tection, although it was initially designed for medical im-
selecting the appropriate method to use. For many years, age segmentation. Other region-based models are used to
the remote sensing community has used artificial neural detect and segment objects, including Faster R-CNN and
networks (ANN) such as Multi-Perceptron Layer (MLP) as Mask R-CNN, successfully applied to landslide inventory
a conventional method of image classification [2]. Until mapping [8]. These segmentation models have also be-
recently, however, conventional machine learning mod- come more sophisticated and advanced by incorporating
els like support vector machines and ensemble classifiers concepts like attention mechanisms and or incorporat-
such as random forests nearly replaced ANN models for ing backbone models and weights such as the residual
tasks like image classification and change detection be- networks (ResNets) [9]. The attention mechanisms focus
cause they can handle data with high dimensions and on certain features or regions while overlooking others,
provide acceptable performance even with limited la- such the way of working of human vision.
beled data [2]. Recent developments in computer vision                                                               Some current studies incorporated other approaches
and graphics processing units (GPUs) have led to a rise with DL models to increase the transferability [10] and
in the popularity of deep neural networks within the RS also achieve higher accuracy in RS classification tasks
community for different tasks, as they have generated and in landslide detection. Ghorbanzadehet al. [11] did
robust results in image classification and image segmen- a model-centric strategy by synchronizing the heat map
tation [3]. Several image classification models have been resulting in a ResU-Net network by knowledge-based
developed, such as AlexNet, VGG net, GoogleNet, ResNet, object-based image analysis (OBIA) for landslide detec-
CDCEO 2022: 2nd Workshop on Complex Data Challenges in Earth
                                                                                                                  tion. Their experiences have done based on the satellite
Observation, July 25, 2022, Vienna, Austria                                                                       Sentinel-2 imagery. Their result evaluation indicated that
Envelope-Open hejar.shahabi@inrs.ca (H. Shahabi);                                                                 integrating OBIA with U-net resulted in an F1 score value
omid.ghorbanzadeh@iarai.ac.at (O. Ghorbanzadeh)                                                                   nearly 8% higher than the baseline ResU-Net model for
Orcid 0000-0002-3275-8436 (H. Shahabi); 0000-0002-9664-8770                                                       the landslide detection task. In another study, Donget
(O. Ghorbanzadeh)
                     © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License al. [12], improved the U-Net’s ability for landslide de-
    CEUR
    Workshop
    Proceedings
                     Attribution 4.0 International (CC BY 4.0).
                     CEUR Workshop Proceedings (CEUR-WS.org)
                  http://ceur-ws.org
                  ISSN 1613-0073
                                                                                                                  tection by adding a multi-scale feature-fusion module,
                                                                                                                       µ
a residual attention network, and a data-dependent up                         Hokkaido
sampling method. Their enhanced network, named L-
Unet could increase the F1 score by more than 3%. While
Ghorbanzadehet al. [13] followed a data-centric strat-
egy by preparing six training data sets based on optical
data and different topographic information to evaluate
the performance of a deep-learning convolution neural               Japan
network (CNN) for landslide detection, a study area in
Nepal. Their most remarkable improvement increased                                                    0   125   250      500
                                                                                                                           Kilometers

the mIOU by more than 17 percentage points. Yanget
al. [14] have done a training samples enhancement and




                                                                                                                                Training Site
developed a background-enhancement technique that
could support distinguishing landslides and similar back-
ground features for training the Mask R-CNN model. The
F1 score was significantly higher (22.38%) than the one
obtained using only satellite images as input data in their
experiment.
   As a result of a literature review on applying DL to
RS applications, including landslide detection, the focus




                                                                                                                                Testing Site
is mainly on model-centric approaches such as adapt-
ing, comparing architectures, and developing advanced
models. At the same time, input data plays a less im-
portant role in model performance. In the model-centric
approach, the input data is the same, while the main
effort is focused on code and developing experimental re-
search to improve the model performance. This involves                                          0 1.5 3               6
selecting the best model architecture and training pro-                  Landslides                                    Kilometers
cess from various possibilities [15]. On the other hand,
in the data-centric approach, the goal is to systematically
                                                              Figure 1: The location of the study area.
alter, synthesize, and improve datasets to increase the
model’s accuracy with a fixed architecture [16]. To our
best knowledge, no study has comprehensively compared
or discussed model-centric vs. data-centric approaches        few planar and spoon-type deep landslides. A landslide
in landslide detection purposes. Since most DL studies        inventory map was generated and updated by the Ge-
for RS tasks are model-centric, in this experimental case     ographical Survey Institute (GSI) of Japan using aerial
study, we aim to compare and evaluate the performance         orthophotos, very high-resolution aerial images, and a
of model-centric and data-centric approaches in landslide     10m resolution digital elevation model (DEM) [9].
detection using Sentinel-2 imagery, ALOS elevation data          As part of this study, an inventory map (around 4950
U-net segmentation method.                                    landslides in an area of 43 km2) in shape file format
                                                              provided by GIS in ESRI was acquired and used as ground
                                                              truth for further analysis. Landslides were detected using
2. Study area and data set                                    Sentinel-2 multispectral imagery in this case. We applied
                                                              atmospheric corrections to images using Sen2Cor [17],
A magnitude 6.6 earthquake struck Eastern Iburi,
                                                              a SNAP plugin. We generated slope layers using the
Hokkaido, Japan, on September 6, 2018. There were ex-
                                                              12-m ALOS DEM, which is an important data set for
tensive damages caused by this incident, including power
                                                              mapping landslides [3]. Sentinel 2 images range in spatial
cuts, damage to transmission and distribution networks,
                                                              resolution from 10 to 60 meters, so we selected only bands
and damage to the Tomato-Atsuma Power Station, which
                                                              2-4 and 8, which have a 10-meter resolution and excluded
supplies electricity to Hokkaido Island. There were 41
                                                              other bands. The generated slope layer was re sampled
fatalities in all, 36 of which were caused by landslides
                                                              to 10-m and stacked with the Sentinel bands. Figure 1
triggered by the earthquake. Typhoon Jebi brought tor-
                                                              shows the study area and landslide inventory.
rential rains to the region just a day before earthquake,
which made hills unstable and prone to landslides. This
caused nearly 5600 landslides in the area. The result was
a significant number of shallow landslides as well as a
3. Methodology                                                 associated with the highest weights. A detailed descrip-
                                                               tion of the attention mechanism in CNN-based models
3.1. Model-centric approach                                    for the RS applications is provided by [1]. In this case, an
                                                               attention mechanism is added to the U-Net model with
3.1.1. U-Net
                                                               the ResNet 34 backbone for landslide detection.
As stated in the introduction section, U-Net was initially
introduced for the segmentation of biomedical images           3.2. Data-centric approach
Ronnebergeret al. [7]. Because of its robust performance,
it has been used for a wide range of image segmentation        In this approach, the U-Net architecture applied by [9],
problems, including remote sensing image classification        which was introduced as our baseline model is used with-
and object detection [9]. In U-Net Architecture, encoder       out any model enhancement. However, Data enhance-
and decoder are the two main components. By apply-             ment will be done by synthesizing and augmenting train-
ing convolution, activation functions, and pooling oper-       ing samples. Therefore, along with the 10-m Sentinel 2
ations, the encoder learns how to abstractly represent         images and slope layer, normalized difference vegetation
the input image. Pooling reduces computational cost in         index (NDVI) will be generated and fed into the model
this part, but spatial data is lost [7]. Through operations    as well. This index is helpful for discriminating some
such as transpose convolution model, the decoder part          of landslides that removed the surface vegetation from
attempts to restore the original size of the abstracted        the background objects [14, 13]. Moreover, using the
representation. Concatenating the output of transpose          OBIA concept and multi-resolution segmentation (MRS),
convolution with the skip connection feature map in the        which is a bottom-up segmentation technique based on
encoder part is the skip-connection feature map at the         the pairwise region-merging approach the size of the
same level as the output of transpose convolution. While       generated object is controlled by the scale factor [19].
in the encoder part the number of feature channels is          In order to avoid errors such as over-segmentation and
doubled at each down-sampling, in the decoder part it          under-segmentation an index called as object fitness in-
is diminished by half until in the final layer a convolu-      dex (OFI) introduced by [20] is applied to guaranty the
tion (1x1) is used (in this case with sigmoid activation) to   quality of objects. The mean values of objects for each
map the channel patterns to a given number of classes.         image band, including NDVI and slope, will be calculated,
The architecture used by Ghorbanzadehet al. [9] will be        and then exported in a raster format to be stacked with
implemented for this study.                                    another dataset as input to the U-Net model.

3.1.2. Residual network (ResNet)                               4. Experimental results
ResNet serves as a backbone for many computer vision
tasks, including remote sensing image classification and   In this study, a tile size of 128 × 128 without any overlap
segmentation [9]. In 2015, it won the ImageNet chal-       was used as an input to all applied models. The accu-
lenge with extremely deep neural networks (more than       racy of the model was also validated by selecting 30% of
150 layers). ResNet’s success is mainly due to its novel   training data sets at random. As our task is binary classi-
architecture that introduced skip connections for the firstfication, we applied Sigmoid as the activation function
time, which add the output from the previous layer to      in the last layers and rectified linear activation function
the layer ahead. This alternative shortcut path allows     (ReLU) in the earlier layers. All models were set at 100
gradient to flow through and prevents the problem of       epochs, but a function was defined to save the model on
vanishing gradient. The baseline model of the U-Net de-    the epoch number to ensure minimum losses. Our DL
sign is used in this research, but with ResNet-34 acting   models were all implemented in Python using Tensor-
as a backbone.                                             Flow API and Keras library, for segmentation part we
                                                           used eCognition software. As previously noted, data aug-
                                                           mentation was only applied to a data-centric approach.
3.1.3. Attention mechanism
                                                              Each model was evaluated based on standard accuracy
Bahdanauet al. [18] introduced the attention mechanism assessment metrics of the precision, recall and F1-score.
to enhance the performance of the encoder-decoder mod- A loss and F1-score of 0.14 and 0.88 were achieved dur-
els for machine translation system. Later its variants ing the training of the conventional U-Net model while,
were used in other application including the RS applica- these values were 0.26 and 0.68 for validation data. A
tions. Through a weighted combination of encoded input total of almost 4 million parameters were trained in this
data, the decoder has access to the most valuable parts of model. For test area, the trained model was used to de-
the input sequence, thus, the most relevant parts will be tect landslides; the accuracy assessment results showed
                                                           precision, recall, and F1-score values of 0.76, 0.48, and
                                                            ResNet backbone were 0.67, 0.72, and 0.70, respectively.
                                                            Finally, the most complex version of U-Net that includes
                                                            both ResNet backbone and attention mechanism with
                                                            10.5 million parameters was trained. For training, loss
                                               U-Net        and F1-score values were 0.05 and 0.95, and for validation
                                                            were 0.08 and 91, respectively. However, like U-Net with
                                                            ResNet backbone model, the performance of the model
                                                            in the test area even with adding attention mechanism to
                                                            the model was not significant, and values 0.88, 0.62, 0.72
                                                            were achieved for precision, recall, and F1-score, accord-
                                                            ingly. It provided the best precision score among other
                                                            models.
                                      ResNet(34)U-Net          For the data-centric approach, only the base line U-Net
                                                            model with the same architecture was used as for the first
                                                            scenario in the model-centric approach. However, the
                                                            input data has been modified. The OBIA features were
                                                            stacked with other images, and then data augmentation
                                                            techniques such as flipping (horizontally and vertically)
                                                            and rotating (90, 180, 270 degrees) were performed. This
                                                            resulted in 10395 image patches being fed to the net-
                                                            work instead of 2079 image patches. In addition, data
            ResNet(34)U-Net + attention mechanism           augmentation was not applied to validation data. The
                                                            U-Net model achieved values of 0.13 for loss and 0.93 for
                                                            F1-score during training. With validation data, however,
                                                            the values were 0.18 and 0.90. The trained model was
                                                            used to predict test data, resulting in precision, recall,
                                                            and F1-score of 0.71, 0.73, and 0.72. The figures 2 and 3
                                                            are respectively depicting the training curve of a model
                                                            and the landslide prediction map for a model-centric and
                                                            a data-centric approach to landslide prediction.
                                   Data-centric (U-Net)

                                                            5. Discussions
                                                             In the model-centric approach, using the same labeled
                                                             data while varying architecture and parameters resulted
        Training loss               Validation loss
                                                             in varying accuracies in training and validation phases,
        Training F1-score           Validation F1-score general U-Net models performed relatively poorly (un-
                                                             derfitting error) in the training process compared to oth-
                                                             ers. U-Net based on ResNet Backbone, however, per-
Figure 2: Training metrics graphs for model-centric ap-
proaches and data-centric approach.                          forms better during training, while the best performance
                                                             is achieved with U-Net based on ResNet and attention
                                                             mechanism. Loss values for both training and valida-
                                                             tion data indicate that by increasing models’ parameters
0.59, respectively. The same U-Net was trained in the higher accuracy during training can be achieved. But
following, but with ResNet 34 as a backbone model. The getting such high accuracy throughout training can be
total number of parameters for this model was 23 million, a sign of overfitting. Therefore, prediction results by
but only 1.8 million parameters were trained, and for the such a model are evaluated with inventory data. Conse-
rest pre-trained weights were applied. Training Loss and quently, excepting the general U-Net provided the lowest
F1-score values for U-Net with ResNet 34 backbone were accuracy with a recall of 0.48 and an F1-score of 0.60.
0.13 and 0.93, respectively, and for the validation data which means it U-Net was able to detect only 48% of
set, the scores were 0.18 and 0.90. Although the model’s landslides. Furthermore, U-Net with ResNet backbone
accuracy on training data and validation is quite close, its provided much better performance compared to the gen-
performance in the test area did not provide much higher eral U-Net model, with a recall of 0.72 and F1-score of 072.
accuracy. Precision, recall, and F1-score for U-Net with Finally in model-centric, the best performance according
Table 1
Quantitative evaluation of models.

                              Model                     Tr-loss   Va-loss   Precision   Recall    F1-score
                             U-Net                       0.14      0.26        0.76      0.48       0.59
                        ResNet(34)U-Net                  0.13      0.18        0.67      0.72        0.7
             ResNet(34)U-Net + attention mechanism       0.05      0.08        0.88      0.62       0.72
                      Data-centric (U-Net)               0.13      0.17        0.71      0.73       0.72




 µ                                                           Training the model using synthesized and augmented
                                                             data provided a great performance on both training data
                                                             and validation. And evaluating predicted landslide with
                                                             inventory map indicated the F1-score of 0.72 while great
                                                             consistency between other metrics such as precision with
                                                             0.71 and recall of 0.73. This experiment clearly shows
                                                             that by generating/synthesizing data and argumentation
         U-Net                                               available data higher accuracy can be achieved even with

 µ
                                                             simple model architecture. For example, the performance
                                                             and accuracy of the data-centric U-Net model were quite
                                                             similar to an advanced U-Net model with a ResNet back-
                                                             bone and attention mechanism, and in terms of F1-score,
                                                             the difference in both models’ performance was 1%.


                                                             6. Conclusions
        ResNet(34)U-Net


 µ                                                           In this study, the goal was to compare two approaches
                                                             namely model-centric and data-centric in DL for remote
                                                             sensing application of landslide detection. According
                                                             to our accuracy assessment result we conclude that the
                                                             accuracy of landslide detection can be improved by opti-
                                                             mizing network structures or training data set to a certain
                                                             extent. We showed that the process of enhancing sam-
          ResNet(34)U-Net + attention mechanism              ple sets in the data-centric and may adding additional
                                                             information is an optimization on the data level, which
µ                                                            is applicable any DL models and the common ones like
                                                             the U-Net model. A direction worth pursuing is how we
                                                             can enhance the landslide detection accuracy of the DL
                                                             results by modifying the training samples before or in
                                                             the feature learning step. We developed a data-centric
           Inventory map                                     approach that includes different measurements, and we
           Data-centric (U-Net)
                                   0   1.25 2.5  5
                                                  Kilometers
                                                             compared the results with those obtained from complex
                                                             network structures to represent the potential capabilities
                                                             of data optimization. The application of popular segmen-
Figure 3: Landslide detection results based on the model- tation models like FCN, SegNet, Deeplab, and ASPP, also
centric approaches and data-centric approach.
                                                             the impact of the data-centric approach on the model
                                                             transferability to new areas is the focus of our next work.

to F1-score achieved by the U-Net model based on ResNet
backbone and attention mechanism, the score achieved Acknowledgments
was 0.73 although the recall value was 0.62 it provided
                                                         This research was funded by the Institute of Advanced
the highest precision of 0.88. In Data-Centric, the con-
                                                         Research in Artificial Intelligence (IARAI) GmbH, Vienna,
ventional U-Net with the same architecture was used as
                                                         Austria.
the fixed model, while data went through argumentation.
References                                                           14 (2022). URL: https://www.mdpi.com/2072-4292/
                                                                     14/11/2552. doi:10.3390/rs14112552 .
 [1] S. Ghaffarian, J. Valente, M. Van Der Voort, B. Tekin-     [13] O. Ghorbanzadeh, S. R. Meena, H. S. S. Abadi, S. T.
     erdogan, Effect of attention mechanism in deep                  Piralilou, L. Zhiyong, T. Blaschke, Landslide map-
     learning-based remote sensing image processing:                 ping using two main deep-learning convolution
     A systematic literature review, Remote Sensing 13               neural network streams combined by the dempster–
     (2021) 2965.                                                    shafer model, IEEE Journal of selected topics in
 [2] L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, B. A. John-             applied earth observations and remote sensing 14
     son, Deep learning in remote sensing applica-                   (2020) 452–463.
     tions: A meta-analysis and review, ISPRS journal           [14] R. Yang, F. Zhang, J. Xia, C. Wu, Landslide
     of photogrammetry and remote sensing 152 (2019)                 extraction using mask r-cnn with background-
     166–177.                                                        enhancement method, Remote Sensing 14 (2022)
 [3] O. Ghorbanzadeh, Y. Xu, P. Ghamis, M. Kopp,                     2206.
     D. Kreil, Landslide4sense: Reference benchmark             [15] L. J. Miranda, Towards data-centric machine learn-
     data and deep learning models for landslide detec-              ing: a short review, ljvmiranda921. github. io (2021).
     tion, arXiv preprint arXiv:2206.00515 (2022).              [16] I. Pan, L. R. Mason, O. K. Matar, Data-centric engi-
 [4] Z. Ma, G. Mei, Deep learning for geological hazards             neering: integrating simulation, machine learning
     analysis: Data, models, applications, and opportu-              and statistics. challenges and opportunities, Chem-
     nities, Earth-Science Reviews 223 (2021) 103858.                ical Engineering Science 249 (2022) 117271.
 [5] E. Maggiori, Y. Tarabalka, G. Charpiat, P. Al-             [17] M. Main-Knorn, B. Pflug, J. Louis, V. Debaecker,
     liez, Convolutional neural networks for large-scale             U. Müller-Wilm, F. Gascon, Sen2cor for sentinel-
     remote-sensing image classification, IEEE Transac-              2, in: Image and Signal Processing for Remote
     tions on geoscience and remote sensing 55 (2016)                Sensing XXIII, volume 10427, International Society
     645–657.                                                        for Optics and Photonics, 2017, p. 1042704.
 [6] Y. Xu, P. Ghamisi, Region-growing fully convolu-           [18] D. Bahdanau, K. Cho, Y. Bengio, Neural machine
     tional networks for hyperspectral image classifica-             translation by jointly learning to align and translate,
     tion with point-level supervision (2021).                       arXiv preprint arXiv:1409.0473 (2014).
 [7] O. Ronneberger, P. Fischer, T. Brox, U-net: Con-           [19] T. Blaschke, Object based image analysis for remote
     volutional networks for biomedical image segmen-                sensing, ISPRS journal of photogrammetry and
     tation, in: International Conference on Medical                 remote sensing 65 (2010) 2–16.
     image computing and computer-assisted interven-            [20] S. Tavakkoli Piralilou, H. Shahabi, B. Jarihani,
     tion, Springer, 2015, pp. 234–241.                              O. Ghorbanzadeh, T. Blaschke, K. Gholamnia, S. R.
 [8] S. L. Ullo, A. Mohan, A. Sebastianelli, S. E. Ahamed,           Meena, J. Aryal, Landslide detection using multi-
     B. Kumar, R. Dwivedi, G. R. Sinha, A new mask                   scale image segmentation and different machine
     r-cnn-based method for improved landslide detec-                learning models in the higher himalayas, Remote
     tion, IEEE Journal of Selected Topics in Applied                Sensing 11 (2019) 2575.
     Earth Observations and Remote Sensing 14 (2021)
     3799–3810.
 [9] O. Ghorbanzadeh, A. Crivellari, P. Ghamisi, H. Sha-
     habi, T. Blaschke, A comprehensive transferability
     evaluation of u-net and resu-net for landslide de-
     tection from sentinel-2 data (case study areas from
     taiwan, china, and japan), Scientific Reports 11
     (2021) 1–20.
[10] M. Zhang, H. Singh, L. Chok, R. Chunara, Seg-
     menting across places: The need for fair transfer
     learning with satellite imagery, arXiv preprint
     arXiv:2204.04358 (2022).
[11] O. Ghorbanzadeh, H. Shahabi, A. Crivellari,
     S. Homayouni, T. Blaschke, P. Ghamisi, Landslide
     detection using deep learning and object-based im-
     age analysis, Landslides (2022) 1–11.
[12] Z. Dong, S. An, J. Zhang, J. Yu, J. Li, D. Xu, L-unet: A
     landslide extraction model using multi-scale feature
     fusion and attention mechanism, Remote Sensing