=Paper=
{{Paper
|id=Vol-3207/paper7
|storemode=property
|title=Model-Centric vs Data-Centric Deep Learning Approaches for Landslide Detection
|pdfUrl=https://ceur-ws.org/Vol-3207/paper7.pdf
|volume=Vol-3207
|authors=Hejar Shahabi,Omid Ghorbanzadeh
|dblpUrl=https://dblp.org/rec/conf/cdceo/ShahabiG22
}}
==Model-Centric vs Data-Centric Deep Learning Approaches for Landslide Detection==
Model-Centric vs Data-Centric Deep Learning Approaches for Landslide Detection Hejar Shahabi1 , Omid Ghorbanzadeh2 1 Centre Eau Terre Environnement, Institut National de la Recherche Scientifique (INRS), Quebec City, QC G1K 9A9, Quebec, Canada 2 Institute of Advanced Research in Artificial Intelligence (IARAI), Vienna, Austria Abstract The implementation of deep learning (DL) models has significantly improved the accuracy and automation of remote sensing (RS) image classification tasks, such as landslide detection. The reason is that DL models have independent feature learning and strong computing capabilities and have attracted continuous attention in modifications and enhancements through numerous model-centric efforts. In practice, however, the impact of the quality of training samples on classification performance is usually ignored. This study uses a model-centric approach in which a U-Net network is regarded as a baseline model. A ResNet-34 model is used to optimize the baseline model, and the optimized model is further enhanced by adding an attention mechanism. However, in the data-centric approach, the baseline model is only trained based on the enhanced training samples. Our data-centric approach increased the F1-score by over 13 percentage points, which is the same increase as the most sophisticated and complex model-centric approach. Keywords Attention mechanism, deep learning, object-based image analysis (OBIA), landslide extraction 1. Introduction and DenseNet [4]. In RS tasks such as image classification, however, the As remote sensing (RS) imagery has become the basis goal usually is to label every pixel within an image, and of data collection across various fields, such as agricul- DL semantic segmentation techniques such as Fully Con- ture, environment, and disaster risk management, critical volutional Networks (FCN) are used to achieve that goal information can be extracted from such multi-temporal [5, 6]. The U-Net [7] algorithm, which utilizes encoder- and multi-resolution images through image classification, decoder architectures to improve FCN, is widely used by object detection, and time series analysis [1]. However, the RS community for image segmentation and object de- one of the most critical aspects of such data processing is tection, although it was initially designed for medical im- selecting the appropriate method to use. For many years, age segmentation. Other region-based models are used to the remote sensing community has used artificial neural detect and segment objects, including Faster R-CNN and networks (ANN) such as Multi-Perceptron Layer (MLP) as Mask R-CNN, successfully applied to landslide inventory a conventional method of image classification [2]. Until mapping [8]. These segmentation models have also be- recently, however, conventional machine learning mod- come more sophisticated and advanced by incorporating els like support vector machines and ensemble classifiers concepts like attention mechanisms and or incorporat- such as random forests nearly replaced ANN models for ing backbone models and weights such as the residual tasks like image classification and change detection be- networks (ResNets) [9]. The attention mechanisms focus cause they can handle data with high dimensions and on certain features or regions while overlooking others, provide acceptable performance even with limited la- such the way of working of human vision. beled data [2]. Recent developments in computer vision Some current studies incorporated other approaches and graphics processing units (GPUs) have led to a rise with DL models to increase the transferability [10] and in the popularity of deep neural networks within the RS also achieve higher accuracy in RS classification tasks community for different tasks, as they have generated and in landslide detection. Ghorbanzadehet al. [11] did robust results in image classification and image segmen- a model-centric strategy by synchronizing the heat map tation [3]. Several image classification models have been resulting in a ResU-Net network by knowledge-based developed, such as AlexNet, VGG net, GoogleNet, ResNet, object-based image analysis (OBIA) for landslide detec- CDCEO 2022: 2nd Workshop on Complex Data Challenges in Earth tion. Their experiences have done based on the satellite Observation, July 25, 2022, Vienna, Austria Sentinel-2 imagery. Their result evaluation indicated that Envelope-Open hejar.shahabi@inrs.ca (H. Shahabi); integrating OBIA with U-net resulted in an F1 score value omid.ghorbanzadeh@iarai.ac.at (O. Ghorbanzadeh) nearly 8% higher than the baseline ResU-Net model for Orcid 0000-0002-3275-8436 (H. Shahabi); 0000-0002-9664-8770 the landslide detection task. In another study, Donget (O. Ghorbanzadeh) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License al. [12], improved the U-Net’s ability for landslide de- CEUR Workshop Proceedings Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 tection by adding a multi-scale feature-fusion module, µ a residual attention network, and a data-dependent up Hokkaido sampling method. Their enhanced network, named L- Unet could increase the F1 score by more than 3%. While Ghorbanzadehet al. [13] followed a data-centric strat- egy by preparing six training data sets based on optical data and different topographic information to evaluate the performance of a deep-learning convolution neural Japan network (CNN) for landslide detection, a study area in Nepal. Their most remarkable improvement increased 0 125 250 500 Kilometers the mIOU by more than 17 percentage points. Yanget al. [14] have done a training samples enhancement and Training Site developed a background-enhancement technique that could support distinguishing landslides and similar back- ground features for training the Mask R-CNN model. The F1 score was significantly higher (22.38%) than the one obtained using only satellite images as input data in their experiment. As a result of a literature review on applying DL to RS applications, including landslide detection, the focus Testing Site is mainly on model-centric approaches such as adapt- ing, comparing architectures, and developing advanced models. At the same time, input data plays a less im- portant role in model performance. In the model-centric approach, the input data is the same, while the main effort is focused on code and developing experimental re- search to improve the model performance. This involves 0 1.5 3 6 selecting the best model architecture and training pro- Landslides Kilometers cess from various possibilities [15]. On the other hand, in the data-centric approach, the goal is to systematically Figure 1: The location of the study area. alter, synthesize, and improve datasets to increase the model’s accuracy with a fixed architecture [16]. To our best knowledge, no study has comprehensively compared or discussed model-centric vs. data-centric approaches few planar and spoon-type deep landslides. A landslide in landslide detection purposes. Since most DL studies inventory map was generated and updated by the Ge- for RS tasks are model-centric, in this experimental case ographical Survey Institute (GSI) of Japan using aerial study, we aim to compare and evaluate the performance orthophotos, very high-resolution aerial images, and a of model-centric and data-centric approaches in landslide 10m resolution digital elevation model (DEM) [9]. detection using Sentinel-2 imagery, ALOS elevation data As part of this study, an inventory map (around 4950 U-net segmentation method. landslides in an area of 43 km2) in shape file format provided by GIS in ESRI was acquired and used as ground truth for further analysis. Landslides were detected using 2. Study area and data set Sentinel-2 multispectral imagery in this case. We applied atmospheric corrections to images using Sen2Cor [17], A magnitude 6.6 earthquake struck Eastern Iburi, a SNAP plugin. We generated slope layers using the Hokkaido, Japan, on September 6, 2018. There were ex- 12-m ALOS DEM, which is an important data set for tensive damages caused by this incident, including power mapping landslides [3]. Sentinel 2 images range in spatial cuts, damage to transmission and distribution networks, resolution from 10 to 60 meters, so we selected only bands and damage to the Tomato-Atsuma Power Station, which 2-4 and 8, which have a 10-meter resolution and excluded supplies electricity to Hokkaido Island. There were 41 other bands. The generated slope layer was re sampled fatalities in all, 36 of which were caused by landslides to 10-m and stacked with the Sentinel bands. Figure 1 triggered by the earthquake. Typhoon Jebi brought tor- shows the study area and landslide inventory. rential rains to the region just a day before earthquake, which made hills unstable and prone to landslides. This caused nearly 5600 landslides in the area. The result was a significant number of shallow landslides as well as a 3. Methodology associated with the highest weights. A detailed descrip- tion of the attention mechanism in CNN-based models 3.1. Model-centric approach for the RS applications is provided by [1]. In this case, an attention mechanism is added to the U-Net model with 3.1.1. U-Net the ResNet 34 backbone for landslide detection. As stated in the introduction section, U-Net was initially introduced for the segmentation of biomedical images 3.2. Data-centric approach Ronnebergeret al. [7]. Because of its robust performance, it has been used for a wide range of image segmentation In this approach, the U-Net architecture applied by [9], problems, including remote sensing image classification which was introduced as our baseline model is used with- and object detection [9]. In U-Net Architecture, encoder out any model enhancement. However, Data enhance- and decoder are the two main components. By apply- ment will be done by synthesizing and augmenting train- ing convolution, activation functions, and pooling oper- ing samples. Therefore, along with the 10-m Sentinel 2 ations, the encoder learns how to abstractly represent images and slope layer, normalized difference vegetation the input image. Pooling reduces computational cost in index (NDVI) will be generated and fed into the model this part, but spatial data is lost [7]. Through operations as well. This index is helpful for discriminating some such as transpose convolution model, the decoder part of landslides that removed the surface vegetation from attempts to restore the original size of the abstracted the background objects [14, 13]. Moreover, using the representation. Concatenating the output of transpose OBIA concept and multi-resolution segmentation (MRS), convolution with the skip connection feature map in the which is a bottom-up segmentation technique based on encoder part is the skip-connection feature map at the the pairwise region-merging approach the size of the same level as the output of transpose convolution. While generated object is controlled by the scale factor [19]. in the encoder part the number of feature channels is In order to avoid errors such as over-segmentation and doubled at each down-sampling, in the decoder part it under-segmentation an index called as object fitness in- is diminished by half until in the final layer a convolu- dex (OFI) introduced by [20] is applied to guaranty the tion (1x1) is used (in this case with sigmoid activation) to quality of objects. The mean values of objects for each map the channel patterns to a given number of classes. image band, including NDVI and slope, will be calculated, The architecture used by Ghorbanzadehet al. [9] will be and then exported in a raster format to be stacked with implemented for this study. another dataset as input to the U-Net model. 3.1.2. Residual network (ResNet) 4. Experimental results ResNet serves as a backbone for many computer vision tasks, including remote sensing image classification and In this study, a tile size of 128 × 128 without any overlap segmentation [9]. In 2015, it won the ImageNet chal- was used as an input to all applied models. The accu- lenge with extremely deep neural networks (more than racy of the model was also validated by selecting 30% of 150 layers). ResNet’s success is mainly due to its novel training data sets at random. As our task is binary classi- architecture that introduced skip connections for the firstfication, we applied Sigmoid as the activation function time, which add the output from the previous layer to in the last layers and rectified linear activation function the layer ahead. This alternative shortcut path allows (ReLU) in the earlier layers. All models were set at 100 gradient to flow through and prevents the problem of epochs, but a function was defined to save the model on vanishing gradient. The baseline model of the U-Net de- the epoch number to ensure minimum losses. Our DL sign is used in this research, but with ResNet-34 acting models were all implemented in Python using Tensor- as a backbone. Flow API and Keras library, for segmentation part we used eCognition software. As previously noted, data aug- mentation was only applied to a data-centric approach. 3.1.3. Attention mechanism Each model was evaluated based on standard accuracy Bahdanauet al. [18] introduced the attention mechanism assessment metrics of the precision, recall and F1-score. to enhance the performance of the encoder-decoder mod- A loss and F1-score of 0.14 and 0.88 were achieved dur- els for machine translation system. Later its variants ing the training of the conventional U-Net model while, were used in other application including the RS applica- these values were 0.26 and 0.68 for validation data. A tions. Through a weighted combination of encoded input total of almost 4 million parameters were trained in this data, the decoder has access to the most valuable parts of model. For test area, the trained model was used to de- the input sequence, thus, the most relevant parts will be tect landslides; the accuracy assessment results showed precision, recall, and F1-score values of 0.76, 0.48, and ResNet backbone were 0.67, 0.72, and 0.70, respectively. Finally, the most complex version of U-Net that includes both ResNet backbone and attention mechanism with 10.5 million parameters was trained. For training, loss U-Net and F1-score values were 0.05 and 0.95, and for validation were 0.08 and 91, respectively. However, like U-Net with ResNet backbone model, the performance of the model in the test area even with adding attention mechanism to the model was not significant, and values 0.88, 0.62, 0.72 were achieved for precision, recall, and F1-score, accord- ingly. It provided the best precision score among other models. ResNet(34)U-Net For the data-centric approach, only the base line U-Net model with the same architecture was used as for the first scenario in the model-centric approach. However, the input data has been modified. The OBIA features were stacked with other images, and then data augmentation techniques such as flipping (horizontally and vertically) and rotating (90, 180, 270 degrees) were performed. This resulted in 10395 image patches being fed to the net- work instead of 2079 image patches. In addition, data ResNet(34)U-Net + attention mechanism augmentation was not applied to validation data. The U-Net model achieved values of 0.13 for loss and 0.93 for F1-score during training. With validation data, however, the values were 0.18 and 0.90. The trained model was used to predict test data, resulting in precision, recall, and F1-score of 0.71, 0.73, and 0.72. The figures 2 and 3 are respectively depicting the training curve of a model and the landslide prediction map for a model-centric and a data-centric approach to landslide prediction. Data-centric (U-Net) 5. Discussions In the model-centric approach, using the same labeled data while varying architecture and parameters resulted Training loss Validation loss in varying accuracies in training and validation phases, Training F1-score Validation F1-score general U-Net models performed relatively poorly (un- derfitting error) in the training process compared to oth- ers. U-Net based on ResNet Backbone, however, per- Figure 2: Training metrics graphs for model-centric ap- proaches and data-centric approach. forms better during training, while the best performance is achieved with U-Net based on ResNet and attention mechanism. Loss values for both training and valida- tion data indicate that by increasing models’ parameters 0.59, respectively. The same U-Net was trained in the higher accuracy during training can be achieved. But following, but with ResNet 34 as a backbone model. The getting such high accuracy throughout training can be total number of parameters for this model was 23 million, a sign of overfitting. Therefore, prediction results by but only 1.8 million parameters were trained, and for the such a model are evaluated with inventory data. Conse- rest pre-trained weights were applied. Training Loss and quently, excepting the general U-Net provided the lowest F1-score values for U-Net with ResNet 34 backbone were accuracy with a recall of 0.48 and an F1-score of 0.60. 0.13 and 0.93, respectively, and for the validation data which means it U-Net was able to detect only 48% of set, the scores were 0.18 and 0.90. Although the model’s landslides. Furthermore, U-Net with ResNet backbone accuracy on training data and validation is quite close, its provided much better performance compared to the gen- performance in the test area did not provide much higher eral U-Net model, with a recall of 0.72 and F1-score of 072. accuracy. Precision, recall, and F1-score for U-Net with Finally in model-centric, the best performance according Table 1 Quantitative evaluation of models. Model Tr-loss Va-loss Precision Recall F1-score U-Net 0.14 0.26 0.76 0.48 0.59 ResNet(34)U-Net 0.13 0.18 0.67 0.72 0.7 ResNet(34)U-Net + attention mechanism 0.05 0.08 0.88 0.62 0.72 Data-centric (U-Net) 0.13 0.17 0.71 0.73 0.72 µ Training the model using synthesized and augmented data provided a great performance on both training data and validation. And evaluating predicted landslide with inventory map indicated the F1-score of 0.72 while great consistency between other metrics such as precision with 0.71 and recall of 0.73. This experiment clearly shows that by generating/synthesizing data and argumentation U-Net available data higher accuracy can be achieved even with µ simple model architecture. For example, the performance and accuracy of the data-centric U-Net model were quite similar to an advanced U-Net model with a ResNet back- bone and attention mechanism, and in terms of F1-score, the difference in both models’ performance was 1%. 6. Conclusions ResNet(34)U-Net µ In this study, the goal was to compare two approaches namely model-centric and data-centric in DL for remote sensing application of landslide detection. According to our accuracy assessment result we conclude that the accuracy of landslide detection can be improved by opti- mizing network structures or training data set to a certain extent. We showed that the process of enhancing sam- ResNet(34)U-Net + attention mechanism ple sets in the data-centric and may adding additional information is an optimization on the data level, which µ is applicable any DL models and the common ones like the U-Net model. A direction worth pursuing is how we can enhance the landslide detection accuracy of the DL results by modifying the training samples before or in the feature learning step. We developed a data-centric Inventory map approach that includes different measurements, and we Data-centric (U-Net) 0 1.25 2.5 5 Kilometers compared the results with those obtained from complex network structures to represent the potential capabilities of data optimization. The application of popular segmen- Figure 3: Landslide detection results based on the model- tation models like FCN, SegNet, Deeplab, and ASPP, also centric approaches and data-centric approach. the impact of the data-centric approach on the model transferability to new areas is the focus of our next work. to F1-score achieved by the U-Net model based on ResNet backbone and attention mechanism, the score achieved Acknowledgments was 0.73 although the recall value was 0.62 it provided This research was funded by the Institute of Advanced the highest precision of 0.88. In Data-Centric, the con- Research in Artificial Intelligence (IARAI) GmbH, Vienna, ventional U-Net with the same architecture was used as Austria. the fixed model, while data went through argumentation. References 14 (2022). URL: https://www.mdpi.com/2072-4292/ 14/11/2552. doi:10.3390/rs14112552 . [1] S. Ghaffarian, J. Valente, M. Van Der Voort, B. Tekin- [13] O. Ghorbanzadeh, S. R. Meena, H. S. S. Abadi, S. T. erdogan, Effect of attention mechanism in deep Piralilou, L. Zhiyong, T. Blaschke, Landslide map- learning-based remote sensing image processing: ping using two main deep-learning convolution A systematic literature review, Remote Sensing 13 neural network streams combined by the dempster– (2021) 2965. shafer model, IEEE Journal of selected topics in [2] L. Ma, Y. Liu, X. Zhang, Y. Ye, G. Yin, B. A. John- applied earth observations and remote sensing 14 son, Deep learning in remote sensing applica- (2020) 452–463. tions: A meta-analysis and review, ISPRS journal [14] R. Yang, F. Zhang, J. Xia, C. Wu, Landslide of photogrammetry and remote sensing 152 (2019) extraction using mask r-cnn with background- 166–177. enhancement method, Remote Sensing 14 (2022) [3] O. Ghorbanzadeh, Y. Xu, P. Ghamis, M. Kopp, 2206. D. Kreil, Landslide4sense: Reference benchmark [15] L. J. Miranda, Towards data-centric machine learn- data and deep learning models for landslide detec- ing: a short review, ljvmiranda921. github. io (2021). tion, arXiv preprint arXiv:2206.00515 (2022). [16] I. Pan, L. R. Mason, O. K. Matar, Data-centric engi- [4] Z. Ma, G. Mei, Deep learning for geological hazards neering: integrating simulation, machine learning analysis: Data, models, applications, and opportu- and statistics. challenges and opportunities, Chem- nities, Earth-Science Reviews 223 (2021) 103858. ical Engineering Science 249 (2022) 117271. [5] E. Maggiori, Y. Tarabalka, G. Charpiat, P. Al- [17] M. Main-Knorn, B. Pflug, J. Louis, V. Debaecker, liez, Convolutional neural networks for large-scale U. Müller-Wilm, F. Gascon, Sen2cor for sentinel- remote-sensing image classification, IEEE Transac- 2, in: Image and Signal Processing for Remote tions on geoscience and remote sensing 55 (2016) Sensing XXIII, volume 10427, International Society 645–657. for Optics and Photonics, 2017, p. 1042704. [6] Y. Xu, P. Ghamisi, Region-growing fully convolu- [18] D. Bahdanau, K. Cho, Y. Bengio, Neural machine tional networks for hyperspectral image classifica- translation by jointly learning to align and translate, tion with point-level supervision (2021). arXiv preprint arXiv:1409.0473 (2014). [7] O. Ronneberger, P. Fischer, T. Brox, U-net: Con- [19] T. Blaschke, Object based image analysis for remote volutional networks for biomedical image segmen- sensing, ISPRS journal of photogrammetry and tation, in: International Conference on Medical remote sensing 65 (2010) 2–16. image computing and computer-assisted interven- [20] S. Tavakkoli Piralilou, H. Shahabi, B. Jarihani, tion, Springer, 2015, pp. 234–241. O. Ghorbanzadeh, T. Blaschke, K. Gholamnia, S. R. [8] S. L. Ullo, A. Mohan, A. Sebastianelli, S. E. Ahamed, Meena, J. Aryal, Landslide detection using multi- B. Kumar, R. Dwivedi, G. R. Sinha, A new mask scale image segmentation and different machine r-cnn-based method for improved landslide detec- learning models in the higher himalayas, Remote tion, IEEE Journal of Selected Topics in Applied Sensing 11 (2019) 2575. Earth Observations and Remote Sensing 14 (2021) 3799–3810. [9] O. Ghorbanzadeh, A. Crivellari, P. Ghamisi, H. Sha- habi, T. Blaschke, A comprehensive transferability evaluation of u-net and resu-net for landslide de- tection from sentinel-2 data (case study areas from taiwan, china, and japan), Scientific Reports 11 (2021) 1–20. [10] M. Zhang, H. Singh, L. Chok, R. Chunara, Seg- menting across places: The need for fair transfer learning with satellite imagery, arXiv preprint arXiv:2204.04358 (2022). [11] O. Ghorbanzadeh, H. Shahabi, A. Crivellari, S. Homayouni, T. Blaschke, P. Ghamisi, Landslide detection using deep learning and object-based im- age analysis, Landslides (2022) 1–11. [12] Z. Dong, S. An, J. Zhang, J. Yu, J. Li, D. Xu, L-unet: A landslide extraction model using multi-scale feature fusion and attention mechanism, Remote Sensing