=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_50
|storemode=property
|title=Deep Learning Approaches for Flood Classification and Flood Aftermath Detection
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_50.pdf
|volume=Vol-2283
|authors=Naina Said,Konstantin Pogorelov,Kashif Ahmad,Michael Riegler,Nasir Ahmad,Olga Ostroukhova,Pål Halvorsen,Nicola Conci
|dblpUrl=https://dblp.org/rec/conf/mediaeval/SaidPARAOHC18
}}
==Deep Learning Approaches for Flood Classification and Flood Aftermath Detection==
Deep learning approaches for flood classification and flood aftermath detection Naina Said1 , Konstantin Pogorelov2 , Kashif Ahmad4 , Michael Riegler3 , Nasir Ahmad1 , Olga Ostroukhova 5 , Pål Halvorsen3 , Nicola Conci4 1 University of Engineering and Technology, Peshawar, Pakistan 2 Simula Research Laboratory, University of Oslo, Norway 3 Simula Metropolitan Center for Digital Engineering, University of Oslo, Norway 4 University of Trento, Italy 5 Research Institute of Multiprocessor Computation Systems n.a. A.V. Kalyaev, Russia naina_dcse@yahoo.com,konstantin@simula.no,kashif.ahmad@unitn.it michael@simula.no,n.ahmad@uetpeshawar.edu.pk,paalh@simula.no,nicola.conci@unitn.it ABSTRACT 2 PROPOSED APPROACH This paper presents the method proposed by team UTAOS for 2.1 Methodology for FCSM Task MediaEval 2018 Multimedia Satellite Task: Emergency Response To tackle the first challenge, based on our previous experience [3], for Flooding Events. In the first challenge, we mainly rely on object we rely on features extracted through four different Convolutional and scene level features extracted through multiple deep models Neural Network (CNN) models pre-trained on the ImageNet dataset pre-trained on the ImageNet and Places datasets. The object and [6] and the Places dataset [18]. These models include two pre- scene-level features are combined using early, late and double fu- trained models, on the Places dataset (AlexNet[10] and VggNet sion techniques achieving an average F1-score of 60.59%, 63.58% [15]) and two models (VggNet and ResNet [8]) pre-trained on the and 65.03%, respectively. For the second challenge, we rely on a con- ImageNet dataset. The models pre-trained on Imagenet correspond volutional neural networks (CNNs) and a transfer learning-based to object level information while the ones pre-trained on the Places classification approach achieving an average F1-score of 62.30% and dataset extract scene level information. For feature extraction from 61.02% for run 1 and run 2, respectively. all models, we use the Caffe toolbox1 . To be able to fuse the complementary information (i.e., object and scene-level features), we use three different fusion techniques, namely early, late and double fusion. In the early fusion, we simply 1 INTRODUCTION concatenate the features extracted through different models. In Natural disasters, such as floods, earthquakes and droughts, may the late fusion, we simply average the results obtained through cause significant damage to both human life and infrastructure. In the individual models. In our third fusion technique, we combine such adverse events, an instant access to information may help the results obtained from the first two techniques in an additional to mitigate the damage [1, 2]. In recent years, social media and late fusion step by assigning them equal weights. For classification remotely sensed information have been widely utilized to analyze purposes, we rely on Support Vector Machines (SVMs) in all of the natural disasters and their potential impact on the environment submitted fusion runs. [4, 9, 11]. Similar to the 2017 version, the MediaEVal 2018 Social Media and Satellite task [5] aims to combine the information from 2.2 Methodology for FDSI Task the two complementary sources of information to provide a better For the FDSI part of the task, we initially tried to apply the well- view of the underlying natural disaster. performing Generative Adversarial Network (GAN) approach in- This paper provides detailed description of the methods pro- troduced in our previous works for the flood detection satellite posed by team UTAOS for MediaEval 2018 Multimedia Satellite imagery [1] and medical imagery [13, 14]. Task: Emergency Response for Flooding Events. The challenge is We conducted an exhaustive set of experiments, but unfortu- composed of two parts, namely (i) flood classification for social mul- nately could not achieve a roads passability detection performance timedia and (ii) flood detection in satellite imagery. The first task is better than random label assignment would achieve. The reason for further divided in two sub-tasks aiming to predict (a) whether there that is the limited size of the dataset (only 1,437 samples were pro- are evidences of a flood in a given social media image or not and vided in the development set). This, in combination with the large (b) if evidences of flood exists in the image, whether it is possible to variety of landscapes, road types, types of obstacles and weather pass through the flooded road (passability). The second task aims conditions, etc., prevents the GAN-based approach from adequate to analyze the roads from satellite images, and predict whether or training and finding key visual features required to reliable distin- not it is possible for a vehicle to pass a road. guish between flooded and non-flooded roads. Thus, we decided to fall-back to another convolutional neural net- work (CNN) and a transfer learning-based classification approach, Copyright held by the owner/author(s). MediaEval’18, 29-31 October 2018, Sophia Antipolis, France 1 http://caffe.berkeleyvision.org/ MediaEval’18, 29-31 October 2018, Sophia Antipolis, France N. Said et al. which has been mainly introduced for the medical images clas- Table 2: Evaluation of our proposed approach for the FDSI sification in our previous work [12]. This approach is based on task in terms of F1 Scores. the Inception v3 architecture [16] pre-trained on the ImageNet dataset [6] and the retraining method described in [7]. Run Method F1 Score For the here presented work, we froze all the basic convolutional 1 All-train 62.30% layers of the network and only retrained the two top fully con- 2 Half-train 61.02% nected layers after random initialization of their weights. The fully connected layers were retrained using the RMSprop [17] optimizer which allows an adaptive learning rate during the training process. As the input for the CNN model, we used the image patches, available, the usage of the additional road network detection meth- extracted from the full images using the provided coordinates of ods is not possible for these runs. Thus, we decided to perform two the target road end points. Visual inspection of the training-dataset- types of training for our transfer-learning detection approach. generated roads’ patches showed relatively good coverage for the First, we implemented a pipeline for classification that differs road-related areas and enough coverage of the neighbourhood areas from common procedures. This process was involving all the train- together give enough visual information for the following CNN- ing samples into the training process as both training and validation based analysis and classification. sets. Usually, for classification tasks, this would result into over- Moreover, in order to increase the number of training samples, fitting of the model and inability to correctly classify the test sam- we also performed various augmentation operations on the im- ples. However, for this specific task, the limited number of training ages. Specifically, we performed horizontal and vertical flipping and epochs and significant training data augmentation in conjunction change of brightness in the interval of ±40%. After the model has with a high variety of road patch samples resulted in normal train- been retrained, we used it for a multi-class classifier that provides ing process. This allowed to correctly retrain the last layers of the the probability for each of two classes: passable and non-passable. network and produce reasonable classifiers even on such a limited training set. 3 RESULTS AND ANALYSIS The official F1-Score metric (see table 2) on the non-passable 3.1 Runs Description in FCSI Task road class for the first "All-train" run is 62.30%. To verify our idea of the usability of using all the training data for both training and During the experimentation on the development set, we observed validation, we also performed a normal network training with a that the classifiers trained on scene level features extracted through random 50/50 development/validation data split. This second Half- models pre-trained on the places dataset perform better compared trained run resulted in F1-Score of 61.02% which is slightly lower to the ones pre-trained on Imagenet. However, we also observed that comparing to the All-trained run. This is confirming the validity combining the object and scene level features leads to better results of our idea of using the complete training dataset and heavy data compared to the individual models. In order to better utilize the augmentation to improve road patches classification performance. scene and object level features, we used three different techniques in each run. Our first run is based on late fusion where we used equal weights for each model. In the second run, we concatenate the 4 CONCLUSIONS AND FUTURE WORK features extracted through the individual models. An SVM classifier This year, the social media and satellite task introduced a new and is then trained on the combined feature vectors. In the final run, important challenge of detecting the passibility of roads, which can we use double fusion by combining the results of the first two runs be vital for the people lining in the flooded regions. In the social in a late fusion way. Table 1 provides the evaluation results of our media image analysis, we mainly relied on deep features extracted proposed methods in terms of F1 scores for each of the runs. Overall, through different pre-trained deep models. During the experiments, better results are obtained with double fusion while least results we observed that the scene-level information, extracted through are obtained with an early fusion of the features. models pre-trained on the places dataset perform better compared to the ones pre-trained on Imagenet. However, the object-level Table 1: Evaluation of our proposed approach for the FCSI information well complement the scene-level features when com- task in terms of F1 Scores. bined. We also observed that double fusion performs slightly better than the early fusion on the provided dataset. However, it needs to Run Method F1 Score be investigated in more detail. In the future, we aim to analyze the 1 Late Fusion 63.58% task with more advanced early and late fusion techniques. 2 Early Fusion 60.59% On the other hand, in the satellite subtask, we found that just 3 Double Fusion 65.03% a normal image segmentation approach cannot help, and we im- plemented an task-oriented CNN and transfer-learning-based ap- proach. This approach was able to classify image patches with roads and achieved an F1-Score of 62.30% for the non-passable road class. 3.2 Runs Description in FDSI Task In the future, we plan to implement an advanced road network and For the experimental setup of the FDSI task, we decided to perform flooding detection and segmentation using a combined CNN- and only two mandatory runs which are utilizing the task-provided GAN-based approach pre-trained on the existing annotated road training data only. Due to a limited amount of training samples network and flooded areas datasets. Task name as it appears on http://multimediaeval.org/mediaeval2018 MediaEval’18, 29-31 October 2018, Sophia Antipolis, France REFERENCES [10] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Im- [1] Kashif Ahmad, Konstantin Pogorelov, Michael Riegler, Nicola Conci, agenet classification with deep convolutional neural networks. In and Pål Halvorsen. 2018. Social media and satellites. Multimedia Tools Advances in neural information processing systems. 1097–1105. and Applications (2018), 1–39. [11] Keiller Nogueira, Samuel G Fadel, Ícaro C Dourado, Rafael de O Wer- [2] Kashif Ahmad, Michael Riegler, Konstantin Pogorelov, Nicola Conci, neck, Javier AV Muñoz, Otávio AB Penatti, Rodrigo T Calumby, Lin Tzy Pål Halvorsen, and Francesco De Natale. 2017. Jord: a system for col- Li, Jefersson A dos Santos, and Ricardo da S Torres. 2018. Exploiting lecting information and monitoring natural disasters by linking social ConvNet Diversity for Flooding Identification. IEEE Geoscience and media with satellite imagery. In Proceedings of the 15th International Remote Sensing Letters 15, 9 (2018), 1446–1450. Workshop on Content-Based Multimedia Indexing. ACM, 12. [12] Konstantin Pogorelov, Sigrun Losada Eskeland, Thomas de Lange, [3] Kashif Ahmad, Amir Sohail, Nicola Conci, and Francesco De Natale. Carsten Griwodz, Kristin Ranheim Randel, Håkon Kvale Stens- 2018. A Comparative study of Global and Deep Features for the land, Duc-Tien Dang-Nguyen, Concetto Spampinato, Dag Johansen, analysis of user-generated natural disaster related images. In 2018 IEEE Michael Riegler, and others. 2017. A holistic multimedia system for 13th Image, Video, and Multidimensional Signal Processing Workshop gastrointestinal tract disease detection. In Proceedings of the 8th ACM (IVMSP). IEEE, 1–5. on Multimedia Systems Conference. ACM, 112–123. [4] Benjamin Bischke, Prakriti Bhardwaj, Aman Gautam, Patrick Helber, [13] Konstantin Pogorelov, Olga Ostroukhova, Mattis Jeppsson, Håvard Damian Borth, and Andreas Dengel. 2017. Detection of flooding events Espeland, Carsten Griwodz, Thomas de Lange, Dag Johansen, Michael in social multimedia and satellite imagery using deep neural networks. Riegler, and Pål Halvorsen. 2018. Deep Learning and Hand-crafted In Proceedings of the Working Notes Proceeding MediaEval Workshop, Feature Based Approaches for Polyp Detection in Medical Videos. In Dublin, Ireland. 13–15. 2018 IEEE 31st International Symposium on Computer-Based Medical [5] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens de Bruijn, and Systems (CBMS). IEEE, 381–386. Damian Borth. The Multimedia Satellite Task at MediaEval 2018: [14] Konstantin Pogorelov, Olga Ostroukhova, Andreas Petlund, Pål Emergency Response for Flooding Events. In Proc. of the MediaEval Halvorsen, Thomas de Lange, Håvard Nygaard Espeland, Tomas 2018 Workshop (Oct. 29-31, 2018). Sophia-Antipolis, France. Kupka, Carsten Griwodz, and Michael Riegler. 2018. Deep learning [6] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. and handcrafted feature based approaches for automatic detection of 2009. Imagenet: A large-scale hierarchical image database. In Computer angiectasia. In Biomedical & Health Informatics (BHI), 2018 IEEE EMBS International Conference on. IEEE, 365–368. Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on. [15] Karen Simonyan and Andrew Zisserman. 2014. Very deep convo- Ieee, 248–255. lutional networks for large-scale image recognition. arXiv preprint [7] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, arXiv:1409.1556 (2014). Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional [16] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens, Activation Feature for Generic Visual Recognition.. In Proc. of ICML, and Zbigniew Wojna. 2015. Rethinking the inception architecture for Vol. 32. 647–655. computer vision. arXiv preprint arXiv:1512.00567 (2015). [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep [17] Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop: residual learning for image recognition. In Proceedings of the IEEE Divide the gradient by a running average of its recent magnitude. conference on computer vision and pattern recognition. 770–778. COURSERA: Neural networks for machine learning 4, 2 (2012). [9] M Jing, BW Scotney, SA Coleman, and others. 2016. Flood Event Image [18] Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and Recognition via Social Media Image and Text Analysis. In Signals and Aude Oliva. 2014. Learning deep features for scene recognition using Systems Conference (ISSC). 4–9. places database. In Advances in neural information processing systems. 487–495.