=Paper= {{Paper |id=Vol-2283/MediaEval_18_paper_50 |storemode=property |title=Deep Learning Approaches for Flood Classification and Flood Aftermath Detection |pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_50.pdf |volume=Vol-2283 |authors=Naina Said,Konstantin Pogorelov,Kashif Ahmad,Michael Riegler,Nasir Ahmad,Olga Ostroukhova,Pål Halvorsen,Nicola Conci |dblpUrl=https://dblp.org/rec/conf/mediaeval/SaidPARAOHC18 }} ==Deep Learning Approaches for Flood Classification and Flood Aftermath Detection== https://ceur-ws.org/Vol-2283/MediaEval_18_paper_50.pdf
      Deep learning approaches for flood classification and flood
                        aftermath detection
                             Naina Said1 , Konstantin Pogorelov2 , Kashif Ahmad4 , Michael Riegler3 ,
                              Nasir Ahmad1 , Olga Ostroukhova 5 , Pål Halvorsen3 , Nicola Conci4
                                           1 University of Engineering and Technology, Peshawar, Pakistan
                                                2 Simula Research Laboratory, University of Oslo, Norway
                               3 Simula Metropolitan Center for Digital Engineering, University of Oslo, Norway
                                                              4 University of Trento, Italy
                            5 Research Institute of Multiprocessor Computation Systems n.a. A.V. Kalyaev, Russia

                               naina_dcse@yahoo.com,konstantin@simula.no,kashif.ahmad@unitn.it
                       michael@simula.no,n.ahmad@uetpeshawar.edu.pk,paalh@simula.no,nicola.conci@unitn.it

ABSTRACT                                                                       2 PROPOSED APPROACH
This paper presents the method proposed by team UTAOS for                      2.1 Methodology for FCSM Task
MediaEval 2018 Multimedia Satellite Task: Emergency Response
                                                                               To tackle the first challenge, based on our previous experience [3],
for Flooding Events. In the first challenge, we mainly rely on object
                                                                               we rely on features extracted through four different Convolutional
and scene level features extracted through multiple deep models
                                                                               Neural Network (CNN) models pre-trained on the ImageNet dataset
pre-trained on the ImageNet and Places datasets. The object and
                                                                               [6] and the Places dataset [18]. These models include two pre-
scene-level features are combined using early, late and double fu-
                                                                               trained models, on the Places dataset (AlexNet[10] and VggNet
sion techniques achieving an average F1-score of 60.59%, 63.58%
                                                                               [15]) and two models (VggNet and ResNet [8]) pre-trained on the
and 65.03%, respectively. For the second challenge, we rely on a con-
                                                                               ImageNet dataset. The models pre-trained on Imagenet correspond
volutional neural networks (CNNs) and a transfer learning-based
                                                                               to object level information while the ones pre-trained on the Places
classification approach achieving an average F1-score of 62.30% and
                                                                               dataset extract scene level information. For feature extraction from
61.02% for run 1 and run 2, respectively.
                                                                               all models, we use the Caffe toolbox1 .
                                                                                   To be able to fuse the complementary information (i.e., object
                                                                               and scene-level features), we use three different fusion techniques,
                                                                               namely early, late and double fusion. In the early fusion, we simply
1    INTRODUCTION                                                              concatenate the features extracted through different models. In
Natural disasters, such as floods, earthquakes and droughts, may               the late fusion, we simply average the results obtained through
cause significant damage to both human life and infrastructure. In             the individual models. In our third fusion technique, we combine
such adverse events, an instant access to information may help                 the results obtained from the first two techniques in an additional
to mitigate the damage [1, 2]. In recent years, social media and               late fusion step by assigning them equal weights. For classification
remotely sensed information have been widely utilized to analyze               purposes, we rely on Support Vector Machines (SVMs) in all of the
natural disasters and their potential impact on the environment                submitted fusion runs.
[4, 9, 11]. Similar to the 2017 version, the MediaEVal 2018 Social
Media and Satellite task [5] aims to combine the information from              2.2      Methodology for FDSI Task
the two complementary sources of information to provide a better
                                                                               For the FDSI part of the task, we initially tried to apply the well-
view of the underlying natural disaster.
                                                                               performing Generative Adversarial Network (GAN) approach in-
   This paper provides detailed description of the methods pro-
                                                                               troduced in our previous works for the flood detection satellite
posed by team UTAOS for MediaEval 2018 Multimedia Satellite
                                                                               imagery [1] and medical imagery [13, 14].
Task: Emergency Response for Flooding Events. The challenge is
                                                                                  We conducted an exhaustive set of experiments, but unfortu-
composed of two parts, namely (i) flood classification for social mul-
                                                                               nately could not achieve a roads passability detection performance
timedia and (ii) flood detection in satellite imagery. The first task is
                                                                               better than random label assignment would achieve. The reason for
further divided in two sub-tasks aiming to predict (a) whether there
                                                                               that is the limited size of the dataset (only 1,437 samples were pro-
are evidences of a flood in a given social media image or not and
                                                                               vided in the development set). This, in combination with the large
(b) if evidences of flood exists in the image, whether it is possible to
                                                                               variety of landscapes, road types, types of obstacles and weather
pass through the flooded road (passability). The second task aims
                                                                               conditions, etc., prevents the GAN-based approach from adequate
to analyze the roads from satellite images, and predict whether or
                                                                               training and finding key visual features required to reliable distin-
not it is possible for a vehicle to pass a road.
                                                                               guish between flooded and non-flooded roads.
                                                                                  Thus, we decided to fall-back to another convolutional neural net-
                                                                               work (CNN) and a transfer learning-based classification approach,
Copyright held by the owner/author(s).
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France
                                                                               1 http://caffe.berkeleyvision.org/
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                                                                       N. Said et al.


which has been mainly introduced for the medical images clas-            Table 2: Evaluation of our proposed approach for the FDSI
sification in our previous work [12]. This approach is based on          task in terms of F1 Scores.
the Inception v3 architecture [16] pre-trained on the ImageNet
dataset [6] and the retraining method described in [7].                                     Run     Method       F1 Score
    For the here presented work, we froze all the basic convolutional                        1      All-train     62.30%
layers of the network and only retrained the two top fully con-                              2      Half-train    61.02%
nected layers after random initialization of their weights. The fully
connected layers were retrained using the RMSprop [17] optimizer
which allows an adaptive learning rate during the training process.
    As the input for the CNN model, we used the image patches,           available, the usage of the additional road network detection meth-
extracted from the full images using the provided coordinates of         ods is not possible for these runs. Thus, we decided to perform two
the target road end points. Visual inspection of the training-dataset-   types of training for our transfer-learning detection approach.
generated roads’ patches showed relatively good coverage for the             First, we implemented a pipeline for classification that differs
road-related areas and enough coverage of the neighbourhood areas        from common procedures. This process was involving all the train-
together give enough visual information for the following CNN-           ing samples into the training process as both training and validation
based analysis and classification.                                       sets. Usually, for classification tasks, this would result into over-
    Moreover, in order to increase the number of training samples,       fitting of the model and inability to correctly classify the test sam-
we also performed various augmentation operations on the im-             ples. However, for this specific task, the limited number of training
ages. Specifically, we performed horizontal and vertical flipping and    epochs and significant training data augmentation in conjunction
change of brightness in the interval of ±40%. After the model has        with a high variety of road patch samples resulted in normal train-
been retrained, we used it for a multi-class classifier that provides    ing process. This allowed to correctly retrain the last layers of the
the probability for each of two classes: passable and non-passable.      network and produce reasonable classifiers even on such a limited
                                                                         training set.
3 RESULTS AND ANALYSIS                                                       The official F1-Score metric (see table 2) on the non-passable
3.1 Runs Description in FCSI Task                                        road class for the first "All-train" run is 62.30%. To verify our idea
                                                                         of the usability of using all the training data for both training and
During the experimentation on the development set, we observed
                                                                         validation, we also performed a normal network training with a
that the classifiers trained on scene level features extracted through
                                                                         random 50/50 development/validation data split. This second Half-
models pre-trained on the places dataset perform better compared
                                                                         trained run resulted in F1-Score of 61.02% which is slightly lower
to the ones pre-trained on Imagenet. However, we also observed that
                                                                         comparing to the All-trained run. This is confirming the validity
combining the object and scene level features leads to better results
                                                                         of our idea of using the complete training dataset and heavy data
compared to the individual models. In order to better utilize the
                                                                         augmentation to improve road patches classification performance.
scene and object level features, we used three different techniques
in each run. Our first run is based on late fusion where we used
equal weights for each model. In the second run, we concatenate the      4   CONCLUSIONS AND FUTURE WORK
features extracted through the individual models. An SVM classifier      This year, the social media and satellite task introduced a new and
is then trained on the combined feature vectors. In the final run,       important challenge of detecting the passibility of roads, which can
we use double fusion by combining the results of the first two runs      be vital for the people lining in the flooded regions. In the social
in a late fusion way. Table 1 provides the evaluation results of our     media image analysis, we mainly relied on deep features extracted
proposed methods in terms of F1 scores for each of the runs. Overall,    through different pre-trained deep models. During the experiments,
better results are obtained with double fusion while least results       we observed that the scene-level information, extracted through
are obtained with an early fusion of the features.                       models pre-trained on the places dataset perform better compared
                                                                         to the ones pre-trained on Imagenet. However, the object-level
Table 1: Evaluation of our proposed approach for the FCSI                information well complement the scene-level features when com-
task in terms of F1 Scores.                                              bined. We also observed that double fusion performs slightly better
                                                                         than the early fusion on the provided dataset. However, it needs to
                Run        Method          F1 Score                      be investigated in more detail. In the future, we aim to analyze the
                 1       Late Fusion        63.58%                       task with more advanced early and late fusion techniques.
                 2       Early Fusion       60.59%                          On the other hand, in the satellite subtask, we found that just
                 3      Double Fusion       65.03%                       a normal image segmentation approach cannot help, and we im-
                                                                         plemented an task-oriented CNN and transfer-learning-based ap-
                                                                         proach. This approach was able to classify image patches with roads
                                                                         and achieved an F1-Score of 62.30% for the non-passable road class.
3.2    Runs Description in FDSI Task                                     In the future, we plan to implement an advanced road network and
For the experimental setup of the FDSI task, we decided to perform       flooding detection and segmentation using a combined CNN- and
only two mandatory runs which are utilizing the task-provided            GAN-based approach pre-trained on the existing annotated road
training data only. Due to a limited amount of training samples          network and flooded areas datasets.
Task name as it appears on http://multimediaeval.org/mediaeval2018                    MediaEval’18, 29-31 October 2018, Sophia Antipolis, France


REFERENCES                                                                      [10] Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. 2012. Im-
 [1] Kashif Ahmad, Konstantin Pogorelov, Michael Riegler, Nicola Conci,              agenet classification with deep convolutional neural networks. In
     and Pål Halvorsen. 2018. Social media and satellites. Multimedia Tools          Advances in neural information processing systems. 1097–1105.
     and Applications (2018), 1–39.                                             [11] Keiller Nogueira, Samuel G Fadel, Ícaro C Dourado, Rafael de O Wer-
 [2] Kashif Ahmad, Michael Riegler, Konstantin Pogorelov, Nicola Conci,              neck, Javier AV Muñoz, Otávio AB Penatti, Rodrigo T Calumby, Lin Tzy
     Pål Halvorsen, and Francesco De Natale. 2017. Jord: a system for col-           Li, Jefersson A dos Santos, and Ricardo da S Torres. 2018. Exploiting
     lecting information and monitoring natural disasters by linking social          ConvNet Diversity for Flooding Identification. IEEE Geoscience and
     media with satellite imagery. In Proceedings of the 15th International          Remote Sensing Letters 15, 9 (2018), 1446–1450.
     Workshop on Content-Based Multimedia Indexing. ACM, 12.                    [12] Konstantin Pogorelov, Sigrun Losada Eskeland, Thomas de Lange,
 [3] Kashif Ahmad, Amir Sohail, Nicola Conci, and Francesco De Natale.               Carsten Griwodz, Kristin Ranheim Randel, Håkon Kvale Stens-
     2018. A Comparative study of Global and Deep Features for the                   land, Duc-Tien Dang-Nguyen, Concetto Spampinato, Dag Johansen,
     analysis of user-generated natural disaster related images. In 2018 IEEE        Michael Riegler, and others. 2017. A holistic multimedia system for
     13th Image, Video, and Multidimensional Signal Processing Workshop              gastrointestinal tract disease detection. In Proceedings of the 8th ACM
     (IVMSP). IEEE, 1–5.                                                             on Multimedia Systems Conference. ACM, 112–123.
 [4] Benjamin Bischke, Prakriti Bhardwaj, Aman Gautam, Patrick Helber,          [13] Konstantin Pogorelov, Olga Ostroukhova, Mattis Jeppsson, Håvard
     Damian Borth, and Andreas Dengel. 2017. Detection of flooding events            Espeland, Carsten Griwodz, Thomas de Lange, Dag Johansen, Michael
     in social multimedia and satellite imagery using deep neural networks.          Riegler, and Pål Halvorsen. 2018. Deep Learning and Hand-crafted
     In Proceedings of the Working Notes Proceeding MediaEval Workshop,              Feature Based Approaches for Polyp Detection in Medical Videos. In
     Dublin, Ireland. 13–15.                                                         2018 IEEE 31st International Symposium on Computer-Based Medical
 [5] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens de Bruijn, and             Systems (CBMS). IEEE, 381–386.
     Damian Borth. The Multimedia Satellite Task at MediaEval 2018:             [14] Konstantin Pogorelov, Olga Ostroukhova, Andreas Petlund, Pål
     Emergency Response for Flooding Events. In Proc. of the MediaEval               Halvorsen, Thomas de Lange, Håvard Nygaard Espeland, Tomas
     2018 Workshop (Oct. 29-31, 2018). Sophia-Antipolis, France.                     Kupka, Carsten Griwodz, and Michael Riegler. 2018. Deep learning
 [6] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.          and handcrafted feature based approaches for automatic detection of
     2009. Imagenet: A large-scale hierarchical image database. In Computer          angiectasia. In Biomedical & Health Informatics (BHI), 2018 IEEE EMBS
                                                                                     International Conference on. IEEE, 365–368.
     Vision and Pattern Recognition, 2009. CVPR 2009. IEEE Conference on.
                                                                                [15] Karen Simonyan and Andrew Zisserman. 2014. Very deep convo-
     Ieee, 248–255.
                                                                                     lutional networks for large-scale image recognition. arXiv preprint
 [7] Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang,
                                                                                     arXiv:1409.1556 (2014).
     Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A Deep Convolutional
                                                                                [16] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens,
     Activation Feature for Generic Visual Recognition.. In Proc. of ICML,
                                                                                     and Zbigniew Wojna. 2015. Rethinking the inception architecture for
     Vol. 32. 647–655.
                                                                                     computer vision. arXiv preprint arXiv:1512.00567 (2015).
 [8] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep
                                                                                [17] Tijmen Tieleman and Geoffrey Hinton. 2012. Lecture 6.5-rmsprop:
     residual learning for image recognition. In Proceedings of the IEEE
                                                                                     Divide the gradient by a running average of its recent magnitude.
     conference on computer vision and pattern recognition. 770–778.
                                                                                     COURSERA: Neural networks for machine learning 4, 2 (2012).
 [9] M Jing, BW Scotney, SA Coleman, and others. 2016. Flood Event Image
                                                                                [18] Bolei Zhou, Agata Lapedriza, Jianxiong Xiao, Antonio Torralba, and
     Recognition via Social Media Image and Text Analysis. In Signals and
                                                                                     Aude Oliva. 2014. Learning deep features for scene recognition using
     Systems Conference (ISSC). 4–9.
                                                                                     places database. In Advances in neural information processing systems.
                                                                                     487–495.