Flood detection from social multimedia and satellite images
    using ensemble and transfer learning with CNN architectures
                                                              Danielle Dias, Ulisses Dias
                                                                University of Campinas, Brazil
                                                     danielle.dias@ic.unicamp.br,ulisses@ft.unicamp.br

ABSTRACT                                                                        2.1 Transfer Learning Mechanism
In this paper we explore deep convolutional neural networks pre-                Many advanced CNN architectures have been trained on ImageNet
trained on ImageNet along with transfer learning mechanism to                   and are currently available. We selected 10 of them as feature ex-
detect if an area has been affected by a flood in terms of access.              tractors: DenseNet121 [5], DenseNet169 [5], DenseNet201 [5], In-
We worked in two tasks with different datasets. The first dataset               ceptionResNetV2 [8], InceptionV3 [9], MobileNet [3], ResNet50 [4],
contains images from social media and the goal is to identify di-               VGG16 [7], VGG19 [7], Xception [2]. We also studied if global fea-
rect evidence for passability of roads by conventional means. The               ture based approaches extracted with Lire [6] could provide any
second dataset contains high resolution satellite imagery of par-               significant advantage to the pre-trained models, but since no im-
tially flooded areas and the goal is to identify sections of roads              provement was achieved, we decided to use only features extracted
that are potentially blocked. For both tasks, we used visual infor-             from CNNs.
mation only and our best models achieved averaged F1-Score value                   We replaced the architecture prediction layers with new ANN
of 64.81% on the first task and 73.27% on the second task.                      models, which are responsible for returning the classification la-
                                                                                bels. For the social media subtask, the output labels account for (i)
                                                                                no evidence, (ii) evidence/not passable, and (iii) evidence/passable.
1 INTRODUCTION                                                                  For the remote sensing subtask, the output labels account for (i)
Flooding events demand fast response. Rescue and medical teams                  passable, and (ii) non passable. That said, ANNs in the former sub-
should move fast to the affected points and bring victims to safety             task have three units, whereas ANNs in the latter have only two.
in a timely manner. Unfortunately, roads may be affected by the                 The output layers use softmax activation function.
flood in terms of access. Automatic road passability recognition
aids the support planning that will mitigate the impact of disasters.           2.2 Prediction Layer Models
   The “Multimedia Satellite Task 2018” studies the problem of road
passability classification, namely whether or not it is possible to             Two approaches performed best on our 5-fold cross validation anal-
travel through a flooded region. Two tasks were proposed depend-                ysis of prediction layers. They are hereby called Model1 and Model2 .
ing on the source of information. In the first task, we should take                Model1 is an ANN having only one hidden layer with 512 nodes.
advantage of the high popularity of social media and filter those               Each node uses ReLU as activation function. We added a Dropout
information which provide direct evidence for passability of roads.             layer with a dropout ratio of 50% in the hidden layer, and l 2 regu-
In the second task, we receive high resolution satellite imagery of             larization to prevent overfitting.
partially flooded areas and the goal is to identify if it is possible to           Model2 is an ANN having two hidden layers. The first has 2048
go from a point A to a point B. More details can be found in the                nodes and the second has 128. Nodes in hidden layers use ReLU as
task overview [1].                                                              activation function, and l 2 as regularization. We dropped out 80%
                                                                                of the connections between input layer and the first hidden layer.
2 APPROACH                                                                      We also added a dropout ratio of 50% in each hidden layer.
The dataset for the social media task consists of 7,387 images and
the dataset for the remote sensing task consists of 1,664 satellite
                                                                                2.3 Ensemble
images. As the size of the dataset is limited, we decided to use the            We have 10 CNN architectures to extract features and 2 ANN ar-
transfer learning mechanism in both cases in a similar workflow:                chitectures for prediction. Therefore, for each image we have 20
images are received as input, pre-trained convolutional neural net-             class predictions, each prediction is a vector of three floating point
works (CNNs) are used for feature extraction (Section 2.1), artificial          numbers in the social media task and two floating point numbers
neural networks (ANNs) predict labels (Section 2.2), and an ensem-              in the remote sensing task.
ble is constructed of individual classifiers (Section 2.3).                        To create an ensemble, we concatenate the class predictions and
   While in the social media subtask images are the only source of              use logistic regression to map the 20×3 = 60 dimension vector to 3
information, in the remote sensing subtask we receive the images                output classes in the social media task, and to map the 20 × 2 = 40
along with two points A and B. The question is whether or not we                dimension vector to 2 output classes in the remote sensing task.
can go from point A to point B. Thus, we preprocess the images to
embed these points within the image (Section 2.4).                              2.4 Preprocessing Satellite Images
                                                                                In Figure 1 we illustrate all the steps to preprocess the satellite im-
Copyright held by the owner/author(s).
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                      ages. In Figure 1(a) we show one of the images in the development
                                                                                dataset. Figures 1(b-d) have marks added for illustrative purposes
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                                                                  D. Dias et al.


           (a) Original                      (b) Marks Added                   (c) Cropped                        (d) Rotated

Figure 1: Preprocessing steps using the points A and B. In (a) we show an original image as given by the organizers. In (b) we
add blue, red, and yellow marks to represent the points A, B, and the midpoint between A and B, respectively. We also draw a
circunference centered in the midpoint that has diameter equal to the distance between A and B. In (c) we crop the image to
include the entire circunference. In (d) we rotate the image to have A in the left side and B in the right side.


and to clarify the explanation, we did not really add these marks       Table 1: Evaluation results for the flood classification task
to the image provided to the CNNs. Blue and red marks represent         from social multimedia images. We highlight in bold the
the inputted A and B points, respectively. Our ultimate goals is to     best result, which was achieved by the ensemble.
place these points in fixed locations, so the model could learn how
to find a path between them. We follow by describing each step.              CNN Arch.       ANN Arch.     Averaged F1-Score (%)
    (1) We first compute a point C which is halfway between A                DenseNet201       Model1               62.82
        and B as shown by the yellow mark in Figure 1(b).                    VGG19             Model1               60.92
    (2) We compute a circunference centered in C that have the               Resnet50          Model1               62.93
        distance between A and B as diameter. Observe in Fig-                DenseNet169       Model1               62.91
        ure 1(b) that only a small area of the image is inside the
        circunference and that the area outside is not helpful to            Ensemble             -                 64.81
        answer whether there is a path between A and B. This ob-
        servation occurs in several cases.
                                                                        Table 2: Evaluation results for the flood classification task
    (3) We crop the image to keep only the circunscript square as
                                                                        from satellite imagery. We highlight in bold the best result,
        shown in Figure 1(c).
                                                                        which was achieved by DenseNet121 with Model1 .
    (4) We rotate the image around C to place A in the left side
        where the circunference touches the circunscript square,
        and B in the right side counterpart, as shown in Figure 1(d).        CNN Arch.       ANN Arch.     Averaged F1-Score (%)
                                                                             MobileNet         Model1               56.82
3 RESULTS AND ANALYSIS                                                       MobileNet         Model2               68.63
During the training phase, we evaluated the models using 5-fold              InceptionV3       Model2               62.69
cross validation. We selected the four best models and the ensem-            DenseNet121       Model1               73.27
ble to submit to the organizers, which performed their analysis on
                                                                             Ensemble             -                 71.72
unseen data and reported the results back to us. Table 1 and Ta-
ble 2 present the results using the averaged F1-Score metric for
the social media and remote sensing subtasks, respectively.
   In the social media task, the ensemble produced the best results,       We combined features extracted from 10 CNNs with 2 models
obtaining F1-Score of 64.81% against 62.93% yielded by ResNet50,        based on ANNs for prediction, then we build an ensemble by con-
the best individual model. In the remote sensing task, the ensem-       catenating the predicted classes and using logistic regression to
ble achieved 71.72% while the best individual model DenseNet121         map them to a new output. This ensemble achieved best results in
reached 73.27%. We believe there is room for improvement if we          the social media task, but not in the remote sensing. Our results
tune the ensemble again, or if we replace the logistic regressor by     support the idea that transfer learning mechanism and ensemble
other classification methods.                                           are promising approaches for both tasks.

4 CONCLUSION                                                            ACKNOWLEDGMENTS
We used pre-trained CNNs as a starting point to create models that      We thank CAPES and CNPq (grant 400487/2016-0) and FAPESP
predict if it is possible to travel through a flooded area.             (grant 2015/11937-9).
Multimedia Satellite Task                                                    MediaEval’18, 29-31 October 2018, Sophia Antipolis, France


REFERENCES
[1] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens de Bruijn, and
    Damian Borth. The Multimedia Satellite Task at MediaEval 2018:
    Emergency Response for Flooding Events. In Proc. of the MediaEval
    2018 Workshop (Oct. 29-31, 2018). Sophia-Antipolis, France.
[2] François Chollet. 2016. Xception: Deep Learning with Depthwise Sep-
    arable Convolutions. CoRR abs/1610.02357 (2016). arXiv:1610.02357
    http://arxiv.org/abs/1610.02357
[3] Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko,
    Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam.
    2017. MobileNets: Efficient Convolutional Neural Networks for Mo-
    bile Vision Applications. (04 2017).
[4] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2015.
    Deep Residual Learning for Image Recognition. CoRR abs/1512.03385
    (2015). arXiv:1512.03385 http://arxiv.org/abs/1512.03385
[5] Gao Huang, Zhuang Liu, and Kilian Q. Weinberger. 2016. Densely
    Connected Convolutional Networks. CoRR abs/1608.06993 (2016).
    arXiv:1608.06993 http://arxiv.org/abs/1608.06993
[6] Mathias Lux and Savvas A. Chatzichristofis. 2008. Lire: Lucene Image
    Retrieval: An Extensible Java CBIR Library. In Proceedings of the 16th
    ACM International Conference on Multimedia (MM ’08). ACM, New
    York, NY, USA, 1085–1088.
[7] Karen Simonyan and Andrew Zisserman. 2014. Very Deep Con-
    volutional Networks for Large-Scale Image Recognition. CoRR
    abs/1409.1556 (2014). arXiv:1409.1556 http://arxiv.org/abs/1409.1556
[8] Christian Szegedy, Sergey Ioffe, and Vincent Vanhoucke. 2016.
    Inception-v4, Inception-ResNet and the Impact of Residual Connec-
    tions on Learning. CoRR abs/1602.07261 (2016). arXiv:1602.07261
    http://arxiv.org/abs/1602.07261
[9] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jonathon Shlens,
    and Zbigniew Wojna. 2015. Rethinking the Inception Architecture
    for Computer Vision. CoRR abs/1512.00567 (2015). arXiv:1512.00567
    http://arxiv.org/abs/1512.00567