=Paper=
{{Paper
|id=Vol-2283/MediaEval_18_paper_48
|storemode=property
|title=Deep Learning Models for Passability Detection of Flooded Roads
|pdfUrl=https://ceur-ws.org/Vol-2283/MediaEval_18_paper_48.pdf
|volume=Vol-2283
|authors=Laura Lopez-Fuentes,Alessandro Farasin,Harald Skinnemoen,Paolo Garza
|dblpUrl=https://dblp.org/rec/conf/mediaeval/Lopez-FuentesFS18
}}
==Deep Learning Models for Passability Detection of Flooded Roads==
<pdf width="1500px">https://ceur-ws.org/Vol-2283/MediaEval_18_paper_48.pdf</pdf>
<pre>
Deep Learning models for passability detection of flooded roads
                Laura Lopez-Fuentes1, 2, 3 , Alessandro Farasin4, 5 , Harald Skinnemoen3 , Paolo Garza4
                           1 University of the Balearic Islands, Spain, 2 Autonomous University of Barcelona, Spain
                   3 AnsuR Technologies, Norway, 4 Politecnico di Torino, Italy 5 Istituto Superiore Mario Boella, Italy

                              l.lopez@uib.es,alessandro.farasin@ismb.it,harald@ansur.no,paolo.garza@polito.it

ABSTRACT                                                                   each image, metadata is available. Metadata is a set of informa-
In this paper we study and compare several approaches to detect            tion concerning the tweet itself and the user who wrote it. For
floods and evidence for passability of roads by conventional means         instance, the text message and the respective language, the number
in Twitter. We focus on tweets containing both visual information (a       of retweets and likes, the number of replies, the coordinates (and
picture shared by the user) and metadata, a combination of text and        whether it is geo-located) and the user’s followers number are only
related extra information intrinsic to the Twitter API. This work          a subset of properties associated to a tweet.
has been done in the context of the MediaEval 2018 Multimedia
Satellite Task.                                                            4     APPROACH
                                                                           To properly deal with heterogeneous data, we opted for a "divide-
                                                                           et-impera" approach: we created a model for each kind of data.
1    INTRODUCTION                                                          In detail we developed: (i) a classification model using only the
Social media are becoming, year by year, ever more popular and             metadata information, (ii) three classification models using only
used for sharing people daily activities. This massive adoption            the images, (iii) a model which combines the metadata and the
brought to a large availability of contents, such as texts and pictures,   visual information. In this section we will briefly describe the three
in the most varied sectors, making the social media a great source of      different systems.
information. The large availability of data is precious for extracting
knowledge and it lays the basis for several applications. One of           4.1    Metadata only
them falls in the context of emergency management related to
                                                                           We processed metadata (i) filtering out properties not available
natural disasters, in which computer vision and machine learning
                                                                           in the whole dataset, (ii) studying the correlation among the re-
techniques are investigated [2, 3, 11, 16] to extract key information
                                                                           maining ones and selecting the most relevant ones. As a result
and help first responders in their activities.
                                                                           we kept the text written by the user, the language in which the
   In this work we focus on analyzing social media posts in order to
                                                                           tweet was originally written, the number of retweets and the num-
extract valuable information about roads affected by flood. In more
                                                                           ber of persons who had favourited the tweet. As a pre-processing
detail, we propose a multi-modal deep learning network which pro-
                                                                           step we have translated all texts to English, removed emojis, urls
cesses flood-related social media pictures and related metadata (e.g.,
                                                                           and special characters; then we used lemmatization and tokeniza-
Twitter, Flickr, YFCC100M), both to provide (a) evidence of roads
                                                                           tion techniques. We represented the text features using a word
and (b) whether they are also passable. The approach presented in
                                                                           embedding initialized with Glove [7]. After that, we normalized
this paper takes his inspiration (and it wants to be an extension)
                                                                           between 0 and 1 the number of retweets and the number of times
from the work [10] proposed in the MediaEval challenge held in
                                                                           the tweet was favourited. Then, we binarized the original language
2017.
                                                                           information by assigning 0 to English and 1 to any other language.
                                                                           Finally, we defined a neural network composed by a bidirectional
2    RELATED WORKS
                                                                           Long Short-Term Memory (LSTM) network. The result of the LSTM
Recent literature approaches leverage on satellite [8, 12, 15] or          has been concatenated to the normalized extra fields and passed
ground acquisitions [5] to identify flood events. Other works focus        through a two parallel fully-connected (FC) layers with a softmax
more on urban elements detection such as roads [6, 9]. To the best         classifier, one per task.
of our knowledge there are no existing works to determine road             Both tasks have been trained in parallel. Initially, we set all the
passability evidence during flood events.                                  images which had no evidence of flood as having no passability
                                                                           issues either. However, this strategy introduced a big imbalance
3    DATA                                                                  to an already imbalanced dataset which made the training more
The dataset used in this work was distributed by MediaEval 2018            difficult, so we finally decided to use only the images containing
Multimedia Satellite Task [1, 4]. It consists of 5820 Twitter images       evidence of flood to train the passability classifier, while still doing
with its related metadata, from which ∼36% of the images present           the training in parallel.
flooded regions with evidence of roads. Only the images belonging
to the earlier class are considered for the second task evaluation:        4.2    Visual Information only
among them, the ∼45% present passable roads. Furthermore, for              As a pre-processing step for the images we applied several data
                                                                           augmentation techniques: image rotation, width and height shifts,
Copyright held by the owner/author(s).
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                 horizontal flip and zoom. We designed two different systems to
                                                                           process the images, which we will compare.
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                                                                  L. Lopez-Fuentes et al.

Table 1: F-Score (%) evaluated on the test set for the two sub-tasks. Firstly, it is computed on a subset of 50 tweets from the
training set and manually annotated by 4 persons. Then, it is computed on the test set for each developed approach: Metadata-
only Approach (MA), Visual Approach 1 (VA1, Double-Ended Classifier with Compact Loss), Visual Approach 2 (VA2, Network
Stacking with average aggregation), Visual Approach 3 (VA3, Network Stacking with average and voting aggregation).

                                                        EVIDENCE [%]                                        PASSABILITY [%]
           Approach \ Data
                                        Metadata          Images            Meta + Imgs      Metadata           Images      Meta + Imgs
         Human annotation                51.48             87.32                 -            18.18              47.71           -
         Metadata only [MA]              43.88               -                   -             19.3                -             -
     Image only [VA1, VA2, VA3]            -         85.6 86.43 87.79            -              -         24.09 67.13 68.38      -
    Metadata and Image [VA1+MA]            -                 -                 83.12            -                  -           28.34


    Double-ended classifier with compact loss: We used the In-               be seen, our approach for flood evidence classification using meta-
ception V3 [14] network pre-trained on ImageNet with two FC                  data obtains very poor results but close to the results obtained by
layers and a softmax classifier at the end. Each end of the network          human annotators, which means that the metadata was not very
was trained for each task. These tasks can be subsumed as two                discriminative for this task. Since the error is cumulative, the results
separate One-class classification problems, in which the single class        of both the humans and the metadata classifier drop significantly
is the flood event for the first case, its passability condition for the     for the passability detection, being the F-score again very close to
second one. We took inspiration from [13] and we customized the              one another. All the image classification approaches achieve sim-
InceptionV3 optimization function, as: д̂ = maxд D(д(t))+λC(д(t)),           ilar results on the first task, while the network stacking achieves
where: (a) д is the deep feature representation for the training data        a small improvement with respect to the double-ended classifier
t, (b) λ is a positive constant, (c) D is the Descriptive loss function      with compact-loss network. Furthermore, the aggregation of the net-
(within this approach, we used the cross-entropy) and (d) C is the           works using average and voting slightly improves the aggregation
Compactness loss function, which evaluates the batch intra-class             compared to using only the average. However, there is a big gap
deep feature distance to derive objects from the same class.                 between the performance of the double-ended classifier with com-
    Network staking: We used 9 state-of-the art networks (Incep-             pact loss and the network stacking approaches. When we decided
tionV3, Xception, VGG16, VGG19, InceptionResNetV2, MobileNet,                to use the same network body to perform both tasks we thought
DenseNet121, DenseNet201, NASNetLarge) all of them pre-trained               that since the tasks were very related, one task could benefit from
on ImageNet. They were separately trained for both problems in 5             the others knowledge. However, since one task was more difficult
different train-validation folds, which generated 90 networks (45            than the other, the double-ended network seems to have specialized
per task). The output of each network is a number between 0 and 1            on the easy task while leaving aside the difficult task. It is also re-
which represents the probability of the picture containing evidence          markable that the network stacking algorithms achieve significant
of roads in flooded regions and evidence of passable roads, respec-          better results than the human annotators, probably because: (i) road
tively. Being n the number of networks and pi the probability of the         passability is subjective in several cases and (ii) while the network
picture corresponding to class 1, we define p as the average of pi for       learnt over the whole training set, the human annotators were not
all 1 ≤ i ≤ n. We define votinд(p1 , ..., pn ) = |{i/pi > 0.5, 1 ≤ i ≤       given any examples about the task.
n}|, where |.| is the set cardinality. We used two different methods         Finally, combining metadata with images does not provide much
to aggregate the results from the networks: (i) Average aggregation:         improvement or it even worsens the results due to the lack of dis-
pred(p1 , ..., pn ) = (p > 0.5), (ii) Average and voting aggregation:        criminative features in the metadata.

                          if p > 0.5 and votinд(p1 , ..., pn ) > n2 − 2 or 6 CONCLUSIONS
                                                                        
                    1
                   
                   
pred(p1 , ..., pn ) =      p > 0.45 and votinд(p1 , ..., pn ) ≥ n2 ,
                   
                       0 otherwise.
                                                                           In this paper we studied several approaches to perform flood and
                                                                           road passability detection. We proposed several approaches to deal
                                                                            with textual and visual information. According to our tests, we
4.3 Metadata and Visual Information                                         discovered that when a network tries to accomplish several tasks
To combine the metadata and the images, we merged the Metadata-             with different difficulties, even if they are related, it focuses on one
only and the Double-ended classifier with compact loss approaches.          of them (presumably the simplest one), achieving good performance
The two networks were taken without their respective double-                in one case, but bad in the latter one.
ended fully-connected (FC) layers and merged with two newer FC
layers (one per task) with a softmax classifier.                            ACKNOWLEDGMENTS
                                                                             This work was partially supported by the Spanish Grants TIN2016-
5    RESULTS                                                                 75404-P AEI/FEDER, UE, TIN2014-52072-P, TIN2016-79717-R, TIN2013-
To get a first idea of the upper-bound for our task we asked 4 per-          42795-P and the European Commission H2020 I-REACT project no.
sons to do the task on a subset of 50 images, the results are given          700256. Laura Lopez-Fuentes benefits from the NAERINGSPHD fel-
in Table 1. In the subsequent rows we have included the results              lowship of the Norwegian Research Council under the collaboration
for the 5 different approaches introduced in this paper. As it can           agreement Ref.3114 with the UIB.
2018-Multimedia-Satellite-Task                                                    MediaEval’18, 29-31 October 2018, Sophia Antipolis, France


REFERENCES
 [1] 2018. MediaEval 2018 Multimedia Satellite Task. http://www.
     multimediaeval.org/mediaeval2018/multimediasatellite/. (2018). Data
     released: 31 May 2018.
 [2] Flavia Sofia Acerbo and Claudio Rossi. 2017. Filtering informative
     tweets during emergencies: a machine learning approach. In Proceed-
     ings of the First CoNEXT Workshop on ICT Tools for Emergency Networks
     and DisastEr Relief. ACM, 1–6.
 [3] Federico Angaramo and Claudio Rossi. 2017. Online clustering and
     classification for real-time event detection in Twitter. (2017).
 [4] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens de Bruijn, and
     Damian Borth. The Multimedia Satellite Task at MediaEval 2018:
     Emergency Response for Flooding Events. In Proc. of the MediaEval
     2018 Workshop (Oct. 29-31, 2018). Sophia-Antipolis, France.
 [5] Tom Brouwer, Dirk Eilander, Arnejan Van Loenen, Martijn J Booij,
     Kathelijne M Wijnberg, Jan S Verkade, and Jurjen Wagemaker. 2017.
     Probabilistic flood extent estimates from social media flood observa-
     tions. Natural Hazards & Earth System Sciences 17, 5 (2017).
 [6] Yinghua He, Hong Wang, and Bo Zhang. 2004. Color-based road
     detection in urban traffic scenes. IEEE Transactions on intelligent
     transportation systems 5, 4 (2004), 309–318.
 [7] Christopher D. Manning Jeffrey Pennington, Richard Socher. 2018.
     GloVe: Global Vectors for Word Representation. (2018). https://nlp.
     stanford.edu/projects/glove/
 [8] Victor Klemas. 2014. Remote sensing of floods and flood-prone areas:
     an overview. Journal of Coastal Research 31, 4 (2014), 1005–1013.
 [9] Hui Kong, Jean-Yves Audibert, and Jean Ponce. 2010. General road
     detection from a single image. IEEE Transactions on Image Processing
     19, 8 (2010), 2211–2220.
[10] Laura Lopez-Fuentes, Joost van de Weijer, Marc Bolanos, and Harald
     Skinnemoen. 2017. Multi-modal deep learning approach for flood
     detection. In Proc. of the MediaEval 2017 Workshop (Sept. 13–15, 2017).
     Dublin, Ireland.
[11] Laura Lopez-Fuentes, Joost van de Weijer, Manuel González-Hidalgo,
     Harald Skinnemoen, and Andrew D Bagdanov. 2018. Review on com-
     puter vision techniques in emergency situations. Multimedia Tools
     and Applications 77, 13 (2018), 17069–17107.
[12] Igor Ogashawara, Marcelo Pedroso Curtarelli, and Celso M Ferreira.
     2013. The use of optical remote sensing for mapping flooded areas.
     Int. J. Eng. Res. Appl 3, 5 (2013), 1956–1960.
[13] Pramuditha Perera and Vishal M Patel. 2018. Learning Deep Features
     for One-Class Classification. arXiv preprint arXiv:1801.05365 (2018).
[14] Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, Jon Shlens, and
     Zbigniew Wojna. 2016. Rethinking the inception architecture for
     computer vision. In Proceedings of the IEEE conference on computer
     vision and pattern recognition. 2818–2826.
[15] CJ Ticehurst, P Dyce, and JP Guerschman. 2009. Using passive mi-
     crowave and optical remote sensing to monitor flood inundation in
     support of hydrologic modelling. In Interfacing modelling and sim-
     ulation with mathematical and computational sciences, 18th World
     IMACS/MODSIM Congress. 13–17.
[16] Luca Venturini and Evelina Di Corso. 2017. Analyzing spatial data from
     twitter during a disaster. In Big Data (Big Data), 2017 IEEE International
     Conference on. IEEE, 3779–3783.

</pre>