Detection of Road Passability
                                   from Social Media and Satellite Images
                     Armin Kirchknopf1 , Djordje Slijepcevic1 , Matthias Zeppelzauer1 , Markus Seidl1
                                  1 Media Computing Resarch Group, St. Pölten University of Applied Sciences

       armin.kirchknopf@fhstp.ac.at,djordje.slijepcevic@fhstp.ac.at,m.zeppelzauer@fhstp.ac.at,markus.seidl@fhstp.ac.at

ABSTRACT                                                                 The second sub-task addresses the detection of the passability of
This paper presents the contribution of Team MC-FHSTP to the             roads in satellite images.
multimedia satellite task at the MediaEval 2018 benchmark. We
present two methods, one for the estimation of the passability of        2     APPROACH
roads from social media images due to flooding and one method            The following two sections describe our approaches for both sub-
that estimates passability from satellite images. We present the         tasks. The idea behind both approaches is to provide a simple,
results obtained in the benchmark for both methods.                      reproducible and straight-forward baseline for the respective tasks.


1    INTRODUCTION                                                        2.1     Sub-task 1: Flood classification for social
Flood threats have prompted many researchers to develop technology-              multimedia
based solutions for the precise and autonomous exploration of flood      The dataset provided for sub-task one consists of twitter text-
areas. Such solutions should enable the assessment of the impact         messages with accompanying images and is therefore a multimodal
of hazards as well as the immediate response to disasters. This can      data-source. Our initial idea consisted of training two different clas-
be done by analyzing satellite images. In the work of Pradhan et         sifiers for the image data and a separate one for the text data and
al. [9], Amitrano et al. [2], and Sumalan et al. [10] satellite images   then merging the predictions. This should allow the information
are analyzed with the aim of identifying flood areas. Yamaguchi          from both inputs to be used for the final decision.
and Saji [6] propose a method that analyses satellite images and             For the textual data, we investigated several methods. All tweets
indicates road conditions after an earthquake including a tsunami.       that were not in English or Spanish were excluded from further
The work of Amit et al. [1] utilizes an convolutional neural net-        processing. Then the Spanish tweets (a representative minority in
work (CNN) to extract disaster regions automatically by combining        the data) were translated into English using the Google Translate
pre-disaster and post-disaster satellite images. Another more re-        API. We generated Bag-of-Words(BOW) descriptors on the N = 100
cent data source are media from social networks, which provide           and N = 50 most common words and used them separately for
very prompt and local information about the affected areas. In or-       training. Since the above mentioned representation did not deliver
der to identify bush fires and their effects, Lagerstrom et al. [7]      promising results on the training data, we chose to calculate TFIDF
investigated methods that divide Twitter images into two classes,        representations.
fire-related and non-fire-related content, see furthermore [3]. Yang         For the image data, two approaches were developed. First, we
et al. [11] used images and text data downloaded from Flickr to          extracted a Bag-of-Words descriptor based on the responses (tags)
distinguish between three different disaster classes: hurricane, oil     retrieved by the clarifai1 API for the twitter images and fed it into
spill, and earthquake. Each of the classes was divided into two sub-     a Support Vector Machine (SVM). Since preliminary results were,
categories, with floods forming a subcategory of the hurricane class.    however, little promising on the development data, we excluded the
The authors utilized feature vectors based on the word frequency,        approach from the final evaluation. We assume, that the main prob-
and some basic image features prior to an multiple correspondence        lem with the clarifai tags was that they were not specific enough
analysis (MCA). Nguyen et al. [8] have used a CNN to distinguish         to solve our task. In a second visual classification approach we
images from social media of four catastrophic events into three          leveraged a deep neural network for image description and classifi-
classes of severity. The work of Cervone et al. [5] utilized several     cation (ResNet50 pre-trained on ImageNet2 ). The original network
data streams including satellite images, aerial images, tweets and       was extended with two densely connected layers (of size 512 and
images downloaded from Flickr to assess the damage to transporta-        256, respectively) and with two dropout layers (likelihood 0.5) in
tion infrastructure after the Colorado floods in 2013.                   between and a softmax layer on top. Initially, in all experiments
    In this paper we are aiming at determining road passability          only the added layers were trained for 20 epochs. Depending on the
from images as proposed in the Multimedia Satellite Task of the          run (see below), in a further step the whole network was fine-tuned
MediaEval 2018 benchmark [4]. We present and evaluate different          within a variable number of epochs. Prior to network training the
machine learning approaches on two datasets provided for two             provided twitter images were pre-processed in two ways: first, the
sub-tasks. The first sub-task comprises the retrieval of images from     images were rescaled (non-uniformly) to the required input size of
social media which provide evidence for the passability of roads.        the network (224x224), and second, a central patch was cut out from

Copyright held by the owner/author(s).
                                                                         1 https://www.clarifai.com/
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France
                                                                         2 http://www.image-net.org/
MediaEval’18, 29-31 October 2018, Sophia Antipolis, France                                                                       M. Larson et al.


the images to put more focus on the image center. Furthermore, we          Table 1: F1-scores (macro-averaged over classes 2 (passable
applied rotation and mirroring as data augmentation steps.                 with evidence) and 3 (non passable with evidence)) and ac-
                                                                           curacies. The asterisk (*) is used if no data is available.
2.2     Sub-task 2: Detection of road passability in
        satellite images                                                       Run    Training (F1 | Acc.)   Validation (F1 | Acc.)   Test (F1)
                                                                                1     34%        83%         28%         86%            20%
For the prediction of road passability between two given points in a
                                                                                2     43%        85%         37%         83%            24%
satellite image we made a simplifying assumption. We assumed, that
                                                                                4     27%        85%          *           *             17%
at least one of the two given points is under water in case the road
                                                                                5     27%        89%          *           *             35%
segment defined by the two points is not passable. Consequently,
we assumed that the patches around points under water share
visual properties. We modelled our assumption as follows. Patches:
From each satellite image, we extracted a patch of 50x50px around          Table 2: Recall, precision and F1-scores for the non-passable
each of the two given points. Visual features: From each patch, we         class in the satellite image sub-task.
extracted RGB histograms with 16 bins per channel. Training and
Classification: We trained SVMs. For the evaluation of our approach                   Run              Training (R | P | F1)   Test (F1)
on the development data we used 10-fold cross validation. For the                     1 (concat)       73% 85%        79%        57%
test data we trained the SVMs with the complete development data                      2 (sep)          83% 80%        81%        32%
set. We used three aggregation methods for the RGB histograms:                        3 (join)         86% 82%        84%        39%
      (1) Concatenation (concat): We concatenated the RGB his-                        4 (majority)     84% 84%        84%        56%
          tograms of the two patches of a satellite image to a feature                5 (unanimous)    73% 89%        79%        57%
          vector with 96 dimensions and trained an SVM.
      (2) Separation (sep): We trained two SVMs, one with the
          patches of the first point of each image and the second
                                                                           regarding the accuracy. However, it should be noted that generally
          with the patches of the second points, respectively. We
                                                                           a relatively high accuracy was achieved, but a rather low F1-score,
          predicted, that a satellite image contains a passable road, if
                                                                           which assumably stems from the class imbalance and indicates that
          the predictions for both patches of the image are passable.
                                                                           the models could not learn all classes equally well. On the test set
      (3) Joined (join): As in aggregation (sep), we kept the RGB
                                                                           Run 5 clearly performs best (F1-score = 0.35). The weakest result is
          histograms of the two patches per satellite images separate.
                                                                           obtained by Run 4 which indicates that the training was stopped
          Instead of training two SVMs, we trained one SVM with
                                                                           too early. A prolonged training phase (more than 6 epochs as in
          all patches, i.e. we used a training data set with double the
                                                                           Run 5) could further improve the result. Overall, it is notable that
          number of samples. We predicted as in the (sep) case.
                                                                           the best run (run 5) also has the strongest generalization ability (i.e.
In a first baseline experiment, we evaluated this assumption. The          F1 on test set is larger than on validation set).
results were surprisingly useful, consequently we decided to use
the approach.                                                              3.2       Results of sub-task 2
                                                                           Table 2 contains the results of the second sub-task. We used the
3 EXPERIMENTAL RESULTS                                                     runs 1,2 and 3 for each of our aggregation methods introduced
3.1 Results of sub-task 1                                                  in Section 2.2. Runs 4 and 5 do not contain additional data but
Since the initial data set was unbalanced, we divided the data into        represent classifier fusions of runs 1,2 and 3, where run 4 is the
a balanced training set to avoid any bias towards a class during           result of majority voting and run 5 of unanimous voting. The results
training and an unbalanced validation set (that contains the re-           for runs 1,2 and 5 on the test set are notably better than for runs 2
maining development data). The training set contained 874 samples          and 3 which indicates over-fitting for these two runs.
per class. The results for the training, validation and test phase
in terms of F1-score and classification accuracy (in percent) are          4     CONCLUSION
summarized in Table 1. In Run 1 we directly used the predictions of        We have presented two approaches for the detection of road con-
the ResNet50 network, which was trained on the balanced training           dition (passability) from social media images and satellite images.
set and evaluated on the unbalanced validation set. We trained the         Both approaches have low complexity and are easy to reproduce
model for five epochs, with a batch size of 32, the Adam optimiza-         and generalize in most cases well on the test data. To improve the
tion algorithm and a learning rate of 0.0001. In Run 2 we employed         visual modality of task 1 a two-step approach, i.e. first classification
the SVM model trained with the TFIDF representations which are             on evidence and secondly on passability, could be more rewarding.
based on a high min_df value of 120. We skipped Run 3 because              They represent a first baseline for this task that shall be improved
we could not gain better results by combining visual and textual           in future.
information. In Run 4 and Run 5 the predictions where obtained
from the ResNet50 network trained on the entire development data           ACKNOWLEDGMENTS
for three epochs (Run 4) and six epochs (Run 5). In the training           This work was supported by the Austrian Research Promotion
phase Run 2 performed best regarding the F1-score, and Run 5               Agency (FFG), Project No. 856333.
Multimedia Satellite Task                                                     MediaEval’18, 29-31 October 2018, Sophia Antipolis, France


REFERENCES
 [1] Siti Nor Khuzaimah Binti Amit, Soma Shiraishi, Tetsuo Inoshita, and
     Yoshimitsu Aoki. 2016. Analysis of satellite images for disaster de-
     tection. In Geoscience and Remote Sensing Symposium (IGARSS), 2016
     IEEE International. IEEE, 5189–5192.
 [2] D. Amitrano, G. Di Martino, A. Iodice, D. Riccio, and G. Ruello. 2018.
     Unsupervised Rapid Flood Mapping Using Sentinel-1 GRD SAR Images.
     IEEE Transactions on Geoscience and Remote Sensing 56, 6 (June 2018),
     3290–3299. https://doi.org/10.1109/TGRS.2018.2797536
 [3] Benjamin Bischke, Damian Borth, Christian Schulze, and Andreas
     Dengel. 2016. Contextual enrichment of remote-sensed events with
     social media streams. In Proceedings of the 2016 ACM on Multimedia
     Conference. ACM, 1077–1081.
 [4] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens de Bruijn, and
     Damian Borth. The Multimedia Satellite Task at MediaEval 2018:
     Emergency Response for Flooding Events. In Proc. of the MediaEval
     2018 Workshop (Oct. 29-31, 2018). Sophia-Antipolis, France.
 [5] Guido Cervone, Elena Sava, Qunying Huang, Emily Schnebele, Jeff
     Harrison, and Nigel Waters. 2016. Using Twitter for tasking remote-
     sensing data collection and damage assessment: 2013 Boulder flood
     case study. International Journal of Remote Sensing 37, 1 (2016), 100–
     124.
 [6] Hitoshi Saji Keishi Yamaguchi. 2012. Analysis of road damage after
     a large-scale earthquake using satellite images. (2012), 8524 - 8524 -
     8 pages. https://doi.org/10.1117/12.976288
 [7] Ryan Lagerstrom, Yulia Arzhaeva, Piotr Szul, Oliver Obst, Robert
     Power, Bella Robinson, and Tomasz Bednarz. 2016. Image Classifica-
     tion to Support Emergency Situation Awareness. Frontiers in Robotics
     and AI 3 (2016), 54. https://doi.org/10.3389/frobt.2016.00054
 [8] Dat Tien Nguyen, Firoj Alam, Ferda Ofli, and Muhammad Imran. 2017.
     Automatic image filtering on social networks using deep learning
     and perceptual hashing during crises. arXiv preprint arXiv:1704.02602
     (2017).
 [9] B. Pradhan, M. S. Tehrany, and M. N. Jebur. 2016. A New Semiauto-
     mated Detection Mapping of Flood Extent From TerraSAR-X Satellite
     Image Using Rule-Based Classification and Taguchi Optimization Tech-
     niques. IEEE Transactions on Geoscience and Remote Sensing 54, 7 (July
     2016), 4331–4342. https://doi.org/10.1109/TGRS.2016.2539957
[10] A. L. Sumalan, D. Popescu, and L. Ichim. 2017. Flooded and vegetation
     areas detection from UAV images using multiple descriptors. In 2017
     21st International Conference on System Theory, Control and Computing
     (ICSTCC). 447–452. https://doi.org/10.1109/ICSTCC.2017.8107075
[11] Yimin Yang, Hsin-Yu Ha, Fausto Fleites, Shu-Ching Chen, and Steven
     Luis. 2011. Hierarchical disaster image classification for situation
     report enhancement. In Information Reuse and Integration (IRI), 2011
     IEEE International Conference on. IEEE, 181–186.