=Paper= {{Paper |id=Vol-2670/MediaEval_19_paper_59 |storemode=property |title=Estimation of Flood Level from Social Media Images |pdfUrl=https://ceur-ws.org/Vol-2670/MediaEval_19_paper_59.pdf |volume=Vol-2670 |authors=Julia Strebl,Djordje Slijepcevic,Armin Kirchknopf,Muntaha Sakeena,Markus Seidl,Matthias Zeppelzauer |dblpUrl=https://dblp.org/rec/conf/mediaeval/StreblSKSSZ19 }} ==Estimation of Flood Level from Social Media Images== https://ceur-ws.org/Vol-2670/MediaEval_19_paper_59.pdf
                    Flood Level Estimation from Social Media Images
                                                             Julia Strebl1 , Djordje Slijepcevic1 ,
                     Armin Kirchknopf1 , Muntaha Sakeena1 , Markus Seidl1 , Matthias Zeppelzauer1
                                                     1 St. Pölten University of Applied Sciences, Austria

                                                                {firstname.lastname}@fhstp.ac.at

ABSTRACT
In this paper, we present an approach and first results for the Media-
Eval 2019 sub-task on “Multimodal Flood Level Estimation from
News” from the “2019 Multimedia Satellite Task”. The water level
is measured by detecting people standing in water and using the
human body as a reference. We focus only on the visual modality
and propose a combination of a ResNet-based water detector and
pose estimation to solve the task. First results are promising and
show that our approach is clearly performing above baseline.                      Figure 1: Flood level estimation with our approach using the
                                                                                  OpenPose tracker [3].
1    INTRODUCTION
The assessment of natural disasters by automated media analysis                   shown by Hostache et al. [4] and Zwenzner et al. [9]. Both authors
becomes increasingly important as, on one hand, the amount of                     proposed a combined method based on SAR images and a Digital El-
user-generated media has been rising with the availability of smart               evation Model (DEM). By using crowd-sourced, non-authoritatively
phones and, on the other hand, the likelihood of disasters increases              collected data, Schnebele et al. [8], proposed a method to detect
e.g. due to the ongoing climate change. The availability of (social)              flood events on road infrastructure in the US. Pandey et al. [7] used
media data represents an opportunity to automatically detect and                  different data modalities, such as MODIS images and TRMM pre-
assess disasters to better guide first responders and emergency                   cipitation to detect floodings after a dam breach and could estimate
forces. The types of disasters targeted in this work are floods and               a rise of flood level by 1.0 to 1.4m. Related research as mentioned
the particular task to solve is flood level estimation [6]. The task has          above fused several information sources, such as aerial images and
been formulated in the course of the “2019 Multimedia Satellite Task:             DEM models to estimate flood levels. The estimation of flood levels
Estimation of Flood Severity” conducted in the MediaEval 2019                     from RGB data is a challenging task, as the visual appearance of
benchmarking initiative [1]. This paper presents our contribution                 water varies strongly.
to the benchmark together with results on the benchmarking test
set. The task of flood level estimation is defined as follows: “build a           3   APPROACH
binary classifier that predicts whether or not the image contains at              Since the provided dataset is multimodal, our initial idea consisted
least one person standing in water above the knee”1 . Input to the                of training two different classifiers for the image data and a sep-
classifier can be visual data, textual data or both. The data stems               arate one for the text data and then fusing the predictions. Due
from online news articles and comprises 6172 articles, whereas 1234               to insufficient text data we decided to provide predictions based
articles belong to the test set and 4932 articles to the training set (6          only on visual data. We developed two classification approaches.
articles, i.e., 598, 3932, 4465, 5019, 5091, and 5419, were excluded              The first approach (see Figure 2a) relies on detecting water within
due to corrupt image files). There is one image per article. The test             the whole image and detecting at least one person with obscured
labels were not available during development. A major challenge of                lower body parts. The second approach (see Figure 2b) performs
this task is the strong imbalance between the positive class (people              local water detection. To this end, for each human body detected,
standing in water above the knee) and the negative class. The                     a patch that also contains the local neighbourhood of the human
textual data was only partly available through the links provided                 body is taken into account for water detection. If at least for one
by the organizers. For this reason our approach focuses only on the               patch in the image our model detects obscured lower extremities
visual domain. Our results show that the approach is clearly above                and water in the vicinity, the image is assigned to the positive class.
the random baseline and has a good generalization ability.                        Both proposed approaches build upon three main components: (i)
                                                                                  a water detector that predicts whether a certain image or image
2    RELATED WORK                                                                 region contains water, (ii) a pose estimator that detects people and
Disaster recognition based on social media images has been a rising               fits skeletons into their bodies and (iii) a rule-based fusion module
topic recently. Flood level estimation is technically challenging as              that combines the information from the water detector and the
                                                                                  pose estimator to make a final decision.
1 http://www.multimediaeval.org/mediaeval2019/multimediasatellite, 01.10.2019
                                                                                      Water detector: we build upon ResNet50, which is pre-trained
Copyright 2019 for this paper by its authors. Use
                                                                                  on ImageNet and fine-tuned for water/no water detection using im-
permitted under Creative Commons License Attribution                              ages showing either water or not. Images are resized, using nearest-
4.0 International (CC BY 4.0).                                                    neighbor interpolation, to the network’s input size (227x227) while
MediaEval’19, 27-29 October 2019, Sophia Antipolis, France
MediaEval’19, 27-29 October 2019, Sophia Antipolis, France                                                                                       J. Strebl et al.

                                  Water
                                 Detector
                                                                                      over the predictions of Runs 1, 4 and 5. Results for the validation
                                                                                      and test set are presented in Table 1.
                                                                  class 0
                                            Rule-based
                                             Classifier
                                                                  class 1
                   OpenPose
                                                                                      Table 1: Macro-averaged Precision (P), Recall (R), and F1-
                                                   >T


                                                  >T
                                                                                      scores for Multimodal Flood Level Estimation from News.

       (a) Approach 1: Global water detection (Run 1 and 4).                                        Run    Validation (P | R | F1)   Test (F1)
              OpenPose                                                                               1     0.58 0.67       0.61        0.61
                                                        Water
                                                       Detector
                                                                                                     3     0.59 0.68       0.61        0.61
                                                                            class 0                  4     0.55 0.60       0.56        0.59
                                                   Rule-based
                                                                            class 1
                                                                                                     5     0.58 0.77       0.60        0.59
                                                    Classifier


                                                            >T

                                                                                         The performance of our approach is almost the same on our vali-
                                                                                      dation and the benchmark test set, which shows that our approach
          (b) Approach 2: Local water detection (Run 5).
                                                                                      generalizes well. The overall performance is around 60% for all
                                                                                      runs and does not show a significant difference between the local
    Figure 2: Overview of two water detection approaches.
                                                                                      and global approaches. Similarly, the fusion of both (Run 3) does
                                                                                      not outperform our baseline (Run 1). The random baseline for this
keeping the original aspect ratio. Horizontal flipping, brightness                    task depends on the class cardinalities in the test set and is thus
variations and non-uniform re-scaling of the images are applied for                   unknown to the authors. It has, however, an upper limit of 50% for
data augmentation. The top five layers are fine-tuned (for 6 epochs,                  the task due to the use of the macro averaged class-wise F1-scores
batch size 256) before the whole network (for 10 epochs, batch size                   as performance measure. Our approach outperforms this baseline,
32) is trained using the Adam optimizer (learning rate of 10−4 ).                     which shows that it learns useful patterns related to the target task,
    Pose estimator (OpenPose): we employ OpenPose [3] to de-                          although there is room for improvement. A closer analysis of the
tect body key points from depicted human bodies. To filter out                        results shows several directions for improvement. While the water
false positive detections and unreliable skeletons, we calculate a                    detection is quite robust (classification accuracy of 0.88; model is
confidence score (CU ) from the two most robust upper body parts,                     trained on data from last year’s task as well as 700 images from
i.e. head and chest (OpenPose joint IDs 0 and 1). Only skeletons                      this year’s task and evaluated on 200 images from this year’s task),
with an empirically estimated threshold of CU > 0.6 are further                       we observe numerous false and missed detections of OpenPose.
considered. To detect whether the lower extremities of a body are                     Furthermore, reflections of the human body on the water surface
visible, we calculate a second confidence score (C L ) as the mean                    represent a problem, i.e. for the detection of lower extremities. In
confidence over the lower body parts (OpenPose joint IDs 9, 10, 12,                   several cases, OpenPose added body parts for the lower extremities,
and 13). Note that for missing body parts the confidence is zero.                     which were actually under water (see right image in Figure 1).
    Rule-based Classifier: to determine whether the lower extrem-
ities of a detected skeleton are visible we employ the following                      5   DISCUSSION AND OUTLOOK
heuristic rule: CU /max(C L , 10−4 ) > T , with CU and C L being the                  In this paper, we presented our contribution to the MediaEval 2019
mean detection confidence for the upper and lower body and T an                       task on flood level estimation from news media images. Our ap-
empirically determined threshold of 1.5. The max operator prevents                    proach combines a pose detector and a water detector to find im-
division by zero.                                                                     ages showing people standing in water above the knee. First results
    Final decision rule: A positive detection of a person standing                    show a promising generalization ability. Concerning the overall
in water is declared when both the rule-based classifier and the                      performance, improvements are possible. A promising approach
(local or global) water detector predict positively.                                  to increase robustness is the use of several human (pose) detectors
                                                                                      trained on different data (e.g. urban and rural setting). A limitation
4    EXPERIMENTAL RESULTS                                                             of our approach is that not only water but also other objects can
We train our models on 80% of the training data and use 20% from                      obscure the lower extremities of a person or that only torso or head
each class for validation (randomly selected). For Run 1, we use                      are shown in the picture. As a result, the lower extremities cannot
only the data provided by the organizers, but we manually label                       be detected and if water is present, our approach may fail. In order
the images regarding whether or not they contain water. In all                        to compensate for these effects, pixel-wise segmentation of water
other runs, we further use the data from the Multimedia Satellite                     and humans could be advantageous. Additionally, pixel accurate
Task from 2018 [2] (Task: Flood classification for social multimedia;                 data could help to detect false detections by OpenPose, e.g. detected
manually labeled to water/no water) to train the water detector [5].                  body parts protruding out of the segmented area, which represents
For Run 1 and 4, we employ the classification pipeline depicted in                    a human body, should not be considered.
Figure 2a. In Run 5 we evaluate the local approach from Figure 2b.
Run 3 was reserved for a multimodal run combining text and image                      ACKNOWLEDGMENTS
data, which we could not submit due to large amounts of inaccessi-                    The work in this article was supported by the Austrian Research
ble text data. Therefore, for Run 3 we perform a majority voting                      Promotion Agency FFG under grant no. 856333.
The Multimedia Satellite Task                                                    MediaEval’19, 27-29 October 2019, Sophia Antipolis, France


REFERENCES
 [1] Benjamin Bischke, Patrick Helber, Simon Brugman, Erkan Basar,
     Zhengyu Zhao, Martha Larson, and Konstantin Pogorelov. Oct. 27-29,
     2019. The Multimedia Satellite Task at MediaEval 2019: Estimation
     of Flood Severity. In Proc. of the MediaEval 2019 Workshop. Sophia
     Antipolis, France.
 [2] Benjamin Bischke, Patrick Helber, Zhengyu Zhao, Jens De Bruijn,
     and Damian Borth. 2018. The multimedia satellite task at MediaEval
     2018: Emergency response for flooding events. In 2018 Working Notes
     Proceedings of the MediaEval Workshop, MediaEval 2018. CEUR-WS.
     org, 1–3.
 [3] Zhe Cao, Tomas Simon, Shih-En Wei, and Yaser Sheikh. 2017. Re-
     altime multi-person 2d pose estimation using part affinity fields. In
     Proceedings of the IEEE Conference on Computer Vision and Pattern
     Recognition. 7291–7299.
 [4] Renaud Hostache, Patrick Matgen, Guy Schumann, Christian Puech,
     Lucien Hoffmann, and Laurent Pfister. 2009. Water level estimation and
     reduction of hydraulic model calibration uncertainties using satellite
     SAR images of floods. IEEE Transactions on Geoscience and Remote
     Sensing 47, 2 (2009), 431–441.
 [5] Armin Kirchknopf, Djordje Slijepcevic, Matthias Zeppelzauer, and
     Markus Seidl. 2018. Detection of Road Passability from Social Media
     and Satellite Images.. In MediaEval.
 [6] Victor Klemas. 2014. Remote sensing of floods and flood-prone areas:
     an overview. Journal of Coastal Research 31, 4 (2014), 1005–1013.
 [7] Rajesh Kumar Pandey, Jean-François Crétaux, Muriel Bergé-Nguyen,
     Virendra Mani Tiwari, Vanessa Drolon, Fabrice Papa, and Stephane
     Calmant. 2014. Water level estimation by remote sensing for the 2008
     flooding of the Kosi River. International journal of remote sensing 35, 2
     (2014), 424–440.
 [8] E. Schnebele, G. Cervone, and N. Waters. 2018. Road assessment after
     flood events using non-authoritative data. Nat. Hazards Earth Syst. Sci.
     14, 4 (2018), 1007–1015. https://doi.org/10.5194/nhess-14-1007-2014
 [9] H. Zwenzner and S. Voigt. 2009. Improved estimation of flood param-
     eters by combining space based SAR data with very high resolution
     digital elevation data. Hydrology and Earth System Sciences 13, 5 (2009),
     567.