<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Flood Level Estimation from Social Media Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julia Strebl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Djordje Slijepcevic</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Armin Kirchknopf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muntaha Sakeena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Markus Seidl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matthias Zeppelzauer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>St. Pölten University of Applied Sciences</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>In this paper, we present an approach and first results for the MediaEval 2019 sub-task on “Multimodal Flood Level Estimation from News” from the “2019 Multimedia Satellite Task”. The water level is measured by detecting people standing in water and using the human body as a reference. We focus only on the visual modality and propose a combination of a ResNet-based water detector and pose estimation to solve the task. First results are promising and show that our approach is clearly performing above baseline. 1http://www.multimediaeval.org/mediaeval2019/multimediasatellite, 01.10.2019</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        The assessment of natural disasters by automated media analysis
becomes increasingly important as, on one hand, the amount of
user-generated media has been rising with the availability of smart
phones and, on the other hand, the likelihood of disasters increases
e.g. due to the ongoing climate change. The availability of (social)
media data represents an opportunity to automatically detect and
assess disasters to better guide first responders and emergency
forces. The types of disasters targeted in this work are floods and
the particular task to solve is flood level estimation [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The task has
been formulated in the course of the “2019 Multimedia Satellite Task:
Estimation of Flood Severity” conducted in the MediaEval 2019
benchmarking initiative [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This paper presents our contribution
to the benchmark together with results on the benchmarking test
set. The task of flood level estimation is defined as follows: “build a
binary classifier that predicts whether or not the image contains at
least one person standing in water above the knee”1. Input to the
classifier can be visual data, textual data or both. The data stems
from online news articles and comprises 6172 articles, whereas 1234
articles belong to the test set and 4932 articles to the training set (6
articles, i.e., 598, 3932, 4465, 5019, 5091, and 5419, were excluded
due to corrupt image files). There is one image per article. The test
labels were not available during development. A major challenge of
this task is the strong imbalance between the positive class (people
standing in water above the knee) and the negative class. The
textual data was only partly available through the links provided
by the organizers. For this reason our approach focuses only on the
visual domain. Our results show that the approach is clearly above
the random baseline and has a good generalization ability.
      </p>
    </sec>
    <sec id="sec-2">
      <title>RELATED WORK</title>
      <p>
        Disaster recognition based on social media images has been a rising
topic recently. Flood level estimation is technically challenging as
shown by Hostache et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and Zwenzner et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Both authors
proposed a combined method based on SAR images and a Digital
Elevation Model (DEM). By using crowd-sourced, non-authoritatively
collected data, Schnebele et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], proposed a method to detect
lfood events on road infrastructure in the US. Pandey et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] used
diferent data modalities, such as MODIS images and TRMM
precipitation to detect floodings after a dam breach and could estimate
a rise of flood level by 1.0 to 1.4m. Related research as mentioned
above fused several information sources, such as aerial images and
DEM models to estimate flood levels. The estimation of flood levels
from RGB data is a challenging task, as the visual appearance of
water varies strongly.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>APPROACH</title>
      <p>Since the provided dataset is multimodal, our initial idea consisted
of training two diferent classifiers for the image data and a
separate one for the text data and then fusing the predictions. Due
to insuficient text data we decided to provide predictions based
only on visual data. We developed two classification approaches.
The first approach (see Figure 2a) relies on detecting water within
the whole image and detecting at least one person with obscured
lower body parts. The second approach (see Figure 2b) performs
local water detection. To this end, for each human body detected,
a patch that also contains the local neighbourhood of the human
body is taken into account for water detection. If at least for one
patch in the image our model detects obscured lower extremities
and water in the vicinity, the image is assigned to the positive class.
Both proposed approaches build upon three main components: (i)
a water detector that predicts whether a certain image or image
region contains water, (ii) a pose estimator that detects people and
ifts skeletons into their bodies and (iii) a rule-based fusion module
that combines the information from the water detector and the
pose estimator to make a final decision.</p>
      <p>Water detector: we build upon ResNet50, which is pre-trained
on ImageNet and fine-tuned for water/no water detection using
images showing either water or not. Images are resized, using
nearestneighbor interpolation, to the network’s input size (227x227) while
keeping the original aspect ratio. Horizontal flipping, brightness
variations and non-uniform re-scaling of the images are applied for
data augmentation. The top five layers are fine-tuned (for 6 epochs,
batch size 256) before the whole network (for 10 epochs, batch size
32) is trained using the Adam optimizer (learning rate of 10−4).</p>
      <p>
        Pose estimator (OpenPose): we employ OpenPose [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to
detect body key points from depicted human bodies. To filter out
false positive detections and unreliable skeletons, we calculate a
confidence score ( CU ) from the two most robust upper body parts,
i.e. head and chest (OpenPose joint IDs 0 and 1). Only skeletons
with an empirically estimated threshold of CU &gt; 0.6 are further
considered. To detect whether the lower extremities of a body are
visible, we calculate a second confidence score ( CL ) as the mean
confidence over the lower body parts (OpenPose joint IDs 9, 10, 12,
and 13). Note that for missing body parts the confidence is zero.
      </p>
      <p>Rule-based Classifier : to determine whether the lower
extremities of a detected skeleton are visible we employ the following
heuristic rule: CU /max(CL, 10−4) &gt; T , with CU and CL being the
mean detection confidence for the upper and lower body and T an
empirically determined threshold of 1.5. The max operator prevents
division by zero.</p>
      <p>Final decision rule: A positive detection of a person standing
in water is declared when both the rule-based classifier and the
(local or global) water detector predict positively.
4</p>
    </sec>
    <sec id="sec-4">
      <title>EXPERIMENTAL RESULTS</title>
      <p>
        We train our models on 80% of the training data and use 20% from
each class for validation (randomly selected). For Run 1, we use
only the data provided by the organizers, but we manually label
the images regarding whether or not they contain water. In all
other runs, we further use the data from the Multimedia Satellite
Task from 2018 [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] (Task: Flood classification for social multimedia;
manually labeled to water/no water) to train the water detector [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
For Run 1 and 4, we employ the classification pipeline depicted in
Figure 2a. In Run 5 we evaluate the local approach from Figure 2b.
Run 3 was reserved for a multimodal run combining text and image
data, which we could not submit due to large amounts of
inaccessible text data. Therefore, for Run 3 we perform a majority voting
over the predictions of Runs 1, 4 and 5. Results for the validation
and test set are presented in Table 1.
      </p>
      <p>The performance of our approach is almost the same on our
validation and the benchmark test set, which shows that our approach
generalizes well. The overall performance is around 60% for all
runs and does not show a significant diference between the local
and global approaches. Similarly, the fusion of both (Run 3) does
not outperform our baseline (Run 1). The random baseline for this
task depends on the class cardinalities in the test set and is thus
unknown to the authors. It has, however, an upper limit of 50% for
the task due to the use of the macro averaged class-wise F1-scores
as performance measure. Our approach outperforms this baseline,
which shows that it learns useful patterns related to the target task,
although there is room for improvement. A closer analysis of the
results shows several directions for improvement. While the water
detection is quite robust (classification accuracy of 0.88; model is
trained on data from last year’s task as well as 700 images from
this year’s task and evaluated on 200 images from this year’s task),
we observe numerous false and missed detections of OpenPose.
Furthermore, reflections of the human body on the water surface
represent a problem, i.e. for the detection of lower extremities. In
several cases, OpenPose added body parts for the lower extremities,
which were actually under water (see right image in Figure 1).
5</p>
    </sec>
    <sec id="sec-5">
      <title>DISCUSSION AND OUTLOOK</title>
      <p>In this paper, we presented our contribution to the MediaEval 2019
task on flood level estimation from news media images. Our
approach combines a pose detector and a water detector to find
images showing people standing in water above the knee. First results
show a promising generalization ability. Concerning the overall
performance, improvements are possible. A promising approach
to increase robustness is the use of several human (pose) detectors
trained on diferent data (e.g. urban and rural setting). A limitation
of our approach is that not only water but also other objects can
obscure the lower extremities of a person or that only torso or head
are shown in the picture. As a result, the lower extremities cannot
be detected and if water is present, our approach may fail. In order
to compensate for these efects, pixel-wise segmentation of water
and humans could be advantageous. Additionally, pixel accurate
data could help to detect false detections by OpenPose, e.g. detected
body parts protruding out of the segmented area, which represents
a human body, should not be considered.</p>
    </sec>
    <sec id="sec-6">
      <title>ACKNOWLEDGMENTS</title>
      <p>The work in this article was supported by the Austrian Research
Promotion Agency FFG under grant no. 856333.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bischke</surname>
          </string-name>
          , Patrick Helber, Simon Brugman, Erkan Basar,
          <string-name>
            <given-names>Zhengyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Martha</given-names>
            <surname>Larson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          . Oct.
          <volume>27</volume>
          -
          <fpage>29</fpage>
          ,
          <year>2019</year>
          . The Multimedia Satellite Task at MediaEval 2019:
          <article-title>Estimation of Flood Severity</article-title>
          .
          <source>In Proc. of the MediaEval 2019 Workshop</source>
          . Sophia Antipolis, France.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bischke</surname>
          </string-name>
          , Patrick Helber,
          <string-name>
            <given-names>Zhengyu</given-names>
            <surname>Zhao</surname>
          </string-name>
          , Jens De Bruijn, and
          <string-name>
            <given-names>Damian</given-names>
            <surname>Borth</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>The multimedia satellite task at MediaEval 2018: Emergency response for flooding events</article-title>
          .
          <source>In 2018 Working Notes Proceedings of the MediaEval Workshop</source>
          , MediaEval
          <year>2018</year>
          .
          <article-title>CEUR-WS</article-title>
          . org, 1-
          <fpage>3</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Zhe</given-names>
            <surname>Cao</surname>
          </string-name>
          , Tomas Simon,
          <string-name>
            <surname>Shih-En Wei</surname>
            , and
            <given-names>Yaser</given-names>
          </string-name>
          <string-name>
            <surname>Sheikh</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Realtime multi-person 2d pose estimation using part afinity fields</article-title>
          .
          <source>In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition</source>
          .
          <fpage>7291</fpage>
          -
          <lpage>7299</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Renaud</given-names>
            <surname>Hostache</surname>
          </string-name>
          , Patrick Matgen, Guy Schumann, Christian Puech, Lucien Hofmann, and
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Pfister</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Water level estimation and reduction of hydraulic model calibration uncertainties using satellite SAR images of floods</article-title>
          .
          <source>IEEE Transactions on Geoscience and Remote Sensing</source>
          <volume>47</volume>
          ,
          <issue>2</issue>
          (
          <year>2009</year>
          ),
          <fpage>431</fpage>
          -
          <lpage>441</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Armin</given-names>
            <surname>Kirchknopf</surname>
          </string-name>
          , Djordje Slijepcevic, Matthias Zeppelzauer, and
          <string-name>
            <given-names>Markus</given-names>
            <surname>Seidl</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Detection of Road Passability from Social Media and Satellite Images.</article-title>
          .
          <source>In MediaEval.</source>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Victor</given-names>
            <surname>Klemas</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Remote sensing of floods and flood-prone areas: an overview</article-title>
          .
          <source>Journal of Coastal Research</source>
          <volume>31</volume>
          ,
          <issue>4</issue>
          (
          <year>2014</year>
          ),
          <fpage>1005</fpage>
          -
          <lpage>1013</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Rajesh</given-names>
            <surname>Kumar</surname>
          </string-name>
          <string-name>
            <given-names>Pandey</given-names>
            ,
            <surname>Jean-François</surname>
          </string-name>
          <string-name>
            <surname>Crétaux</surname>
          </string-name>
          , Muriel Bergé-Nguyen, Virendra Mani Tiwari, Vanessa Drolon, Fabrice Papa, and
          <string-name>
            <given-names>Stephane</given-names>
            <surname>Calmant</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Water level estimation by remote sensing for the 2008 lfooding of the Kosi River</article-title>
          .
          <source>International journal of remote sensing 35</source>
          ,
          <issue>2</issue>
          (
          <year>2014</year>
          ),
          <fpage>424</fpage>
          -
          <lpage>440</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>E.</given-names>
            <surname>Schnebele</surname>
          </string-name>
          , G. Cervone, and
          <string-name>
            <given-names>N.</given-names>
            <surname>Waters</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Road assessment after lfood events using non-authoritative data</article-title>
          .
          <source>Nat. Hazards Earth Syst. Sci. 14</source>
          ,
          <issue>4</issue>
          (
          <year>2018</year>
          ),
          <fpage>1007</fpage>
          -
          <lpage>1015</lpage>
          . https://doi.org/10.5194/nhess-14-
          <fpage>1007</fpage>
          -2014
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zwenzner</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Voigt</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Improved estimation of flood parameters by combining space based SAR data with very high resolution digital elevation data</article-title>
          .
          <source>Hydrology and Earth System Sciences</source>
          <volume>13</volume>
          ,
          <issue>5</issue>
          (
          <year>2009</year>
          ),
          <fpage>567</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>