<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ensemble and Inference based Methods for Flood Severity Estimation Using Visual Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mir Murtaza</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Hanif</string-name>
          <email>hanif.soomro@nu.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Atif Tahir</string-name>
          <email>atif.tahir@nu.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Rafi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>National University of Computer and Emerging Sciences, Karachi Campus</institution>
          ,
          <country country="PK">Pakistan</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <fpage>27</fpage>
      <lpage>29</lpage>
      <abstract>
        <p>This paper presents the contribution of NUCES DSGP team for the Multimedia Satellite Task at MediaEval 2019. The essential tasks include News Image Topic Disambiguation (NITD) and Multimodal Flood level Estimation (MFLE) from news images. An ensemble based deep learning method has been applied to the News Image Topic Disambiguation task, where data augmentation and transfer learning were used for binary classification of images. During training, the challenge of class imbalance is managed by using data augmentation technique and selection of equal sample size from each class. For Multimodal Flood Level Estimation task, person's lower body keypoints were detected along with image flood probability scores from two deep convolutional network architectures, namely ResNet50 and VGG19. The confidence scores of detected keypoints and the convolutional networks' output probabilities were combined and were passed to a Random Forest classifier for a ifnal prediction score. The evaluation of the proposed methods for the test set of NITD task revealed a 0.895 F1-Micro score (3rd best score), while the evaluation of MFLE task provided 0.734 F1-Macro score (2nd best score).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Efective and eficient flood response system requires timely
information about the event. The risk of damage could be reduced
by appropriate actions on the basis of inferred information. The
data collected through diferent mediums including news articles
is easily available and could be used for disaster response system.
The NITD task of "Multimedia Satellite Task at MediaEval, 2019"
[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] provided the challenge of developing binary image classifier by
using images published in diferent articles. The classifier method
should predict whether or not the topic of article in which particular
image was included, discussing any water related disaster.
      </p>
      <p>
        In the MFLE task, we had to determine whether at least one
person in the image is standing in water, and whether or not the level
of water is above the knee. To solve this problem, we considered
both global and local perspective of an image. Our approach was
to first detect and localize all person(s) present in an image. Then
a dataset was prepared which contains persons’ bounding boxes
from all of the images. The ResNet50 Convolutional Neural
Network was fine-tuned to extract flood probability of the bounding
box. The label of the image was attached to all bounding box(es) of
person(s) present in the image. Then we stacked ResNet50
probability with confidence score of four lower body keypoints of interest:
namely Right Knee, Left Knee, Right Ankle, Left Ankle of a person
and a score that indicated belief of calculated confidence scores.
We also added an image level flood evidence probability obtained
from VGG19 pre-trained on the Places365 dataset and fine-tuned
on Multimedia Satellite 2017 flood images. This flood probability is
same for all the detected person(s) in an image. For inference, we
trained a random forest on keypoints scores and flood probability
scores obtained for each person bounding box. If the person with
the maximum probability exceeds 0.50, the image is classified as
positive instance. Our inspiration for this work comes from eforts
on text and image based data from diferent sources to identify type
and intensity of disasters [
        <xref ref-type="bibr" rid="ref11 ref15 ref5 ref9">5, 9, 11, 15</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
      <p>
        Two diferent deep learning based approaches were used for both
tasks. The details of approaches are given below:
News Image Topic Disambiguation (NITD): An ensemble based
deep learning approach has been adopted for the task of
Imagebased News Topic Disambiguation, MediaEval, 2019. The dataset
for the task contains 5181 images for training and 1296 images
for testing purpose. There were 564 images related to first class
and 4617 images for second class. An ensemble based method has
been implemented for the specified binary image classification task.
Initially, class imbalanced problem has been solved by using data
augmentation technique. The Augmentor [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] library has been used
to create multiple copies of minority class by using diferent
parameters, including rotate, flip and zoom. After balancing both classes,
3000 images from each class were randomly selected for training.
Visual Geometry Group (VGG16) [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] classification model pretrained
on Hybrid dataset (Places365 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and ImageNet [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) has been used
for the purpose of classification. Visual Geometry Group (VGG16)
is a neural network, launched during ImageNet Large Scale Visual
Recognition Challenge (ILSVRC), 2014 and secured first and
second positions in image localization and image classification tasks,
respectively. Dropout ratio was set as 0.3, which was added after
the first dense layer. And a batch normalization layer was added
after the dropout layer. The last softmax layer was replaced with a
sigmoid unit for binary classes. The last 6 layers of the model were
retrained on dataset of the task and remaining layers were fixed
during the training. The parameters of the model were optimized
using the Adam optimizer, with learning rate of 10e-6. The model
was retrained for 40 epochs and finally the trained model was
applied for prediction on 1296 test images. Similarly, five models were
trained and majority voting was applied for the final prediction, as
depicted in Figure 1.
      </p>
      <p>
        Mu. ltimodal Flood Level Estimation from news (MFLE): For
the prediction of at least one person standing in water level above
the knee in an image, we hypothesized that "a detected person
in the flooded region with low knees and ankles visibility is a
positive case". For this purpose, we prepared a separate dataset
which consists of all the detected persons’ bounding boxes from
their respective images. The label attached to a person bounding
box was same as that of the parent image. For person’s bounding
box detection, the Faster-RCNN model [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is used, an end-to-end
architecture that uses a Region Proposal Network to select regions
of interest along with VGG16 Deep CNN for classification. We
ifne-tuned the last convolutional layer of the ImageNet ResNet50
CNN architecture on these detected person patches of the training
images and replaced the last softmax layer with a sigmoid unit. As
the dataset was highly imbalanced, balanced weights were used
which is an efective method to train Convolutional Neural Network
models on imbalanced data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The purpose of using ResNet is to
get a local estimate of the flood level in the person’s bounding box.
      </p>
      <p>
        Afterwards, for the persons bounding boxes dataset, we obtained
the visibility confidence scores for four lower body keypoints i.e
Right Knee, Left Knee, Right Ankle, Left Ankle of a person and an
overall score indicating belief in the keypoints calculation. These
probabilities help in estimation of the visibility part of our
hypothesis. We utilized Fang et. al [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] approach of Regional Multi-person
Pose Estimation (RMPE) for keypoint detection. The development
dataset had images with no flood evidence. Therefore, we quantified
the global flood evidence by acquiring flood evidence probability
from Places365 VGG19 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] fine-tuned on MediaEval Multimedia
Satellite 2017 dataset [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. All persons bounding boxes of an image
get same global flood evidence probability.
      </p>
      <p>
        Then a Random Forest [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] was trained with 321 trees and gini
splitting criteria on the feature vectors comprising of ResNet50
local flood probability, Right Knee confidence, Left Knee confidence,
Right Ankle confidence, Left Ankle confidence, keypoints Accuracy
Score and VGG19 global flood evidence. For inference, we take
an image, detect person(s) bounding box(es), then for each person
bounding box, pass it through ResNet50 and get a local flood
probability. Then we extract confidence probabilities for Right Knee,
Left Knee, Right Ankle, Left Ankle of a person using RMPE and
the global flood evidence probability using Places365 VGG19. Each
person’s features set is passed to the trained Random Forest and
the final probability of that person being in water above knee-level
is obtained. For classification, we select the person in the image
having the maximum probability, if the maximum probability is
greater than 0.50, the image is classified as positive instance, else a
negative one as shown in Figure 2.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>TEST RESULTS FOR NITD AND MFLE</title>
      <p>
        For all runs of the NITD task, VGG16 ensemble was used. Run 1 and
2 use class weights to handle the class imbalance, Run 3, 4 and 5
use data augmentation. Run 5 uses EasyEnsemble [
        <xref ref-type="bibr" rid="ref10 ref12">10, 12</xref>
        ] approach
for classification. Run 1 of MLFE uses the image feature set and
Random Forest classifier.
      </p>
      <p>Table 1 shows the performance of various runs on test set. For
NITD, the ensemble of five diferent VGG16 (Hybrid) models has
produced a micro-averaged F1 score of 0.895. For MFLE, the
proposed method has produced F1 (macro-averaged) score of 0.734.
We got an F1-Score (+ve class) of 0.694 on the MFLE validation set.
Various approaches for NITD task have been applied. An approach
without data augmentation has been implemented by balancing
the minority and majority classes through weights. Also, features
with the length of 1365 have been extracted for each image and
EasyEnsemble classifier is applied. However, the best performing
approach utilized data augmentation technique to increase the
quantity of minority class. We did an ablation study of the MFLE
inference model. First, when VGG19 global level flood estimation
component was removed, the performance deteriorated. Same hold
true for the ResNet local level flood estimation and for the
Ankle keypoints. For the Random Forest probabilities, we computed
diferent statistics such as mean and median, but maximum of
probabilities resulted in better performance on the validation set.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bischke</surname>
          </string-name>
          , Patrick Helber, Simon Brugman, Erkan Basar, Martha Larson, and
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>The Multimedia Satellite Task at MediaEval 2019: Estimation of Flood Severity</article-title>
          .
          <source>In Proc. of the MediaEval 2019</source>
          Workshop (Oct.
          <fpage>27</fpage>
          -
          <lpage>29</lpage>
          ,
          <year>2019</year>
          ). Sophia Antipolis, France.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Bischke</surname>
          </string-name>
          , Patrick Helber, Christian Schulze, Venkat Srinivasan, Andreas Dengel, and
          <string-name>
            <given-names>Damian</given-names>
            <surname>Borth</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <source>The Multimedia Satellite Task at MediaEval</source>
          <year>2017</year>
          ,
          <article-title>Emergency Response for Flooding Events</article-title>
          .
          <source>In Proc. of the MediaEval 2017 Workshop (Sept</source>
          .
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2017</year>
          ). Dublin, Ireland.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Marcus</surname>
            <given-names>D</given-names>
          </string-name>
          <string-name>
            <surname>Bloice</surname>
          </string-name>
          ,
          <string-name>
            <surname>Peter M Roth</surname>
            ,
            <given-names>and Andreas</given-names>
          </string-name>
          <string-name>
            <surname>Holzinger</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Biomedical image augmentation using Augmentor</article-title>
          .
          <source>Bioinformatics (04</source>
          <year>2019</year>
          ). https://doi.org/10.1093/bioinformatics/btz259 btz259.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine learning 45, 1</source>
          (
          <year>2001</year>
          ),
          <fpage>5</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Tom</given-names>
            <surname>Brouwer</surname>
          </string-name>
          , Dirk Eilander, Arnejan Van Loenen,
          <string-name>
            <surname>Martijn J Booij</surname>
          </string-name>
          ,
          <article-title>Kathelijne M Wijnberg, Jan S Verkade,</article-title>
          and
          <string-name>
            <given-names>Jurjen</given-names>
            <surname>Wagemaker</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Probabilistic flood extent estimates from social media flood observations</article-title>
          .
          <source>Natural Hazards &amp; Earth System Sciences</source>
          <volume>17</volume>
          ,
          <issue>5</issue>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Mateusz</given-names>
            <surname>Buda</surname>
          </string-name>
          ,
          <source>Atsuto Maki, and Maciej A Mazurowski</source>
          .
          <year>2018</year>
          .
          <article-title>A systematic study of the class imbalance problem in convolutional neural networks</article-title>
          .
          <source>Neural Networks</source>
          <volume>106</volume>
          (
          <year>2018</year>
          ),
          <fpage>249</fpage>
          -
          <lpage>259</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Jia</given-names>
            <surname>Deng</surname>
          </string-name>
          , Wei Dong, Richard Socher,
          <string-name>
            <surname>Li-Jia</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kai</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <year>2009</year>
          .
          <article-title>Imagenet: A large-scale hierarchical image database</article-title>
          . In 2009 IEEE conference
          <article-title>on computer vision and pattern recognition</article-title>
          .
          <source>IEEE</source>
          ,
          <fpage>248</fpage>
          -
          <lpage>255</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Hao-Shu</surname>
            <given-names>Fang</given-names>
          </string-name>
          , Shuqin Xie,
          <string-name>
            <surname>Yu-Wing Tai</surname>
            , and
            <given-names>Cewu</given-names>
          </string-name>
          <string-name>
            <surname>Lu</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>RMPE: Regional multi-person pose estimation</article-title>
          .
          <source>In Proceedings of the IEEE International Conference on Computer Vision</source>
          . 2334-
          <fpage>2343</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Ryan</given-names>
            <surname>Lagerstrom</surname>
          </string-name>
          , Yulia Arzhaeva, Piotr Szul, Oliver Obst, Robert Power, Bella Robinson, and
          <string-name>
            <given-names>Tomasz</given-names>
            <surname>Bednarz</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Image classification to support emergency situation awareness</article-title>
          .
          <source>Frontiers in Robotics and AI</source>
          <volume>3</volume>
          (
          <year>2016</year>
          ),
          <fpage>54</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Guillaume</surname>
            <given-names>Lemaître</given-names>
          </string-name>
          , Fernando Nogueira, and
          <string-name>
            <surname>Christos</surname>
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Aridas</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>18</volume>
          ,
          <issue>17</issue>
          (
          <year>2017</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>5</lpage>
          . http://jmlr.org/papers/v18/
          <fpage>16</fpage>
          -
          <lpage>365</lpage>
          .html
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Zhenlong</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Cuizhen</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Christopher T Emrich</surname>
            , and
            <given-names>Diansheng</given-names>
          </string-name>
          <string-name>
            <surname>Guo</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A novel approach to leveraging social media for rapid flood mapping: a case study of the 2015 South Carolina floods</article-title>
          .
          <source>Cartography and Geographic Information Science</source>
          <volume>45</volume>
          ,
          <issue>2</issue>
          (
          <year>2018</year>
          ),
          <fpage>97</fpage>
          -
          <lpage>110</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Xu-Ying</surname>
            <given-names>Liu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jianxin Wu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Zhi-Hua Zhou</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Exploratory undersampling for class-imbalance learning</article-title>
          .
          <source>IEEE Transactions on Systems, Man, and Cybernetics</source>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>B</given-names>
          </string-name>
          (
          <year>Cybernetics</year>
          )
          <volume>39</volume>
          ,
          <issue>2</issue>
          (
          <year>2008</year>
          ),
          <fpage>539</fpage>
          -
          <lpage>550</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Shaoqing</surname>
            <given-names>Ren</given-names>
          </string-name>
          , Kaiming He,
          <string-name>
            <surname>Ross Girshick</surname>
            , and
            <given-names>Jian</given-names>
          </string-name>
          <string-name>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <string-name>
            <surname>Faster</surname>
            <given-names>R-CNN</given-names>
          </string-name>
          :
          <article-title>Towards real-time object detection with region proposal networks</article-title>
          .
          <source>In Advances in neural information processing systems</source>
          .
          <volume>91</volume>
          -
          <fpage>99</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Very Deep Convolutional Networks for Large-Scale Image Recognition</article-title>
          .
          <source>In International Conference on Learning Representations.</source>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Nataliya</surname>
            <given-names>Tkachenko</given-names>
          </string-name>
          , Stephen Jarvis, and
          <string-name>
            <given-names>Rob</given-names>
            <surname>Procter</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Predicting lfoods with Flickr tags</article-title>
          .
          <source>PloS one 12</source>
          ,
          <issue>2</issue>
          (
          <year>2017</year>
          ),
          <year>e0172870</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Bolei</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba.
          <year>2017</year>
          .
          <article-title>Places: A 10 million image database for scene recognition</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence 40</source>
          ,
          <issue>6</issue>
          (
          <year>2017</year>
          ),
          <fpage>1452</fpage>
          -
          <lpage>1464</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>