<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Floods Detection in Twiter Text and Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Naina Said</string-name>
          <email>nainasaid@uetpeshawar.edu.pk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kashif Ahmad</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Asma Gul</string-name>
          <email>asmagul@sbbwu.edu.pk</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nasir Ahmad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ala Al-Fuqaha</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CSE, University of Engineering and Technology</institution>
          ,
          <addr-line>Peshawar</addr-line>
          ,
          <country country="PK">Pakistan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Statistics, Shaheed Benazir Bhutto Women University</institution>
          ,
          <addr-line>Peshawar</addr-line>
          ,
          <country country="PK">Pakistan</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University</institution>
          ,
          <addr-line>Qatar Foundation, Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>In this paper, we present our methods for the MediaEval 2020 Flood Related Multimedia task, which aims to analyze and combine textual and visual content from social media for the detection of real-world lfooding events. The task mainly focuses on identifying floods related tweets relevant to a specific area. We propose several schemes to address the challenge. For text-based flood events detection, we use three diferent methods, relying on Bag of Words (BOW) and an Italian Version of Bert individually and in combination, achieving an F1-score of 0.77%, 0.68%, and 0.70% on the development set, respectively. For the visual analysis, we rely on features extracted via multiple state-of-the-art deep models pre-trained on ImageNet. The extracted features are then used to train multiple individual classifiers whose scores are then combined in a late fusion manner achieving an F1-score of 0.75%. For our mandatory multi-modal run, we combine the classification scores obtained with the best textual and visual schemes in a late fusion manner. Overall, better results are obtained with the multimodal scheme achieving an F1-score of 0.80% on the development set.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Social media outlets, such as Facebook, Twitter, and Instagram,
allow users to create, obtain and share instant information. Being
an instant source of information, social media outlets especially
Twitter has been widely exploited for information gathering and
dissemination especially in adverse events, where instant access to
relevant information is more crucial [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ]. The literature reports
several situations where social media content has been mined to
get timely access to information in the events of natural and man
made disasters [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Being one of the most frequently occurred natural disasters, flood
events detection in social media has been the focus of the research
community over the last few years. For instance, the research work
presented in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] assesses the informativeness of a tweet in the
event of an earthquake using machine learning techniques.
Similarly, in another study [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], the authors analyze domain adaptation
classifiers by utilizing the labeled data from a past disaster event
and unlabelled data from a current event. Flood events detection
has been also part of the MediaEval challenge for the last four
years where each time a diferent aspect of flood events has been
targeted. This year, the task is focused on the detection of flood
events relevant to a specific area [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the task, the participants
are provided with a collection of Twitter data containing a large
number of tweets’ text and associated images, and are asked to
develop a multi-modal system capable of automatically detecting
lfood related events that occurred in a particular area in Italy. It is
to be noted that the tweets are provided in the Italian language.
      </p>
      <p>
        This paper provides a detailed description of the methods
proposed by team UEHBKU for the task. In total, we submitted five
runs including a late fusion based multimodal, one image-based,
and three textual information-based solutions as detailed in the
next section.
For image-based floods detection, based on our experience on a
similar type of task [
        <xref ref-type="bibr" rid="ref1 ref15 ref4">1, 4, 15</xref>
        ], we rely on multiple pre-trained CNNs
to extract the object-level features from the images, which are
then used to train multiple SVM classifiers. The SVM classifiers
provide the results in terms of posterior probabilities, which are
then combined in a late fusion manner by aggregating the scores. A
label with the highest aggregate is selected as the final outcome of
the framework. In total, we used three diferent models namely (i)
DenseNet [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], (ii) VggNet-19 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], and ResNet [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. It is to be noted
that all the models are pre-trained on ImageNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and expected
to extract object-level features.
      </p>
      <p>
        Moreover, to deal with the class imbalance problem, we use
Synthetic Minority Oversampling Technique (SMOTE) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] to synthesize
new examples of the rare class. The SMOTE technique is based on
under-sampling the majority class and oversampling the rare class
and is found to be more efective in improving the classifier
performance in contrast to only under-sampling the majority class.
During the oversampling process, the rare class has been increased
by a factor 3 to have an equal number of samples in both classes.
2.2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Text-based Floods Detection</title>
      <p>
        For text-based analysis of the tweets, two diferent methods, namely
(i) BoW, and (ii) state of the art BERT model [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], are used to obtain
feature vectors from the tweets. The BoW model represents text by
describing the occurrence of words within a document where each
word count is considered as a feature. BERT, on the other hand,
applies bidirectional training of Transformer, which is a popular
attention model, to language modeling. It is to be noted that
diferent variations of the BERT model are available. Since the tweets
provided for the task are in the Italian language so we utilize the
Italian version of the model. Before training the models, the text is
cleaned by removing punctuation keys, such as commas,full-stops,
emojis, URLs, and stop words from the tweets. Similar to the
imagebased solution, we relied on SMOTE to tackle the class imbalance
problem in the textual data.
      </p>
      <p>The feature vector obtained with BoW is used to train a Naive
Bayes classifier where as a logistic regression model is trained on
the BERT features. The classification scores obtained with the both
individual models are then combined in a late fusion manner by
aggregating the probabilities obtained with both models for the
ifnal decision.
2.3</p>
    </sec>
    <sec id="sec-3">
      <title>Multi-modal Analysis</title>
      <p>For the mandatory multi-modal run, the visual and textual
information are combined in a late fusion scheme by aggregating the
probabilities obtained with the individual models trained on visual
and textual features. It is to be noted that in the current
implementation, all the models are treated equally by assigning them equal
weights. In the future, we aim to use more sophisticated fusion
methods by assigning merit-based weights to the models.
3</p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND ANALYSIS</title>
      <p>We submitted five diferent runs for the task where the first three
runs are based on the text while Run 4 and Run 5 are based on
visual and multi-modal information, respectively. In Run 1, we used
BoW for text representation to diferentiate between flooded and
non-flooded events on Twitter. In Run 2, we relied on a multilingual
BERT model to obtain a feature vector for the tweets, and a logistic
regression model is then trained on the generated word embeddings.</p>
      <p>The variation in the performance of the models motivated us for
the joint use of the models in a late fusion manner for our third
run. However, lower than the best individual model’ performance
(i.e., BoW) is observed for the joint use of the models indicating
that BERT is not suited well in our case.</p>
      <p>
        Our Run 4, which is based on visual information only, is
motivated by our previous experience [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where we combined multiple
state-of-the-art pre-trained models in a late fusion manner by
aggregating the posterior probabilities obtained with the individual
models.
      </p>
      <p>In our final run, we enrich the textual information with the
images associated with each tweet for accurate classification of the
tweets. Again, a late fusion method by aggregating the posterior
probabilities is utilized to combining the complementary
information obtained with text and associated images.</p>
      <p>As can be seen in Table 1, overall, better results have been
obtained with the multimodal approach, which indicates the
superiority of the joint use of textual and visual information for the task. In
the case of individual models trained on a single type of feature (i.e.,
textual or visual), better results have been observed for BoW on the
textual features. However, comparable results have been observed
for the joint use of the diferent deep models on visual information.</p>
      <p>Moreover, as can be seen in Table 1, the average score of all
the teams is very low, which shows the complexity of the task.
However, the scores on the development set are reasonably good,
which indicates the issues with the test set especially because the
teams’ average scores for most of the runs are below 20%.
4</p>
    </sec>
    <sec id="sec-5">
      <title>CONCLUSIONS AND FUTURE WORK</title>
      <p>The 2020 Mediaeval flood-related multimedia task was concerned
with analyzing Twitter data for flood detection. The goal of the task
was to combine the textual and visual information from Twitter
in order to develop an automatic classification system to indicate
whether a particular tweet’s text and the associated image is
relevant to an actual flooding event or not. We proposed five diferent
solutions including a multimodal, a visual information based
solution, and three text-based methods. We observed that both types
of data complement each other, and indeed improves the overall
accuracy. In the present study, we performed late fusion using equal
weights for all the models. However, in the future, we would
investigate diferent optimization techniques, such as Particle Swarm
Optimization and Genetic Algorithm, for assigning merit-based
weights to the models in fusion. In addition, we will also explore
other text-based models specifically for the Italian language to
improve the accuracy of Italian text classification.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Kashif</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , Mohamed Lamine Mekhalfi, Nicola Conci, Giulia Boato, Farid Melgani, and
          <string-name>
            <surname>Francesco GB De Natale</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A pool of deep models for event recognition</article-title>
          .
          <source>In 2017 IEEE International Conference on Image Processing (ICIP)</source>
          . IEEE,
          <fpage>2886</fpage>
          -
          <lpage>2890</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Kashif</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , Konstantin Pogorelov, Michael Riegler, Nicola Conci, and
          <string-name>
            <given-names>Pål</given-names>
            <surname>Halvorsen</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Social media and satellites</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          (
          <year>2018</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Kashif</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , Michael Riegler, Konstantin Pogorelov, Nicola Conci, Pål Halvorsen, and Francesco De Natale.
          <year>2017</year>
          .
          <article-title>Jord: a system for collecting information and monitoring natural disasters by linking social media with satellite imagery</article-title>
          .
          <source>In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. ACM</source>
          ,
          <volume>12</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Kashif</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , Amir Sohail, Nicola Conci, and Francesco De Natale.
          <year>2018</year>
          .
          <article-title>A Comparative study of Global and Deep Features for the analysis of user-generated natural disaster related images</article-title>
          .
          <source>In 2018 IEEE 13th Image, Video, and Multidimensional Signal Processing Workshop</source>
          (IVMSP).
          <source>IEEE</source>
          , 1-
          <fpage>5</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Sheharyar</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , Kashif Ahmad, Nasir Ahmad, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Conci</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Convolutional neural networks for disaster images retrieval</article-title>
          .
          <source>In Proceedings of the MediaEval 2017 Workshop (Sept</source>
          .
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          ,
          <year>2017</year>
          ). Dublin, Ireland.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Stelios</given-names>
            <surname>Andreadis</surname>
          </string-name>
          , Ilias Gialampoukidis, Anastasios Karakostas, Stefanos Vrochidis, Ioannis Kompatsiaris, Roberto Fiorin, Daniele Norbiato, and
          <string-name>
            <given-names>Michele</given-names>
            <surname>Ferri</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>The Flood-related Multimedia Task at MediaEval</article-title>
          <year>2020</year>
          . (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Nitesh</surname>
            <given-names>V Chawla</given-names>
          </string-name>
          , Kevin W Bowyer, Lawrence O Hall, and
          <string-name>
            <given-names>W Philip</given-names>
            <surname>Kegelmeyer</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>SMOTE: synthetic minority over-sampling technique</article-title>
          .
          <source>Journal of artificial intelligence research 16</source>
          (
          <year>2002</year>
          ),
          <fpage>321</fpage>
          -
          <lpage>357</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Jia</given-names>
            <surname>Deng</surname>
          </string-name>
          , Wei Dong, Richard Socher,
          <string-name>
            <surname>Li-Jia</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kai</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <year>2009</year>
          .
          <article-title>Imagenet: A large-scale hierarchical image database</article-title>
          .
          <source>In Computer Vision and Pattern Recognition</source>
          ,
          <year>2009</year>
          .
          <article-title>CVPR 2009</article-title>
          . IEEE Conference on. Ieee,
          <volume>248</volume>
          -
          <fpage>255</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kaiming</surname>
            <given-names>He</given-names>
          </string-name>
          , Xiangyu Zhang, Shaoqing Ren, and
          <string-name>
            <given-names>Jian</given-names>
            <surname>Sun</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Deep residual learning for image recognition</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <volume>770</volume>
          -
          <fpage>778</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Gao</surname>
            <given-names>Huang</given-names>
          </string-name>
          , Zhuang Liu,
          <string-name>
            <surname>Laurens Van Der Maaten</surname>
          </string-name>
          , and
          <string-name>
            <surname>Kilian Q Weinberger</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Densely connected convolutional networks</article-title>
          .
          <source>In Proceedings of the IEEE conference on computer vision and pattern recognition</source>
          .
          <volume>4700</volume>
          -
          <fpage>4708</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Muhammad</surname>
            <given-names>Imran</given-names>
          </string-name>
          , Carlos Castillo, Ji Lucas, Patrick Meier, and
          <string-name>
            <given-names>Sarah</given-names>
            <surname>Vieweg</surname>
          </string-name>
          .
          <year>2014</year>
          . AIDR:
          <article-title>Artificial intelligence for disaster response</article-title>
          .
          <source>In Proceedings of the 23rd International Conference on World Wide Web</source>
          .
          <fpage>159</fpage>
          -
          <lpage>162</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Hongmin</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Nicolais</given-names>
            <surname>Guevara</surname>
          </string-name>
          , Nic Herndon, Doina Caragea, Kishore Neppalli, Cornelia Caragea, Anna Cinzia Squicciarini, and
          <string-name>
            <surname>Andrea H Tapia</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Twitter Mining for Disaster Response: A Domain Adaptation Approach.</article-title>
          .
          <source>In ISCRAM.</source>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Naina</surname>
            <given-names>Said</given-names>
          </string-name>
          , Kashif Ahmad, Michael Riegler, Konstantin Pogorelov, Laiq Hassan, Nasir Ahmad, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Conci</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Natural disasters detection in social media and satellite imagery: a survey</article-title>
          .
          <source>Multimedia Tools and Applications (17 Jul</source>
          <year>2019</year>
          ). https://doi.org/10.1007/ s11042-019-07942-1
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Naina</surname>
            <given-names>Said</given-names>
          </string-name>
          , Konstantin Pogorelov, Kashif Ahmad, Michael Riegler, Nasir Ahmad, Olga Ostroukhova, Pål Halvorsen, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Conci</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep Learning Approaches for Flood Classification and Flood Aftermath Detection.</article-title>
          .
          <source>In MediaEval.</source>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Karen</given-names>
            <surname>Simonyan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Andrew</given-names>
            <surname>Zisserman</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Very deep convolutional networks for large-scale image recognition</article-title>
          .
          <source>arXiv preprint arXiv:1409.1556</source>
          (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>