<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Models for Visual Sentiment Analysis of Disaster-related Multimedia Content</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Khubaib Ahmad</string-name>
          <email>khubaibtakkar@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Muhammad Asif Ayub</string-name>
          <email>asifayub836@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kashif Ahmad</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ala Al-Fuqaha</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nasir Ahmad</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Systems Engineering, University of Engineering and Technology</institution>
          ,
          <addr-line>Peshawar</addr-line>
          ,
          <country country="PK">Pakistan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Division of Information and Computing Technology, College of Science and Engineering, Hamad Bin Khalifa University</institution>
          ,
          <addr-line>Qatar Foundation, Doha</addr-line>
          ,
          <country country="QA">Qatar</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>13</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>This paper presents a solutions for the MediaEval 2021 task namely ”Visual Sentiment Analysis: A Natural Disaster Use-case”. The task aims to extract and classify sentiments perceived by viewers and the emotional message conveyed by natural disaster-related images shared on social media. The task is composed of three sub-tasks including, one single label multi-class image classification subtask, and, two multi-label multi-class image classification subtasks. Both the multi-label classification tasks cover diferent sets of labels. In our proposed solutions, we mainly rely on two diferent state-ofthe-art models namely, Inception-v3 and VggNet-19, pre-trained on ImageNet. Both the pre-trained models are fine-tuned for each of the three subtasks using diferent strategies. Overall encouraging results are obtained on all of the three subtasks. On the single-label classification subtask (i.e. subtask 1), we obtained the weighted average F1-scores of 0.540 and 0.526 for the Inception-v3 and VggNet-19 based solutions, respectively. On the multi-label classification tasks i.e., subtask 2 and subtask 3, the weighted F1-scores of our Inceptionv3 based solutions are 0.572 and 0.516, respectively. Similarly, the weighted F1-scores of our VggNet-19 based solution on the subtask 2 and subtask 3 are 0.584 and 0.495, respectively.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Over the last few years, natural disasters analysis in social media
outlets has been one of the active areas of research. During this
time several interesting solutions exploring diferent aspects of
natural disasters have been proposed [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Some key aspects of
natural disasters explored in the literature include disaster detection
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], disaster news dissemination [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], and disasters severity and
damage assessment [
        <xref ref-type="bibr" rid="ref2 ref7">2, 7</xref>
        ]. Some eforts on the sentiment analysis of
natural disaster-related social media posts have also been reported.
However, most of the eforts made in this regard are based on
textual information [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. More recently, Hassan et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] introduced
the concept of visual sentiment analysis of natural disaster-related
images by proposing a deep sentiment analyzer. However, the topic
is very challenging and there are several aspects of visual sentiment
analysis of natural disaster-related visual content that yet need to
be explored. As part of their eforts to further explore the topic,
the authors proposed a task namely ”Visual Sentiment Analysis: A
Natural Disaster Use-case Task at MediaEval 2021” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
      <p>This paper provides the details of the solutions proposed by team
CSE-Innoverts for the visual sentiment analysis task. The task is
composed of three sub-tasks including a (i) single-label multi-class
classification task with three labels, a (ii) multi-label multi-class
classification task with seven labels, and a (iii) multi-label multi-class
classification task with 11 labels. In the first subtask, the
participants need to classify an image into Negative, Positive, and Neutral
sentiments. In the second subtask, the proposed solution aims to
diferentiate among Joy, sadness, fear, disgust, anger, surprise, and
neutral. The final subtask is composed of 11 labels including anger,
anxiety, craving, empathetic pain, fear, horror, joy, relief, sadness,
and surprise.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>PROPOSED APPROACH</title>
    </sec>
    <sec id="sec-3">
      <title>Methodology for Single-label Classification task (subtask 1)</title>
      <p>
        For the first task, we mainly rely on two diferent Convolutional
Neural Networks (CNNs) architectures namely Inception V-3 and
VggNet based on their proven performances in similar tasks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
Since the available dataset is not large enough to train the models
from the scratch, we fine-tuned the existing models pre-trained
on ImageNet dataset [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the literature, generally, the models
pre-trained on ImageNet and Places dataset [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] are fine-tuned
for image classification tasks. However, our choice for the current
implementation is based on the better performance of the models
pre-trained on the ImageNet dataset in similar tasks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. In this
work, the models are fine-tuned for 50 epochs using Adam optimizer
with a learning rate of 0.0001.
      </p>
      <p>It is important to mention that the provided dataset is imbalanced
with a large number of negative samples while fewer samples are
available in the neutral class. Before fine-tuning the models, we
applied an up-sampling technique to balance the dataset. Moreover,
some data augmentation techniques are also employed to further
increase the training samples by cropping, rotating, and flipping
the image patches.
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology for Multi-label Classification tasks (subtask 2 and subtask 3)</title>
      <p>We used the same strategy of fine-tuning the existing pre-trained
state-of-the-art models for the subtask 2 and subtask 3. However,
to deal with the multi-label classification, several changes are made.
For instance, the top layers of the models are extended to support
the multi-label classification tasks. Moreover, the sigmoid
CrossEntropy loss function is used to deal with every CNN output vector
component independently.</p>
      <p>Similar to subtask 1, the distribution of the samples in the
sentiment categories covered in subtask 2 and subtask 3 is not balanced.
To this aim, the same strategy of up-sampling the minority classes
is used to balance the dataset. Moreover, the data augmentation
techniques are also employed in these subtasks.</p>
    </sec>
    <sec id="sec-5">
      <title>3 RESULTS AND ANALYSIS</title>
    </sec>
    <sec id="sec-6">
      <title>3.1 Evaluation Metric</title>
      <p>We used two diferent metrics for the evaluation of the proposed
solutions. On the test set, the evaluations are carried out in terms of
weighted F1-score, which is the oficial evaluation metric of the task.
On the development set, we used binary accuracy as an additional
metric along with the weighted F1-score. For computing the scores
in the multi-label classification task, we used the default threshold
(i.e., 0.5).</p>
    </sec>
    <sec id="sec-7">
      <title>3.2 Experimental Results on the development set</title>
      <p>Table 1 provides the experimental results of our proposed solutions
on the development set in terms of F1-score. It is important to note
that our validation set in these experiments is composed of 487
samples. As can be seen, overall better results are obtained on the
single-label classification subtask 1, which is composed of three
classes only. As we go deeper in the sentiment categories/classes
hierarchy the performance of the algorithms decreases as the
interclass variation decreases.</p>
      <p>As far as the performance of the models is concerned,
Inceptionv3 has significant improvements over VggNet-19 on subtask 1 and
subtask 2 while comparable results are obtained on subtask 3.</p>
    </sec>
    <sec id="sec-8">
      <title>3.3 Experimental Results on the test set</title>
      <p>Table 2 presents the oficial results of our proposed solutions on
the test set. Surprisingly, overall better results are obtained on a
multi-label classification task subtask 2 for both the models. On the
other hand, similar to the development set, the least performance is
observed for both models on subtask 3. As far as the performance of
the models is concerned, Inception-v3 based solution outperformed
the VggNet-19 based solution on subtask 1 and subtask 3 while
comparable results are obtained on subtask 2.</p>
    </sec>
    <sec id="sec-9">
      <title>4 CONCLUSIONS AND FUTURE WORK</title>
      <p>The challenge is composed of three tasks including a single-label
and two multi-label image classification tasks with diferent sets
Runs
Run 1
Run 2
Highest Score
of labels. The first task aims to cover the conventional three
categories/labels generally used to represent sentiments. The other two
tasks aim to cover sets of labels more specific to natural disasters.
These three sets of labels allow to explore diferent aspects of the
domain, and the task’s complexity increases by going deeper in the
sentiments hierarchy. For all the tasks, we rely on two
state-of-theart deep architectures namely Inception-v3 and VggNet-19. To this
aim, the models pre-trained on the ImageNet dataset are fine-tuned
on the development dataset.</p>
      <p>In the current implementations, we rely on object-level
information only by employing the models pre-trained on ImageNet
dataset. We believe, scene-level features could also contribute to the
task. In the future, we aim to jointly utilize both object and
scenelevel information for better performance on all the tasks. Moreover,
we aim to employ merit-based fusion schemes by considering the
contribution of the individual models to the tasks.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Kashif</given-names>
            <surname>Ahmad</surname>
          </string-name>
          , Michael Riegler, Konstantin Pogorelov, Nicola Conci, Pål Halvorsen, and Francesco De Natale.
          <year>2017</year>
          .
          <article-title>Jord: a system for collecting information and monitoring natural disasters by linking social media with satellite imagery</article-title>
          .
          <source>In Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing. 1-6.</source>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Firoj</given-names>
            <surname>Alam</surname>
          </string-name>
          , Ferda Ofli, Muhammad Imran, Tanvirul Alam, and
          <string-name>
            <given-names>Umair</given-names>
            <surname>Qazi</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Deep Learning Benchmarks and Datasets for Social Media Image Classification for Disaster Response</article-title>
          .
          <source>In 2020 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)</source>
          . IEEE,
          <fpage>151</fpage>
          -
          <lpage>158</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Ghazaleh</given-names>
            <surname>Beigi</surname>
          </string-name>
          , Xia Hu,
          <string-name>
            <surname>Ross Maciejewski</surname>
          </string-name>
          , and Huan Liu.
          <year>2016</year>
          .
          <article-title>An overview of sentiment analysis in social media and its applications in disaster relief</article-title>
          .
          <source>Sentiment analysis and ontology engineering</source>
          (
          <year>2016</year>
          ),
          <fpage>313</fpage>
          -
          <lpage>340</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Jia</given-names>
            <surname>Deng</surname>
          </string-name>
          , Wei Dong, Richard Socher,
          <string-name>
            <surname>Li-Jia</surname>
            <given-names>Li</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kai</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <surname>Li</surname>
          </string-name>
          Fei-Fei.
          <year>2009</year>
          .
          <article-title>Imagenet: A large-scale hierarchical image database</article-title>
          .
          <source>In 2009 IEEE conference on computer vision and pattern recognition. Ieee</source>
          ,
          <volume>248</volume>
          -
          <fpage>255</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Syed</given-names>
            <surname>Zohaib</surname>
          </string-name>
          <string-name>
            <surname>Hassan</surname>
          </string-name>
          , Kashif Ahmad, Ala Al-Fuqaha, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Conci</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Sentiment analysis from images of natural disasters</article-title>
          .
          <source>In International Conference on Image Analysis and Processing</source>
          . Springer,
          <fpage>104</fpage>
          -
          <lpage>113</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Syed</given-names>
            <surname>Zohaib</surname>
          </string-name>
          <string-name>
            <given-names>Hassan</given-names>
            , Kashif Ahmad, Michael Riegler, Steven Hicks, Nicola Conci, Pal Halvorsen, and
            <surname>Ala</surname>
          </string-name>
          Al-Fuqaha.
          <year>2021</year>
          .
          <article-title>Visual Sentiment Analysis: A Natural Disaster Use-case Task at MediaEval 2021</article-title>
          .
          <source>In Proceedings of the MediaEval 2021 Workshop Online.</source>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Nayomi</given-names>
            <surname>Kankanamge</surname>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            <given-names>Yigitcanlar</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ashantha Goonetilleke</surname>
            , and
            <given-names>Md</given-names>
          </string-name>
          <string-name>
            <surname>Kamruzzaman</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>Determining disaster severity through social media analysis: Testing the methodology with South East Queensland Flood tweets</article-title>
          .
          <source>International journal of disaster risk reduction 42</source>
          (
          <year>2020</year>
          ),
          <fpage>101360</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Naina</given-names>
            <surname>Said</surname>
          </string-name>
          , Kashif Ahmad, Nicola Conci, and
          <string-name>
            <surname>Ala</surname>
          </string-name>
          Al-Fuqaha.
          <year>2021</year>
          .
          <article-title>Active learning for event detection in support of disaster analysis applications</article-title>
          . Signal, Image and
          <string-name>
            <given-names>Video</given-names>
            <surname>Processing</surname>
          </string-name>
          (
          <year>2021</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Naina</given-names>
            <surname>Said</surname>
          </string-name>
          , Kashif Ahmad, Michael Riegler, Konstantin Pogorelov, Laiq Hassan, Nasir Ahmad, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Conci</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Natural disasters detection in social media and satellite imagery: a survey</article-title>
          .
          <source>Multimedia Tools and Applications</source>
          <volume>78</volume>
          ,
          <issue>22</issue>
          (
          <year>2019</year>
          ),
          <fpage>31267</fpage>
          -
          <lpage>31302</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Naina</surname>
            <given-names>Said</given-names>
          </string-name>
          , Konstantin Pogorelov, Kashif Ahmad, Michael Riegler, Nasir Ahmad, Olga Ostroukhova, Pål Halvorsen, and
          <string-name>
            <given-names>Nicola</given-names>
            <surname>Conci</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Deep Learning Approaches for Flood Classification and Flood Aftermath Detection.</article-title>
          .
          <source>In MediaEval.</source>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Bolei</surname>
            <given-names>Zhou</given-names>
          </string-name>
          , Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba.
          <year>2017</year>
          .
          <article-title>Places: A 10 million image database for scene recognition</article-title>
          .
          <source>IEEE transactions on pattern analysis and machine intelligence 40</source>
          ,
          <issue>6</issue>
          (
          <year>2017</year>
          ),
          <fpage>1452</fpage>
          -
          <lpage>1464</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>