=Paper=
{{Paper
|id=Vol-3181/paper28
|storemode=property
|title=Disaster based Visual Sentiment Analysis using Deep Learning
|pdfUrl=https://ceur-ws.org/Vol-3181/paper28.pdf
|volume=Vol-3181
|authors=Mohsin Ali,Muhammad Hanif,Muhammad Atif Tahir,Muhammad Nouman
					Durrani,Muhammad Rafi
|dblpUrl=https://dblp.org/rec/conf/mediaeval/AliHTDR21
}}
==Disaster based Visual Sentiment Analysis using Deep Learning==
<pdf width="1500px">https://ceur-ws.org/Vol-3181/paper28.pdf</pdf>
<pre>
    Disaster based Visual Sentiment Analysis using Deep Learning
    Mohsin Ali, Muhammad Hanif, Muhammad Atif Tahir, Muhammad Nouman Durrani, Muhammad
                                           Rafi
                            National University of Computer and Emerging Sciences, Karachi Campus, Pakistan
                           {mohsin.ali,hanif.soomro,atif.tahir,muhammad.nouman,muhammad.rafi}@nu.edu.pk

ABSTRACT                                                                                 The research effort has focused on disaster-related images from
In case of a disaster, a large number of relevant and irrelevant images              social media and also implemented various deep learning-based
are propagated through social networks. In this case, the sentiment                  methods [3]. Researchers have used crowd-sourcing to annotate
identification of such disasters is important to speed up relief work                images in multi-label classes, where one image may be a part of one
in the affected region. This paper describes the contribution of FAST-               or more classes based on visual sentiments. Moreover, researchers
NU-DS team for the Visual Sentiment Analysis: A Natural Disaster                     have also implemented various deep learning based models, which
MediaEval Use-case [4] held at MediaEval 2021. Various pre-trained                   were pre-trained on datasets of ImageNet and Places [8] [3].
deep learning-based models for the single-label classification and                       Another research effort has proposed a framework that considers
multi-label classification tasks have been used for feature extraction               both text and image-based visual sentiment analysis [1]. The frame-
and classification. Data augmentation techniques to over-sample                      work has analyzed geo-tagged data objects from disaster-related
minority classes were used to deal with the inherent imbalance                       social media images. The framework is partitioned into sentiment
nature of the dataset. For a single-label and multi-label classification             analysis, geo-sentiment modelling, and spatial-temporal partition-
of tasks, VGG16 proved to be more useful than ResNet50. In this                      ing. Moreover, the research has extracted data from Twitter and
work, we achieved a 0.65 weighted F1 score for the first single label                Flicker, which is related to Napa Earthquake and Hurricane Sandy
classification subtask and 0.54 and 0.41 weighted-F1 scores for the                  [1].
second and third multi-label subtasks, respectively.                                     Similarly, a class-specific residual attention module (CSRA) has
                                                                                     been proposed, which has an extremely simple and efficient model,
                                                                                     It requires fewer resources for training and achieved the state of
1    INTRODUCTION                                                                    the art results for various datasets of multi-label classification of
                                                                                     images[9]. Recently, ensemble-based approaches, such as bagging,
Disaster creates a difficult situation to handle, which may harm
                                                                                     boosting and stacking have been discussed by the researchers in
valuable resources and loss of human lives. Government, NGOs and
                                                                                     image classification [7]
the public use different social networks to propagate relevant and
irrelevant information about the natural calamity in the form of
images, videos, and posts to aware others. In literature, researchers                3     PROPOSED APPROACH
have more focused on text-based sentiment analysis using NLP.                        The dataset for the task of "Visual Sentiment Analysis A Natural
However, images and videos sentiment analysis using ML models                        Disaster Use-case" at MediaEval, 2021, contains 2432 images[4].
is an open research problem and need attention to identify senti-                    The dataset has been used for three different tasks. In the first
ments communicated through images. These images may reveal                           task, single-label classification is performed among three classes:
emotional responses. Hence, careful identification of disaster the                   positive, negative, and neutral. While second and third subtasks
sentiment is important to stop, aware and control any miss lead.                     involve multi-label classification, which contains 7 and 10 classes,
For example, this image-based sentiment analysis may be used for                     respectively.
rapid identification of situational awareness during disaster and
assistance in restoration activities. In addition, the categorization of             3.1    Proposed approach for Subtask 1
such images may further be used to understand the adversity of the                   The proposed method for the first subtask of single label classifi-
situation. In this work, we have worked on the "Visual Sentiment                     cation has been designed by performing different experiments to
Analysis: A Natural Disaster Use-case at MediaEval 2021". The re-                    select image augmentation technique and appropriate deep learning
search has performed single-label and multi-label classification to                  model.
identify visual sentiments that occurred during disasters [4].                          As the dataset used for the first subtask is imbalanced. The neg-
                                                                                     ative class contains 1695 images, and the positive class includes
2    RELATED WORK                                                                    648 images. However, the neutral class includes 89 images, which
Visual Sentiment Analysis: A Natural Disaster Use-case task of                       is significantly less than the positive and negative classes. Three
MediaEval, 2021, involves multi-class and multi-label classifica-                    different methods are utilized to manage the challenge of class
tion tasks. Various similar studies have been performed which has                    imbalance, including weight assignment to classes, oversampling,
focused on visual sentiment analysis.                                                and image augmentation. In the first attempt, different weights are
                                                                                     allocated to all three classes, so that they can be balanced.
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons         • Negative: 0.47
License Attribution 4.0 International (CC BY 4.0).
MediaEval’21, December 13-15 2021, Online                                                  • Positive: 1.26
                                                                                           • Neutral: 9.2
MediaEval’21, December 13-15 2021, Online                                                                                               Ali et al.


Another class balancing effort has been performed by increasing              After balancing the classes, the VGG16 pre-trained on ImageNet
the number of images by simply using oversampling. The over-              is fine-tuned on the dataset for subtask 2 and subtask 3. More-
sampling technique increases copies of instances of minority class        over, the sigmoid activation function is used to predict multi-label
and makes them equal to majority class. The method has proved             classification.
better in comparison to the weight balancing technique. In last, the
class imbalance has been reduced by using the data augmentation           4   RESULTS AND ANALYSIS
technique, in which variants of a single image are created by us-         At the initial stage, VGG16 and ResNet50 have been experimented
ing different augmentation techniques. Moreover, augmentation             for subtask 1. The implementation has been performed on three dif-
techniques including random shift, random flip, random brightness,        ferent data balancing techniques: weight balancing, oversampling,
and random zoom are applied to increase the number of images in           and image augmentation. The weighted F1 scores on training data
minority classes and make them equal in quantity. The data aug-           for the first subtask are shown in Table 1.
mentation technique has provided the best results in tackling class
imbalance problems compared to oversampling and class weight
assignment techniques.                                                           Table 1: F1-Score on training data of subtask 1
   There are two deep learning based pre-trained models are se-
lected for the experiments, including Visual Geometry Group (VGG)          Balancing Technique      F1 Score(ResNet50)     F1 Score(VGG16)
[6] and ResNet50 [5]. Both of the networks are used by pre-training
                                                                             Weighted Class                60.15           61.93
on the ImageNet [2] dataset. The experiments on the training set
                                                                             Over Sampling                 63.17           67.33
revealed that VGG16 had produced a higher F1-score than ResNet50.
                                                                           Image Augmentation              66.85           67.49
Hence VGG16 has been selected to be implemented to predict un-
seen test instances. Moreover, the ImageNet dataset carries weights,
which focuses on objects, while the visual sentiment analyzer re-
quires scenario-based information. Due to this, the last six layers of       The training experiments have proved VGG16 as a better pre-
the model are unfrozen so that retraining can be performed using          trained model and image augmentation as the best class balancing
the visual sentiment analysis dataset. Also, remaining of the layers      technique. By considering better performance, the VGG16 has been
are frozen to avoid their retraining. During training, various hy-        trained on the whole dataset and used to predict test data. The image
perparameters have been experimented, and the best combination            augmentation technique is used for class balancing in subtask 1.
is applied for the training of the model. The learning rate for the       However, for subtask 2 and subtask 3, the oversampling technique
model is set as 10−4 , and the softmax activation function has been       increases the number of minor classes. The results for test-set are
used for the processing. The quantity of epochs is set automatically      visualized in Table 2.
by applying early-stopping based on the best F1 score.
   To improve the efficiency of experiments, all the experiments          Table 2: Results achieved by proposed approach on Test-set
are initially performed by converting images into grayscale, which
has reduced the processing time for the method. After selecting
                                                                                  Task     Model     Balancing Technique      F1 Score
optimal values for hyperparameters, coloured images are used to
further improve the method’s performance. The trained model is                   Task 1   VGG-16     Image Augmentation         65.37
then applied for the prediction of test set instances, which are 1199            Task 2   VGG-16        Oversampling            54.24
images.                                                                          Task 3   VGG-16        Oversampling            41.74


3.2    Proposed approach for Subtask 2 and 3                              5   CONCLUSION
Subtask 2 and Subtask 3 are aimed to classify images based on             The research has proposed a deep learning-based model for single-
multi-label image classification, where one image can be assigned         label and multi-label classification tasks to analyze visual senti-
to various classes, according to its depicted emotions. According to      ments during disastrous conditions. The approach has tried various
subtask 2, the image may belong to one or more classes, including         class balancing techniques and pre-trained models. Furthermore,
anger, disgust, joy, fear, neutral, surprise, and sadness. However, for   the research can be extended by using weights from the Places
subtask 3, the image may acquire one or more classes, and it is also      dataset [8], which involves scene-level information and may pro-
a multi-label classification task. The difference between subtask 2       duce better performance. Moreover, for multi-label classification,
and subtask 3 is the number of classes containing 7 and 10 classes,       the image augmentation technique may be used to over-sample the
respectively.                                                             minority classes.
   The dataset is imbalanced for subtask 2 and subtask 3, and few
classes contain more images than other classes. The oversampling          ACKNOWLEDGMENTS
technique is utilized to increase the number of images in minority        This work was supported in part by the Smart Video Surveillance
classes to reduce the class imbalance.                                    Lab, an affiliated Laboratory of (NCBC), FAST-National University
                                                                          of Computer Emerging Science.
Visual Sentiment Analysis: A Natural Disaster Use-case                          MediaEval’21, December 13-15 2021, Online


REFERENCES
 [1] Abdullah Alfarrarjeh, Sumeet Agrawal, Seon Ho Kim, and Cyrus Sha-
     habi. 2017. Geo-spatial multimedia sentiment analysis in disasters.
     In 2017 IEEE International Conference on Data Science and Advanced
     Analytics (DSAA). IEEE, 193–202.
 [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei.
     2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE
     conference on computer vision and pattern recognition. Ieee, 248–255.
 [3] Syed Zohaib Hassan, Kashif Ahmad, Steven Hicks, Pål Halvorsen,
     Ala Al-Fuqaha, Nicola Conci, and Michael Riegler. 2020. Visual sen-
     timent analysis from disaster images in social media. arXiv preprint
     arXiv:2009.03051 (2020).
 [4] Syed Zohaib Hassan, Kashif Ahmad, Michael Riegler, Steven Hicks,
     Nicola Conci, Pål Halvorsen, and Ala Al-Fuqaha. 2021. Visual Senti-
     ment Analysis: A Natural Disaster Use-case Task at MediaEval 2021.
     In Proceedings of the MediaEval 2021 Workshop, Online.
 [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep
     residual learning for image recognition. In Proceedings of the IEEE
     conference on computer vision and pattern recognition. 770–778.
 [6] Karen Simonyan and Andrew Zisserman. 2014. Very deep convo-
     lutional networks for large-scale image recognition. arXiv preprint
     arXiv:1409.1556 (2014).
 [7] Muhammd Waqas, Muhammad Atif Tahir, and Rizwan Qureshi. 2021.
     Ensemble-Based Instance Relevance Estimation in Multiple-Instance
     Learning. In 2021 9th European Workshop on Visual Information Pro-
     cessing (EUVIP). IEEE, 1–6.
 [8] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio
     Torralba. 2017. Places: A 10 million image database for scene recogni-
     tion. IEEE transactions on pattern analysis and machine intelligence 40,
     6 (2017), 1452–1464.
 [9] Ke Zhu and Jianxin Wu. 2021. Residual Attention: A Simple but
     Effective Method for Multi-Label Recognition. In Proceedings of the
     IEEE/CVF International Conference on Computer Vision. 184–193.

</pre>