=Paper=
{{Paper
|id=Vol-3181/paper28
|storemode=property
|title=Disaster based Visual Sentiment Analysis using Deep Learning
|pdfUrl=https://ceur-ws.org/Vol-3181/paper28.pdf
|volume=Vol-3181
|authors=Mohsin Ali,Muhammad Hanif,Muhammad Atif Tahir,Muhammad Nouman
Durrani,Muhammad Rafi
|dblpUrl=https://dblp.org/rec/conf/mediaeval/AliHTDR21
}}
==Disaster based Visual Sentiment Analysis using Deep Learning==
Disaster based Visual Sentiment Analysis using Deep Learning Mohsin Ali, Muhammad Hanif, Muhammad Atif Tahir, Muhammad Nouman Durrani, Muhammad Rafi National University of Computer and Emerging Sciences, Karachi Campus, Pakistan {mohsin.ali,hanif.soomro,atif.tahir,muhammad.nouman,muhammad.rafi}@nu.edu.pk ABSTRACT The research effort has focused on disaster-related images from In case of a disaster, a large number of relevant and irrelevant images social media and also implemented various deep learning-based are propagated through social networks. In this case, the sentiment methods [3]. Researchers have used crowd-sourcing to annotate identification of such disasters is important to speed up relief work images in multi-label classes, where one image may be a part of one in the affected region. This paper describes the contribution of FAST- or more classes based on visual sentiments. Moreover, researchers NU-DS team for the Visual Sentiment Analysis: A Natural Disaster have also implemented various deep learning based models, which MediaEval Use-case [4] held at MediaEval 2021. Various pre-trained were pre-trained on datasets of ImageNet and Places [8] [3]. deep learning-based models for the single-label classification and Another research effort has proposed a framework that considers multi-label classification tasks have been used for feature extraction both text and image-based visual sentiment analysis [1]. The frame- and classification. Data augmentation techniques to over-sample work has analyzed geo-tagged data objects from disaster-related minority classes were used to deal with the inherent imbalance social media images. The framework is partitioned into sentiment nature of the dataset. For a single-label and multi-label classification analysis, geo-sentiment modelling, and spatial-temporal partition- of tasks, VGG16 proved to be more useful than ResNet50. In this ing. Moreover, the research has extracted data from Twitter and work, we achieved a 0.65 weighted F1 score for the first single label Flicker, which is related to Napa Earthquake and Hurricane Sandy classification subtask and 0.54 and 0.41 weighted-F1 scores for the [1]. second and third multi-label subtasks, respectively. Similarly, a class-specific residual attention module (CSRA) has been proposed, which has an extremely simple and efficient model, It requires fewer resources for training and achieved the state of 1 INTRODUCTION the art results for various datasets of multi-label classification of images[9]. Recently, ensemble-based approaches, such as bagging, Disaster creates a difficult situation to handle, which may harm boosting and stacking have been discussed by the researchers in valuable resources and loss of human lives. Government, NGOs and image classification [7] the public use different social networks to propagate relevant and irrelevant information about the natural calamity in the form of images, videos, and posts to aware others. In literature, researchers 3 PROPOSED APPROACH have more focused on text-based sentiment analysis using NLP. The dataset for the task of "Visual Sentiment Analysis A Natural However, images and videos sentiment analysis using ML models Disaster Use-case" at MediaEval, 2021, contains 2432 images[4]. is an open research problem and need attention to identify senti- The dataset has been used for three different tasks. In the first ments communicated through images. These images may reveal task, single-label classification is performed among three classes: emotional responses. Hence, careful identification of disaster the positive, negative, and neutral. While second and third subtasks sentiment is important to stop, aware and control any miss lead. involve multi-label classification, which contains 7 and 10 classes, For example, this image-based sentiment analysis may be used for respectively. rapid identification of situational awareness during disaster and assistance in restoration activities. In addition, the categorization of 3.1 Proposed approach for Subtask 1 such images may further be used to understand the adversity of the The proposed method for the first subtask of single label classifi- situation. In this work, we have worked on the "Visual Sentiment cation has been designed by performing different experiments to Analysis: A Natural Disaster Use-case at MediaEval 2021". The re- select image augmentation technique and appropriate deep learning search has performed single-label and multi-label classification to model. identify visual sentiments that occurred during disasters [4]. As the dataset used for the first subtask is imbalanced. The neg- ative class contains 1695 images, and the positive class includes 2 RELATED WORK 648 images. However, the neutral class includes 89 images, which Visual Sentiment Analysis: A Natural Disaster Use-case task of is significantly less than the positive and negative classes. Three MediaEval, 2021, involves multi-class and multi-label classifica- different methods are utilized to manage the challenge of class tion tasks. Various similar studies have been performed which has imbalance, including weight assignment to classes, oversampling, focused on visual sentiment analysis. and image augmentation. In the first attempt, different weights are allocated to all three classes, so that they can be balanced. Copyright 2021 for this paper by its authors. Use permitted under Creative Commons • Negative: 0.47 License Attribution 4.0 International (CC BY 4.0). MediaEval’21, December 13-15 2021, Online • Positive: 1.26 • Neutral: 9.2 MediaEval’21, December 13-15 2021, Online Ali et al. Another class balancing effort has been performed by increasing After balancing the classes, the VGG16 pre-trained on ImageNet the number of images by simply using oversampling. The over- is fine-tuned on the dataset for subtask 2 and subtask 3. More- sampling technique increases copies of instances of minority class over, the sigmoid activation function is used to predict multi-label and makes them equal to majority class. The method has proved classification. better in comparison to the weight balancing technique. In last, the class imbalance has been reduced by using the data augmentation 4 RESULTS AND ANALYSIS technique, in which variants of a single image are created by us- At the initial stage, VGG16 and ResNet50 have been experimented ing different augmentation techniques. Moreover, augmentation for subtask 1. The implementation has been performed on three dif- techniques including random shift, random flip, random brightness, ferent data balancing techniques: weight balancing, oversampling, and random zoom are applied to increase the number of images in and image augmentation. The weighted F1 scores on training data minority classes and make them equal in quantity. The data aug- for the first subtask are shown in Table 1. mentation technique has provided the best results in tackling class imbalance problems compared to oversampling and class weight assignment techniques. Table 1: F1-Score on training data of subtask 1 There are two deep learning based pre-trained models are se- lected for the experiments, including Visual Geometry Group (VGG) Balancing Technique F1 Score(ResNet50) F1 Score(VGG16) [6] and ResNet50 [5]. Both of the networks are used by pre-training Weighted Class 60.15 61.93 on the ImageNet [2] dataset. The experiments on the training set Over Sampling 63.17 67.33 revealed that VGG16 had produced a higher F1-score than ResNet50. Image Augmentation 66.85 67.49 Hence VGG16 has been selected to be implemented to predict un- seen test instances. Moreover, the ImageNet dataset carries weights, which focuses on objects, while the visual sentiment analyzer re- quires scenario-based information. Due to this, the last six layers of The training experiments have proved VGG16 as a better pre- the model are unfrozen so that retraining can be performed using trained model and image augmentation as the best class balancing the visual sentiment analysis dataset. Also, remaining of the layers technique. By considering better performance, the VGG16 has been are frozen to avoid their retraining. During training, various hy- trained on the whole dataset and used to predict test data. The image perparameters have been experimented, and the best combination augmentation technique is used for class balancing in subtask 1. is applied for the training of the model. The learning rate for the However, for subtask 2 and subtask 3, the oversampling technique model is set as 10−4 , and the softmax activation function has been increases the number of minor classes. The results for test-set are used for the processing. The quantity of epochs is set automatically visualized in Table 2. by applying early-stopping based on the best F1 score. To improve the efficiency of experiments, all the experiments Table 2: Results achieved by proposed approach on Test-set are initially performed by converting images into grayscale, which has reduced the processing time for the method. After selecting Task Model Balancing Technique F1 Score optimal values for hyperparameters, coloured images are used to further improve the method’s performance. The trained model is Task 1 VGG-16 Image Augmentation 65.37 then applied for the prediction of test set instances, which are 1199 Task 2 VGG-16 Oversampling 54.24 images. Task 3 VGG-16 Oversampling 41.74 3.2 Proposed approach for Subtask 2 and 3 5 CONCLUSION Subtask 2 and Subtask 3 are aimed to classify images based on The research has proposed a deep learning-based model for single- multi-label image classification, where one image can be assigned label and multi-label classification tasks to analyze visual senti- to various classes, according to its depicted emotions. According to ments during disastrous conditions. The approach has tried various subtask 2, the image may belong to one or more classes, including class balancing techniques and pre-trained models. Furthermore, anger, disgust, joy, fear, neutral, surprise, and sadness. However, for the research can be extended by using weights from the Places subtask 3, the image may acquire one or more classes, and it is also dataset [8], which involves scene-level information and may pro- a multi-label classification task. The difference between subtask 2 duce better performance. Moreover, for multi-label classification, and subtask 3 is the number of classes containing 7 and 10 classes, the image augmentation technique may be used to over-sample the respectively. minority classes. The dataset is imbalanced for subtask 2 and subtask 3, and few classes contain more images than other classes. The oversampling ACKNOWLEDGMENTS technique is utilized to increase the number of images in minority This work was supported in part by the Smart Video Surveillance classes to reduce the class imbalance. Lab, an affiliated Laboratory of (NCBC), FAST-National University of Computer Emerging Science. Visual Sentiment Analysis: A Natural Disaster Use-case MediaEval’21, December 13-15 2021, Online REFERENCES [1] Abdullah Alfarrarjeh, Sumeet Agrawal, Seon Ho Kim, and Cyrus Sha- habi. 2017. Geo-spatial multimedia sentiment analysis in disasters. In 2017 IEEE International Conference on Data Science and Advanced Analytics (DSAA). IEEE, 193–202. [2] Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In 2009 IEEE conference on computer vision and pattern recognition. Ieee, 248–255. [3] Syed Zohaib Hassan, Kashif Ahmad, Steven Hicks, Pål Halvorsen, Ala Al-Fuqaha, Nicola Conci, and Michael Riegler. 2020. Visual sen- timent analysis from disaster images in social media. arXiv preprint arXiv:2009.03051 (2020). [4] Syed Zohaib Hassan, Kashif Ahmad, Michael Riegler, Steven Hicks, Nicola Conci, Pål Halvorsen, and Ala Al-Fuqaha. 2021. Visual Senti- ment Analysis: A Natural Disaster Use-case Task at MediaEval 2021. In Proceedings of the MediaEval 2021 Workshop, Online. [5] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition. 770–778. [6] Karen Simonyan and Andrew Zisserman. 2014. Very deep convo- lutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014). [7] Muhammd Waqas, Muhammad Atif Tahir, and Rizwan Qureshi. 2021. Ensemble-Based Instance Relevance Estimation in Multiple-Instance Learning. In 2021 9th European Workshop on Visual Information Pro- cessing (EUVIP). IEEE, 1–6. [8] Bolei Zhou, Agata Lapedriza, Aditya Khosla, Aude Oliva, and Antonio Torralba. 2017. Places: A 10 million image database for scene recogni- tion. IEEE transactions on pattern analysis and machine intelligence 40, 6 (2017), 1452–1464. [9] Ke Zhu and Jianxin Wu. 2021. Residual Attention: A Simple but Effective Method for Multi-Label Recognition. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 184–193.