MediaEval 2020: An Ensemble-based Multimodal Approach for Coronavirus and 5G Conspiracy Tweet Detection Chahat Raj1, Mihir P Mehta2, 1Delhi Technological University, Delhi, India 2 Indian Institute of Management Raipur, Chhattisgarh, India chahatraj58@gmail.com. mihirm3795@gmail.com ABSTRACT One such misinformation that has impacted the thoughts and In the wake of ongoing COVID-19 pandemic, a parallel stream of lifestyle of people and the emergence of technology and misinformation and conspiracies rises on the internet. People revenue of several brands is 5G Corona Conspiracy. This around the world are being flooded with texts and visuals conspiracy has played its significant path to impact the minds claiming false statements linked with coronavirus disease. This of consumers by creating ambiguity about the safety of using paper presents a multi-modal fake news detection system that 5G communication technology. uses text and image features to detect conspiracy tweets. This To fight the ongoing misinformation wave amidst the research has been performed in context with the FakeNews: pandemic, our NLP subtask at MediaEval 2020 uses ensemble Coronavirus and 5G Conspiracy task of MediaEval 2020. The NLP technique with multiple ML and DL models to identify 5G subtask we have performed utilizes an ensemble of machine related coronavirus conspiracies prevalent on Twitter. Detailed learning and deep learning algorithms for the analysis of textual- overview of the task and dataset has been described by visual data. We demonstrate the performances of experiments Pogorelov et al. [1], [2]. performed for each modality and results obtained after their fusion. 2 APPROACH 1 INTRODUCTION We adopt an ensembling approach incorporating several machine learning and deep learning-based text and image classifiers. We Scientists, Economics, Mathematicians, Analysts and many other divide our approach into three routines: text-based classification, professionals have made their claim by formulating theories on image-based classification and fusion of text and image models. origination and spread of the Coronavirus Disease 2019 (COVID- The proposed architecture uses a combination of features 19). Research and Investment are made both on cure and tracing obtained from multiple classifiers. We experimented with the cause of the origination of pandemic. And along with the rising several text classifiers on the development dataset and decided number of these theories, the spread of misinformation related to to use a fixed subset of them based on the results obtained on COVID-19, termed as ‘Infodemic’ has been on the rise too, a lot of each one of them separately. We have used Support Vector times from internet users, public figures and potentially trusted Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbour (KNN), sources. Messages and media carrying such misinformation are LSTM (Long-Short Term Memory) and Bi-LSTM (Bidirectional- spread both intentionally and unintentionally. Several times, they LSTM) for the NLP classification task. Each tweet undergoes have been linked with existing theories that make them sound preprocessing steps before being passed to these classifiers. true despite not involving either substantial proof or logic. People These include URL removal, punctation removal, lowercasing, also get amused by the superficial texts and images carried by the tokenization, stopword removal, stemming/lemmatization and misinformation and tend not to verify the credibility that it padding. We incorporate LSTM and Bi-LSTM with series of carries. Moreover, they pass it further to their friends and families Dense layers and setting Dropout value to 0.5. RMSprop whom they are trusted by and ultimately the misinformation optimizer has been used while training LSTM and Bi-LSTM manages to convince a large group of audience that is connected models for 15 epochs each with a batch size equal to 64. For via this network and thereby impacting the habit and lifestyle of text-based approach, classification results obtained from SVM, the people that accept it. These changes can have an adverse effect NB, KNN, LSTM and Bi-LSTM are used for majority voting to or tend to be of no use and consume time and other material obtain final predictions. resources of people. Hence it becomes necessary to identify, For visual classification, we filtered tweets containing evaluate and share the authenticity of every information, images and obtained 171 images with the label 5G Coronavirus especially those involving conspiracy claims. Conspiracy, 118 belonging to Other Conspiracy class and the rest 791 were Non-Conspiracy tweet images. The test set consisted of 617 images. We fine-tune and use three deep Copyright 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). learning models namely, VGG16 [3], Xception [4] and MediaEval’20, December 14-15 2020, Online InceptionV3 [5] for classifying images and use their results for majority voting to make final predictions. MediaEval’20, December 14-15 2020, Online C. Raj, M. Mehta 3 RESULTS AND ANALYSIS These models have been pre-trained previously and we fine-tune them setting the dropout value to 0.5 and an added The MediaEval 2020 FakeNews: Coronavirus and 5G Conspiracy batch normalization layer after the dropout layer. We have NLP subtask requires classification of tweets related to used sigmoid activation in the last layer for binary predictions coronavirus and 5G conspiracy from other conspiracy and non- and softmax for multiclass predictions. We have used Adam conspiracy tweets. Table 2 and Table 3 show the classification optimizer for all visual classification models and trained them results on the development set and test set respectively. We for 15 epochs each setting batch size to 64. perform five runs on the given task which include three-class classification and coarse two-class classification wherein non- conspiracy and other conspiracy tweets are combined into a single class. Our first run performs ternary classification using text classifiers only. The second run combines text and image modality classification results and return results based on both combined. The third, fourth and fifth runs are coarse two-class classifiers performing text-based, image-based and classification based on text and image features combined, respectively. Table 3: Testing Phase Results Figure 1: Ensemble Model Architecture Runs Modality Classes MCC Score For multi-modal classification, we ensemble all text and Run 1 Text 3- class 0.3408 image-based classifiers utilized and employ max-voting for Run 2 Text + Image 3-class 0.0674 final classification. Figure 1 demonstrates the ensembling architecture. The class with the highest number of votes is Run 3 Text Binary 0.4179 selected as the predicted class for each tweet. Development on Run 4 Image Binary 0.0644 all runs has been performed by splitting the dataset into 7:3 Run 5 Text + Image Binary 0.0232 ratio for training and validation. We provide the details of models used and results obtained on validation in Table 1 and Observing the trend of results obtained in development and Table 2. training phases, we observe that binary classifier performed Table 1: Experimental Model Details better than three-class classifier. Our binary text classifier achieved third highest score (0.4179) in the challenge. This Runs Modality Models demonstrates that our model finds it easier to distinguish 5G coronavirus conspiracies from all other conspiracies and real Run 1 Text SVM, NB, KNN, LSTM, Bi-LSTM tweets. Ternary text-based classification achieved a score of 0.3408. Image-based detection quality can be further improved (SVM, NB, KNN, LSTM, Bi-LSTM), (VGG-16, significantly. Low scores of models using image modality owe Run 2 Text + Image Xception, InceptionV3) to the small size of visual data. Proposed method with larger Run 3 Text SVM, NB, KNN, LSTM, Bi-LSTM dataset would perform eminently. We suggest the use of data augmentation techniques for better performance. Run 4 Image VGG-16, Xception, InceptionV3 4 DISCUSSION (SVM, NB, KNN, LSTM, Bi-LSTM), (VGG-16, Run 5 Text + Image In this paper, we employ machine learning and deep learning- Xception, InceptionV3) based ensembling technique that uses majority voting to deduce predictions if a tweet is related to 5G Coronavirus conspiracy or Table 2: Development Phase Results not. We perform a multimodal analysis utilizing text-based NLP features from the tweet and visual features from the images posted Runs Class Acc P R F1 ROC along with those tweets. We build a fusion model that incorporates both textual and visual features and generates prediction based on Run 1 Ternary 0.6965 0.5333 0.5111 0.5220 0.6978 each modality separately and their combination. Our classification Run 2 Ternary 0.6157 0.4043 0.2568 0.3140 0.5394 approach plays with both binary and ternary classifiers to experiment with the efficiency of the ensemble models. The Run 3 Binary 0.8357 0.3797 0.6390 0.4764 0.8190 limitation we encounter is the lack of sufficient training data and Run 4 Binary 0.7824 0.1892 0.2917 0.2295 0.5471 propose to fix it in future works using data augmentation Run 5 Binary 0.7639 0.1351 0.3846 0.2000 0.5574 techniques on both text and image data to receive better performance and healthier conspiracy detection. FakeNews: Corona virus and 5G conspiracy C. Raj, M. Mehta REFERENCES [1] Pogorelov, Konstantin, Daniel Thilo Schroeder, Luk Burchard, Johannes Moe, Stefan Brenner, Petra Filkukova and Johannes Langguth. "FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020." Proc. of the MediaEval 2020 Workshop, Online, 14-15 December 2020. [2] Schroeder, Daniel Thilo, Konstantin Pogorelov, and Johannes Langguth. "FACT: a Framework for Analysis and Capture of Twitter Graphs." In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS), pp. 134-141. IEEE, 2019. [3] Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conference on Learning Representations [4] Chollet, F. (2017). Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1251-1258). [5] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... & Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1-9).