MediaEval 2020: An Ensemble-based Multimodal
                       Approach for Coronavirus and 5G Conspiracy Tweet
                                          Detection
                                                             Chahat Raj1, Mihir P Mehta2,
                                                      1Delhi Technological University, Delhi, India
                                          2
                                            Indian Institute of Management Raipur, Chhattisgarh, India
                                                 chahatraj58@gmail.com. mihirm3795@gmail.com

ABSTRACT
                                                                                         One such misinformation that has impacted the thoughts and
In the wake of ongoing COVID-19 pandemic, a parallel stream of                       lifestyle of people and the emergence of technology and
misinformation and conspiracies rises on the internet. People                        revenue of several brands is 5G Corona Conspiracy. This
around the world are being flooded with texts and visuals                            conspiracy has played its significant path to impact the minds
claiming false statements linked with coronavirus disease. This                      of consumers by creating ambiguity about the safety of using
paper presents a multi-modal fake news detection system that                         5G communication technology.
uses text and image features to detect conspiracy tweets. This                           To fight the ongoing misinformation wave amidst the
research has been performed in context with the FakeNews:                            pandemic, our NLP subtask at MediaEval 2020 uses ensemble
Coronavirus and 5G Conspiracy task of MediaEval 2020. The NLP                        technique with multiple ML and DL models to identify 5G
subtask we have performed utilizes an ensemble of machine                            related coronavirus conspiracies prevalent on Twitter. Detailed
learning and deep learning algorithms for the analysis of textual-                   overview of the task and dataset has been described by
visual data. We demonstrate the performances of experiments                          Pogorelov et al. [1], [2].
performed for each modality and results obtained after their
fusion.                                                                              2 APPROACH
1 INTRODUCTION                                                                       We adopt an ensembling approach incorporating several machine
                                                                                     learning and deep learning-based text and image classifiers. We
Scientists, Economics, Mathematicians, Analysts and many other
                                                                                     divide our approach into three routines: text-based classification,
professionals have made their claim by formulating theories on
                                                                                     image-based classification and fusion of text and image models.
origination and spread of the Coronavirus Disease 2019 (COVID-
                                                                                        The proposed architecture uses a combination of features
19). Research and Investment are made both on cure and tracing
                                                                                     obtained from multiple classifiers. We experimented with
the cause of the origination of pandemic. And along with the rising
                                                                                     several text classifiers on the development dataset and decided
number of these theories, the spread of misinformation related to
                                                                                     to use a fixed subset of them based on the results obtained on
COVID-19, termed as ‘Infodemic’ has been on the rise too, a lot of
                                                                                     each one of them separately. We have used Support Vector
times from internet users, public figures and potentially trusted
                                                                                     Machine (SVM), Naïve Bayes (NB), K-Nearest Neighbour (KNN),
sources. Messages and media carrying such misinformation are
                                                                                     LSTM (Long-Short Term Memory) and Bi-LSTM (Bidirectional-
spread both intentionally and unintentionally. Several times, they
                                                                                     LSTM) for the NLP classification task. Each tweet undergoes
have been linked with existing theories that make them sound
                                                                                     preprocessing steps before being passed to these classifiers.
true despite not involving either substantial proof or logic. People
                                                                                     These include URL removal, punctation removal, lowercasing,
also get amused by the superficial texts and images carried by the
                                                                                     tokenization, stopword removal, stemming/lemmatization and
misinformation and tend not to verify the credibility that it
                                                                                     padding. We incorporate LSTM and Bi-LSTM with series of
carries. Moreover, they pass it further to their friends and families
                                                                                     Dense layers and setting Dropout value to 0.5. RMSprop
whom they are trusted by and ultimately the misinformation
                                                                                     optimizer has been used while training LSTM and Bi-LSTM
manages to convince a large group of audience that is connected
                                                                                     models for 15 epochs each with a batch size equal to 64. For
via this network and thereby impacting the habit and lifestyle of
                                                                                     text-based approach, classification results obtained from SVM,
the people that accept it. These changes can have an adverse effect
                                                                                     NB, KNN, LSTM and Bi-LSTM are used for majority voting to
or tend to be of no use and consume time and other material
                                                                                     obtain final predictions.
resources of people. Hence it becomes necessary to identify,
                                                                                        For visual classification, we filtered tweets containing
evaluate and share the authenticity of every information,
                                                                                     images and obtained 171 images with the label 5G Coronavirus
especially those involving conspiracy claims.
                                                                                     Conspiracy, 118 belonging to Other Conspiracy class and the
                                                                                     rest 791 were Non-Conspiracy tweet images. The test set
                                                                                     consisted of 617 images. We fine-tune and use three deep
Copyright 2020 for this paper by its authors. Use permitted under Creative Commons
License Attribution 4.0 International (CC BY 4.0).                                   learning models namely, VGG16 [3], Xception [4] and
MediaEval’20, December 14-15 2020, Online                                            InceptionV3 [5] for classifying images and use their results for
                                                                                     majority voting to make final predictions.
MediaEval’20, December 14-15 2020, Online                                                                             C. Raj, M. Mehta

                                                                    3 RESULTS AND ANALYSIS
   These models have been pre-trained previously and we
fine-tune them setting the dropout value to 0.5 and an added        The MediaEval 2020 FakeNews: Coronavirus and 5G Conspiracy
batch normalization layer after the dropout layer. We have          NLP subtask requires classification of tweets related to
used sigmoid activation in the last layer for binary predictions    coronavirus and 5G conspiracy from other conspiracy and non-
and softmax for multiclass predictions. We have used Adam           conspiracy tweets. Table 2 and Table 3 show the classification
optimizer for all visual classification models and trained them     results on the development set and test set respectively. We
for 15 epochs each setting batch size to 64.                        perform five runs on the given task which include three-class
                                                                    classification and coarse two-class classification wherein non-
                                                                    conspiracy and other conspiracy tweets are combined into a single
                                                                    class. Our first run performs ternary classification using text
                                                                    classifiers only. The second run combines text and image modality
                                                                    classification results and return results based on both combined.
                                                                    The third, fourth and fifth runs are coarse two-class classifiers
                                                                    performing text-based, image-based and classification based on
                                                                    text and image features combined, respectively.
                                                                                     Table 3: Testing Phase Results

            Figure 1: Ensemble Model Architecture
                                                                           Runs          Modality        Classes      MCC Score

   For multi-modal classification, we ensemble all text and                Run 1           Text          3- class        0.3408
image-based classifiers utilized and employ max-voting for                 Run 2       Text + Image      3-class         0.0674
final classification. Figure 1 demonstrates the ensembling
architecture. The class with the highest number of votes is                Run 3           Text          Binary          0.4179
selected as the predicted class for each tweet. Development on             Run 4          Image          Binary          0.0644
all runs has been performed by splitting the dataset into 7:3              Run 5       Text + Image      Binary          0.0232
ratio for training and validation. We provide the details of
models used and results obtained on validation in Table 1 and
                                                                       Observing the trend of results obtained in development and
Table 2.
                                                                    training phases, we observe that binary classifier performed
                Table 1: Experimental Model Details                 better than three-class classifier. Our binary text classifier
                                                                    achieved third highest score (0.4179) in the challenge. This
Runs     Modality         Models
                                                                    demonstrates that our model finds it easier to distinguish 5G
                                                                    coronavirus conspiracies from all other conspiracies and real
Run 1    Text             SVM, NB, KNN, LSTM, Bi-LSTM               tweets. Ternary text-based classification achieved a score of
                                                                    0.3408. Image-based detection quality can be further improved
                          (SVM, NB, KNN, LSTM, Bi-LSTM), (VGG-16,   significantly. Low scores of models using image modality owe
Run 2    Text + Image
                          Xception, InceptionV3)                    to the small size of visual data. Proposed method with larger
Run 3    Text             SVM, NB, KNN, LSTM, Bi-LSTM               dataset would perform eminently. We suggest the use of data
                                                                    augmentation techniques for better performance.
Run 4    Image            VGG-16, Xception, InceptionV3
                                                                    4 DISCUSSION
                          (SVM, NB, KNN, LSTM, Bi-LSTM), (VGG-16,
Run 5    Text + Image                                               In this paper, we employ machine learning and deep learning-
                          Xception, InceptionV3)
                                                                    based ensembling technique that uses majority voting to deduce
                                                                    predictions if a tweet is related to 5G Coronavirus conspiracy or
                Table 2: Development Phase Results                  not. We perform a multimodal analysis utilizing text-based NLP
                                                                    features from the tweet and visual features from the images posted
  Runs     Class        Acc        P       R        F1     ROC
                                                                    along with those tweets. We build a fusion model that incorporates
                                                                    both textual and visual features and generates prediction based on
 Run 1    Ternary     0.6965   0.5333    0.5111   0.5220   0.6978   each modality separately and their combination. Our classification
 Run 2    Ternary     0.6157   0.4043    0.2568   0.3140   0.5394   approach plays with both binary and ternary classifiers to
                                                                    experiment with the efficiency of the ensemble models. The
 Run 3    Binary      0.8357   0.3797    0.6390   0.4764   0.8190
                                                                    limitation we encounter is the lack of sufficient training data and
 Run 4    Binary      0.7824   0.1892    0.2917   0.2295   0.5471   propose to fix it in future works using data augmentation
 Run 5    Binary      0.7639   0.1351    0.3846   0.2000   0.5574   techniques on both text and image data to receive better
                                                                    performance and healthier conspiracy detection.
FakeNews: Corona virus and 5G conspiracy                                           C. Raj, M. Mehta


 REFERENCES
  [1] Pogorelov,      Konstantin, Daniel      Thilo Schroeder, Luk
      Burchard, Johannes Moe, Stefan Brenner, Petra Filkukova and
      Johannes Langguth. "FakeNews: Corona Virus and 5G
      Conspiracy Task at MediaEval 2020." Proc. of the MediaEval
      2020 Workshop, Online, 14-15 December 2020.
  [2] Schroeder, Daniel Thilo, Konstantin Pogorelov, and Johannes
      Langguth. "FACT: a Framework for Analysis and Capture of
      Twitter Graphs." In 2019 Sixth International Conference on
      Social Networks Analysis, Management and Security (SNAMS),
      pp. 134-141. IEEE, 2019.
  [3] Karen Simonyan and Andrew Zisserman. 2015. Very Deep
      Convolutional Networks for Large-Scale Image Recognition. In
      International Conference on Learning Representations
  [4] Chollet, F. (2017). Xception: Deep learning with depthwise
      separable convolutions. In Proceedings of the IEEE conference
      on computer vision and pattern recognition (pp. 1251-1258).
  [5] Szegedy, C., Liu, W., Jia, Y., Sermanet, P., Reed, S., Anguelov, D., ... &
       Rabinovich, A. (2015). Going deeper with convolutions. In
       Proceedings of the IEEE conference on computer vision and
       pattern recognition (pp. 1-9).