=Paper=
{{Paper
|id=Vol-3181/paper57
|storemode=property
|title=Classifying COVID-19 Conspiracy Tweets with Word Embedding and BERT
|pdfUrl=https://ceur-ws.org/Vol-3181/paper57.pdf
|volume=Vol-3181
|authors=Yuta Yanagi,Ryouhei Orihara,Yasuyuki Tahara,Yuichi Sei,Akihiko
					Ohsuga
|dblpUrl=https://dblp.org/rec/conf/mediaeval/YanagiOTSO21
}}
==Classifying COVID-19 Conspiracy Tweets with Word Embedding and BERT==
<pdf width="1500px">https://ceur-ws.org/Vol-3181/paper57.pdf</pdf>
<pre>
Classifying COVID-19 Conspiracy Tweets with Word Embedding
                         and BERT
                                             Yuta Yanagi1 , Ryohei Orihara1 , Yasuyuki Tahara1 ,
                                                      Yuichi Sei1 , Akihiko Ohsuga1
                                     1 The University of Electro-Communications, Japan

             yanagi.yuta@ohsuga.lab.uec.ac.jp,orihara@acm.org,tahara@uec.ac.jp,sei@is.uec.ac.jp,ohsuga@uec.ac.jp

ABSTRACT                                                                             the BERT model improved classification performance for the 5G
We, the team OTS-UEC contributed the automatic detection of con-                     conspiracy/the other conspiracy/the non-conspiracy.
spiracy tweets in MediaEval 2021. The dataset has tweets that refer
to COVID-19. Part of them argues/discusses the relationship of                       3     APPROACH
conspiracies. Following the results of the MediaEval 2020 working                    In this section, we show how to implement our proposed model in
notes, we use a BERT-based classifier. We implement three pro-                       each subtask.
posed models and compare them in the experiments. In the task of
this year, the model also shows better results of classifying than                   3.1    Preprocessing
a text embedding-based one. This result suggests that using the
                                                                                     The organizer sent us raw tweet texts as a dataset. Therefore, we
pre-trained model is also suitable to classify conspiracy tweets by
                                                                                     apply to preprocess following rules.
small preparation processes.
                                                                                           • Fix contracted forms by a provided tool [15] and manual
                                                                                             processes.
1    INTRODUCTION                                                                          • Make all alphabets to lowercase.
FakeNews, one of the MediaEval 2021 tasks focuses on the auto-                             • Remove letters except for alphabets, numbers, and whites-
matic classifying of tweets by conspiracies [10]. The FakeNews has                           paces.
three classification subtasks. The first (Text-Based Misinformation                        • Replace all numbers to zero (0) except “covid19”
Detection, MD) is classifying three stances classes. The given three                       • Eliminate stopwords by a tool from NLTK [3].
labels are supporting, discussing, and non-conspiracy (not mention
                                                                                        The removed letters include emojis. When we improve the per-
conspiracy). The second (Text-Based Conspiracy Theory Recogni-
                                                                                     formances of the classifications, considering emojis may be able to
tion, CTR) is nine binary classifications for pre-defined conspiracies
                                                                                     extract more accurate tweet features.
if referred to or not. The third (Text-Based Combined Misinforma-
tion and Conspiracies Detection, CMCD) requires classifying three
stances by the nine conspiracies (3 × 9 output types).                               3.2    Language Models
    We compared the effect of using pre-trained language models in                   The FakeNews task requires making two model types. On the one
every subtasks. In addition, we attempt to compare two language                      hand, “required run” needs to complete within the dataset. On the
models. One is pre-trained NNLM [2] based, another is pre-trained                    other hand, “optional run (s)” allows using data outside the dataset.
BERT [4] based. The results show there are solid improvements in                     The outside data includes pre-trained language models.
using the pre-trained language model. Moreover, using the BERT                           We compare the effect of pre-trained language models on the
based model gives the best result in the experiments.                                difference of results between these two model types. We have done
                                                                                     all implementations in Keras [7].
2    RELATED WORK
                                                                                         3.2.1 Required run. First of all, we get encoded tweets consist-
The epidemic of COVID-19 affects not only in medical area but                        ing of integers by TextVectorization. Secondly, we obtain word
also social media. Diffusing misinformation (including fake news)                    embedding by the Embedding layer. We initialize the layer by the
reduces the credibility of governments and medical treatments like                   uniform distribution. Finally, we obtain a tweet feature by average-
vaccines [13]. Moreover, part of people argues the relationships                     pooling of all word embeddings in GlobalAveragePooling1D. The
between the epidemic and conspiracies by psychological influences                    dimensionality of output from the pooling is 128. We add a fully
[5, 14]. Therefore, the automatic detection of conspiracy tweets is                  connected layer with a 10% dropout layer. The 32-dimensional array
crucial to lighten the burden imposed on medical workers.                            is the tweet features in the required run.
   The FakeNews task in 2021 extends from the automatic detection
of the 5G conspiracy from COVID-19 tweets in MediaEval 2020                             3.2.2 Optional runs. In this run, we can use outside the given
[11, 12]. Among its participants, two teams used the BERT model                      dataset includes pre-trained models. We use the BERT-based lan-
in a single model [8] or an ensemble model [9]. In both cases, using                 guage model from the results of the FakeNews task in MediaEval
                                                                                     2020 [8, 9]. We assign small_bert from TensorFlow Hub [6]. We
Copyright 2021 for this paper by its authors. Use permitted under Creative Commons   also add the fully connected layer and obtain the tweet features. In
License Attribution 4.0 International (CC BY 4.0).
MediaEval’21, December 13-15 2021, Online                                            the stance classification subtask, we also compared with a NNLM
                                                                                     based language model by TensorFlow Hub [2].
MediaEval’21, December 13-15 2021, Online                                                                                                      Y. Yanagi et al.

                                                                                 Table 2: Detail results of MCC in the conspiracy detection.
  # Type of stance labels   # Refer to conspiracy       #Agree with conspiracy
                                                                                 See also overview paper for every abbreviation of pre-defined
  1     Non-conspiracy      0         NO                    Except in training   conspiracies [10].
  2 Discusses conspiracy                                0          NO
                            1         YES
    Promotes/Supporting                                                              Model   SC     BMC     A      FV      IP     HRI     PR      NWO     S
  3                                                     1         YES
        conspiracy
                                                                                     Emb.    0.01   0.16    0.30   -0.09   0.15   0.1     0.19    0.2     0.16
                                                                                     BERT    0.04   0.41    0.44   0.09    0.05   0.53    0.30    0.41    0.13
Figure 1: Comparison table between the given labels and the
new labels in Misinformation Detection.
                                                                                 Table 3: The results of the couple binary classifications and
                                                                                 single three-class classification with BERT in the MD.
             Table 1: Results of implemented models.

                                                                                                           Model name                    MCC
                  Subtask name                        Model           MCC
                                                                                                   Double binary classifications         0.413
                                                 Word emb.            0.142                       Single three-class classification      0.258
          Misinformation Detection              BERT-based            0.413
                                                NNLM-based            0.388
                                                    Word emb.         0.133      tweets. According to the task organizer, other participants also send
      Conspiracy Theory Recognition                                              the all-one output.
                                                    BERT-based        0.267
                                                                                    Table 2 shows the detailed result in the CTR subtask. The BERT
         Combined Misinformation                    Word emb.         0.000
                                                                                 model is better in seven of the nine pre-defined conspiracies. Even
         & Conspiracies Detection                   BERT-based        0.000
                                                                                 in the remaining three cases the differences are tiny.

                                                                                 4.2     Double Binary Classification
3.3     Classification Models                                                    Table 3 the results of two classification structures. Both of them use
We prepare three classification models for each subtask. We input                the BERT-based language models. We can confirm that the double
the tweet features for them.                                                     binary classification shows a better score than the single three-class
                                                                                 one.
   3.3.1 Misinformation Detection. We build two binary classifiers
because the ratio of the labels is nearly 2:1:1. Figure 1 shows the cor-
respondences of the given labels and ones in this subtask imposed
                                                                                 5     DISCUSSION AND OUTLOOK
by us. The first one considers if a tweet refers to any conspira-                In this paper, we participate the FakeNews task that requires classi-
cies. If it does, the second one considers if the tweet supports the             fying tweets by conspiracies. To realize it, we employ pre-trained
conspiracies or not. Therefore, during the training sequence, the                language models from other models for the FakeNews task of Media-
non-conspiracy tweets are not used for the second classifier. We                 Eval 2020 [11]. We compare them with models that use only word
think this will help to train without bias from the imbalance of                 embedding. According to the experimental result, the pre-trained
given labels. We compare in experiments the effect of this structure             language model help to extract conspiracy information at the stance
with the model that classified directly for three labels.                        classification and the conspiracy detection. However, in classifica-
                                                                                 tion for the CMCD subtask, all output scores are the same label. We
   3.3.2 Conspiracy Theory Recognition. We build a classifier for                guess that the classification models do not work because the tweets
nine outputs that parallel pre-defined conspiracies. We use another              mentioning each pre-defined conspiracy are scattered. However,
fully connected layer that outputs nine values.                                  looking at the models of other teams, it is possible that we have
                                                                                 designed our models incorrectly for the CMCD subtask. A closer
   3.3.3 Combined Misinformation and Conspiracies Detection. We
                                                                                 look at the result of CTK shows variation in the effectiveness of
prepare nine three-class classifiers that deduce stances. We do not
                                                                                 the pre-trained language model by the pre-defined conspiracies.
use two binary classifiers due to the lack of tweets that refer to
                                                                                 This result may come from the characteristics of the trend of tweet
each conspiracy.
                                                                                 content. It can be needed further researching. Moreover, we also
                                                                                 compare the two classification structures at the MD subtask. The
4 RESULTS AND ANALYSIS                                                           experiment results show us that the double binary classification
4.1 Effect of Language Model                                                     is better than the single three-class classification. We expect this
Table 1 shows the returned results of the FakeNews task. All result              reason is nearly 2:1:1 of three classes ratio. If the ratio is different,
values are the Matthews correlation coefficient (MCC) [1].                       the trend will not continue.
   We can confirm that using the language model makes the results
improve except for the CMCD subtask. In the CMCD, all output                     ACKNOWLEDGMENTS
labels are one in those models, which means non-conspiracy. We                   This work was supported by JSPS KAKENHI Grant Numbers
attribute this to the fact that by separating the classifiers by the             JP18H03229, JP18H03340, 18K19835, JP19H04113, JP19K12107,
pre-defined conspiracies, we increased the ratio of non-conspiracy               JP21H03496.
FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task                                         MediaEval’21, December 13-15 2021, Online


REFERENCES                                                                        [15] Pascal van Kooten. 2020. contractions. https://github.com/kootenpv/
 [1] Pierre Baldi, Søren Brunak, Yves Chauvin, Claus A. F. Andersen, and               contractions. (2020).
     Henrik Nielsen. 2000. Assessing the accuracy of prediction algorithms
     for classification: an overview . Bioinformatics 16, 5 (05 2000), 412–424.
     https://doi.org/10.1093/bioinformatics/16.5.412
 [2] Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Christian Jan-
     vin. 2003. A neural probabilistic language model. The journal of
     machine learning research 3 (2003), 1137–1155.
 [3] Steven Bird and Edward Loper. 2004. NLTK: The Natural Language
     Toolkit. In Proceedings of the ACL Interactive Poster and Demonstration
     Sessions. Association for Computational Linguistics, Barcelona, Spain,
     214–217. https://www.aclweb.org/anthology/P04-3031
 [4] Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.
     2019. BERT: Pre-training of Deep Bidirectional Transformers for
     Language Understanding. (2019). arXiv:cs.CL/1810.04805
 [5] Karen M. Douglas. 2021.                  COVID-19 conspiracy the-
     ories.         Group Processes & Intergroup Relations 24, 2
     (2021), 270–275.            https://doi.org/10.1177/1368430220982068
     arXiv:https://doi.org/10.1177/1368430220982068
 [6] Tensorflow hub. 2021. small_bert/bert_en_uncased_L-4_H-512_A-8.
     (2021). https://tfhub.dev/tensorflow/small
 [7] Nikhil Ketkar. 2017. Introduction to keras. In Deep learning with
     Python. Springer, 97–111.
 [8] Andrey Malakhov, Alessandro Patruno, and Stefano Bocconi. 2020.
     Fake News Classification with BERT. In Working Notes Proceed-
     ings of the MediaEval 2020 Workshop, Online, 14-15 December 2020
     (CEUR Workshop Proceedings), Steven Hicks, Debesh Jha, Konstantin
     Pogorelov, Alba García Seco de Herrera, Dmitry Bogdanov, Pierre-
     Etienne Martin, Stelios Andreadis, Minh-Son Dao, Zhuoran Liu,
     José Vargas Quiros, Benjamin Kille, and Martha A. Larson (Eds.),
     Vol. 2882. CEUR-WS.org. http://ceur-ws.org/Vol-2882/paper38.pdf
 [9] Olga Papadopoulou, Giorgos Kordopatis-Zilos, and Symeon Pa-
     padopoulos. 2020. MeVer Team Tackling Corona Virus and 5G Con-
     spiracy Using Ensemble Classification Based on BERT. In Working
     Notes Proceedings of the MediaEval 2020 Workshop, Online, 14-15 De-
     cember 2020 (CEUR Workshop Proceedings), Steven Hicks, Debesh Jha,
     Konstantin Pogorelov, Alba García Seco de Herrera, Dmitry Bogdanov,
     Pierre-Etienne Martin, Stelios Andreadis, Minh-Son Dao, Zhuoran
     Liu, José Vargas Quiros, Benjamin Kille, and Martha A. Larson (Eds.),
     Vol. 2882. CEUR-WS.org. http://ceur-ws.org/Vol-2882/paper76.pdf
[10] Konstantin Pogorelov, Daniel Thilo Schroeder, Stefan Brenner, and
     Johannes Langguth. 2021. FakeNews: Corona Virus and Conspiracies
     Multimedia Analysis Task at MediaEval 2021. In the MediaEval 2021
     Workshop, Online, 13-15 December 2020. Online.
[11] Konstantin Pogorelov, Daniel Thilo Schroeder, Luk Burchard, Johannes
     Moe, Stefan Brenner, Petra Filkukova, and Johannes Langguth. 2020.
     Fakenews: Corona virus and 5g conspiracy task at mediaeval 2020. In
     MediaEval 2020 Workshop. Online.
[12] Konstantin Pogorelov, Daniel Thilo Schroeder, Petra Filkuková, Stefan
     Brenner, and Johannes Langguth. 2021. WICO Text: A Labeled Dataset
     of Conspiracy Theory and 5G-Corona Misinformation Tweets. In 2021
     Workshop on Open Challenges in Online Social Networks. Online, 21–25.
[13] Jon Roozenbeek, Claudia R. Schneider, Sarah Dryhurst, John Kerr,
     Alexandra L. J. Freeman, Gabriel Recchia, Anne Marthe van der Bles,
     and Sander van der Linden. 2020. Susceptibility to misinformation
     about COVID-19 around the world. Royal Society Open Science 7, 10
     (Oct. 2020), 201199. https://doi.org/10.1098/rsos.201199
[14] Joseph E Uscinski, Adam M Enders, Casey Klofstad, Michelle Seelig,
     John Funchion, Caleb Everett, Stefan Wuchty, Kamal Premaratne, and
     Manohar Murthi. 2020. Why do people believe COVID-19 conspiracy
     theories? Harvard Kennedy School Misinformation Review 1, 3 (2020).

</pre>