<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Performance of Diferent Text Classification Strategies on Conspiracy Classification in Social Media</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manfred Moosleitner</string-name>
          <email>manfred.moosleitner@uibk.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Muraurer</string-name>
          <email>b.murauer@posteo.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universität Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Universität Innsbruck</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2021</year>
      </pub-date>
      <fpage>6</fpage>
      <lpage>8</lpage>
      <abstract>
        <p>This paper summarizes the contribution of our team UIBK-DBISFAKENEWS to the shared task “FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task” as part of MediaEval 2021, the goal of which is to classify tweets based on their textual content. The task features the three sub-tasks (i) Text-Based Misinformation Detection, (ii) Text-Based Conspiracy Theories Recognition, and (iii) Text-Based Combined Misinformation and Conspiracies Detection. We achieved our best results for all three sub-tasks using the pre-trained language model BERT Base[1], with extremely randomized trees and support vector machines as runner ups. We further show that syntactic features using dependency grammar are inefective, resulting in prediction scores close to a random baseline.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        This task consists of the three sub-tasks, in which properties of
short social media posts must be predicted. The sub-tasks’ goals
are similar but distinctively diferent: In sub-task 1 (Text-Based
Misinformation Detection), each document belongs to one of three
classes ("Promotes/Supports Conspiracy", "Discusses Conspiracy",
"Non-Conspiracy"), making it a multi-class classification problem.
In sub-task 2 (Text-Based Conspiracy Theories Recognition), a list of
conspiracies is provided. For each conspiracy, the goal is to predict
whether that conspiracy is mentioned in a document, whereas more
than one conspiracy can be mentioned in one document. This makes
it a multi-label classification problem. Finally, in sub-task 3
(TextBased Combined Misinformation and Conspiracies Detection), the
above sub-tasks are combined and for each evaluation document,
the model must predict for each of the provided conspiracies the
way that conspiracy is mentioned according to sub-task 1. The
development data provided for this task consists of about 1,500
tweets collected by Schroeder et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and a detailed description of
the individual sub-tasks can be found in the task overview paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
The code of our solution is available online1.
      </p>
      <p>For each task, each team is allowed to submit five runs with
the following restrictions: For run 1, only features extracted from
the provided texts without additional external data or pre-trained
models were allowed (concretely, this also disallows using any
BERT-related model). For run 2, the usage of pre-trained models
was additionally allowed, and for runs 3 and 4, also the use of
external data for training any model was permitted.</p>
      <sec id="sec-1-1">
        <title>1https://git.uibk.ac.at/c7031305/mediaeval2021-fakenews</title>
      </sec>
      <sec id="sec-1-2">
        <title>Parameter</title>
        <p>-gram size
-grams max. features
lowercase text
DT-gram word repr.</p>
      </sec>
      <sec id="sec-1-3">
        <title>BERT model num. trees</title>
      </sec>
      <sec id="sec-1-4">
        <title>Tested Values</title>
        <p>1, 2, ..., 10
unlimited, 1000</p>
        <p>true, false
universal POS tag, English POS tag</p>
      </sec>
      <sec id="sec-1-5">
        <title>RoBERTa, DistilBERT, BERT base</title>
        <p>100, 250, 500, 750, 1000, 2000, ..., 5000</p>
        <p>In our approach to this task, we perform a large-scale grid search
experiment testing a variety of diferent feature extraction
methods and classification models and hyper-parameter configurations
thereof. We show that the pre-trained language model BERT
outperforms the other presented approaches for all the sub-tasks and
that the syntax-based features are not able to detect the
conspiracyrelated classes in the documents.
2</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>TEXT FEATURES</title>
      <p>We include multiple types of features in our grid experiments. As
a widely used and simple to calculate baseline, we include word
and character -grams. We thereby test diferent configurations of
the extraction, including diferent sizes of  and pre-processing the
texts to lowercase. The full list of parameters is shown in Table 1.
We normalize the frequency of the resulting -grams by using tf-idf.</p>
      <p>
        As the second type of text features, we calculate
DependencyTree-grams (DT-grams) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to determine if texts within one class
or label have similar grammatical structures. DT-grams are
substructures of the dependency graph of sentences, and can be
interpreted as a way to enhance -grams of part-of-speech tags by
redefining which tokens are “close” to one another and therefore
form an -gram. Thereby, each word is represented either by its
English-specific, part-of-speech tag, or a combination thereof. This
choice is a hyper-parameter and is tuned by the grid-search (cf.
Table 1). Like their character- and word-based counterparts, we
calculate the frequency of the resulting -grams as feature and use
tf-idf normalization. We include this feature to check whether some
of the conspiracy classes exhibit stylistic markers that are typical
for that category.
      </p>
      <p>Lastly, we use the sequence of tokens in the unmodified text
as input for fine-tuning diferent pre-trained language models to
directly perform classification on the evaluation sets. Thereby, no
splitting of the training documents was required as the documents
are short enough to be included in any of the language models.
M. Moosleitner, B. Murauer
spiritual
there
all
in
julia
antichrist
satanic
ark
m
3</p>
    </sec>
    <sec id="sec-3">
      <title>CLASSIFICATION MODELS</title>
      <p>We employ three diferent machine learning models with the tf-idf
normalized frequencies of character-, word- and DT-gram-based
features: support vector machines (SVM), multinomial naive bayes
(MNB), and extremely randomized trees (ET). The hyper-parameters
for these models are also listed in Table 1.</p>
      <p>
        We use three BERT-like models: BERT-base [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], RoBERTa [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
and DistilBERT [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Thereby, we use a maximum sequence length
of 256, three epochs of fine-tuning, and a batch size of 8. All other
parameters were left untouched from their default implementation
provided by the huggingface2 library.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>RESULTS AND DISCUSSION</title>
      <p>Generally, our results show a strong connection between certain
keywords and their corresponding conspiracy theories. Figure 1
shows the top four positive and negative coeficients for two of the
classes ("Satanism" and "Mind Control") from sub-task 2, which are
an intuitive way to show the relationship of the words with the
corresponding classes.</p>
      <sec id="sec-4-1">
        <title>2https://huggingface.co/</title>
        <p>From the grid search experiments, we select the best-performing
configurations for each of the allowed runs, which are shown in
Table 2. Since for the first run we were not allowed to submit the
BERT-based method, we submitted the ET-based solution, which
was second in line.</p>
        <p>The task organizers released the development dataset in two
stages: the first part consisted of 500 documents, and the second
part included an additional 1,000 documents. With only the first
part available, the ET model outperformed all others and was only
outperformed by BERT when the full dataset was available. This
indicates that BERT-like models require a minimum amount of text
data for pre-training to perform eficiently. Concretely, the addition
of the second part of the development dataset increased the number
of documents for the smallest class "Discusses Conspiracy" from
76 to 262, giving a rough impression of how many samples are
required for BERT to perform better than the ET model.</p>
        <p>When comparing the Results of extremely randomized Trees
and SVMs, we can see that the performance of ET was better than
the performance of SVM for sub-task 1, and vice versa for
subtask 2, where SMV performed better than ET. We think one of the
reasons for this is because the use of word uni-grams with SVM,
compared to character 6-grams with ET, indicating that whole
words better reflect the connection between keywords and labels
than only partial words. Also indicating a connection between
certain keywords and their labels is that the performance of BERT
in sub-task 2 (multi-label) and sub-task 3 (multi-class, multi-label)
is a little better than for sub-task 1 (multi-class).</p>
        <p>On the other hand, the results produced by our grammar-based
approach show a poor performance in all sub-tasks, indicating that
the grammatical structure of the texts as feature is not suited to
diferentiate between the given classes and labels. We attribute this
behavior to the short texts, which are not likely to incorporate
complex grammatical structures, as well as dificulties in parsing
due to the unstructured nature of the text.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>ACKNOWLEDGMENTS</title>
      <p>We would like to thank our research group for the feedback and
Prof. Günther Specht, head of our research group, for providing the
necessary infrastructure to perform the research reported in this
paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Roberta: A robustly optimized bert pretraining approach</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Murauer</surname>
          </string-name>
          and
          <string-name>
            <given-names>Günther</given-names>
            <surname>Specht</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>DT-grams: Structured Dependency Grammar Stylometry for Cross-Language Authorship Attribution</article-title>
          .
          <source>arXiv preprint arXiv:2106.05677</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Daniel Thilo Schroeder, Stefan Brenner, and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Langguth</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task at MediaEval 2021</article-title>
          .
          <source>In Proc. of the MediaEval 2021 Workshop</source>
          , Online,
          <fpage>13</fpage>
          -
          <lpage>15</lpage>
          December
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Konstantin</given-names>
            <surname>Pogorelov</surname>
          </string-name>
          , Daniel Thilo Schroeder, Petra Filkuková, Stefan Brenner, and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Langguth</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>WICO Text: A Labeled Dataset of Conspiracy Theory and 5G-Corona Misinformation Tweets</article-title>
          .
          <source>In Proceedings of the 2021 Workshop on Open Challenges in Online Social Networks</source>
          .
          <fpage>21</fpage>
          -
          <lpage>25</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Victor</given-names>
            <surname>Sanh</surname>
          </string-name>
          , Lysandre Debut, Julien Chaumond, and
          <string-name>
            <given-names>Thomas</given-names>
            <surname>Wolf</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter</article-title>
          . arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>01108</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>