<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Detecting Conspiracy Theories from Tweets: Textual and Structural Approaches</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Haoming Guo</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Adam Ash</string-name>
          <email>adamash@berkeley.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Chung</string-name>
          <email>dachung@berkeley.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gerald Friedland</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>Berkeley</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2020</year>
      </pub-date>
      <fpage>14</fpage>
      <lpage>15</lpage>
      <abstract>
        <p>The sharing of biased and fake news on social media has skyrocketed in the past few years. These actions have caused real-world problems and harm. The Fake News Detection Task 2020 has two subtasks: NLP-based approach and graph-based approach (Analyzing the repost structure of social media posts). We present baseline models for these two diferent subtasks and their performance. For the NLP-based approach, Transformers yielded the best results with a Matthews Correlation Coeficient (MCC) score of 0.477. For the graph-based approach, the best results came from a Support Vector Machine (SVM) model with a MCC score of 0.366.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION &amp; RELATED WORK</title>
      <p>
        This paper discusses social media natural language processing
and graph-based processing on detecting conspiracy theories. We
present our work two subtasks: one that classifies tweets based
on their content and metadata (includes images), and another that
classifies tweets solely based on their graph-based structure with
very little metadata (relative time posted, friends, followers). The
task overview paper[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] describes the dataset more in-depth as well
as providing information on how the dataset was constructed.[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
      </p>
      <p>
        FNC-1, a similar benchmark on fake news and stance detection
from texts has received much attention from researchers[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
Using handcrafted features and a Multi-Layer Perceptron model has
proved to perform well on the task[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. Furthermore, Slovikovskaya
et al. showed that fine-tuning transformers achieve state-of-the-art
on the benchmark.[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
      </p>
      <p>
        Methods incorporating graph structure have proved to be fairly
efective in detecting "fake news" consisting of an article shared on
Twitter or other social media.[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] Moreover, deep learning methods
on graphs of variable size and connectivity have been shown to be
efective tools for classification.[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
      </p>
      <p>This paper presents several prediction models and features for
an NLP approach as well as a graph-based approach. We present the
performance of these predictors and describe our methodologies.</p>
    </sec>
    <sec id="sec-2">
      <title>APPROACH</title>
    </sec>
    <sec id="sec-3">
      <title>Bidirectional LSTM</title>
      <p>
        We use a bidirectional LSTM (BiLSTM) as our baseline model for
the NLP track. We tokenize and lemmatize each tweet into a list
of 55 tokens (pad 0s if below 55), and for each token we get a
300-dimensional embedding from a pretrained word2vec model[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Then, we use the vectors to train a biLSTM for classifications. We
choose the Adam optimizer, categorical cross entropy loss, 256 units
for LSTM and two fully connected layers for our final prediction.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Transformers</title>
      <p>
        We experiment with pretrained transformers to classify the tweets.
BERT, which stands for Bidirectional Encoder Representations from
Transformers, was introduced in 2018 and achieved state-of-the-art
performance on most NLP tasks[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. BERT uses a multi-layer
bidirectional transformer encoder with a self-attention mechanism to
learn a language representation of the input texts. Following BERT,
two modifications, XLNet[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and RoBERTa[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] were proposed to
address some of the BERT’s shortcomings and outperformed BERT
on a variety of tasks.
      </p>
      <p>
        We use a framework called flair[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to obtain BERT, XLNet and
RoBERTa embeddings separately as 3 sets of features. We use the
base 768-dimensional versions of all transformers. Then, we train a
fully connected neural network with 2 hidden layers on each set
of features to classify the tweets. Best hyperparameters including
hidden layer size, learning rate and number of training iterations
are tuned separately for each of BERT, XLNet and RoBERTa.
2.3
      </p>
    </sec>
    <sec id="sec-5">
      <title>Basic Graph Features</title>
      <p>Some features are hand prepared for each retweet graph. Features
are either categorized as being calculated solely based on graph
structure, or calculated with the help of separate node information.
Examples of features based solely of of graph structure include
edge count, node count, number of connected components, and
average clustering coeficient. Features based of of both node
information and graph structure include average time to retweet,
original tweeter’s follower count, and percentage of original tweeter’s
followers who retweeted.
2.4</p>
    </sec>
    <sec id="sec-6">
      <title>Computed Graph Features</title>
      <p>About 60000 random subgraphs are sampled from graphs in the
training set. To create each random sample, ten nodes and their
corresponding edges are then randomly chosen from a graph in
the training set. Each randomly sampled subgraph is given a
100dimensional vector corresponding to the subgraph’s flattened
adjacency matrix and a label corresponding to the label of its source
graph. A logistic regression classifier is run on all sampled
subgraphs. For each graph in the test set, ten random subgraphs are
similarly computed. The average of the model’s predictions for
these 10 random subgraphs is added to the test set features for their
corresponding graph.</p>
      <p>
        For each graph in the training set, the Graph2Vec package[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
is used to create 64-dimensional representations for the graph’s
largest (highest node count) subgraph. This representation is used
as 64 features for each graph. Also, 64-dimensional representations
are taken of each subgraph of each graph, and a weighted average
is taken by the number of nodes in the subgraph is added to the
features for each graph.
      </p>
      <p>
        A Deepwalk[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] algorithm is also used to generate a length 64
feature for each node in both the training and test sets. A logistic
regression classifier is trained on the Deepwalk feature vectors for
each node in each graph in the training set. For each graph in the
test set, the average of the predictions for each node was used as a
feature.
3
3.1
      </p>
    </sec>
    <sec id="sec-7">
      <title>RESULTS AND ANALYSIS</title>
    </sec>
    <sec id="sec-8">
      <title>Text-based Approaches</title>
      <p>We split the data into an 80% training set and a 20% validation
set. We evaluate our results using Matthews correlation coeficient
(MCC), which is considered a balanced measure even for unbalanced
data distributions. We present validation results and the oficial test
set results in tables 1 and 2.</p>
      <p>All transformers outperforms the baseline BiLSTM as in many
other NLP tasks, but among the transformers RoBERTa
significantly outperforms the other. We analyze the reasons behind it.
First, BERT and XLNet are pretrained on 16GB of Book Corpus
and English Wikipedia, while RoBERTa is pretrained on an
additional 144GB of CommonCrawl News dataset, Web text corpus, and
Stories from Common Crawl. We think that this additional data
not only improves RoBERTa’s generalizability, but it also makes
the model more suitable for news subjects and informal language.
Secondly, RoBERTa removes the Next Sentence Prediction (NSP)
training objective. The NSP objective was hypothesized to improve
performance on tasks that require reasoning on pairs of sentences,
which is not a key element in our task. Liu et al. also showed in
his paper the uselessness of the NSP objective in many settings.
Therefore, the removal of NSP loss is another possible reason why
RoBERTa performs the best.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Structure-based Approaches</title>
      <p>We evaluate our results using MCC as discussed in the previous
subsection. Once again, the data is split into an 80% training set and
a 20% validation set. Diferent models are tested for both the
twoclass and three-class problems. Models tested are SVM, neural nets,
and random forest. The neural net has three layers of 64 Rectified
Linear Units each, with a Sigmoid function output layer.</p>
      <p>For the two-class problem, a SVM model with a radial basis
function kernel outperforms the other models on validation sets. For the
three-class problem, a random forest model with 40 estimators and
a maximum depth of four nodes outperforms the others. Average
validation results using features computed only from the graph
structure are shown in Table 3, and average validation results using
features computed from all available data are shown in Table 4.</p>
      <p>Our final classifiers are run on a test set roughly one third the
size of our training set. Our two-class SVM model receives an MCC
of 0.370. Our three-class random forest model receives an average
MCC of 0.318. It is not surprising that our two-class classifier
performs better, as using two classes instead of three leads to a dataset
with much less ambiguity.
4</p>
    </sec>
    <sec id="sec-10">
      <title>DISCUSSION AND OUTLOOK</title>
      <p>Above, we presented and experimented with several methods to
detect conspiracy theories from social media content based on their
text and graph structure. Overall, a transformer-based approach
exhibited the best performance for text-based classification, while
SVM/Random Forest trained on our crafted graph features proved
to be the best on structure-based classification.</p>
      <p>There are many ways to extend the methodologies described
above. Below we list some possible ways of furthering our work.</p>
      <p>(1) Our preliminary experimentation with the provided
metadata yielded worse results than our transformers-based approaches.
Further experiments could be done to determine whether training
a classifier on the metadata would yield better results.</p>
      <p>(2) In this paper, we focused on text-based approaches and
structure-based approaches separately for the specific sub-tasks.
Incorporating diferent modalities such as analyzing tweet texts,
tweet structures, metadata, and images associated with the tweet
could prove to be useful.</p>
      <p>FakeNews: Corona virus and 5G Conspiracy</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Alan</given-names>
            <surname>Akbik</surname>
          </string-name>
          , Duncan Blythe, and
          <string-name>
            <given-names>Roland</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Contextual String Embeddings for Sequence Labeling</article-title>
          .
          <source>In COLING 2018, 27th International Conference on Computational Linguistics</source>
          .
          <fpage>1638</fpage>
          -
          <lpage>1649</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Jacob</given-names>
            <surname>Devlin</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ming-Wei</surname>
            <given-names>Chang</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Kenton</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Kristina</given-names>
            <surname>Toutanova</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          . CoRR abs/
          <year>1810</year>
          .04805 (
          <year>2018</year>
          ). arXiv:
          <year>1810</year>
          .04805 http://arxiv.org/abs/
          <year>1810</year>
          .04805
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>David</given-names>
            <surname>Duvenaud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dougal</given-names>
            <surname>Maclaurin</surname>
          </string-name>
          , Jorge Aguilera-Iparraguirre,
          <article-title>Rafael Gómez-Bombarelli, Timothy Hirzel, Alán Aspuru-Guzik, and</article-title>
          <string-name>
            <given-names>Ryan P.</given-names>
            <surname>Adams</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Convolutional Networks on Graphs for Learning Molecular Fingerprints</article-title>
          . (
          <year>2015</year>
          ).
          <source>arXiv:cs.LG/1509.09292</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Andreas</given-names>
            <surname>Hanselowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Avinesh P.V.S.</given-names>
            ,
            <surname>Benjamin</surname>
          </string-name>
          <string-name>
            <surname>Schiller</surname>
          </string-name>
          , Felix Caspelherr, Debanjan * Chaudhuri, Christian M. Meyer, and
          <string-name>
            <given-names>Iryna</given-names>
            <surname>Gurevych</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A Retrospective Analysis of the Fake News Challenge StanceDetection Task</article-title>
          .
          <source>In Proceedings of the 27th International Conference on Computational Linguistics (COLING</source>
          <year>2018</year>
          ). http://tubiblio.ulb. tu-darmstadt.de/105434/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Yinhan</given-names>
            <surname>Liu</surname>
          </string-name>
          , Myle Ott, Naman Goyal, Jingfei Du, Mandar Joshi, Danqi Chen,
          <string-name>
            <surname>Omer Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Mike</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Luke</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Veselin</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach</article-title>
          . CoRR abs/
          <year>1907</year>
          .11692 (
          <year>2019</year>
          ). arXiv:
          <year>1907</year>
          .11692 http://arxiv. org/abs/
          <year>1907</year>
          .11692
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Mikolov</surname>
          </string-name>
          , Edouard Grave, Piotr Bojanowski,
          <string-name>
            <given-names>Christian</given-names>
            <surname>Puhrsch</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Armand</given-names>
            <surname>Joulin</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Advances in Pre-Training Distributed Word Representations</article-title>
          .
          <source>In Proceedings of the International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Federico</given-names>
            <surname>Monti</surname>
          </string-name>
          , Fabrizio Frasca, Davide Eynard, Damon Mannion, and
          <string-name>
            <surname>Michael</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bronstein</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Fake News Detection on Social Media using Geometric Deep Learning</article-title>
          . (
          <year>2019</year>
          ).
          <article-title>arXiv:cs</article-title>
          .SI/
          <year>1902</year>
          .06673
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Annamalai</given-names>
            <surname>Narayanan</surname>
          </string-name>
          , Mahinthan Chandramohan, Rajasekar Venkatesan, Lihui Chen, Yang Liu, and
          <string-name>
            <given-names>Shantanu</given-names>
            <surname>Jaiswal</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>graph2vec: Learning Distributed Representations of Graphs. (</article-title>
          <year>2017</year>
          ). arXiv:cs.
          <source>AI/1707.05005</source>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Bryan</given-names>
            <surname>Perozzi</surname>
          </string-name>
          , Rami Al-Rfou, and
          <string-name>
            <given-names>Steven</given-names>
            <surname>Skiena</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>DeepWalk: Online Learning of Social Representations</article-title>
          .
          <source>CoRR abs/1403</source>
          .6652 (
          <year>2014</year>
          ). arXiv:
          <volume>1403</volume>
          .6652 http://arxiv.org/abs/1403.6652
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Konstantin</surname>
            <given-names>Pogorelov</given-names>
          </string-name>
          , Daniel Thilo Schroeder, Luk Burchard, Johannes Moe, Stefan Brenner, Petra Filkukova, and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Langguth</surname>
          </string-name>
          .
          <year>2020</year>
          .
          <article-title>FakeNews: Corona Virus and 5G Conspiracy Task at MediaEval 2020</article-title>
          . In MediaEval 2020 Workshop.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Thilo</surname>
          </string-name>
          <string-name>
            <surname>Schroeder</surname>
          </string-name>
          , Konstantin Pogorelov, and
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Langguth</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>FACT: a Framework for Analysis and Capture of Twitter Graphs</article-title>
          .
          <source>In 2019 Sixth International Conference on Social Networks Analysis, Management and Security (SNAMS)</source>
          . IEEE,
          <fpage>134</fpage>
          -
          <lpage>141</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Valeriya</given-names>
            <surname>Slovikovskaya</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>Transfer Learning from Transformers to Fake News Challenge Stance Detection (FNC-1) Task</article-title>
          . CoRR abs/
          <year>1910</year>
          .14353 (
          <year>2019</year>
          ). arXiv:
          <year>1910</year>
          .14353 http://arxiv.org/abs/
          <year>1910</year>
          . 14353
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Zhilin</surname>
            <given-names>Yang</given-names>
          </string-name>
          , Zihang Dai, Yiming Yang, Jaime G. Carbonell, Ruslan Salakhutdinov, and
          <string-name>
            <surname>Quoc</surname>
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Le</surname>
          </string-name>
          .
          <year>2019</year>
          .
          <article-title>XLNet: Generalized Autoregressive Pretraining for Language Understanding</article-title>
          . CoRR abs/
          <year>1906</year>
          .08237 (
          <year>2019</year>
          ). arXiv:
          <year>1906</year>
          .08237 http://arxiv.org/abs/
          <year>1906</year>
          .08237
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>