<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Identifying Misinformation Spreaders: A Graph-Based Semi-Supervised Learning Approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Atta Ullah</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rabeeh Ayaz Abbasi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akmal Saeed Khattak</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anwar Said</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Quaid-i-Azam University Islamabad</institution>
          ,
          <country country="PK">Pakistan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute for Software Integrated Systems, Department of Computer Science, Vanderbilt University</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we proposed a Graph-Based conspiracy source detection method for the MediaEval task 2022 FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task. The goal of this study was to apply SOTA graph neural network methods to the problem of misinformation spreading in online social networks. We explore three diferent Graph Neural Network models: GCN, GraphSAGE and DGCNN. Experimental results demonstrate that DGCNN outperforms in terms of accuracy.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        Misinformation or fake news spreading helps exaggerate information which makes it dificult
to distinguish fake news from real news. There are three ways to identify fake news: through
content, context, or a combination of the two. In content-based, the underlying challenge is
the fluctuating nature of style, patterns, topics and platforms. Models trained on one dataset
may not perform well when using a diferent dataset due to the diferences in their contents,
style, or language. To address such challenges, context-based solutions have been devoted to
the detection of misinformation spreaders. Because it is obvious that the propagation of fake
news and real news are diferent[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] the authors have presented a semi-supervised model GCN for node classification,
and [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] extends it for rumors detection. GCN models are inherently transductive and fail
to generalize unseen nodes. Therefore, [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have proposed an approach called GraphSAGE,
which is a general inductive framework and efectively leverages node feature information for
generation of every new node’s embeddings. Instead of training each node’s embeddings, they
learn to produce embeddings by aggregating features from a node’s local neighborhood. For
more promising results, we consider changing node classification to graph classification. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
authors have proposed a method called Graph Isomorphic Network (GIN), which classifies both
graphs and nodes. Despite the strength of graph representation learning, GNN has a limited
understanding of representational properties and limitations. The model is powerful as the
Weisfeiler-Lehman graph isomorphism test[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Similarly, in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed a method named,
Deep Graph Convolutional Neural Network (DGCNN) to capture the graph level features which
consist of four layers of either [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] or [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>
        We present the diferent methods we implemented in this study. As working on sub-task 2[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
we have chosen the problem as both node and graph classification.
      </p>
      <p>
        In node classification, the model aims to classify a node by analyzing its features and its
neighbors’ features. Similarly, in the graph classification setting, we construct subgraphs for
each node by taking ego-network of each labeled node up to 3-hop neighbours.The implemented
methods are GCN[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], GraphSAGE[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and DGCNN[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <sec id="sec-3-1">
        <title>3.1. Graph Convolutional Neural Network</title>
        <p>
          Graph Convolutional Neural network (GCN) is a semi-supervised learning method for node
classification. GCN is based on the message passing mechanism that learns from node’s features
and its neighboring node’s features. We have used three GCN[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] layers and the fourth is a linear
layer. We use Binary Cross Entropy loss and Adam Optimizer during training. The architecture
of the implemented model is illustrated in Figure 1.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. GraphSAGE</title>
        <p>
          We use three GraphSAGE layers along with the RELU activation function and a linear layer.
Further, Binary Cross Entropy loss and Adam Optimizer during training has been utilized for
measuring the performance of the proposed model. Figure 2 illustrates the architecture of the
implemented GraphSAGE model. The rationale behind using GraphSAGE is: it is inductive
and tries to create embeddings by using sampling and aggregation features from the node local
neighborhood.
3.2.1. Deep Graph Convolutional Neural Network
DGCNN introduces a readout or pooling function which aggregates learned node embeddings
to graph-level embedding[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ].
        </p>
        <p>We generate subgraphs for each node and transform the node classification problem to
graph classification. The subgraphs are generated by taking the ego-network of up-to 3-hop
neighbourhood of a node. This way we form the label mapping between node and corresponding
ego-network of the node; the model classifies the label of a node’s ego-network, which is taken
as the label of the node.</p>
        <p>The implemented model consists of four GCN layers, 1D-MaxPooling in between two
1DConv layers, and a fully connected layer as shown in Figure 3.The network is trained using
Binary Cross Entropy loss and the Adam optimizer with an initial learning rate of 0.1− 5 and a
dropout of 0.5.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Analysis</title>
      <p>In this section, we present our experimental results obtained through the implemented models.
We use 80 : 20 train and test split ratio and use accuracy, Matthews correlation coeficient
(MCC) and Area Under the ROC (AUC) as evaluation metrics.</p>
      <p>We report models’ performance in terms of three evaluation metrics in Table1 We conclude
from the results that DGCNN performs quite better as compared to other two models.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Outlook</title>
      <p>As discussed the performance of implemented models in Section 4, these are the best results
being obtained by using GCN, GraphSAGE, and DGCNN. The feature distribution of
misinformation spreader and regular users is approximately equal through which the performance is
considerably low. In order to improve results, combination of both Label Propagation (LPA) and
GCN can help to classify misinformation spreaders and regular users efectively.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>D. Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Basak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dutta</surname>
          </string-name>
          ,
          <article-title>A heuristic-driven ensemble framework for covid-19 fake news detection</article-title>
          ,
          <source>in: International Workshop onCombating Online Hostile Posts in Regional Languages during Emergency Situation</source>
          , Springer,
          <year>2021</year>
          , pp.
          <fpage>164</fpage>
          -
          <lpage>176</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Biradar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Saumya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Chauhan</surname>
          </string-name>
          ,
          <article-title>Combating the infodemic: Covid-19 induced fake news recognition in social media networks</article-title>
          ,
          <source>Complex &amp; Intelligent Systems</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>[3] MediaEval, Fakenews: Corona virus and conspiracies multimedia analysis task</article-title>
          ,
          <year>2022</year>
          . URL: https: //github.com/konstapo/2022-
          <article-title>Fake-</article-title>
          <string-name>
            <surname>News-</surname>
          </string-name>
          MediaEval-Task/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>T. N.</given-names>
            <surname>Kipf</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Welling</surname>
          </string-name>
          ,
          <article-title>Semi-supervised classification with graph convolutional networks</article-title>
          ,
          <source>arXiv preprint arXiv:1609.02907</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <article-title>Identifying possible rumor spreaders on twitter: A weak supervised learning approach</article-title>
          , in: 2021
          <source>International Joint Conference on Neural Networks (IJCNN)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>W.</given-names>
            <surname>Hamilton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ying</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <article-title>Inductive representation learning on large graphs</article-title>
          ,
          <source>Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>K.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leskovec</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Jegelka</surname>
          </string-name>
          ,
          <article-title>How powerful are graph neural networks?</article-title>
          , arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>00826</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>An end-to-end deep learning architecture for graph classification</article-title>
          ,
          <source>Proceedings of the AAAI Conference on Artificial Intelligence</source>
          <volume>32</volume>
          (
          <year>2018</year>
          ). URL: https: //ojs.aaai.org/index.php/AAAI/article/view/11782. doi:
          <volume>10</volume>
          .1609/aaai.v32i1.
          <fpage>11782</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>