1. Introduction

Identifying Misinformation Spreaders: A Graph-Based Semi-Supervised Learning Approach

Atta Ullah

Rabeeh Ayaz Abbasi

Akmal Saeed Khattak

Anwar Said

1 0 Department of Computer Science, Quaid-i-Azam University Islamabad , Pakistan 1 Institute for Software Integrated Systems, Department of Computer Science, Vanderbilt University , USA

In this paper we proposed a Graph-Based conspiracy source detection method for the MediaEval task 2022 FakeNews: Corona Virus and Conspiracies Multimedia Analysis Task. The goal of this study was to apply SOTA graph neural network methods to the problem of misinformation spreading in online social networks. We explore three diferent Graph Neural Network models: GCN, GraphSAGE and DGCNN. Experimental results demonstrate that DGCNN outperforms in terms of accuracy.

1. Introduction 2. Related Work

Misinformation or fake news spreading helps exaggerate information which makes it dificult to distinguish fake news from real news. There are three ways to identify fake news: through content, context, or a combination of the two. In content-based, the underlying challenge is the fluctuating nature of style, patterns, topics and platforms. Models trained on one dataset may not perform well when using a diferent dataset due to the diferences in their contents, style, or language. To address such challenges, context-based solutions have been devoted to the detection of misinformation spreaders. Because it is obvious that the propagation of fake news and real news are diferent[ 1 ].

In [ 4 ] the authors have presented a semi-supervised model GCN for node classification, and [ 5 ] extends it for rumors detection. GCN models are inherently transductive and fail to generalize unseen nodes. Therefore, [ 6 ] have proposed an approach called GraphSAGE, which is a general inductive framework and efectively leverages node feature information for generation of every new node’s embeddings. Instead of training each node’s embeddings, they learn to produce embeddings by aggregating features from a node’s local neighborhood. For more promising results, we consider changing node classification to graph classification. In [ 7 ] authors have proposed a method called Graph Isomorphic Network (GIN), which classifies both graphs and nodes. Despite the strength of graph representation learning, GNN has a limited understanding of representational properties and limitations. The model is powerful as the Weisfeiler-Lehman graph isomorphism test[ 7 ]. Similarly, in [ 8 ] proposed a method named, Deep Graph Convolutional Neural Network (DGCNN) to capture the graph level features which consist of four layers of either [ 4 ], [ 6 ] or [ 7 ].

3. Approach

We present the diferent methods we implemented in this study. As working on sub-task 2[ 3 ], we have chosen the problem as both node and graph classification.

In node classification, the model aims to classify a node by analyzing its features and its neighbors’ features. Similarly, in the graph classification setting, we construct subgraphs for each node by taking ego-network of each labeled node up to 3-hop neighbours.The implemented methods are GCN[ 4 ], GraphSAGE[ 6 ] and DGCNN[ 8 ].

3.1. Graph Convolutional Neural Network

Graph Convolutional Neural network (GCN) is a semi-supervised learning method for node classification. GCN is based on the message passing mechanism that learns from node’s features and its neighboring node’s features. We have used three GCN[ 4 ] layers and the fourth is a linear layer. We use Binary Cross Entropy loss and Adam Optimizer during training. The architecture of the implemented model is illustrated in Figure 1.

3.2. GraphSAGE

We use three GraphSAGE layers along with the RELU activation function and a linear layer. Further, Binary Cross Entropy loss and Adam Optimizer during training has been utilized for measuring the performance of the proposed model. Figure 2 illustrates the architecture of the implemented GraphSAGE model. The rationale behind using GraphSAGE is: it is inductive and tries to create embeddings by using sampling and aggregation features from the node local neighborhood. 3.2.1. Deep Graph Convolutional Neural Network DGCNN introduces a readout or pooling function which aggregates learned node embeddings to graph-level embedding[ 2 ].

We generate subgraphs for each node and transform the node classification problem to graph classification. The subgraphs are generated by taking the ego-network of up-to 3-hop neighbourhood of a node. This way we form the label mapping between node and corresponding ego-network of the node; the model classifies the label of a node’s ego-network, which is taken as the label of the node.

The implemented model consists of four GCN layers, 1D-MaxPooling in between two 1DConv layers, and a fully connected layer as shown in Figure 3.The network is trained using Binary Cross Entropy loss and the Adam optimizer with an initial learning rate of 0.1− 5 and a dropout of 0.5.

4. Results and Analysis

In this section, we present our experimental results obtained through the implemented models. We use 80 : 20 train and test split ratio and use accuracy, Matthews correlation coeficient (MCC) and Area Under the ROC (AUC) as evaluation metrics.

We report models’ performance in terms of three evaluation metrics in Table1 We conclude from the results that DGCNN performs quite better as compared to other two models.

5. Discussion and Outlook

As discussed the performance of implemented models in Section 4, these are the best results being obtained by using GCN, GraphSAGE, and DGCNN. The feature distribution of misinformation spreader and regular users is approximately equal through which the performance is considerably low. In order to improve results, combination of both Label Propagation (LPA) and GCN can help to classify misinformation spreaders and regular users efectively.

[1]

D. Das ,

Basak ,

Dutta , A heuristic-driven ensemble framework for covid-19 fake news detection , in: International Workshop onCombating Online Hostile Posts in Regional Languages during Emergency Situation , Springer, 2021 , pp. 164 - 176 .

[2]

Biradar ,

Saumya ,

Chauhan , Combating the infodemic: Covid-19 induced fake news recognition in social media networks , Complex & Intelligent Systems ( 2022 ) 1 - 13 .

[3] MediaEval, Fakenews: Corona virus and conspiracies multimedia analysis task , 2022 . URL: https: //github.com/konstapo/2022- Fake- News- MediaEval-Task/.

[4]

T. N.

Kipf ,

Welling , Semi-supervised classification with graph convolutional networks , arXiv preprint arXiv:1609.02907 ( 2016 ).

[5]

Sharma ,

Sharma , Identifying possible rumor spreaders on twitter: A weak supervised learning approach , in: 2021 International Joint Conference on Neural Networks (IJCNN) , IEEE, 2021 , pp. 1 - 8 .

[6]

Hamilton ,

Ying ,

Leskovec , Inductive representation learning on large graphs , Advances in neural information processing systems 30 ( 2017 ).

[7]

Xu ,

Hu ,

Leskovec ,

Jegelka , How powerful are graph neural networks? , arXiv preprint arXiv: 1810 . 00826 ( 2018 ).

[8]

Zhang ,

Cui ,

Neumann ,

Chen , An end-to-end deep learning architecture for graph classification , Proceedings of the AAAI Conference on Artificial Intelligence 32 ( 2018 ). URL: https: //ojs.aaai.org/index.php/AAAI/article/view/11782. doi: 10 .1609/aaai.v32i1. 11782 .