-

Epidemiology Inspired Framework for Fake News Mitigation in Social Networks

Bhavtosh Rath

Jaideep Srivastava

0 0 University of Minnesota , USA

Research in fake news detection and prevention has gained a lot of attention over the past decade, with most models using features generated from content and propagation paths. Complementary to these approaches, in this position paper we outline a framework inspired from the domain of epidemiology that proposes to identify people who are likely to become fake news spreaders. The proposed framework can serve as motivation to build fake news mitigation models, even for the scenario when fake news has not yet originated. Some models based on the framework have been successfully evaluated on real world Twitter datasets and can provide motivation for new research directions.

eol>Fake news spreaders Social networks Epidemiology

1. Introduction

In content-based approach the problem is formulated as identifying whether content of a spreading informaThe wide adoption of social media platforms like Face- tion is fake or not. Most proposed models rely on using book, Twitter and WhatsApp has resulted in the cre- linguistic or visual based features. While earlier work ation of behavioral big data, thus motivating researchers relied mostly on hand engineering relevant features, to propose various computational models for combat- more recently deep learning based models have gained ing fake news. So far the focus of most research has popularity as they can automatically generate relevant been on determining veracity of the information using features. Propagation based approaches consider propfeatures extracted manually or automatically through agation paths of fake news and are mostly inspired techniques such as deep learning. We propose a novel from information difusion and cascade models. They fake news prevention and control framework that in- are used to understand how information spreading patcorporates people’s behavioral data along with their terns can help distinguish fake news from true news. network structure. Like in epidemiology, models pro- These models are usually integrated with content-based posed within the framework cover the entire life cycle features to improve prediction performance. Majorof spreading: i.e. before the fake news originates, af- ity of computational models for fake news detection ter the fake news starts spreading and containment of from these two categories are summarized in [ 2 ]. Userits further spreading. The framework is not to be con- based approaches focus more on peoples’ psychology. fused with popular information difusion based mod- While user-specific features can be included as part els [ 1 ] because they a) usually categorize certain nodes of content-based models, there has also been some reand cannot be generalized to all nodes, b) consider only search exploring behavior patterns of individuals who the propagation paths but not the underlying graph spread fake news. Behavioral principles like naive restructure and c) can be generalized to information dif- alism and confirmation bias (at individual level) have fusion and need not be particular to fake news spread- been found to make fake news perceived as true, as ing. stated in [ 3 ]. A phenomenon called echo chamber efRelated Work: Literature of research in fake news fect (at group level) has also been found to reinforce detection and prevention strategies is vast, and can be people’s pre-existing biases, making them averse to acdivided broadly into three categories: Content-based, cepting opposing opinions [ 4 ]. The role of bots in fake Propagation-based and User-based. news spreading has also been studied. More recently work has been done to identify fake news spreaders [ 5 ] Proceedings of the CIKM 2020 Workshops. October 19-20, Galway, which focus on modelling linguistic features but they IErdeiltaonrds.of the Proceedings: Stefan Conrad, Ilaria Tiddi. do not integrate underlying network structure. Not email: rathx082@umn.edu (B. Rath); srivasta@umn.edu (J. many computational models have been proposed explorSrivastava) ing psychological concepts from historical behavioral data orcid: that make people vulnerable to spreading fake news, which our proposed framework can be used to address.

© 2020 Copyright for this paper by its authors. Use permitted under Creative CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g CCoEmUmoRns WLiceonrsekAsthtriobuptioPnr4o.0cIneteerdnaitniognasl ((CCC EBYU4R.0)-.WS.org)

A major limitation with existing models is that they news without verifying its claim if it is endorsed by a rely on the presence of fake news to generate mean- neighbor they trust); and b) the density of its neighboringful features, thus making it dificult to model fake hood, similar to how high population density increases news mitigation strategies. Our framework proposes the likelihood of infection spreading, a modular netmodels using two important components that do not work structure is more prone to fake news spreading. rely on the presence of fake news: underlying network After the infection spreading is identified there is a structure and people’s historical behavioral data. need to de-contaminate the population. A medicinal

The rest of the paper is divided as follows: We ex- cure is used to treat the infected population and thus plain how epidemiological concepts can be mapped di- prevent further spreading of infection. In the context rectly to the problem of fake news spreading and mit- of fake news, a refutation news can serve this purigation. We then explain proposed models for detect- pose. Refutation news can be defined as true news ing fake news spreader using the Community Health that fact-checks a fake news. Contents from popuAssessment model and also summarize current and fu- lar fact-checking websites1 are examples of refutation ture research based on the ideas. Finally we give our news. In epidemiology the medicine can have two purconcluding remarks. poses: As control mechanism (i.e. medication), with the intention to cure infected people (i.e. explicitly inform the fake news spreaders about the refutation 2. Epidemiology Inspired news) and as prevention mechanism (i.e. immunizaFramework tion), with the intention to prevent uninfected population from becoming infection carriers in future (i.e.

Epidemiology is the field of medicine which deals with prevent unexposed population from becoming fake news the incidence, distribution and control of infection amongspreaders). An infected person is said to have recovpopulations. In the proposed framework fake news ered if he either decides to retract from sharing the is analogous to infection, social network is analogous fake news or decides to share the refutation news, or to population and the likelihood of people believing a both. Mapping of epidemiological concepts to the connews endorser in the immediate neighborhood is anal- text of fake news spreading is summarized in Table 1. ogous to their vulnerability to getting infected when exposed. We consider fake news as a pathogen that 3. Contributions intends to infect as many people as possible. An important assumption we make is that fake news of all In this section we show how the framework has been kinds is generalized as a single infection, unlike in epi- applied so far and how it is used to propose relevant demiology where people have diferent levels of im- models. munity against diferent kinds of infections (i.e. the 3.1. Community Health Assessment framework is information agnostic). Also we do not distinguish bots in the network population. model

The likelihood of a person getting infected (i.e. be- A social network has the characteristic property to exlieving and spreading the fake news) is dependent on hibit community structures that are formed based on two important factors: a) the likelihood of trusting a news endorser (a person is more likely to spread a 1https://www.snopes.com/, https://www.politifact.com/ inter-node interactions. Communities tend to be modular groups where within-group members are highly connected, and across-group members are loosely connected. Thus members within a community would tend Figure 1: Motivating example. Red nodes denote fake news to have a higher degree of trust among each other than spreaders. between members across diferent communities. If such communities are exposed to fake news propagating in its vicinity, the likelihood of all community members 3.2. Assessment, identification and getting infected would be high. Thus it is important to prevention identify vulnerable individuals that lie in the path of fake news spread to limit the overall spreading of fake To model a person’s likelihood to endorse a fake news news in the network. The idea is illustrated in Figure 1. based on their belief in the endorser, we applied the In the context of Twitter, directed edge 1 → 1 rep- Trust in Social Media (TSM) algorithm. It assigns a resents 1 follows 1. Thus information flows from 1 pair of complementary trust scores, called Trustingness to 1 when 1 decided to retweet an information en- and Trustworthiness to every node in a social network. dorsed by 1. The goal would be to identify nodes that While trustingness quantifies the propensity of a node are likely to believe and spread the fake news. Sub- to trust its neighbors, trustworthiness quantifies the script of the nodes denote the community they belongs willingness of the neighbors to trust the node. Impleto. Motivated by the idea of ease of spreading within a mentation details for the algorithm can be found in [ 6 ]. community we proposed the Community Health As- Below we propose three phases for the framework and sessment model. The model identifies three types of summarize models implemented so far with future dinodes with respect to a community: neighbor, bound- rections. ary and core nodes, which are explained below: 1. Vulnerability assessment of population: In epi1. Neighbor nodes: These nodes are directly connected demiology, it is important to identify individuals and to at least one node of the community. The set of groups that are vulnerable to fake news before the spreadneighbor nodes is denoted by  . They are not a ing begins. Borrowing ideas from the community health part of the community. assessment model, we proposed metrics that quantify 2. Boundary nodes: These are community nodes that the vulnerability of nodes and communities in a netare directly connected to at least one neighbor node. work. Through experiments on real world information The set of boundary nodes is denoted by  . It is im- spreading networks on Twitter, we showed that our portant to note that only community nodes that have proposed metrics are more efective in identifying fake an outgoing edge towards a neighbor nodes are in  . news spreaders compared to true news spreaders, con3. Core nodes: These are community nodes that are ifrming our hypothesis that fake news relies strongly only connected to members within the community. The on inter-personal trust to propagate while true news set of core nodes is denoted by  . does not. Details regarding the model implementation

The neighbor, boundary and core nodes for commu- can be found in [ 7 ]. nities in Figure 1 are listed in Table 2. 2. Identification of fake news spreaders: While determining the veracity of information has been widely (a) Fake news reaches  (b) Fake news reaches  (c) Fake news reaches  researched, it is equally important to determine the shows the scenario where fake news has reached the authenticity of the people who are spreading informa- two neighbor nodes (highlighted in red). Three boundtion. A model for automatic identification of people ary nodes (circled in red) are exposed to the fake news. spreading fake news by leveraging the concept of Be- In (b) two out of three exposed boundary nodes believability (i.e. the extent to which the propagated in- come spreaders, and marks the beginning of fake news formation is likely to be perceived as truthful) is pro- spreading within the community. And in (c), one of the posed. With the retweet network edge-weighted by two exposed core nodes become spreader. believability scores, network representation learning Thus using community health assessment model we is used to generate node embeddings, which is lever- can build models that predict both exposed (i.e. boundaged to classify users as fake news spreaders or not ary nodes) and unexposed (i.e. core nodes) nodes that using a recurrent neural network classifier. Based on would likely become fake news spreaders after infecexperiments on a very large real-world rumor dataset tion spreading has begun (i.e. fake news has reached collected from Twitter, we could efectively identify neighbor nodes). Efective mitigation strategies could false information spreaders. Further details can be found then be deployed against predicted spreaders. in [ 8 ]. 3. Prevention and control of infection spreading: Motivation for this problem can be explained through 4. Conclusion Figure 1. 1, a neighbor node for community 3 is a fake news spreader. Node 3, a boundary node is ex- In this position paper we proposed a novel epidemiposed and likely to start fake news spreading in com- ology inspired framework and showed how the community 3. To prevent such a scenario it is important munity health assessment model can be used to build to predict boundary nodes of all communities in a net- models for fake news mitigation, a problem less exwork that are likely to become fake news spreaders plored compared to fake news detection. What makes when the infection has reached neighbor nodes. Sim- it diferent from most existing research is that a) it proilarly, consider the scenario where 3 is a fake news poses a more spreader-centric modelling approach inswphreicahdearr.e Mimemmebdeirasteoffotlhloewceormsomf un3itayr e n3,ow3exapnodsed3 rsetelyadonoffecaotnutreenste-xcternatcrtiecdafprpormoafachke, annedwsb)thitusdoseersvninogt to the fake news, and the remaining community mem- as motivation to build fake news mitigation strategies, bers are two steps away. Due to their close proximity even for the scenario when fake news has not yet origthey too are vulnerable to believing 3 and causing inated. Recent work that apply few of the ideas have infection to spread throughout the community. Thus shown encouraging results, thus serving as motivation it is important to identify core nodes that would be- to pursue the idea further. A limitation of our model come likely spreaders when the infection has reached is that it does not not incorporate the dynamic nature boundary nodes. The scenarios are explained in Fig- of social network structure. As part of future work we ure 2 applying the community health assessment model. would like to incorporate eliminating the presence of Nodes inside the dotted oval denote core nodes, be- bots as we are focusing on modeling psychological and tween dotted and solid oval denote boundary nodes sociological properties based on behavioral data. and outside the solid oval denote neighbor nodes. (a)

[1]

Jin , E. Dougherty,

Saraf ,

Cao ,

Ramakrishnan , Epidemiological modeling of news and rumors on twitter , in: Proceedings of the 7th workshop on social network mining and analysis , 2013 , pp. 1 - 9 .

[2]

Sharma ,

Qian ,

Jiang ,

Ruchansky ,

Zhang , Y. Liu, Combating fake news: A survey on identification and mitigation techniques , ACM Transactions on Intelligent Systems and Technology (TIST) 10 ( 2019 ) 1 - 42 .

[3]

Shu ,

Sliva ,

Wang ,

Tang , H. Liu, Fake news detection on social media: A data mining perspective , ACM SIGKDD explorations newsletter 19 ( 2017 ) 22 - 36 .

[4]

Del Vicario ,

Vivaldo ,

Bessi ,

Zollo ,

Scala , G. Caldarelli, W. Quattrociocchi, Echo chambers: Emotional contagion and group polarization on facebook, Scientific reports 6 ( 2016 ) 37825 .

[5]

Bevendorf ,

Ghanem ,

Giachanou ,

Kestemont , E. Manjavacas,

Potthast ,

Rangel ,

Rosso ,

Specht ,

Stamatatos , et al., Shared tasks on authorship analysis at pan 2020 , in: European Conference on Information Retrieval, Springer, 2020 , pp. 508 - 516 .

[6]

Roy ,

Sarkar ,

Srivastava ,

Huh , Trustingness & trustworthiness: A pair of complementary trust measures in a social network , in: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) , IEEE, 2016 , pp. 549 - 554 .

[7]

Rath ,

Gao ,

Srivastava , Evaluating vulnerability to fake news in social networks: A community health assessment model , in: 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM) , IEEE, 2019 , pp. 432 - 435 .

[8]

Rath ,

Gao , J. Ma, J. Srivastava, Utilizing computational trust to identify rumor spreaders on twitter , Social Network Analysis and Mining 8 ( 2018 ) 64 .