=Paper=
{{Paper
|id=Vol-2699/paper36
|storemode=property
|title=Epidemiology Inspired Framework for Fake News Mitigation in Social Networks
|pdfUrl=https://ceur-ws.org/Vol-2699/paper36.pdf
|volume=Vol-2699
|authors=Bhavtosh Rath,Jaideep Srivastava
|dblpUrl=https://dblp.org/rec/conf/cikm/RathS20
}}
==Epidemiology Inspired Framework for Fake News Mitigation in Social Networks==
Epidemiology Inspired Framework for Fake News Mitigation in Social Networks Bhavtosh Rath, Jaideep Srivastava University of Minnesota, USA Abstract Research in fake news detection and prevention has gained a lot of attention over the past decade, with most models using features generated from content and propagation paths. Complementary to these approaches, in this position paper we outline a framework inspired from the domain of epidemiology that proposes to identify people who are likely to become fake news spreaders. The proposed framework can serve as motivation to build fake news mitigation models, even for the scenario when fake news has not yet originated. Some models based on the framework have been successfully evaluated on real world Twitter datasets and can provide motivation for new research directions. Keywords Fake news spreaders, Social networks, Epidemiology. 1. Introduction In content-based approach the problem is formulated as identifying whether content of a spreading informa- The wide adoption of social media platforms like Face- tion is fake or not. Most proposed models rely on using book, Twitter and WhatsApp has resulted in the cre- linguistic or visual based features. While earlier work ation of behavioral big data, thus motivating researchers relied mostly on hand engineering relevant features, to propose various computational models for combat- more recently deep learning based models have gained ing fake news. So far the focus of most research has popularity as they can automatically generate relevant been on determining veracity of the information using features. Propagation based approaches consider prop- features extracted manually or automatically through agation paths of fake news and are mostly inspired techniques such as deep learning. We propose a novel from information diffusion and cascade models. They fake news prevention and control framework that in- are used to understand how information spreading pat- corporates people’s behavioral data along with their terns can help distinguish fake news from true news. network structure. Like in epidemiology, models pro- These models are usually integrated with content-based posed within the framework cover the entire life cycle features to improve prediction performance. Major- of spreading: i.e. before the fake news originates, af- ity of computational models for fake news detection ter the fake news starts spreading and containment of from these two categories are summarized in [2]. User- its further spreading. The framework is not to be con- based approaches focus more on peoples’ psychology. fused with popular information diffusion based mod- While user-specific features can be included as part els [1] because they a) usually categorize certain nodes of content-based models, there has also been some re- and cannot be generalized to all nodes, b) consider only search exploring behavior patterns of individuals who the propagation paths but not the underlying graph spread fake news. Behavioral principles like naive re- structure and c) can be generalized to information dif- alism and confirmation bias (at individual level) have fusion and need not be particular to fake news spread- been found to make fake news perceived as true, as ing. stated in [3]. A phenomenon called echo chamber ef- Related Work: Literature of research in fake news fect (at group level) has also been found to reinforce detection and prevention strategies is vast, and can be people’s pre-existing biases, making them averse to ac- divided broadly into three categories: Content-based, cepting opposing opinions [4]. The role of bots in fake Propagation-based and User-based. news spreading has also been studied. More recently work has been done to identify fake news spreaders [5] Proceedings of the CIKM 2020 Workshops. October 19-20, Galway, which focus on modelling linguistic features but they Ireland. Editors of the Proceedings: Stefan Conrad, Ilaria Tiddi. do not integrate underlying network structure. Not email: rathx082@umn.edu (B. Rath); srivasta@umn.edu (J. many computational models have been proposed explor- Srivastava) ing psychological concepts from historical behavioral data orcid: that make people vulnerable to spreading fake news, which © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop CEUR Workshop Proceedings (CEUR-WS.org) Proceedings http://ceur-ws.org ISSN 1613-0073 our proposed framework can be used to address. Table 1 Mapping Epidemiological concept to fake news spreading. Epidemiology context Fake news spreading context Infection Infection Fake news Population People and communities Nodes and modular sub-graphs Vulnerable Likely to become infection carriers Likely to become fake news spreaders Exposed Neighbors are infected Neighbor nodes are fake news spreaders Spreaders Infected people Fake news spreaders Prevention Medication Refutation news Control Immunization Refutation news Recovered Infection cured Retract fake news and/or spread refutation news A major limitation with existing models is that they news without verifying its claim if it is endorsed by a rely on the presence of fake news to generate mean- neighbor they trust); and b) the density of its neighbor- ingful features, thus making it difficult to model fake hood, similar to how high population density increases news mitigation strategies. Our framework proposes the likelihood of infection spreading, a modular net- models using two important components that do not work structure is more prone to fake news spreading. rely on the presence of fake news: underlying network After the infection spreading is identified there is a structure and people’s historical behavioral data. need to de-contaminate the population. A medicinal The rest of the paper is divided as follows: We ex- cure is used to treat the infected population and thus plain how epidemiological concepts can be mapped di- prevent further spreading of infection. In the context rectly to the problem of fake news spreading and mit- of fake news, a refutation news can serve this pur- igation. We then explain proposed models for detect- pose. Refutation news can be defined as true news ing fake news spreader using the Community Health that fact-checks a fake news. Contents from popu- Assessment model and also summarize current and fu- lar fact-checking websites1 are examples of refutation ture research based on the ideas. Finally we give our news. In epidemiology the medicine can have two pur- concluding remarks. poses: As control mechanism (i.e. medication), with the intention to cure infected people (i.e. explicitly inform the fake news spreaders about the refutation 2. Epidemiology Inspired news) and as prevention mechanism (i.e. immuniza- Framework tion), with the intention to prevent uninfected popu- lation from becoming infection carriers in future (i.e. Epidemiology is the field of medicine which deals with prevent unexposed population from becoming fake news the incidence, distribution and control of infection amongspreaders). An infected person is said to have recov- populations. In the proposed framework fake news ered if he either decides to retract from sharing the is analogous to infection, social network is analogous fake news or decides to share the refutation news, or to population and the likelihood of people believing a both. Mapping of epidemiological concepts to the con- news endorser in the immediate neighborhood is anal- text of fake news spreading is summarized in Table 1. ogous to their vulnerability to getting infected when exposed. We consider fake news as a pathogen that 3. Contributions intends to infect as many people as possible. An im- portant assumption we make is that fake news of all In this section we show how the framework has been kinds is generalized as a single infection, unlike in epi- applied so far and how it is used to propose relevant demiology where people have different levels of im- models. munity against different kinds of infections (i.e. the 3.1. Community Health Assessment framework is information agnostic). Also we do not distinguish bots in the network population. model The likelihood of a person getting infected (i.e. be- A social network has the characteristic property to ex- lieving and spreading the fake news) is dependent on hibit community structures that are formed based on two important factors: a) the likelihood of trusting a news endorser (a person is more likely to spread a 1 https://www.snopes.com/, https://www.politifact.com/ Table 2 Neighbor, boundary and core nodes for communities in Fig- ure 1. 𝑐𝑜𝑚 𝑐𝑜𝑚 𝑐𝑜𝑚 𝑐𝑜𝑚 1 𝐷2 𝐶1 𝐴1 ,𝐵1 ,𝐸1 ,𝐷1 ,𝐹1 ,𝐺1 2 𝐴6 ,𝐸6 𝐶2 ,𝐷2 𝐴2 ,𝐵2 ,𝐸2 ,𝐹2 3 𝐷1 ,𝐷5 ,𝐸6 𝐴3 ,𝐶3 𝐵3 ,𝐷3 ,𝐸3 ,𝐹3 4 𝐷3 𝐶4 𝐴4 ,𝐵4 ,𝐷4 ,𝐸4 ,𝐹4 5 𝐷4 ,𝐷8 ,𝐸8 𝐷5 ,𝐴5 ,𝐶5 𝐸5 ,𝐵5 6 𝐴5 𝐷6 𝐴6 ,𝐵6 ,𝐶6 ,𝐸6 7 𝐵6 𝐴7 𝐵7 ,𝐶7 ,𝐷7 ,𝐸7 ,𝐹7 , 𝐺7 8 𝐹7 𝐴8 𝐵8 ,𝐶8 ,𝐷8 ,𝐸8 ,𝐹8 inter-node interactions. Communities tend to be mod- ular groups where within-group members are highly connected, and across-group members are loosely con- nected. Thus members within a community would tend Figure 1: Motivating example. Red nodes denote fake news to have a higher degree of trust among each other than spreaders. between members across different communities. If such communities are exposed to fake news propagating in its vicinity, the likelihood of all community members 3.2. Assessment, identification and getting infected would be high. Thus it is important to prevention identify vulnerable individuals that lie in the path of fake news spread to limit the overall spreading of fake To model a person’s likelihood to endorse a fake news news in the network. The idea is illustrated in Figure 1. based on their belief in the endorser, we applied the In the context of Twitter, directed edge 𝐵1 → 𝐴1 rep- Trust in Social Media (TSM) algorithm. It assigns a resents 𝐵1 follows 𝐴1 . Thus information flows from 𝐴1 pair of complementary trust scores, called Trustingness to 𝐵1 when 𝐵1 decided to retweet an information en- and Trustworthiness to every node in a social network. dorsed by 𝐴1 . The goal would be to identify nodes that While trustingness quantifies the propensity of a node are likely to believe and spread the fake news. Sub- to trust its neighbors, trustworthiness quantifies the script of the nodes denote the community they belongs willingness of the neighbors to trust the node. Imple- to. Motivated by the idea of ease of spreading within a mentation details for the algorithm can be found in [6]. community we proposed the Community Health As- Below we propose three phases for the framework and sessment model. The model identifies three types of summarize models implemented so far with future di- nodes with respect to a community: neighbor, bound- rections. ary and core nodes, which are explained below: 1. Vulnerability assessment of population: In epi- 1. Neighbor nodes: These nodes are directly connected demiology, it is important to identify individuals and to at least one node of the community. The set of groups that are vulnerable to fake news before the spread- neighbor nodes is denoted by 𝑐𝑜𝑚 . They are not a ing begins. Borrowing ideas from the community health part of the community. assessment model, we proposed metrics that quantify 2. Boundary nodes: These are community nodes that the vulnerability of nodes and communities in a net- are directly connected to at least one neighbor node. work. Through experiments on real world information The set of boundary nodes is denoted by 𝑐𝑜𝑚 . It is im- spreading networks on Twitter, we showed that our portant to note that only community nodes that have proposed metrics are more effective in identifying fake an outgoing edge towards a neighbor nodes are in 𝑐𝑜𝑚 . news spreaders compared to true news spreaders, con- 3. Core nodes: These are community nodes that are firming our hypothesis that fake news relies strongly only connected to members within the community. The on inter-personal trust to propagate while true news set of core nodes is denoted by 𝑐𝑜𝑚 . does not. Details regarding the model implementation The neighbor, boundary and core nodes for commu- can be found in [7]. nities in Figure 1 are listed in Table 2. 2. Identification of fake news spreaders: While de- termining the veracity of information has been widely (a) Fake news reaches 𝑐𝑜𝑚 (b) Fake news reaches 𝑐𝑜𝑚 (c) Fake news reaches 𝑐𝑜𝑚 Figure 2: Community health assessment model perspective for fake news prevention and control. researched, it is equally important to determine the shows the scenario where fake news has reached the authenticity of the people who are spreading informa- two neighbor nodes (highlighted in red). Three bound- tion. A model for automatic identification of people ary nodes (circled in red) are exposed to the fake news. spreading fake news by leveraging the concept of Be- In (b) two out of three exposed boundary nodes be- lievability (i.e. the extent to which the propagated in- come spreaders, and marks the beginning of fake news formation is likely to be perceived as truthful) is pro- spreading within the community. And in (c), one of the posed. With the retweet network edge-weighted by two exposed core nodes become spreader. believability scores, network representation learning Thus using community health assessment model we is used to generate node embeddings, which is lever- can build models that predict both exposed (i.e. bound- aged to classify users as fake news spreaders or not ary nodes) and unexposed (i.e. core nodes) nodes that using a recurrent neural network classifier. Based on would likely become fake news spreaders after infec- experiments on a very large real-world rumor dataset tion spreading has begun (i.e. fake news has reached collected from Twitter, we could effectively identify neighbor nodes). Effective mitigation strategies could false information spreaders. Further details can be found then be deployed against predicted spreaders. in [8]. 3. Prevention and control of infection spreading: Motivation for this problem can be explained through 4. Conclusion Figure 1. 𝐷1 , a neighbor node for community 3 is a In this position paper we proposed a novel epidemi- fake news spreader. Node 𝐴3 , a boundary node is ex- ology inspired framework and showed how the com- posed and likely to start fake news spreading in com- munity health assessment model can be used to build munity 3. To prevent such a scenario it is important models for fake news mitigation, a problem less ex- to predict boundary nodes of all communities in a net- plored compared to fake news detection. What makes work that are likely to become fake news spreaders it different from most existing research is that a) it pro- when the infection has reached neighbor nodes. Sim- poses a more spreader-centric modelling approach in- ilarly, consider the scenario where 𝐴3 is a fake news stead of content-centric approach, and b) it does not spreader. Members of the community 𝐵3 , 𝐷3 and 𝐸3 rely on features extracted from fake news thus serving which are immediate followers of 𝐴3 are now exposed as motivation to build fake news mitigation strategies, to the fake news, and the remaining community mem- even for the scenario when fake news has not yet orig- bers are two steps away. Due to their close proximity inated. Recent work that apply few of the ideas have they too are vulnerable to believing 𝐴3 and causing shown encouraging results, thus serving as motivation infection to spread throughout the community. Thus to pursue the idea further. A limitation of our model it is important to identify core nodes that would be- is that it does not not incorporate the dynamic nature come likely spreaders when the infection has reached of social network structure. As part of future work we boundary nodes. The scenarios are explained in Fig- would like to incorporate eliminating the presence of ure 2 applying the community health assessment model. bots as we are focusing on modeling psychological and Nodes inside the dotted oval denote core nodes, be- sociological properties based on behavioral data. tween dotted and solid oval denote boundary nodes and outside the solid oval denote neighbor nodes. (a) References [1] F. Jin, E. Dougherty, P. Saraf, Y. Cao, N. Ramakr- ishnan, Epidemiological modeling of news and ru- mors on twitter, in: Proceedings of the 7th work- shop on social network mining and analysis, 2013, pp. 1–9. [2] K. Sharma, F. Qian, H. Jiang, N. Ruchansky, M. Zhang, Y. Liu, Combating fake news: A survey on identification and mitigation techniques, ACM Transactions on Intelligent Systems and Technol- ogy (TIST) 10 (2019) 1–42. [3] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake news detection on social media: A data mining perspective, ACM SIGKDD explorations newslet- ter 19 (2017) 22–36. [4] M. Del Vicario, G. Vivaldo, A. Bessi, F. Zollo, A. Scala, G. Caldarelli, W. Quattrociocchi, Echo chambers: Emotional contagion and group polar- ization on facebook, Scientific reports 6 (2016) 37825. [5] J. Bevendorff, B. Ghanem, A. Giachanou, M. Keste- mont, E. Manjavacas, M. Potthast, F. Rangel, P. Rosso, G. Specht, E. Stamatatos, et al., Shared tasks on authorship analysis at pan 2020, in: European Conference on Information Retrieval, Springer, 2020, pp. 508–516. [6] A. Roy, C. Sarkar, J. Srivastava, J. Huh, Trust- ingness & trustworthiness: A pair of comple- mentary trust measures in a social network, in: 2016 IEEE/ACM International Conference on Ad- vances in Social Networks Analysis and Mining (ASONAM), IEEE, 2016, pp. 549–554. [7] B. Rath, W. Gao, J. Srivastava, Evaluating vulnera- bility to fake news in social networks: A commu- nity health assessment model, in: 2019 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), IEEE, 2019, pp. 432–435. [8] B. Rath, W. Gao, J. Ma, J. Srivastava, Utilizing computational trust to identify rumor spreaders on twitter, Social Network Analysis and Mining 8 (2018) 64.