=Paper=
{{Paper
|id=Vol-2699/paper36
|storemode=property
|title=Epidemiology Inspired Framework for Fake News Mitigation in Social Networks
|pdfUrl=https://ceur-ws.org/Vol-2699/paper36.pdf
|volume=Vol-2699
|authors=Bhavtosh Rath,Jaideep Srivastava
|dblpUrl=https://dblp.org/rec/conf/cikm/RathS20
}}
==Epidemiology Inspired Framework for Fake News Mitigation in Social Networks==
<pdf width="1500px">https://ceur-ws.org/Vol-2699/paper36.pdf</pdf>
<pre>
Epidemiology Inspired Framework for Fake News
Mitigation in Social Networks
Bhavtosh Rath, Jaideep Srivastava
University of Minnesota, USA


                                    Abstract
                                    Research in fake news detection and prevention has gained a lot of attention over the past decade, with most models using
                                    features generated from content and propagation paths. Complementary to these approaches, in this position paper we
                                    outline a framework inspired from the domain of epidemiology that proposes to identify people who are likely to become
                                    fake news spreaders. The proposed framework can serve as motivation to build fake news mitigation models, even for the
                                    scenario when fake news has not yet originated. Some models based on the framework have been successfully evaluated on
                                    real world Twitter datasets and can provide motivation for new research directions.

                                    Keywords
                                    Fake news spreaders, Social networks, Epidemiology.


1. Introduction                                                                        In content-based approach the problem is formulated
                                                                                       as identifying whether content of a spreading informa-
The wide adoption of social media platforms like Face- tion is fake or not. Most proposed models rely on using
book, Twitter and WhatsApp has resulted in the cre- linguistic or visual based features. While earlier work
ation of behavioral big data, thus motivating researchers relied mostly on hand engineering relevant features,
to propose various computational models for combat- more recently deep learning based models have gained
ing fake news. So far the focus of most research has popularity as they can automatically generate relevant
been on determining veracity of the information using features. Propagation based approaches consider prop-
features extracted manually or automatically through agation paths of fake news and are mostly inspired
techniques such as deep learning. We propose a novel from information diffusion and cascade models. They
fake news prevention and control framework that in- are used to understand how information spreading pat-
corporates people’s behavioral data along with their terns can help distinguish fake news from true news.
network structure. Like in epidemiology, models pro- These models are usually integrated with content-based
posed within the framework cover the entire life cycle features to improve prediction performance. Major-
of spreading: i.e. before the fake news originates, af- ity of computational models for fake news detection
ter the fake news starts spreading and containment of from these two categories are summarized in [2]. User-
its further spreading. The framework is not to be con- based approaches focus more on peoples’ psychology.
fused with popular information diffusion based mod- While user-specific features can be included as part
els [1] because they a) usually categorize certain nodes of content-based models, there has also been some re-
and cannot be generalized to all nodes, b) consider only search exploring behavior patterns of individuals who
the propagation paths but not the underlying graph spread fake news. Behavioral principles like naive re-
structure and c) can be generalized to information dif- alism and confirmation bias (at individual level) have
fusion and need not be particular to fake news spread- been found to make fake news perceived as true, as
ing.                                                                                   stated in [3]. A phenomenon called echo chamber ef-
Related Work: Literature of research in fake news fect (at group level) has also been found to reinforce
detection and prevention strategies is vast, and can be people’s pre-existing biases, making them averse to ac-
divided broadly into three categories: Content-based, cepting opposing opinions [4]. The role of bots in fake
Propagation-based and User-based.                                                      news spreading has also been studied. More recently
                                                                                       work has been done to identify fake news spreaders [5]
Proceedings of the CIKM 2020 Workshops. October 19-20, Galway,                         which focus on modelling linguistic features but they
Ireland.
Editors of the Proceedings: Stefan Conrad, Ilaria Tiddi.                               do not integrate underlying network structure. Not
email: rathx082@umn.edu (B. Rath); srivasta@umn.edu (J.                                many computational models have been proposed explor-
Srivastava)                                                                            ing psychological concepts from historical behavioral data
orcid:                                                                                 that make people vulnerable to spreading fake news, which
          © 2020 Copyright for this paper by its authors. Use permitted under Creative
          Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
          CEUR Workshop Proceedings (CEUR-WS.org)
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                                                                       our proposed framework can be used to address.
Table 1
Mapping Epidemiological concept to fake news spreading.
                        Epidemiology context                         Fake news spreading context
  Infection                   Infection                                       Fake news
 Population           People and communities                        Nodes and modular sub-graphs
 Vulnerable      Likely to become infection carriers            Likely to become fake news spreaders
  Exposed              Neighbors are infected                  Neighbor nodes are fake news spreaders
 Spreaders                 Infected people                               Fake news spreaders
 Prevention                  Medication                                    Refutation news
   Control                  Immunization                                   Refutation news
 Recovered                 Infection cured                 Retract fake news and/or spread refutation news


   A major limitation with existing models is that they    news without verifying its claim if it is endorsed by a
rely on the presence of fake news to generate mean-        neighbor they trust); and b) the density of its neighbor-
ingful features, thus making it difficult to model fake    hood, similar to how high population density increases
news mitigation strategies. Our framework proposes         the likelihood of infection spreading, a modular net-
models using two important components that do not          work structure is more prone to fake news spreading.
rely on the presence of fake news: underlying network      After the infection spreading is identified there is a
structure and people’s historical behavioral data.         need to de-contaminate the population. A medicinal
   The rest of the paper is divided as follows: We ex-     cure is used to treat the infected population and thus
plain how epidemiological concepts can be mapped di-       prevent further spreading of infection. In the context
rectly to the problem of fake news spreading and mit-      of fake news, a refutation news can serve this pur-
igation. We then explain proposed models for detect-       pose. Refutation news can be defined as true news
ing fake news spreader using the Community Health          that fact-checks a fake news. Contents from popu-
Assessment model and also summarize current and fu-        lar fact-checking websites1 are examples of refutation
ture research based on the ideas. Finally we give our      news. In epidemiology the medicine can have two pur-
concluding remarks.                                        poses: As control mechanism (i.e. medication), with
                                                           the intention to cure infected people (i.e. explicitly
                                                           inform the fake news spreaders about the refutation
2. Epidemiology Inspired                                   news) and as prevention mechanism (i.e. immuniza-
     Framework                                             tion), with the intention to prevent uninfected popu-
                                                           lation from becoming infection carriers in future (i.e.
Epidemiology is the field of medicine which deals with prevent unexposed population from becoming fake news
the incidence, distribution and control of infection amongspreaders). An infected person is said to have recov-
populations. In the proposed framework fake news ered if he either decides to retract from sharing the
is analogous to infection, social network is analogous fake news or decides to share the refutation news, or
to population and the likelihood of people believing a both. Mapping of epidemiological concepts to the con-
news endorser in the immediate neighborhood is anal- text of fake news spreading is summarized in Table 1.
ogous to their vulnerability to getting infected when
exposed. We consider fake news as a pathogen that 3. Contributions
intends to infect as many people as possible. An im-
portant assumption we make is that fake news of all In this section we show how the framework has been
kinds is generalized as a single infection, unlike in epi- applied so far and how it is used to propose relevant
demiology where people have different levels of im- models.
munity against different kinds of infections (i.e. the 3.1. Community Health Assessment
framework is information agnostic). Also we do not
distinguish bots in the network population.
                                                                   model
   The likelihood of a person getting infected (i.e. be- A social network has the characteristic property to ex-
lieving and spreading the fake news) is dependent on hibit community structures that are formed based on
two important factors: a) the likelihood of trusting
a news endorser (a person is more likely to spread a           1 https://www.snopes.com/, https://www.politifact.com/
Table 2
Neighbor, boundary and core nodes for communities in Fig-
ure 1.
 𝑐𝑜𝑚       𝑐𝑜𝑚         𝑐𝑜𝑚                𝑐𝑜𝑚
  1           𝐷2           𝐶1       𝐴1 ,𝐵1 ,𝐸1 ,𝐷1 ,𝐹1 ,𝐺1
  2        𝐴6 ,𝐸6       𝐶2 ,𝐷2          𝐴2 ,𝐵2 ,𝐸2 ,𝐹2
  3       𝐷1 ,𝐷5 ,𝐸6    𝐴3 ,𝐶3          𝐵3 ,𝐷3 ,𝐸3 ,𝐹3
  4           𝐷3           𝐶4         𝐴4 ,𝐵4 ,𝐷4 ,𝐸4 ,𝐹4
  5       𝐷4 ,𝐷8 ,𝐸8   𝐷5 ,𝐴5 ,𝐶5           𝐸5 ,𝐵5
  6           𝐴5           𝐷6           𝐴6 ,𝐵6 ,𝐶6 ,𝐸6
  7           𝐵6           𝐴7       𝐵7 ,𝐶7 ,𝐷7 ,𝐸7 ,𝐹7 , 𝐺7
  8           𝐹7           𝐴8         𝐵8 ,𝐶8 ,𝐷8 ,𝐸8 ,𝐹8


inter-node interactions. Communities tend to be mod-
ular groups where within-group members are highly
connected, and across-group members are loosely con-
nected. Thus members within a community would tend Figure 1: Motivating example. Red nodes denote fake news
to have a higher degree of trust among each other than spreaders.
between members across different communities. If such
communities are exposed to fake news propagating in
its vicinity, the likelihood of all community members 3.2. Assessment, identification and
getting infected would be high. Thus it is important to
                                                                prevention
identify vulnerable individuals that lie in the path of
fake news spread to limit the overall spreading of fake To model a person’s likelihood to endorse a fake news
news in the network. The idea is illustrated in Figure 1. based on their belief in the endorser, we applied the
In the context of Twitter, directed edge 𝐵1 → 𝐴1 rep- Trust in Social Media (TSM) algorithm. It assigns a
resents 𝐵1 follows 𝐴1 . Thus information flows from 𝐴1 pair of complementary trust scores, called Trustingness
to 𝐵1 when 𝐵1 decided to retweet an information en- and Trustworthiness to every node in a social network.
dorsed by 𝐴1 . The goal would be to identify nodes that While trustingness quantifies the propensity of a node
are likely to believe and spread the fake news. Sub- to trust its neighbors, trustworthiness quantifies the
script of the nodes denote the community they belongs willingness of the neighbors to trust the node. Imple-
to. Motivated by the idea of ease of spreading within a mentation details for the algorithm can be found in [6].
community we proposed the Community Health As- Below we propose three phases for the framework and
sessment model. The model identifies three types of summarize models implemented so far with future di-
nodes with respect to a community: neighbor, bound- rections.
ary and core nodes, which are explained below:            1. Vulnerability assessment of population: In epi-
1. Neighbor nodes: These nodes are directly connected demiology, it is important to identify individuals and
to at least one node of the community. The set of groups that are vulnerable to fake news before the spread-
neighbor nodes is denoted by 𝑐𝑜𝑚 . They are not a ing begins. Borrowing ideas from the community health
part of the community.                                    assessment model, we proposed metrics that quantify
2. Boundary nodes: These are community nodes that the vulnerability of nodes and communities in a net-
are directly connected to at least one neighbor node. work. Through experiments on real world information
The set of boundary nodes is denoted by 𝑐𝑜𝑚 . It is im- spreading networks on Twitter, we showed that our
portant to note that only community nodes that have proposed metrics are more effective in identifying fake
an outgoing edge towards a neighbor nodes are in 𝑐𝑜𝑚 . news spreaders compared to true news spreaders, con-
3. Core nodes: These are community nodes that are firming our hypothesis that fake news relies strongly
only connected to members within the community. The on inter-personal trust to propagate while true news
set of core nodes is denoted by 𝑐𝑜𝑚 .                    does not. Details regarding the model implementation
   The neighbor, boundary and core nodes for commu- can be found in [7].
nities in Figure 1 are listed in Table 2.                 2. Identification of fake news spreaders: While de-
                                                          termining the veracity of information has been widely
       (a) Fake news reaches 𝑐𝑜𝑚             (b) Fake news reaches 𝑐𝑜𝑚             (c) Fake news reaches 𝑐𝑜𝑚

Figure 2: Community health assessment model perspective for fake news prevention and control.


researched, it is equally important to determine the shows the scenario where fake news has reached the
authenticity of the people who are spreading informa- two neighbor nodes (highlighted in red). Three bound-
tion. A model for automatic identification of people ary nodes (circled in red) are exposed to the fake news.
spreading fake news by leveraging the concept of Be- In (b) two out of three exposed boundary nodes be-
lievability (i.e. the extent to which the propagated in- come spreaders, and marks the beginning of fake news
formation is likely to be perceived as truthful) is pro- spreading within the community. And in (c), one of the
posed. With the retweet network edge-weighted by two exposed core nodes become spreader.
believability scores, network representation learning        Thus using community health assessment model we
is used to generate node embeddings, which is lever- can build models that predict both exposed (i.e. bound-
aged to classify users as fake news spreaders or not ary nodes) and unexposed (i.e. core nodes) nodes that
using a recurrent neural network classifier. Based on would likely become fake news spreaders after infec-
experiments on a very large real-world rumor dataset tion spreading has begun (i.e. fake news has reached
collected from Twitter, we could effectively identify neighbor nodes). Effective mitigation strategies could
false information spreaders. Further details can be found then be deployed against predicted spreaders.
in [8].
3. Prevention and control of infection spreading:
Motivation for this problem can be explained through 4. Conclusion
Figure 1. 𝐷1 , a neighbor node for community 3 is a
                                                          In this position paper we proposed a novel epidemi-
fake news spreader. Node 𝐴3 , a boundary node is ex-
                                                          ology inspired framework and showed how the com-
posed and likely to start fake news spreading in com-
                                                          munity health assessment model can be used to build
munity 3. To prevent such a scenario it is important
                                                          models for fake news mitigation, a problem less ex-
to predict boundary nodes of all communities in a net-
                                                          plored compared to fake news detection. What makes
work that are likely to become fake news spreaders
                                                          it different from most existing research is that a) it pro-
when the infection has reached neighbor nodes. Sim-
                                                          poses a more spreader-centric modelling approach in-
ilarly, consider the scenario where 𝐴3 is a fake news
                                                          stead of content-centric approach, and b) it does not
spreader. Members of the community 𝐵3 , 𝐷3 and 𝐸3
                                                          rely on features extracted from fake news thus serving
which are immediate followers of 𝐴3 are now exposed
                                                          as motivation to build fake news mitigation strategies,
to the fake news, and the remaining community mem-
                                                          even for the scenario when fake news has not yet orig-
bers are two steps away. Due to their close proximity
                                                          inated. Recent work that apply few of the ideas have
they too are vulnerable to believing 𝐴3 and causing
                                                          shown encouraging results, thus serving as motivation
infection to spread throughout the community. Thus
                                                          to pursue the idea further. A limitation of our model
it is important to identify core nodes that would be-
                                                          is that it does not not incorporate the dynamic nature
come likely spreaders when the infection has reached
                                                          of social network structure. As part of future work we
boundary nodes. The scenarios are explained in Fig-
                                                          would like to incorporate eliminating the presence of
ure 2 applying the community health assessment model.
                                                          bots as we are focusing on modeling psychological and
Nodes inside the dotted oval denote core nodes, be-
                                                          sociological properties based on behavioral data.
tween dotted and solid oval denote boundary nodes
and outside the solid oval denote neighbor nodes. (a)
References
[1] F. Jin, E. Dougherty, P. Saraf, Y. Cao, N. Ramakr-
    ishnan, Epidemiological modeling of news and ru-
    mors on twitter, in: Proceedings of the 7th work-
    shop on social network mining and analysis, 2013,
    pp. 1–9.
[2] K. Sharma, F. Qian, H. Jiang, N. Ruchansky,
    M. Zhang, Y. Liu, Combating fake news: A survey
    on identification and mitigation techniques, ACM
    Transactions on Intelligent Systems and Technol-
    ogy (TIST) 10 (2019) 1–42.
[3] K. Shu, A. Sliva, S. Wang, J. Tang, H. Liu, Fake
    news detection on social media: A data mining
    perspective, ACM SIGKDD explorations newslet-
    ter 19 (2017) 22–36.
[4] M. Del Vicario, G. Vivaldo, A. Bessi, F. Zollo,
    A. Scala, G. Caldarelli, W. Quattrociocchi, Echo
    chambers: Emotional contagion and group polar-
    ization on facebook, Scientific reports 6 (2016)
    37825.
[5] J. Bevendorff, B. Ghanem, A. Giachanou, M. Keste-
    mont, E. Manjavacas, M. Potthast, F. Rangel,
    P. Rosso, G. Specht, E. Stamatatos, et al., Shared
    tasks on authorship analysis at pan 2020, in:
    European Conference on Information Retrieval,
    Springer, 2020, pp. 508–516.
[6] A. Roy, C. Sarkar, J. Srivastava, J. Huh, Trust-
    ingness & trustworthiness: A pair of comple-
    mentary trust measures in a social network, in:
    2016 IEEE/ACM International Conference on Ad-
    vances in Social Networks Analysis and Mining
    (ASONAM), IEEE, 2016, pp. 549–554.
[7] B. Rath, W. Gao, J. Srivastava, Evaluating vulnera-
    bility to fake news in social networks: A commu-
    nity health assessment model, in: 2019 IEEE/ACM
    International Conference on Advances in Social
    Networks Analysis and Mining (ASONAM), IEEE,
    2019, pp. 432–435.
[8] B. Rath, W. Gao, J. Ma, J. Srivastava, Utilizing
    computational trust to identify rumor spreaders
    on twitter, Social Network Analysis and Mining
    8 (2018) 64.

</pre>