-

A Framework for Complex In uence Propagation based on the F 2DLT Class of Di usion Models

Antonio Calio

Andrea Tagarelli

tagarellig@dimes.unical.it 0 0 Dept. Computer Engineering , Modeling, Electronics , and System Engineering, University of Calabria , Rende (CS) , Italy

What are the key-features that enable an information diffusion model to explain the inherent complex dynamics of real-world propagation phenomena? To answer the above question, we discuss a novel class of stochastic Linear Threshold (LT) di usion models, which are designed to capture the following aspects in in uence propagation scenarios: trust/distrust in the user relationships, changes in adopting one or alternative information items, hesitation towards adopting an information item over time, latency in the propagation, time horizon for the unfolding of the di usion process, and multiple cascades of information that might occur competitively. Around all such aspects, our de ned Friend-Foe Dynamic LT (F 2DLT ) class comprises a non-competitive model as well as two competitive models, which are able to represent semi-progressivity and non-progressivity, respectively, in the propagation process. The above key-constituents embedded in our models make them unique in the literature of di usion models, including epidemic models. To validate our models through real-world networks, we also discuss di erent strategies for the selection of the initial in uencers to mimic non-competitive and competitive di usion scenarios, inspired by the widely-known problem of limitation of misinformation spread. Finally, we describe a web-based simulation environment for testing the proposed di usion models.

Since the early applications in viral marketing, the development of information di usion models and related optimization methods has provided e ective support to address a variety of in uence propagation problems. Nowadays, due to the shrinking boundary between real and online social life [ 2 ] and the presence of multiple, competitive spreaders over the Web, which could also act towards Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). This volume is published and copyrighted by its editors. SEBD 2020, June 21-24, 2020, Villasimius, Italy. misinformation, deciding whether a source of information is reliable or not has become a critical task. The di culty in assessing the reliability and trustworthiness of the source generating or sharing a piece of information increases the likelihood of people to be deceived by a spreading information. In addition to this, the tendency to access information from like-minded sources [ 5 ] leads users to be trapped in information bubbles, eventually causing forms of network polarization [ 4 ]. Polarization can be contrasted via actions devoted to fact-checking or misinformation debunking, but time plays a crucial role in this game, as cognitive phenomena of con rmation bias may easily arise.

Understanding the trustworthiness of the source of an information item is often challenging, as it requires tracking the information item of its originator, which could be unfeasible in many cases. Therefore, it beomes essential to capture the e ect of trust/distrust relationships on both the user behavior and propagation dynamics. One big question hence arises:

What are the key-features that make a di usion model able to explain the inherent dynamic, and often competitive, nature of real-world propagation phenomena?

Contributions. To answer the above question, we discuss a class of stochastic di usion models, named Friend-Foe Dynamic Linear Threshold Models (for short, F 2DLT ), which has been originally proposed in [ 3 ]. Our models are inspired by the classic Linear Threshold (LT) model, whereby an individual can decide to take an action as a result of the exposure to multiple sources of in uence. Major key-features of our models is that the information di usion graph is de ned on top of a trust network, where trust is encoded into the in uence probabilities, the response of a user to the in uencing attempts is described by a time-varying activation threshold function, and also a quiescence function is introduced to model the latency or delay in the propagation.

Our F 2DLT models are designed to deal with non-competitive and competitive propagation scenarios. For competitive campaign scenarios, one model is semi-progressive, which assumes that a user, once activated, is only allowed to switch to a di erent campaign, and another model is non-progressive, i.e., it requires a user to have always the support of her/his in-neighbors to keep the activation state with a certain campaign.

To evaluate our models, we also devised four seed selection strategies, which mimic di erent, realistic scenarios of in uence propagation. Experimental evaluation conducted on real-world networks, also including comparison with stochastic epidemic models, has provided interesting ndings on the meaningfulness and uniqueness of our proposed models. 2 2.1

Overview

The F 2DLT class of di usion models

Trust Network︎ Online social network

Users︎ exogenous and dynamic quiescence ︎ factors︎ exogenous and dynamic activationthreshold factors︎ . . . of this framework are the following: (i) a set of online social network users; (ii) a trust network, possibly inferred from a social network; (iii) user-behavior characteristics that provide information for incorporating two main aspects into the di usion process, namely activation-threshold and quiescence; (iv) information related to one or multiple competing campaigns.

Also, the information di usion process is supposed to have a certain time horizon, and before its expiration, users respond to the network's stimuli which may lead to being active in favor of one campaign or the opposing one. According to their own behavior, users may have the possibility of switching from the adoption of a campaign's item to that of another one.

Putting it all together, our F 2DLT based framework embeds all the above aspects that are essential to explain complex propagation phenomena, i.e., competitive di usion, non-progressivity, time-aware activation, delayed propagation, and trust/distrust relations.

Notably, in our setting, we tend to reject as true in general, the principle \I agree with my friends' idea and disagree with my foes' idea" (which also resembles the adage \the enemy of my enemy is my friend"), since this would imply that the behavior of a user should be completely determined by the stimuli coming from her/his neighbors. In other terms, friends should be responsible for the in uenceability and activation of a user, as opposed to foes, which should instead impact on delayed propagation. Therefore, in our models, the trusted connections and distrusted connections play di erent roles: only friends can exert a degree of (positive) in uence, whereas foes can only contribute to increase the user's hesitation to commit with the propagation process.

One assumption of our framework is the availability of trust relationships between the users involved in the information di usion context. Although this may not hold in practice, a few studies have been recently developed to infer a trust network from the time-varying interactions observed in evolving social networks, such as [ 6 ].

Activation-threshold function. Given a directed, weighted network representing an information di usion graph built on top of a trust network, every node v is associated with an activation-threshold, v 2 (0; 1], which corresponds to the e ort of activating the node which is needed in terms of cumulative in uence. The activation-threshold function g, de ned over the set of nodes V and the time interval of di usion T , enhances this concept. For each node v at time t 2 T , it is de ned as:

g(v; t) = v + #( v; t):

Thus, a node v is activated when its perceived in uences exceed the threshold v, plus the time-evolving activation term, #( ; ), which models the dynamic response of users at any activation attempt. Our proposed models can in principle embrace any de nition for #( ; ); here, we focus on two main scenarios.

Biased . Modeled as a non-decreasing monotone function, it captures the tendency of a user to consolidate her/his belief, according to the con rmation-bias principle [ 1 ]. We de ne this function as follows: g(v; t) = v + #( v; t) = v + min 1 v ; t tlvast (1) where tlvast denotes the last time v was activated and 0 denotes the increment in the value of g(v; t) for consecutive time steps. Clearly, the value increases as a node keeps staying in the same active state.

Unbiased . In applications such as customer retention or churn prediction, a user could revise her/his uncertainty to activate over time. We model this in such a way that, for each v, the value of the function is maximum (i.e., 1) just after the activation, i.e., at time t = tlvast + 1, then it starts to decrease from the following time steps: g(v; t) = v + #( v; t) = v + exp( (t tlvast 1)) vI[t tlvast = 1] (2) Quiescence function. The quiescence value quanti es the latency in the propagation through a particular node. It is de ned as a non-decreasing and monotone function q : V; T 7! T , such that for every v 2 V; t 2 T , with v activated at time t: q(v; t) = v + (N in(v); t); where v 2 T represents an exogenous term modeling the user's hesitation in being committed with the propagation process, and (N in(v); t) determines an We assume the second additive term in Eq. (1) is zero if C0 (3) additional delay proportional to the amount of v's neighbors that are distrusted and active, when the activation attempt is performed by the v's trusted neighbors. Here we focus on the following function: q(v; t) = v + (N in(v); t) = v + exp X

jwuvj u2St 1 where

0 quanti es the user sensitivity towards negative in uence. 2.2

Di usion Models

The F 2DLT class includes three di usion models [ 3 ], which are now brie y and informally described. The rst one refers to a single-item propagation scenario, while the other two models can dela with competitive scenarios, i.e., two or multiple campaigns spreading informative items over the network in a competitive fashion. Figure 2 shows the life-cycle of a node in the non-competitive scenario and in competitive ones, respectively. It should be noted that all the models share a similar two-phase activation process. In fact, when a node is activated by its neighbors, it always becomes quiescent. Its permanence in this state depends on Eq. 3. After the quiescent time is expired, a node can be regarded as active, regardless of its activation campaign, and is considered fully committed to the propagation process. While the non-competitive model, dubbed nC-F 2DLT , is a progressive model (i.e., once a node becomes active cannot be deactivated in subsequent times), the semi-progressive competitive model, spC-F 2DLT , allows a node to switch from a campaign to another, even multiple times. Finally, the fully non-progressive competitive model, npC-F 2DLT , additionally allows for node deactivation.

Theoretical properties of the models. The non-competitive, progressive model nC-F 2DLT is proven to be equivalent to LT with Quiescence Time i.e., the activation function in nC-F 2DLT is monotone and submodular. On the other hand, the competitive spC-F 2DLT and npC-F 2DLT can be reduced, via graph serialization, to a progressive Homogeneous Competitive LT, with monotone, non-submodular activation function i.e., the activation function in spC-F 2DLT and npC-F 2DLT is monotone but not submodular [ 3 ].

Evaluation methodology

Data. We used four real-world, publicly available networks, namely: Epinions, Slashdot, Wiki-Con ict, and Wiki-Vote | please refer to [ 3 ] for details about the datasets. All networks are originally directed and signed; in addition, the two Wikipedia-based networks also have timestamped edges. The in uence probability are derived so that to a network having a higher fraction of positive edges will correspond to an equivalent network where users are more willing to be involved in the propagation process.

Seed selection strategies. We de ned four seed selection strategies, each of which mimics a di erent, realistic scenario of in uence propagation.

Exogenous and malicious sources of information. This method, hereinafter referred to as M-Sources, aims at simulating the presence of multiple sources of malicious information within the network. Here, an exogenous source is meant as a node without incoming links. Formally, given a budget k, the method selects the top-k users in a ranking solution determined as r(v) = (W =(W + W +)) log(jN out(v)j) for every v such that N in(v) = ;. Here, W +, resp. W denote the sum of trust, resp. distrust outgoing weights.

Exogenous and in uential trusted sources of information. Analogously to the previous method, this one, dubbed I-Sources, is concerned with exogenous sources with emphasis on trusted links. Hence, the ranking function is: r(v) = (W +=(W + W +)) log(jN out(v)j).

Stress triads. This strategy is based on the notion of structural balance in triads. In a typical stress-triad con guration there is a node v with two incoming connections, from a node z with negative weight and from a node u with positive weight, and such that z is connected to u via a positive link. In this case, z is regarded as a stress-node since it could activate v through u, despite being negatively linked to v. The Stress-Triads strategy selects seeds based on the number of stress-triads that are involved in as stress-nodes.

Newcomers. We call a node v 2 V as a newcomer if all of its incoming edges are timestamped as less recent than its oldest outgoing edge. The start-time of v is the oldest timestamp associated with its incoming edges. Within the set of newcomers nodes, we further de ne two separate strategies: Least-New and Most-New. Each one consists in the selection of the the top-k newcomers with highest out-degree among those nodes with the oldest and the newest start-time, respectively. Both strategies require temporal information upon edges. Major Findings and Usage Recommendations. We pursued di erent goals of evaluations depending on the particular di usion model. More speci cally, we investigated the impact of the negative in uence on the nal active set in the non-competitive scenario. For the competitive setting, our main goal was to understand the e ect of the con rmation-bias e ect, in the context of misinformation spreading. Therefore, we will have a good and a malicious actor. Finally, we conducted a comprehensive comparative evaluation of our non-competitive model against the Independent Cascade model as well as stochastic individualcontact epidemic models SIR and SEIR [ 3 ].

Here we summarize the major ndings emerged from the experiments: F1 The setting of the dynamic activation-threshold function and quiescence function plays a crucial role in positive-in uence propagation and negativein uence/misinformation limitation.

F2 The average user's sensitivity in the negative in uence perceived from distrusted neighbors ( ) makes the seed identi cation more aware of the negative in uence spread.

F3 The con rmation-bias e ect ( ) may lead the \stronger" campaign to increase its spread capability.

F4 The non-progressive competitive model npC-F 2DLT appears to be less sensitive to the increase of and tends to favor deactivation events (for users previously activated by the weaker campaign) over switched events.

F5 If compared with classic di usion models as IC, SIR, and SEIR, the nC-F 2DLT model tends to favor a slower di usion, since the propagation process lasts consistently longer than IC and the epidemic models.

F6 Threshold models are more suitable when it is required to model propagation phenomena where it is important to capture the di erence in behavior of each individual. On the contrary, epidemic models does not exhibit this important capability, as they are more suitable to represent single contagions.

F7 I-Sources reveals higher spread capability, followed by Stress-Triads. On the contrary, the newcomers-based strategies show a signi cant spread capability, coupled with a negligible e ect on the negative in uence.

The above results provide evidence that our class of trust-aware models o ers an opportunity to better represent the complexity of real-world propagation phenomena. As a consequence, our models can pave the way for the development of sophisticated methods to solve misinformation spread limitation and related optimization problems. Also, our models can pro tably be used in a variety of applications whereby there is an emergence to predict the time required to debunk fake information, or to estimate how people are a ected by the spread of competitive opposite opinions through a social network. 4

Web-based Simulation Environment

As part of our contributions to the research project NextShop PON Grant No. F/050374/01-03/X32, we developed a web application of our di usion models, available at http://people.dimes.unical.it/andreatagarelli/f2dlt/ . It is a simulation environment that allows a user to execute a di usion model in the F 2DLT class, and get an interactive view of a propagation process. As input, an in uence graph can be either uploaded from a le or synthetically generated; in the latter case, the user is asked to provide the following information: (i) number of nodes, (ii) number of edges, (iii) percentage of negative edges, and (iv) graph-generator model, by choosing among the Erd}os{Renyi, Watts-Strogatz, and Barabasi-Albert models. Once the network is created, it is displayed with positive (trust) edges represented in black, and negative (distrust) edges represented in red. Also, after selecting a F 2DLT model, both the activation threshold function and the quiescence function can be con gured by setting #( ; ),and ( ; ). Finally, the early adopters, i.e., the seeds of the propagation process, can be selected by clicking upon speci c nodes of the displayed network. While the di usion process is running, the application updates the layout of the network, and nodes will dynamically change their color accordingly to their activation status. Moreover, the application will show a number of statistics about the propagation process, as well as the temporal evolution of the propagation process.

Anagnostopoulos ,

Bessi , G. Caldarelli,

Del Vicario ,

Petroni ,

Scala ,

Zollo , and

Quattrociocchi . Viral misinformation: The role of homophily and polarization . In Proc. WebConf , pp. 355 { 356 , 2015 .

Bessi , G. Caldarelli,

Del Vicario ,

Scala , and

Quattrociocchi . Social determinants of content selection in the age of (mis)information . In Proc. SocInfo , pp. 259 { 268 , 2014 .

3. Antonio Calio and

Andrea

Tagarelli . Complex in uence propagation based on trustaware dynamic linear threshold models . Applied Network Science , 4 ( 1 ): 14 :1{ 14 : 41 , 2019 .

Garimella , G. De Francisci Morales , A.

Gionis , and M.

Mathioudakis . The e ect of collective attention on controversial debates on social media . In Proc. ACM Web Science , pp. 43 { 52 , 2017 .

Koutra ,

P. N.

Bennett , and

Horvitz . Events and controversies: In uences of a shocking news event on information seeking . In Proc. WebConf , pp. 614 { 624 , 2015 .

Domenico

Mandaglio and

Andrea

Tagarelli . Generalized preference learning for trust network inference . IEEE Access , 7 : 174583 { 174596 , 2019 .