An Influence Maximization based approach to the Engagement of Silent Users in Online Social Networks E XTENDED A BSTRACT Roberto Interdonato1,2 , Chiara Pulice3 , and Andrea Tagarelli1 1 DIMES, University of Calabria, Italy 2 Department of Information Technology, Uppsala University, Sweden 3 UMIACS, University of Maryland, USA Abstract. We discuss a targeted influence maximization (TIM) approach, based on the linear threshold (LT) diffusion model, which is designed to support the engagement of silent members, a.k.a. lurkers, of an online social network. A key aspect of this approach is the objective function of the underlying optimization problem, which is defined in terms of the lurking scores associated with the nodes in the final active set, or delurking capital. The resulting TIM problem shares the computational intractability with classic influence maximization (IM) problems. However, since the proposed objective function is shown to be monotone and submodular under the LT model, a greedy algorithm (with a guaranteed approx- imation ratio) is provided to compute a k-node set that maximizes the delurking capital in the network, for a given lurking threshold. Results have demonstrated the significance of our approach. A comparative evaluation with state-of-the-art algorithms for non-targeted and targeted LT-based IM has also shown the superi- ority of our approach in terms of the estimated overall delurking capital. 1 Introduction Computer science research in online social networks (OSNs) has traditionally focused on users that are actively involved in the OSN life. However, online social environments are often characterized by the presence of a crowd which does not actively contribute, rather it takes a silent role. Such silent members of an OSN are also known as lurkers, since they gain benefit from others’ information without significantly giving back to the community [8,2,9]. The study of lurking behaviors in OSNs is nonetheless important, since lurkers acquire knowledge from the community, and as such they can be social capital holders. Within this view, a major goal is to delurk such users, i.e., to encourage them to more actively be involved in the OSN. This can in principle be accomplished by nurturing and persuading lurkers to be more actively engaged in the community, also thanks the support of other, possibly influential users [9]. In [5] we proposed the first computational framework that addresses the problem of delurking in OSNs. Our key idea was to conceptualize a delurking approach under a graph-based information diffusion model. Research on information diffusion in OSNs (e.g., [1,14,4]) is well-established due to a plethora of methods that have been developed in the last years, mostly upon the two seminal models, namely Independent Cascade (IC) and Linear Threshold (LT) [6]. These assume that an information diffusion process would unfold in a static, directed graph, where each node can be “activated” or not (under a progressive assumption) and each edge is associated with a weight expressing the diffusion probability or influence degree from one node to another. Moreover, in an influence maximization (IM) framework (e.g., [6,3,13]) the general objective is to find a set of initially activated users (also called early adopters, or seed users) which can maximize the spread of information through the network. Our approach refers to a special case of IM, namely targeted IM; in particular, a spe- cific instance of targeted IM, in which lurkers are regarded as the target of the diffusion process. Existing lurker ranking algorithms [10,11,12] can profitably be exploited to mine lurking behaviors in the network, and hence to associate users with a value (lurk- ing score) indicating her/his lurking status. Given a budget k, our goal is to find a set of k nodes that are capable of maximizing the likelihood of “activating” the target lurkers. Intuitively, the activation of a node means that a user is influenced by other users so to “become aware of” or “adopt” a piece of information. Note that this outcome of a diffusion process has a nice analogy in the lurking analysis setting under consideration: the information to spread corresponds to a delurking trigger action, activating nodes regarded as lurkers can be seen as delurking those users, while the lurking score of a user would represent the gain deriving from delurking that user. A key novelty in the formulated optimization problem is the objective function, which is defined in terms of the cumulative amount of the lurking scores associated with the nodes in the final active set, or delurking capital. Our delurking-oriented targeted IM problem shares the computational intractability with classic IM problems. However, since the proposed objective function is shown to be monotone and submodular under the LT model, we provide a greedy algorithm (with typical 1 − 1/e −  approximation ratio), named DEvOTION, which computes a k-node set that maximizes the delurking capital in the network, for a given minimum lurking threshold. We evaluated DEvOTION over OSN datasets of different characteristics and sizes, assessing its performance in terms of estimation of the delurking capital, execution time, and seed characteristics. We also compared DEvOTION with state-of-the-art IM algo- rithms, namely SimPath [3] and TIM+ [13], and a query-based targeted IM algorithm, namely KB-TIM [7]. 2 Delurking-oriented Targeted Influence Maximization 2.1 Problem statement Let G 0 = hV, Ei be the directed graph representing a social network, with set of nodes V and set of edges E. We denote with G = G 0 (b, `) = hV, E, b, `i a directed weighted graph representing the information diffusion graph associated with the social network G 0 , where b : E → R∗ is an edge weighting function, and ` : V → R∗ is a node weighting function. We assume that the node and edge weighting functions in the diffusion graph G are completely specified based upon the availability of a lurker ranking solution. This is produced by a method that is designed to identify and characterize lurkers in an OSN, i.e., a method enabled to assign every user with a score that expresses the status of lurking behavior taken by that user. The intuition is that for any edge (u, v) in G, the weight b(u, v) will indicate how much node u has contributed to the v’s lurking score calculated by the lurker ranking method, which resembles a measure of “influence” produced by u to v. Also, the node weight `(v) will indicate the status of v as lurker, such as the higher the lurker ranking score of v the higher should be `(v). We refer the reader to [11,10] for details about the lurker ranking method and to [5] for definitions of node and edge weighting functions. We denote with LS ∈ [0, 1] a threshold value that indicates the minimum lurking score that a node in the network is required to have in order to be regarded as a target node. Moreover, given any seed set S ⊆ V, we denote with µ(S) the final active set, i.e., the set of nodes that are active at the end of the diffusion process starting from S. Upon the above defined quantities, we introduce a measure that will be essential to the definition of our targeted IM problem. This measure, which we denote as delurking capital cumulated via S, is proportional to the amount of the lurking scores over all target nodes that are activated by the seed set S. Definition 1. Given a set S ⊆ V, the delurking capital DC(µ(S)) associated with the final active set µ(S) ⊆ V is defined as: X DC(µ(S)) = `(v) (1) v∈µ(S)\S ∧ `(v)≥LS Note that in Eq. (1) we do not consider nodes that belong to the seed set S. The ratio behind this choice is that the selection of the seed set should not be biased by nodes with highest lurking scores; this will be made clear next in the definition of our objective function, which in fact relies on DC(µ(S)). Definition 2. Delurking-oriented Targeted Influence Maximization Problem. Given G = hV, E, b, `i, a diffusion model on G, a budget k, and a lurking threshold LS, find a seed set S ⊆ V with |S| ≤ k of nodes (users) such that, by activating them, the final active set µ(S) ⊆ V will have the maximum delurking capital: 0 S= argmax DC(µ(S )) (2) 0 0 S ⊆V s.t. |S |≤k The problem in Def. 2 preserves the complexity of the IM problem and, as a result, it is computationally intractable, i.e., it is still NP-hard. However, as for the classic IM problem, a greedy solution can be designed provided that the natural diminishing property holds for the considered problem. It is well-known that for many diffusion models, including LT, the function σ(A) mapping any subset A ⊆ V of nodes to the size of the final active set µ(A) satisfies monotonicity and submodularity by exploiting the equivalent live-edge graph model [6]. Proposition 1. The delurking capital function DC defined in Eq. (1), mapping any active set µ(A) ⊆ V to its overall delurking capital, is monotone and submodular, for any LS ∈ [0, 1], under the LT model. The above theoretical result enables the development of an approximate algorithm with guarantee for our problem, as discussed later in Sect. 2.3. 2.2 Choosing the information diffusion model The LT model deals with situations in which exposure to multiple sources is needed for a user before taking a decision. More specifically, aPnode v can be influenced by each neighbor u according to a weight b(u, v) such that u∈N in (v) b(u, v) ≤ 1, where N in (v) is the in-neighbor set of node v. At the very beginning of the diffusion process, each node v is assigned a threshold uniformly at random from [0, 1]. This represents the weighted fraction of v’s neighbors that must become active in order for v to become ac- tive itself. Intuitively, the higher is this threshold, the harder will be the task of enrolling v in a new trend, since the total influence weight must exceed its threshold. We chose LT for modeling the process of influence propagation to pursue the goal of engaging lurkers. Unlike IC or IC-based models, the ability of LT to reflect the cu- mulative effect of exposure to multiple sources of influence can be profitably exploited to maximize the likelihood of changing the status of a user into a more active role in the OSN. Of course, this can actually lead to delurking provided that the information to be diffused towards the target lurkers is of any type that concerns a well-established delurking action [9]. In this regard, note that our approach is general as it can involve any type of delurking action in the form of piece of information to flow through the network (e.g., informative rewards, welcome messages, guidelines from experts, etc). 2.3 The DEvOTION algorithm Our proposed greedy method, named DEvOTION (stands for DElurking Oriented Tar- geted Influence maximizatiON), exploits the search for shortest paths in the diffusion graph, following the lead of the study in [3], however in a backward fashion. Along with the information diffusion graph G, the budget integer k, and the minimum lurking score LS, DEvOTION takes in input a real-valued threshold η. This parameter is used to control the size of the neighborhood within which paths are enumerated: in fact, the majority of influence can be captured by exploring the paths within a relatively small neighborhood; for higher η values, less paths are explored (i.e., paths are pruned earlier) leading to smaller runtime but with decreased accuracy in spread estimation. The key idea of DEvOTION is to perform a backward visit of the graph starting from the nodes identified as target (i.e., the nodes u with `(u) ≥ LS). In order to yield a seed set S of size at most k, DEvOTION works as follows. During each iteration, DEvOTION computes the set (say T ) of nodes that reach the target ones and keeps track of the node with the highest marginal gain (i.e., delurking capital DC). The former allows an efficient reset of the spread of each non-seed node contained in T . The latter is found at the end of each iteration upon backward visit of all nodes in T argetSet that do not belong to the current seed set S. This visiting subroutine takes a path P, its probability pp, and the lurking score of the end node in the path (i.e., a target node), and extends P as much as possible (i.e., as long as pp is not lower than η). Initially, a path is formed by one target node, with probability 1; then, the path is extended by exploring the graph backward, adding to it one, unexplored in-neighbor v at time, in a depth-first fashion. The path probability is updated according to the LT-equivalent “live-edge” model [3,6], and so the delurking capital. The process is continued until no more nodes can be added to the path. 600 600 600 η = 1e-03 η = 1e-03 540 η = 1e-04 540 η = 1e-04 540 480 η=0 480 η=0 480 420 420 420 360 360 360 DC DC DC 300 300 300 240 240 240 180 180 180 120 120 120 η = 1e-03 60 60 60 η = 1e-04 η=0 0 0 0 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 5 10 15 20 25 30 35 40 45 50 k k k (a) LS-perc = 5% (b) LS-perc = 10% (c) LS-perc = 25% Fig. 1. Delurking capital in function of k and η on Instagram. 3 Experimental Evaluation 3.1 Evaluation methodology We assessed the sensitivity of DEvOTION to the various parameters, i.e., the size of seed set (k), the lurking threshold (LS) for the selection of the target set, and the path pruning threshold (η). We also comparatively evaluated DEvOTION with SimPath [3] and TIM+ [7], two well-known state-of-the-art algorithms for IM problems, and with KB-TIM [13], a query-based targeted IM algorithm. We used FriendFeed (493K, 19MM edges), GooglePlus (108K nodes, 14MM edges) and Instagram (18K nodes, 618K edges) network datasets, which are publicly avail- able [11,12]. All experiments were carried out on an Intel Core i7-3960X CPU@3.30GHz, 64GB RAM machine. 3.2 Results Sensitivity to parameters. We evaluated the performance of DEvOTION in terms of delurking capital obtained by varying all three parameters involved, i.e., k, LS-perc, η. We initially focused on η, by varying it from 1.0e-03 to 0; note that η = 1.0e-03 is the default value used in other IM algorithms (e.g.,[3]), while η = 0 indicates no path pruning. We generally observed that no significant gain in DC is obtained for lower values of η. This fact achieves particular importance when it is coupled with the time performance results shown in Fig. 2: several orders of magnitude in the runtime are saved when η > 0. Within this view, η = 1.0e-03 might be be preferred to η = 1.0e-04 when trade-off between DC performance and runtime has to be ensured. Concerning the other two parameters, as expected, the delurking capital increases with both the size of the seed set (k) and the size of the target set; we controlled the latter through a parameter LS-perc which corresponds to a percentage value that deter- mines the setting of LS such that the selected target set corresponds to the top-LS-perc of the distribution of node lurking scores. An important remark is that on all datasets a significant fraction of delurking capital, for a given choice of LS-perc, can be achieved just using low k. This appears evident, especially for lower LS-perc, in Fig. 1 by ob- serving the increasing slope of the line fitting the DC curves as LS-perc increases. 800 200000 η = 1e-03 700 η = 1e-04 time (sec) η=0 time (sec) 600 80000 500 0 400 5 20 35 50 k 300 200 100 0 5 10 15 20 25 30 35 40 45 50 k Fig. 2. Execution time of DEvOTION in function of k, with LS-perc = 5% and for varying η, on Instagram. The inset shows the overall results that include η = 0, while the main plot zooms in to focus on η > 0. Concerning the other two datasets (results not shown), we observed that low values of k are enough to obtain a DC value close to or with the same order of magnitude of DC values achieved by using highest k. Comparison with SimPath. SimPath shares with DEvOTION the kind of algo- rithmic approach to LT-based influence maximization, which includes the involvement of parameter η for controlling the enumeration of the paths through the diffusion graph. Results of the comparative evaluation for η = 1.0e-04, in terms of DC performances, are summarized in Table 1. A first important remark that stands out is that the delurking capital obtained by SimPath is always lower than the one obtained by DEvOTION, for all datasets and configurations. The largest increment in DC is obtained for FriendFeed, with per- centage of increment up to 10.8% (for LS-perc = 5%). Significant increment is ob- served also w.r.t. the other datasets, with percentages of increment up to 4.27% (for LS-perc = 5%) on GooglePlus, and up to 5.69% (for LS-perc = 5%) on Instagram. Impact of parameter η in the evaluation turns out to be marginal. Nevertheless, it should be noted that even though performance corresponding to η = 1.0e-3 (results not shown) are slightly lower in terms of DC, the increment w.r.t. SimPath is generally higher, indicating that DEvOTION is more robust than SimPath to path pruning. As the size of target set increases, a general decreasing trend can be observed in the gap between DEvOTION and SimPath DC values. This is not surprising since a larger target set means a larger overlap with the entire node set, thus letting the seed nodes picked up by a non-targeted algorithm reach a larger part of the target set. Regarding comparison in terms of running times, DEvOTION turns out to be two orders of magnitude faster than SimPath, for the same choice of η. Considering the largest dataset, FriendFeed, DEvOTION shows average running time of 47, 278 sec- onds smaller than SimPath in its faster configuration (η = 1.0e-03, LS-perc = 5%), and of 182, 131 seconds in its slowest configuration (η = 1.0e-04, LS-perc = 25%). Comparison with TIM+. TIM+ follows an approach to influence maximization that was designed to improve efficiency in previously existing algorithms, including Table 1. Summary of delurking capital performances of competitors against DEvOTION. FriendFeed GooglePlus Instagram 5% 10% 25% 5% 10% 25% 5% 10% 25% incr. vs. SimPath 10.80% 9.32% 3.97% 4.27% 3.02% 1.06% 5.69% 4.10% 1.60% incr. vs. TIM+ 9.85% 8.82% 3.49% 4.22% 2.96% 1.01% 4.43% 3.11% 1.12% incr. vs. KB-TIM 58.87% 34.48% 35.23% 40.66% 36.07% 33.86% 5.44% 6.79% 10.20% SimPath. Therefore, it was not surprising to find out that TIM+ is indeed faster than DEvOTION. Taking FriendFeed as case in point, there is an average saving of 204 sec- onds for the fastest configurations (η = 1.0e-03 and LS-perc = 5% for DEvOTION,  = 1.0 for TIM+) , and of 6100 seconds for the slowest configurations (η = 1.0e-04 and LS-perc = 25% for DEvOTION,  = 0.1 for TIM+). Nevertheless, DEvOTION always outperforms TIM+ in terms of DC on all datasets and for all configurations. The performance gain achieved by DEvOTION is more sig- nificant on FriendFeed, with average percentage of increment of 9.85% (for LS-perc = 5%), 8.82% (LS-perc = 10%) and 3.49% (LS-perc = 25%). Analogously to what we observed for the evaluation with SimPath and for the same reason, the gap in per- formance between DEvOTION and TIM+ decreases for larger target sets. Smaller but significant increment is also obtained on GooglePlus (average of 4.22% for LS-perc = 5%) and Instagram (average of 4.43% for LS-perc = 5%). Comparison with KB-TIM. DEvOTION always performs significantly better than KB-TIM in terms of DC, on all datasets and with all configurations. Surprisingly, the gain in performance between DEvOTION and KB-TIM is larger than the one observed for other non-targeted competitors (i.e., SimPath, TIM+). Again, as previously seen for the comparative evaluation with IM algorithms, the highest increment corresponds to FriendFeed. As shown in Table 1, the percentage of average increment in DC ranges from 35.23% (for LS-perc = 25%) to 58.87% (for LS-perc = 5%). The increment is from 33.86% (for LS-perc = 25%) to 40.66% (for LS-perc = 5%) for GooglePlus, and from 5.44% (for LS-perc = 5%) to 10.20% (for LS-perc = 25%) for Instagram. It should be noted that the comparison between DEvOTION and KB-TIM on Insta- gram is the only one following an opposite trend, w.r.t. other datasets and competitors, with the size of the target set: larger increments are achieved for larger target sets. This is probably due to the structural characteristics of Instagram and it is confirmed by a greater overlap between the seed sets of DEvOTION and KB-TIM than the seed over- lap observed for the other datasets, as we shall discuss in the next section. As regards the execution times, KB-TIM performs comparably to TIM+, thus running always faster than DEvOTION and SimPath. 3.3 Discussion Empirical evidence has shed light on the significance and effectiveness of DEvOTION. The method has shown to be robust w.r.t. the pruning of paths to be explored in the graph. A significant fraction of delurking capital can be achieved already with a small seed set, even for large network datasets. Compared with state-of-the-art IM and tar- geted IM algorithms, DEvOTION always obtains better quality seed sets, confirming its efficacy and uniqueness to solve a delurking-oriented targeted IM task. 4 Conclusions We discussed the first computational approach to delurking, i.e., to engaging users that take on a lurking role in OSNs. We presented a novel targeted IM problem in which the objective function to be maximized is defined in terms of delurking capital of the target users. We proved that the proposed objective function is monotone and submod- ular, by using the LT-equivalent live-edge graph model, and developed an approximate algorithm, DEvOTION, to solve the problem under consideration. The presented approach is independent on the particular strategy of delurking (being it based on rewards, welcome messages, etc.). We paved the way for the development of web-based systems that, by embedding our approach, can effectively support en- gagement of users. Also, the presented approach can easily be generalized to deal with other targeted IM scenarios, therefore we envisage further developments from various perspectives, including human-computer interaction and marketing. References 1. Bakshy, E., Rosenn, I., Marlow, C., Adamic, L.A.: The role of social networks in information diffusion. In: Proc. World Wide Web Conf. (WWW), pp. 519–528 (2012) 2. Edelmann, N.: Reviewing the definitions of “lurkers” and some implications for online re- search. Cyberpsychology, Behavior, and Social Networking 16(9), 645–649 (2013) 3. Goyal, A., Lu, W., Lakshmanan, L.V.S.: SIMPATH: An Efficient Algorithm for Influence Maximization under the Linear Threshold Model. In: Proc. IEEE Int. Conf. on Data Mining (ICDM), pp. 211–220 (2011) 4. Guille, A., Hacid, H., Favre, C., Zighed, D.A.: Information diffusion in online social net- works: a survey. SIGMOD Record 42(2), 17–28 (2013) 5. Interdonato, R., Pulice, C., Tagarelli, A.: The DEvOTION algorithm for delurking in social networks. Trends in Social Network Analysis, LNSN, Springer, 28 pages (2017) 6. Kempe, D., Kleinberg, J.M., Tardos, E.: Maximizing the spread of influence through a social network. In: Proc. ACM Conf. on Knowledge Discovery and Data Mining (KDD), pp. 137– 146 (2003) 7. Li, Y., Zhang, D., Tan, K.: Real-time targeted influence maximization for online advertise- ments. PVLDB 8(10), 1070–1081 (2015) 8. Preece, J.J., Nonnecke, B., Andrews, D.: The top five reasons for lurking: improving com- munity experiences for everyone. Computers in Human Behavior 20(2), 201–223 (2004) 9. Sun, N., Rau, P.P.L., Ma, L.: Understanding lurkers in online communities: a literature re- view. Computers in Human Behavior 38, 110–117 (2014) 10. Tagarelli, A., Interdonato, R.: “Who’s out there?”: Identifying and Ranking Lurkers in So- cial Networks. In: Proc. Int. Conf. on Advances in Social Networks Analysis and Mining (ASONAM), pp. 215–222 (2013) 11. Tagarelli, A., Interdonato, R.: Lurking in social networks: topology-based analysis and rank- ing methods. Social Netw. Analys. Mining 4(230), 27 (2014) 12. Tagarelli, A., Interdonato, R.: Time-aware analysis and ranking of lurkers in social networks. Social Netw. Analys. Mining 5(1), 23 (2015) 13. Tang, Y., Xiao, X., Shi, Y.: Influence maximization: near-optimal time complexity meets practical efficiency. In: Proc. ACM Conf. on Management of Data (SIGMOD), pp. 75–86 (2014) 14. Weng, L., Ratkiewicz, J., Perra, N., Gonçalves, B., Castillo, C., Bonchi, F., Schifanella, R., Menczer, F., Flammini, A.: The role of information diffusion in the evolution of social net- works. In: Proc. ACM Conf. on Knowledge Discovery and Data Mining (KDD), pp. 356–364 (2013)