Spreader Selection by Community to Maximize Information Diffusion in Social Networks Didier A. Vega-Oliveros and Lilian Berton Department of Computer Science ICMC, University of São Paulo C.P. 668, CEP 13560-970, São Carlos, SP, Brazil davo,lberton@icmc.usp.br Abstract 2004; Barrat et al., 2008; Castellano et al., 2009). Therefore, some individuals can have a higher in- Rumors or information can spread quickly fluence than others according to the network struc- and reach many users in social networks. ture. Researchers have focused on identifying the Models for understanding, preventing or most influential vertices (Kempe et al., 2003; Kit- increasing the diffusion of information are sak et al., 2010; Lawyer, 2012; Pei and Makse, of greatest interest to companies, gov- 2013; Hébert-Dufresne et al., 2013) according to ernments, scientists, etc. In this paper, topological properties. It is expected this influ- we propose an approach for maximizing encers convince the largest number of individuals the information diffusion by selecting the in the network. However, the selection of more most important (central) users from com- than one of them not necessarily maximizes the munities. We also analyze the selection expected fraction of informed individuals, com- of the most central vertices of the net- pared to an uniformly random selection approach. work and considered artificial and real so- In this paper, we propose an approach to max- cial networks, such as email, hamsterster, imize the information diffusion considering the advogato and astrophysics. Experimental community structure of the network. The com- results confirmed the improvement of the munity symbolizes a group of individuals with a final fraction of informed individuals by greater tendency to have more internal than ex- applying the proposed approach. ternal connections to other groups. The reason is that vertices belonging to the same community are 1 Introduction likely to be more similar to each other and share The modeling of propagation or diffusion pro- similar properties and affinity. We confirmed that cesses in social networks has recently received selecting the most influential individual from each more attention, since it allows to understand how community as initial spreaders increases more the a disease can be controlled or how information information diffusion than selecting the most in- spread among individuals. These diffusion pro- fluential individuals from the whole network. cess are generally analyzed employing complex As a motivation example, let us consider a com- network theory (Barrat et al., 2008; Castellano et pany that wants to market a new product in the al., 2009). The area of complex networks seeks blogosphere. The company could select three very to study and understand the dynamics and behav- influential individuals of this social network (blog- ior of complex systems, from the structure of the gers with thousands of access) to advertise its network to the internal dynamics or interactions. product. However, these influencers may be popu- Models that describe the evolution of rumors lar in the same group of people. On the other hand, can be adapted to analyze the spread of spam on if the strategy is to identify the three main commu- the Internet, advertising and marketing, political nities on the network and select the most influen- ideologies or technological news between individ- tial individuals of each community, the company uals (Castellano et al., 2009). In such cases, the would achieve a variety group of users and maxi- representation in complex networks enables the mize the marketing diffusion. analysis of traditional models and the heteroge- The main contribution of this paper is the in- neous structure, which has a strong influence on formation diffusion approach based on selecting the information diffusion process (Moreno et al., the most influential individuals from communities. 73 We employed an artificial scale-free and four real vertex i has. In the case of directed networks is the networks: email, hamsterster, advogato and as- sum of the degrees of input (edges that reach the trophysics. We applied the SIR model for rumor vertex) and output (edges that leave the vertex). propagation selecting the initial seeds from the The average degree hki is the average of all ki of whole network and from the communities. The the network. The vertices that have a very high impact that the Truncate (TP), Contact (CP) or degree in the network are called hubs. Reactive (RP) processes have in the information The degree distribution of a network P (k) is the diffusion was analyzed. The experimental results probability of randomly select a vertex with degree showed that the selection of individuals from the k. Social networks present scale-free degree dis- communities as initial spreaders, the final fraction tribution (Barabási, 2007; Newman, 2010), with of informed individuals is improved. P (k) ⇠ k , in which most of the individuals The remainder of the paper is organized as fol- have low degree, near to the average, and only a lows: Section 2 introduces some definitions and few of them have very high degree (hubs). The measures covered in this paper, the community de- level of disorder or heterogeneity in vertices con- tection algorithm applied and the propagation pro- nections is obtained with the entropy of degree dis- cess in networks. Section 3 brings some related tribution. We employed the normalized version of work. Section 4 presents the proposed approach the Shannon entropy, i.e. for information diffusion based on communities. P1 Section 5 exhibits the experimental results on an k=0 P (k) log(P (k)) artificial scale-free and four real social networks. H̃ = , (1) log(N ) Finally, Section 6 discusses the final remarks. with 0  H̃  1. The maximum value for the 2 Theoretical background entropy occurs when P (k) shows a uniform distri- A network is a collection of items called nodes or bution and the lowest possible value happens when vertices, which are joined together by connections all vertices have the same degree. The entropy of a called links or edges. Formally we define the net- network is related to the robustness and their level work G = (V, E, W ), where V = {v1 , v2 , . . . vn } of resilience. is the set of N vertices, E = {e1 , e2 , . . . em } is the The robustness is also related to the correlation set of M edges and W = {w1 , w2 , . . . wm } are the degree of the network. A network is assortative, weights of the edges that determine the strength of or positive correlated, if vertices tend to connect the interaction between the respective vertices, in with vertices with similar degree. A Network is the case of weighted networks. In mathematical dissassortative, or negative correlated, if vertices terms, an undirected and unweighted network can with low degree tend to connect with higher con- be represented by an adjacency matrix A, in which nected vertices (hubs). When networks do not two connected vertices i and j are in the matrix present any of above patterns, they are called as aij = aji = 1, otherwise, they are equal to 0. non-assortative. For the calculation of the network A path is a consecutive sequence that starts at correlation we employed the Pearson coefficient, vertex i and ends in j, so that vertices are vis- formulated with adjacency matrix as ited more than once. The distance or length of the P path is defined as the number of edges contained in (1/M ) j>i ki kj aij h P i2 the sequence. The shortest distance between two (k +k )a (1/M ) j>i i 2j ij vertices is known as the shortest path or geodesic ⇢= (2) path gi,j . A component is the largest sub-set of P (ki 2 +kj 2 )aij (1/M ) j>i 2 vertices from the network in which exist at least h P i (ki +kj )aij 2 one path between each pair of vertices, but never (1/M ) j>i 2 connect to another component. A connected net- work has only one component. When the networks where M is the total number of edges in the net- have more than one component, we considered the work. If r > 0, then the network is assortative. largest of them. If r < 0 the network is dissassortative. If r = 0, The degree or connectivity of vertex i, called then there is no correlation between the degree of as ki , is the number of edges or connections that vertices. 74 2.1 Centrality measures properties and perform similar roles. Therefore, In complex and social networks have been pro- the division of a network into communities helps posed measures to describe the importance or cen- to understand their topological structure (struc- trality of vertices (Costa et al., 2007) according to tural and functional properties) and its dynamic topological and dynamical properties. The cen- processes, obtaining relevant information and fea- tralities adopted in this work are briefly described tures to the network domain. as follow. We can evaluate a partition based on the scores Degree centrality (DG) is related with the num- obtained from a quality measure. The goal is to ber of connections or popularity of a vertex (Costa evaluate expected features in a good community et al., 2007) and in terms of the adjacency matrix division. One of the most popular quality mea- is expressed as sures is the modularity Q (Newman, 2004). It compares the current density of intra-community X ki = aij . (3) and inter-community edges relative to a random i2N network with similar characteristics. It is based on the fact that random networks have no community Betweenness centrality (BE) quantifies the structure. number of shortest paths that pass through a vertex Given a network with c communities, the Q j between all pair of different vertices (i, k) (Free- modularity is calculated by a symmetric matrix man, 1977). It expresses how much a vertex Bj N ⇥ N , in which elements along the main diag- works as bridge or is a trusted vertex in the trans- onal eii represent connections into the same com- mission of information, i.e. munity and elements eij represent connections be- X tween different communities i and j. Equation 6 ik (j) Bj = , (4) shows the formulation of Q. ik i,k2V,i6=k 2 0 12 3 where ik is the total number of different shortest X X Q= 4eij @ eij A 5 (6) path between i and k, and ik (j) is the number of i j times j appears in those paths. PageRank centrality (PR) expresses the impor- If a specific division provides less edges be- tance of a vertex according to the probability that tween communities than would be expected by other vertices have to arrive at it, after a large num- random connections, the value of Q would be 0. ber of steps (Brin and Page, 1998). The idea is to When the network has isolated communities the simulate the surfing on the net. The user can fol- value of Q would be 1. This measure is employed low the links available at the current page or jump by several techniques to identify communities in to other by typing a new URL. In social terms, it networks systems, especially in divisive and ag- can be approached like the more cited or renowned glomerative methods (Guimera et al., 2003; New- individuals. The formalization of the PageRank man, 2004; Newman, 2006). centrality is Newman (Newman, 2004) proposes an agglom- ~⇡ t = ~⇡ t 1 G , (5) erative method that is an optimized greedy al- gorithm, called fastgreedy. The approach starts where ~⇡ t are the PageRank values for each vertex with a copy of a real network of N vertices with in the tth step of navigation and G is known as no connections, producing at first N communi- the Google matrix. When t = 0 we have by de- ties. At each iteration, two communities ci and cj , fault ~⇡ 0 = {1, . . . , 1}. The jumps are represented which have connections in real network, are cho- by a probability ↵ and we adopted the same value sen in order to obtain the greatest improvement of as defined in the original version (Brin and Page, Q (Equation 7). A pruning is performed in the 1998). search space considering only the edges that exist between communities. Therefore, execution time 2.2 Community detection is reduced when considering the new Q function Communities are sets of densely interconnected (Equation 7). vertices and sparsely connected with the rest of the P P ✓ ◆ network (Newman, 2010). Vertices that belong to j eij i eij the same community, in general, share common Qij = 2 eij (7) 2M 75 The result can also be represented as a den- all their spreader and stifler neighbors. drogram. Cuts at different levels of the dendro- • Truncated Process (TP): It consists of trun- gram produce divisions with greater or lesser num- cate or interrupt the contagion in the precise ber of communities, and the best cut yields the time. In each iteration and for each spreader, largest value of Q. The algorithm at each step has it is randomly selected one neighbor at time, O(M + N ). Since there are at most N 1 join and setting up the states of the contact as cor- operations required to build a complete dendro- responds. The selection continues until the gram, their overall complexity is O((M + N )N ), spreader visit all its neighbors or it becomes or O(N 2 ), for sparse graph. Consequently, by stifler, whichever occurs first. adopting this method is more treatable the analysis of communities in larger networks. • Contact Process (CP): In each time step and for each spreader, it is chosen at random a 2.3 Propagation process on networks single neighbor. Then, it is resolved the tran- In classical rumor diffusion models the ignorant sition states according to the rule that corre- or inactive individuals (S) are those who remain sponds. After that, continues with the next unaware of the information, the spreaders (I) are spreader of the network of the same time step. those who disseminate the information and the sti- Different theoretical models have been pro- fler (R) are those who know the information but posed for modeling the rumor dynamics on net- lose the interest in spreading it. All vertices have works (Moreno et al., 2004; Barrat et al., 2008; the same probability for transmit the information Castellano et al., 2009; Borge-Holthoefer et al., to their neighbors and µ for stopping to be active. 2012). These analytical models make assump- The Maki-Thompson (MT) (Maki and Thomp- tions about the network structure, such as the de- son, 1973) model is a spreader-centric approach gree correlation or distribution, compartments or employed for describing the propagation of ideas class of vertices with same probabilities, homoge- and rumors on networks. In the MT process when- neous/heterogeneous mixing or mean field theory. ever an active spreader i contacts a vertex j that is Notwithstanding all of them claim that their nu- inactive, the latter will become active with a fixed merical solutions agree with the MC simulations, probability . Otherwise, in the case that j knows so we adopt this approach as an exploratory re- about the rumor, it means j is an active spreader or search. a stifler, the vertex i turns into a stifler with proba- bility µ. The behavior when the spreader stops to 3 Related work propagate can be understood because the informa- tion is too much known (contacting spreaders) or Many approaches have been developed in order without novelty (contacting stifler). to understand the propagation of ideas or infor- Three possible choices related with the spreader mation through social networks (Castellano et al., behavior during the diffusion have been re- 2009). Specially, characterize the individuals that ported (Borge-Holthoefer et al., 2012; Meloni et are most influential in the propagation process has al., 2012). They are the Reactive process (RP), attracted the attention of researchers (Richardson Truncated process (TP) and Contact Process (CP). and Domingos, 2002; Kempe et al., 2003; Kitsak However, a clear analysis about the impact of et al., 2010; Pei and Makse, 2013). spreaders behavior in the propagation process has The conventional approach for describing the not been tackled yet. Moreover, there is not a con- most influential vertices is performing a micro- sensus about what to employ in rumor or informa- scopic analysis on the network. Vertices are tion diffusion and it may cause a misinterpretation classified considering their topological properties, of results. The three main characteristic behaviors sorted and ranked in order to generalize their reported to spreaders are described as follow. ability to propagate (Kitsak et al., 2010; Hébert- Dufresne et al., 2013; Pei and Makse, 2013). • Reactive Process (RP): In each iteration, the However, to find the set of initial vertices that spreaders try to pass the rumor among all maximize the propagation capacity, the selection their ignorant neighbors. After that, it evalu- of the most influential spreader may produce an ates whether it will become stifler in the next overlap of influence in the population (Kitsak et iteration or not, considering the contact with al., 2010; Pei and Makse, 2013). 76 In terms of topological properties, there not spreaders ( (t)) and stifler ('(t)) over time is cal- 1 PN exists a consensus about what is the more ac- culated as (t) = N i=1 S i (t), that is similar to curate measure that describes the most influen- the other states and always fulfill (t) + (t) + tial vertices. Some researches claim that hubs '(t) = 1. We assume that infection and recov- are more representative to influence others ver- ering do not occur during the same discrete time tices (Pastor-Satorras and Vespignani, 2001; Al- window or step. bert and Barabási, 2002). Vertices with higher degree are more efficient to maximize the prop- 4.1 Setup agation because, in general, hubs not tend to con- The initial setup for the propagation is (0) = nect with each other and thus can achieve a greater 1 ⌘/N , (0) = ⌘/N and '(0) = 0, where ⌘ number of vertices (Kitsak et al., 2010). In the represents the seeds or number of initial spread- case of communities, the degree proportion of a ers. Each simulation begins with a selection of vertex i is defined as the number of edges that i ⌘ vertices. At each time step, all spreaders uni- has in each community. This degree proportion formly select and try to infect its neighbors with was found as a good descriptor of influence for probability , or stop the diffusion with probabil- communities (Lawyer, 2012). ity µ according to the spreader behavior adopted. On the other hand, the most influential vertices Successful change of state (to be spreader or to be are described as those with the largest Between- stifler) are effective at the next iteration. The sim- ness centrality (Hébert-Dufresne et al., 2013), be- ulations run until the end of the process is reached, cause they intermediate the communication be- when (1) = 0. tween groups of vertices, which increase their in- 4.2 Community selection approach fluence. According to the authors, Betweenness centrality is a better descriptor of the most influ- We propose to select the initial spreaders from the ential spreader in communities. community division of the network. The multiple The PageRank is also considered a better seeds are the most central vertices of each commu- measure to describe the most influential ver- nity. The community division may be calculated tices (Cataldi et al., 2010). The reason is that it by some divisive or agglomerative method (Sec- employs the random walk concept over the net- tion 2.2) and here the fastgreedy algorithm was work to be calculated and vertices with higher val- employed. The method is detailed as follow: ues mean higher probability to be visited. First, given a required number of ⌘ initial Finally, Kempe et al. (2003) propose a greedy spreaders, we find the ⌘ main communities of the algorithm to obtain ⌘ initial spreaders that maxi- network by the fastgreedy algorithm. Then, each mizes the diffusion influence. The authors adopt a community is isolated, which produces ⌘ com- discrete optimization approach and prove that the ponents. The isolation process consists in main- optimization problem is NP-hard. It was imple- taining the intra-community edges and erasing the mented considering the independent and weighted inter-community connections. For each isolated cascade model that have only two states, which are community, a specific centrality measure is cal- different to the SIR model. The method evaluates culated to all vertices. Since vertices with higher one vertex at time to be added in the set of selected centrality are considered more suitable to influ- seeds. The new vertex is accepted if it is what most ence on the network, we select the most impor- increment the diffusion. However, this approach tant vertex from each community. Therefore, these has a very higher computational cost problem, al- vertices influence more in their own community though new researches try to optimize the perfor- and the overlap of influence in the population is mance (Chen et al., 2009). minimized. At the end, ⌘ seeds are selected and they have the best centrality value of its commu- 4 Information diffusion by communities nity. We take the original full network, the ⌘ seeds, the parameters and execute the corresponding sim- Let us consider a constant population of N ver- ulations. tices in all time steps. Each vertex can be For the centrality measure, the point is to only in one state, that is I i (t) = 1 iff i 2 find what centrality better identifies the influential I, otherwise I i (t) = 0, and S i (t)+I i (t)+Ri (i) = spreaders, by communities and in the whole net- 1. The macroscopic fraction of ignorant ( (t)), work, that maximizes the information diffusion. 77 (a) MT-RP (b) MT-TP (c) MT-CP Figure 1: MT (Maki and Thompson, 1973) propagation in an artificial scale-free network with N = 1000, hki = 8 and ⌘ = 1% of initial seeds selected. The color bar shows the final fraction of informed individuals. The behavioral approaches for spreader analyzed are: (a) Reactive process (RP); (b) Truncated process (TP); and (c) Contact process (CP) Here, the degree (DG), PageRank (PR) and Be- favors a viral diffusion on the network with lower tweenness (BE) centralities were considered. values of and it happens independently of which are the initial seeds. For this reason, RP is a more 5 Experimental results suitable approach to simulate broadcasting propa- gation. In this section we analyzed the information dif- On the other hand, the TP behavior is more re- fusion in an artificial scale-free and four real so- lated to the contact network scenario, where the cial networks. We evaluated the impact spread- position and topological characteristics of seeds ers behavior have in the diffusion on the networks. may have influence in the diffusion. Moreover, TP Then, the results about the selection of initial presents more balanced results, near 60% when spreaders by communities, best-ranked vertices of ⇡ µ, and contacts are not as restricted as CP the network and random seeds were explored. behavior. For this reason, we adopted hereafter 5.1 Spreader behavior analysis the MT-TP approach as the propagation process for the analysis. We analyzed the three behavioral approaches for the spreaders and present the impact they produce 5.2 Multiple initial spreader analysis on the propagation process. We considered the The experiments were performed with three pos- MT model with an artificial scale-free network of sibilities for choosing the initial spreader: (i) by size N = 1000 and hki = 8. In order to under- randomly selecting ⌘ individuals as initial seeds stand the overall spectral effect with the parame- in the network; (ii) by selecting the best- ranked ⌘ ters, the simulations were evaluating a range of individuals with highest value of a specific central- and µ in (0, 1]. Therefore, the differences between ity of the network; and finally, (iii) by detecting ⌘ the approaches are evidenced. For each tuple of communities on the network and for each isolated values ( , µ), it was selected 100 times at random community selecting the individual with highest ⌘ = 1% of initial spreader (seeds) and each time value of a specific centrality measure. The cen- was an average over 50 executions. trality measures selected were degree (DG), Be- The impact of the behavioral approach in the fi- tweenness (BE) and Pagerank (PR). nal fraction of informed individuals is shown in Figure 1. We observed that the CP approach is 5.2.1 Real social networks less redundant in the number of contacts made by We adopted the email (Guimera et al., 2003), ad- spreader, producing lower fractions of informed vogato (Kunegis, 2014a), astrophysics (Newman, individuals, in comparison to the other behaviors. 2001) and hamsterster (Kunegis, 2013; Kunegis, Still, because the single contact made by iteration, 2014b). All of them were assumed as undirected the CP behavior is more similar to a propagation and unweighted networks and also it was consid- through the “word-of-mouth” situation. ered the largest component for the simulations. The RP approach obtained more than 80% of The structural properties of the networks are sum- informed individuals with values of 0.3, no marized in Table 1, with the respective number of matter values of µ. Therefore, the RP approach vertices N , the average degree hki, shortest paths 78 (a) MT-TP Community (b) MT-TP Random (c) MT-TP Best Figure 2: MT-TP propagation in an artificial scale-free network with N = 1000 and hki = 8. The final fraction of informed individuals are shown in the color bar. The selection of ⌘ = 4% of seeds was made by: (a) ⌘ communities taking the individual with best PR centrality of each one; (b) uniform random selection of individuals; and (c) the ⌘ individuals with the best PR centrality of the network. Solid white lines to the left in the contour plots show the and µ combinations that achieved 35% of informed individuals. Dashed white lines show the combinations that achieved 60% of informed individuals. Table 1: Topological properties and results of community detection of the networks: last column, the best modularity Q and community division by fastgreedy algorithm Network N hki hgi H̃ ⇢ FastGreedy Q Nc email 1133 9.62 3.60 0.45 0.01 0.49 16 hamsterster 2000 16.1 3.58 0.48 0.02 0.46 57 advogato 5054 15.6 3.27 0.40 -0.09 0.34 49 astrophysics 14845 16.1 4.79 0.38 0.23 0.63 1172 average hgi, normalized entropy H̃, pearson cor- 5.2.2 Information diffusion results relation ⇢. Also, the best modularity Q value and The final fraction of informed individuals ('(1)) division number of communities N C of the net- was averaged over 100 executions for each combi- works produced by the FastGreedy algorithm are nation of initial seeds and parameters. This aver- reported. age represents the propagation capacity achieved email represents a social network of informa- by the selected seeds. tion exchanged by emails between members of the We evaluated the relation between the param- Rovira i Virgili University, Tarragona, with largest eters and the selection of the initial spreaders in hub degree equal to 71. an artificial network. In this experiment the PR was defined as the centrality measure employed to hamsterster is an undirected and unweighted find the seeds in the communities and the whole network based on the website data HAMSTER - network. A value of ⌘ = 4% of initial spread- STER . COM . The edges represent a relationship of ers was adopted for a scale-free network of size family or friend among users. The largest hub has N = 1000, hki = 8, hgi = 3.19, H̃ = 0.33 and degree equal to 273. ⇢ = 0.04. The propagation capacity '(1) was affected advogato is an online community platform for according to the initial seeds (Figure 2). The solid developers of free software launched in 1999. Ver- and dashed white curves represent the combina- tices are users of advogato, the directed edges rep- tion of and µ parameters that obtained 35% and resent trust relationships. The largest hub has de- 60% of informed individuals respectively. We ob- gree equal to 807. served that these curves show a well defined linear Finally, astrophysics is a collaborative net- pattern, which means any proportion of = /µ work between scientists on previous studies of will obtain equivalent '(1) results. astrophysics reported in arXiv during January 1, The selection of seeds by communities (Figure 1995 until December 31, 1999. The network is 2(a)) improved the diffusion on the network in weighted and directed and originally it has 16707 comparison with the Random seeds (Figure 2(b)) vertices. The largest hub of the main component and Best-ranked vertices (Figure 2(c)). This result has 360 connections. is corroborated by the increase of the white lines 79 0.62 BST−BE BST−DG BST−PR 0.6 0.62 COM−BE COM−DG COM−PR 0.58 RANDOM % of informed individuals % of informed individuals 0.61 0.56 0.6 0.54 0.52 0.59 BST−BE 0.5 BST−DG BST−PR 0.58 COM−BE 0.48 COM−DG COM−PR RANDOM 0.57 0.46 0 10 20 30 40 50 0 10 20 30 40 50 seeds seeds (a) email (b) hamsterster 0.43 0.56 0.42 0.55 0.41 % of informed individuals % of informed individuals 0.4 0.54 0.39 0.38 0.53 0.37 0.52 0.36 BST−BE BST−BE BST−DG BST−DG 0.35 BST−PR BST−PR COM−BE 0.51 COM−BE COM−DG COM−DG 0.34 COM−PR COM−PR RANDOM RANDOM 0.33 0.5 0 10 20 30 40 50 0 10 20 30 40 50 seeds seeds (c) advogato (d) astrophysics Figure 3: Propagation capacity of MT-TP model in the four real networks given the selection of seeds by communities (red points), best-ranked (black points), and randomly (blue points). Table 2: Average of propagation capacities for the full range of ⌘ 2 [1, 50], for each network achieved by seeds: (second big column) selecting the most important individuals from communities; (third big column) selecting the best-ranked individuals of the network; and (last column) randomly selecting the initial seeds. The adopted measures were Betweenness (BE), degree (DG) and PageRank (PR) centralities Network Community Best ranked Random BE DG PR BE DG PR selection email 0.6065 0.6105 0.6150 0.5880 0.5840 0.5884 0.6023 hamsterster 0.5485 0.5693 0.5728 0.5306 0.5226 0.5271 0.5273 advogato 0.4077 0.4102 0.4112 0.3993 0.3958 0.4007 0.3805 astrophysics 0.5417 0.5398 0.5415 0.5321 0.5278 0.5301 0.5337 slope. However, a little decrease in the lines slope seeds reached a higher propagation capacity than is evidenced in the MT-TP Best case with respect the selection of best-ranked vertices. For a larger to the MT-TP Random case. number of initial spreaders, '(1) tend to fall when the best-ranked vertices are selected. Consequently, we sought to analyze the impact On the other hand, when the community de- of ⌘ and centrality measures in the selection of tection was performed and individuals with high- seeds in the diffusion process. We varied the num- est values of DG, BE or PR in each community ber of communities and seeds from 2 to 50 and (red points, COM-**) were selected, the propa- fixed = 0.3 and µ = 0.2 for all simulations. gation capacity was improved and achieved the The real social networks described and the MT-TP best results. Therefore, more individuals were in- propagation model were considered in the anal- formed in the network by the community selec- ysis (Figure 3). The random selection of initial tion, with the same propagation constraint (num- spreader (blue points, RANDOM) or best-ranked ber of seeds). vertices (black points, BST-**) of DG, BE or PR centrality, produced a constant propagation capac- In terms of the topological measures, we ob- ity ('(1)). In some case, random selection of served that vertices with highest PageRank cen- 80 trality in the communities (COM-PR) obtained 7 Acknowledgments in average the best propagation results (Table 2). This research was partially supported by Na- Even in the selection of the best-ranked vertices, tional Council for Scientific and Technological the PageRank was notable. Another important Development (CNPq) grant: 140688/2013-7 and point is that often, the uniformly random selec- São Paulo Research Foundation (FAPESP) grant: tion of initial spreader could be a better option 2011/21880-3. than select the most central vertices (best-ranked) of the network. This is contrary what is cur- rently expected and adopted in marketing cam- References paigns, for instances. For all networks and for all Réka Albert and Albert-László Barabási. 2002. Statisti- size of seeds, we evidenced that starts the diffu- cal mechanics of complex networks. Rev. Mod. Phys., 74(1):47–97, jan. sion from the best-ranked vertices produces lower influence, or final fraction of informed individu- A.-L. Barabási. 2007. The architecture of complexity: From network structure to human dynamics. IEEE Control Sys- als, than purely select vertices at random; in some tems Magazine, 27(4):33–42. cases, the best-ranked selection achieved the worst Alain Barrat, MarseilleMarc Barthélemy, and Alessandro results. However, the selection of initial spread- Vespignani. 2008. Dynamical Processes on Complex Net- ers by communities showed, independently of the works. Cambridge University Press. centrality measure, higher results. Javier Borge-Holthoefer, Sandro Meloni, Bruno Gonçalves, and Yamir Moreno. 2012. Emergence of Influential 6 Final remarks Spreaders in Modified Rumor Models. Journal of Statis- tical Physics, 151(1-2):383–393, September. In this work, we proposed a method for maximiz- S. Brin and L. Page. 1998. The anatomy of a large-scale ing the information diffusion on networks. First, hypertextual web search engine. Computer Networks and we analyzed the impact of the spreader behavior ISDN Systems, V:107–117. in the propagation and confirmed that the Trun- Claudio Castellano, Santo Fortunato, and Vittorio Loreto. cate Process (TP) is more suitable to simulate in- 2009. Statistical Physics of Social Dynamics. Reviews of Modern Physics, 81(2):591–646, may. formation diffusion on networks. We applied com- munity detection and targeted the most influential Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. 2010. Emerging topic detection on Twitter based on tempo- vertices from these communities as initial seeds. ral and social terms evaluation. In Proceedings of the Experimental results on an artificial scale-free and Tenth International Workshop on Multimedia Data Min- four real social networks confirmed the increase in ing - MDMKDD ’10, pages 1–10, New York, New York, USA, jul. ACM Press. the final fraction of informed individuals. More- over, it was found that the PageRank centrality in Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient influence maximization in social networks. In Proceed- communities was a better choice in terms of effi- ings of the 15th ACM SIGKDD international conference ciency and influence maximization. on Knowledge discovery and data mining - KDD ’09, page A brief overview about complex network mea- 199, New York, New York, USA, jun. ACM Press. sures, community detection and information prop- L. D. F. Costa, F. A. Rodrigues, G Travieso, and P. R. Villas agation was introduced. We present our proposal Boas. 2007. Characterization of complex networks: A survey of measurements. Advances in Physics, 56:167– to select initial spreaders by communities. There 242. is still an open problem related to an exact defini- L C Freeman. 1977. A set of measures of centrality based on tion of what is considered a community and what betweenness. Sociometry, 40:35–41. would be the ideal division. Nevertheless, we var- ied the number of communities from 2 to 50 and R Guimera, L Danon, A Diaz-Guilera, F Giralt, and A Are- nas. 2003. Self-similar community structure in a network in general (for every community division) our pro- of human interactions. Physical Review E, 68:2003. posal achieved better results versus propagation Laurent Hébert-Dufresne, Antoine Allard, Jean-Gabriel without considering the community structure. Young, and Louis J Dubé. 2013. Global efficiency of In future work, other measures for selecting local immunization on complex networks. Scientific re- ports, 3:2171, January. influential individuals on networks could be ex- plored, in addition to DG, BE and PR applied here. David Kempe, Jon M. Kleinberg, and Éva Tardos. 2003. Also, other models of propagation and network Maximizing the Spread of Influence Through a Social Network. In Lise Getoor, Ted E. Senator, Pedro Domin- topologies could be tested, as well as novel strate- gos, and Christos Faloutsos, editors, KDD, pages 137– gies taking into account community information. 146. ACM. 81 Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik M. E. J. Newman. 2001. The structure of scientific collab- Liljeros, Lev Muchnik, H. Eugene Stanley, and Hernán A. oration networks. In Natl. Acad. Sci. USA, number 98, Makse. 2010. Identification of influential spreaders in pages 404 – 409. complex networks. Nature Physics, 6(11):888–893, Au- gust. M E J Newman. 2004. Fast algorithm for detecting community structure in networks. Physical Review E, Jrme Kunegis. 2013. KONECT – The Koblenz Network Col- 69(3):66133. lection. In Proc. Int. Web Observatory Workshop, pages 1343–1350. M E J Newman. 2006. Finding community structure in net- works using the eigenvectors of matrices. Physical Review Jrme Kunegis. 2014a. Advogato network dataset – E, 74(3):36104. KONECT, October. Jrme Kunegis. 2014b. Hamsterster full network dataset – Mark Newman. 2010. Networks: An Introduction. Oxford KONECT, jan. University Press, Inc., New York, NY, USA. Glenn Lawyer. 2012. Measuring node spreading power by Romualdo Pastor-Satorras and Alessandro Vespignani. 2001. expected cluster degree. page 4, September. Epidemic Spreading in Scale-Free Networks. Physical Review Letters, 86(14):3200–3203, April. D. P. Maki and M Thompson. 1973. Mathematical Models and Applications, with Emphasis on the Social, Life, and Sen Pei and Hernán A Makse. 2013. Spreading dynamics Management Sciences. Prentice-Hall. in complex networks. Journal of Statistical Mechanics: Theory and Experiment, 2013(12):P12002, December. Sandro Meloni, Alex Arenas, Sergio Gmez, Javier Borge- Holthoefer, and Yamir Moreno. 2012. Modeling epi- Matthew Richardson and Pedro Domingos. 2002. Mining demic spreading in complex networks: Concurrency and knowledge-sharing sites for viral marketing. In Proceed- traffic. In My T. Thai and Panos M. Pardalos, edi- ings of the eighth ACM SIGKDD international confer- tors, Handbook of Optimization in Complex Networks, ence on Knowledge discovery and data mining - KDD ’02, Springer Optimization and Its Applications, pages 435– page 61, New York, New York, USA, jul. ACM Press. 462. Springer US. Yamir Moreno, Maziar Nekovee, and Amalio F. Pacheco. 2004. Dynamics of rumor spreading in complex networks. Physical Review E, 69(6):066130, jun. 82