Spreader Selection by Community to Maximize Information Diffusion in Social Networks

Spreader Selection by Community to Maximize Information Diffusion in Social Networks DidierAVega-Oliveros Department of Computer Science ICMC University of São Paulo

C.P. 668, CEP 13560-970 São Carlos SP Brazil

LilianBerton lberton@icmc.usp.br Department of Computer Science ICMC University of São Paulo

C.P. 668, CEP 13560-970 São Carlos SP Brazil

Spreader Selection by Community to Maximize Information Diffusion in Social Networks 6DB986A6FE2F721BD2D66548D133C768 GROBID - A machine learning software for extracting information from scholarly documents

Rumors or information can spread quickly and reach many users in social networks. Models for understanding, preventing or increasing the diffusion of information are of greatest interest to companies, governments, scientists, etc. In this paper, we propose an approach for maximizing the information diffusion by selecting the most important (central) users from communities. We also analyze the selection of the most central vertices of the network and considered artificial and real social networks, such as email, hamsterster, advogato and astrophysics. Experimental results confirmed the improvement of the final fraction of informed individuals by applying the proposed approach.

Introduction

The modeling of propagation or diffusion processes in social networks has recently received more attention, since it allows to understand how a disease can be controlled or how information spread among individuals. These diffusion process are generally analyzed employing complex network theory (Barrat et al., 2008;Castellano et al., 2009). The area of complex networks seeks to study and understand the dynamics and behavior of complex systems, from the structure of the network to the internal dynamics or interactions.

Models that describe the evolution of rumors can be adapted to analyze the spread of spam on the Internet, advertising and marketing, political ideologies or technological news between individuals (Castellano et al., 2009). In such cases, the representation in complex networks enables the analysis of traditional models and the heterogeneous structure, which has a strong influence on the information diffusion process (Moreno et al., 2004;Barrat et al., 2008;Castellano et al., 2009). Therefore, some individuals can have a higher influence than others according to the network structure. Researchers have focused on identifying the most influential vertices (Kempe et al., 2003;Kitsak et al., 2010;Lawyer, 2012;Pei and Makse, 2013;Hébert-Dufresne et al., 2013) according to topological properties. It is expected this influencers convince the largest number of individuals in the network. However, the selection of more than one of them not necessarily maximizes the expected fraction of informed individuals, compared to an uniformly random selection approach.

In this paper, we propose an approach to maximize the information diffusion considering the community structure of the network. The community symbolizes a group of individuals with a greater tendency to have more internal than external connections to other groups. The reason is that vertices belonging to the same community are likely to be more similar to each other and share similar properties and affinity. We confirmed that selecting the most influential individual from each community as initial spreaders increases more the information diffusion than selecting the most influential individuals from the whole network.

As a motivation example, let us consider a company that wants to market a new product in the blogosphere. The company could select three very influential individuals of this social network (bloggers with thousands of access) to advertise its product. However, these influencers may be popular in the same group of people. On the other hand, if the strategy is to identify the three main communities on the network and select the most influential individuals of each community, the company would achieve a variety group of users and maximize the marketing diffusion.

The main contribution of this paper is the information diffusion approach based on selecting the most influential individuals from communities.

We employed an artificial scale-free and four real networks: email, hamsterster, advogato and astrophysics. We applied the SIR model for rumor propagation selecting the initial seeds from the whole network and from the communities. The impact that the Truncate (TP), Contact (CP) or Reactive (RP) processes have in the information diffusion was analyzed. The experimental results showed that the selection of individuals from the communities as initial spreaders, the final fraction of informed individuals is improved.

The remainder of the paper is organized as follows: Section 2 introduces some definitions and measures covered in this paper, the community detection algorithm applied and the propagation process in networks. Section 3 brings some related work. Section 4 presents the proposed approach for information diffusion based on communities. Section 5 exhibits the experimental results on an artificial scale-free and four real social networks. Finally, Section 6 discusses the final remarks.

Theoretical background

A network is a collection of items called nodes or vertices, which are joined together by connections called links or edges. Formally we define the network G = (V, E, W ), where V = {v 1 , v 2 , . . . v n } is the set of N vertices, E = {e 1 , e 2 , . . . e m } is the set of M edges and W = {w 1 , w 2 , . . . w m } are the weights of the edges that determine the strength of the interaction between the respective vertices, in the case of weighted networks. In mathematical terms, an undirected and unweighted network can be represented by an adjacency matrix A, in which two connected vertices i and j are in the matrix a ij = a ji = 1, otherwise, they are equal to 0. A path is a consecutive sequence that starts at vertex i and ends in j, so that vertices are visited more than once. The distance or length of the path is defined as the number of edges contained in the sequence. The shortest distance between two vertices is known as the shortest path or geodesic path g i,j . A component is the largest sub-set of vertices from the network in which exist at least one path between each pair of vertices, but never connect to another component. A connected network has only one component. When the networks have more than one component, we considered the largest of them.

The degree or connectivity of vertex i, called as k i , is the number of edges or connections that vertex i has. In the case of directed networks is the sum of the degrees of input (edges that reach the vertex) and output (edges that leave the vertex).

The average degree hki is the average of all k i of the network. The vertices that have a very high degree in the network are called hubs.

The degree distribution of a network P (k) is the probability of randomly select a vertex with degree k. Social networks present scale-free degree distribution (Barabási, 2007;Newman, 2010), with P (k) ⇠ k , in which most of the individuals have low degree, near to the average, and only a few of them have very high degree (hubs). The level of disorder or heterogeneity in vertices connections is obtained with the entropy of degree distribution. We employed the normalized version of the Shannon entropy, i.e.

H = P 1 k=0 P (k) log(P (k)) log(N ) ,(1)

with 0  H  1. The maximum value for the entropy occurs when P (k) shows a uniform distribution and the lowest possible value happens when all vertices have the same degree. The entropy of a network is related to the robustness and their level of resilience.

The robustness is also related to the correlation degree of the network. A network is assortative, or positive correlated, if vertices tend to connect with vertices with similar degree. A Network is dissassortative, or negative correlated, if vertices with low degree tend to connect with higher connected vertices (hubs). When networks do not present any of above patterns, they are called as non-assortative. For the calculation of the network correlation we employed the Pearson coefficient, formulated with adjacency matrix as

⇢ = (1/M ) P j>i k i k j a ij h (1/M ) P j>i (k i +k j )a ij 2 i 2 (1/M ) P j>i (ki 2 +k j 2 )aij 2 h (1/M ) P j>i (k i +k j )a ij 2 i 2 (2)

where M is the total number of edges in the network. If r > 0, then the network is assortative. If r < 0 the network is dissassortative. If r = 0, then there is no correlation between the degree of vertices.

Centrality measures

In complex and social networks have been proposed measures to describe the importance or centrality of vertices (Costa et al., 2007) according to topological and dynamical properties. The centralities adopted in this work are briefly described as follow. Degree centrality (DG) is related with the number of connections or popularity of a vertex (Costa et al., 2007) and in terms of the adjacency matrix is expressed as

k i = X i2N a ij .(3)

Betweenness centrality (BE) quantifies the number of shortest paths that pass through a vertex j between all pair of different vertices (i, k) (Freeman, 1977). It expresses how much a vertex B j works as bridge or is a trusted vertex in the transmission of information, i.e.

B j = X i,k2V,i6 =k ik (j) ik ,(4)

where ik is the total number of different shortest path between i and k, and ik (j) is the number of times j appears in those paths.

PageRank centrality (PR) expresses the importance of a vertex according to the probability that other vertices have to arrive at it, after a large number of steps (Brin and Page, 1998). The idea is to simulate the surfing on the net. The user can follow the links available at the current page or jump to other by typing a new URL. In social terms, it can be approached like the more cited or renowned individuals. The formalization of the PageRank centrality is

⇡t = ⇡t 1 G ,(5)

where ⇡t are the PageRank values for each vertex in the t th step of navigation and G is known as the Google matrix. When t = 0 we have by default ⇡0 = {1, . . . , 1}. The jumps are represented by a probability ↵ and we adopted the same value as defined in the original version (Brin and Page, 1998).

Community detection

Communities are sets of densely interconnected vertices and sparsely connected with the rest of the network (Newman, 2010). Vertices that belong to the same community, in general, share common properties and perform similar roles. Therefore, the division of a network into communities helps to understand their topological structure (structural and functional properties) and its dynamic processes, obtaining relevant information and features to the network domain.

We can evaluate a partition based on the scores obtained from a quality measure. The goal is to evaluate expected features in a good community division. One of the most popular quality measures is the modularity Q (Newman, 2004). It compares the current density of intra-community and inter-community edges relative to a random network with similar characteristics. It is based on the fact that random networks have no community structure.

Given a network with c communities, the Q modularity is calculated by a symmetric matrix N ⇥ N , in which elements along the main diagonal e

ii represent connections into the same community and elements e ij represent connections between different communities i and j. Equation 6shows the formulation of Q.

Q = X i 2 4 e ij 0 @ X j e ij 1 A 2 3 5(6)

If a specific division provides less edges between communities than would be expected by random connections, the value of Q would be 0.

When the network has isolated communities the value of Q would be 1. This measure is employed by several techniques to identify communities in networks systems, especially in divisive and agglomerative methods (Guimera et al., 2003;Newman, 2004;Newman, 2006).

Newman (Newman, 2004) proposes an agglomerative method that is an optimized greedy algorithm, called fastgreedy. The approach starts with a copy of a real network of N vertices with no connections, producing at first N communities. At each iteration, two communities c i and c j , which have connections in real network, are chosen in order to obtain the greatest improvement of Q (Equation 7). A pruning is performed in the search space considering only the edges that exist between communities. Therefore, execution time is reduced when considering the new Q function (Equation 7).

Q ij = 2 ✓ e ij P j e ij P i e ij 2M ◆(7)

The result can also be represented as a dendrogram. Cuts at different levels of the dendrogram produce divisions with greater or lesser number of communities, and the best cut yields the largest value of Q. The algorithm at each step has O(M + N ). Since there are at most N 1 join operations required to build a complete dendrogram, their overall complexity is O((M + N )N ), or O(N 2 ), for sparse graph. Consequently, by adopting this method is more treatable the analysis of communities in larger networks.

Propagation process on networks

In classical rumor diffusion models the ignorant or inactive individuals (S) are those who remain unaware of the information, the spreaders (I) are those who disseminate the information and the stifler (R) are those who know the information but lose the interest in spreading it. All vertices have the same probability for transmit the information to their neighbors and µ for stopping to be active.

The Maki-Thompson (MT) (Maki and Thompson, 1973) model is a spreader-centric approach employed for describing the propagation of ideas and rumors on networks. In the MT process whenever an active spreader i contacts a vertex j that is inactive, the latter will become active with a fixed probability . Otherwise, in the case that j knows about the rumor, it means j is an active spreader or a stifler, the vertex i turns into a stifler with probability µ. The behavior when the spreader stops to propagate can be understood because the information is too much known (contacting spreaders) or without novelty (contacting stifler).

Three possible choices related with the spreader behavior during the diffusion have been reported (Borge-Holthoefer et al., 2012;Meloni et al., 2012). They are the Reactive process (RP), Truncated process (TP) and Contact Process (CP). However, a clear analysis about the impact of spreaders behavior in the propagation process has not been tackled yet. Moreover, there is not a consensus about what to employ in rumor or information diffusion and it may cause a misinterpretation of results. The three main characteristic behaviors reported to spreaders are described as follow.

• Reactive Process (RP): In each iteration, the spreaders try to pass the rumor among all their ignorant neighbors. After that, it evaluates whether it will become stifler in the next iteration or not, considering the contact with all their spreader and stifler neighbors.

• Truncated Process (TP): It consists of truncate or interrupt the contagion in the precise time. In each iteration and for each spreader, it is randomly selected one neighbor at time, and setting up the states of the contact as corresponds. The selection continues until the spreader visit all its neighbors or it becomes stifler, whichever occurs first.

• Contact Process (CP): In each time step and for each spreader, it is chosen at random a single neighbor. Then, it is resolved the transition states according to the rule that corresponds. After that, continues with the next spreader of the network of the same time step.

Different theoretical models have been proposed for modeling the rumor dynamics on networks (Moreno et al., 2004;Barrat et al., 2008;Castellano et al., 2009;Borge-Holthoefer et al., 2012). These analytical models make assumptions about the network structure, such as the degree correlation or distribution, compartments or class of vertices with same probabilities, homogeneous/heterogeneous mixing or mean field theory. Notwithstanding all of them claim that their numerical solutions agree with the MC simulations, so we adopt this approach as an exploratory research.

Related work

Many approaches have been developed in order to understand the propagation of ideas or information through social networks (Castellano et al., 2009). Specially, characterize the individuals that are most influential in the propagation process has attracted the attention of researchers (Richardson and Domingos, 2002;Kempe et al., 2003;Kitsak et al., 2010;Pei and Makse, 2013).

The conventional approach for describing the most influential vertices is performing a microscopic analysis on the network. Vertices are classified considering their topological properties, sorted and ranked in order to generalize their ability to propagate (Kitsak et al., 2010;Hébert-Dufresne et al., 2013;Pei and Makse, 2013). However, to find the set of initial vertices that maximize the propagation capacity, the selection of the most influential spreader may produce an overlap of influence in the population (Kitsak et al., 2010;Pei and Makse, 2013).

In terms of topological properties, there not exists a consensus about what is the more accurate measure that describes the most influential vertices. Some researches claim that hubs are more representative to influence others vertices (Pastor-Satorras and Vespignani, 2001;Albert and Barabási, 2002). Vertices with higher degree are more efficient to maximize the propagation because, in general, hubs not tend to connect with each other and thus can achieve a greater number of vertices (Kitsak et al., 2010). In the case of communities, the degree proportion of a vertex i is defined as the number of edges that i has in each community. This degree proportion was found as a good descriptor of influence for communities (Lawyer, 2012).

On the other hand, the most influential vertices are described as those with the largest Betweenness centrality (Hébert-Dufresne et al., 2013), because they intermediate the communication between groups of vertices, which increase their influence. According to the authors, Betweenness centrality is a better descriptor of the most influential spreader in communities.

The PageRank is also considered a better measure to describe the most influential vertices (Cataldi et al., 2010). The reason is that it employs the random walk concept over the network to be calculated and vertices with higher values mean higher probability to be visited.

Finally, Kempe et al. (2003) propose a greedy algorithm to obtain ⌘ initial spreaders that maximizes the diffusion influence. The authors adopt a discrete optimization approach and prove that the optimization problem is NP-hard. It was implemented considering the independent and weighted cascade model that have only two states, which are different to the SIR model. The method evaluates one vertex at time to be added in the set of selected seeds. The new vertex is accepted if it is what most increment the diffusion. However, this approach has a very higher computational cost problem, although new researches try to optimize the performance (Chen et al., 2009).

Information diffusion by communities

Let us consider a constant population of N vertices in all time steps. Each vertex can be only in one state, that is I i (t) = 1 iff i 2 I, otherwise I i (t) = 0, and S i (t)+I i (t)+R i (i) = 1. The macroscopic fraction of ignorant ( (t)), spreaders ( (t)) and stifler ('(t)) over time is calculated as (t) = 1 N P N i=1 S i (t), that is similar to the other states and always fulfill (t) + (t) + '(t) = 1. We assume that infection and recovering do not occur during the same discrete time window or step.

Setup

The initial setup for the propagation is (0) = 1 ⌘/N , (0) = ⌘/N and '(0) = 0, where ⌘ represents the seeds or number of initial spreaders. Each simulation begins with a selection of ⌘ vertices. At each time step, all spreaders uniformly select and try to infect its neighbors with probability , or stop the diffusion with probability µ according to the spreader behavior adopted. Successful change of state (to be spreader or to be stifler) are effective at the next iteration. The simulations run until the end of the process is reached, when (1) = 0.

Community selection approach

We propose to select the initial spreaders from the community division of the network. The multiple seeds are the most central vertices of each community. The community division may be calculated by some divisive or agglomerative method (Section 2.2) and here the fastgreedy algorithm was employed. The method is detailed as follow:

First, given a required number of ⌘ initial spreaders, we find the ⌘ main communities of the network by the fastgreedy algorithm. Then, each community is isolated, which produces ⌘ components. The isolation process consists in maintaining the intra-community edges and erasing the inter-community connections. For each isolated community, a specific centrality measure is calculated to all vertices. Since vertices with higher centrality are considered more suitable to influence on the network, we select the most important vertex from each community. Therefore, these vertices influence more in their own community and the overlap of influence in the population is minimized. At the end, ⌘ seeds are selected and they have the best centrality value of its community. We take the original full network, the ⌘ seeds, the parameters and execute the corresponding simulations.

For the centrality measure, the point is to find what centrality better identifies the influential spreaders, by communities and in the whole network, that maximizes the information diffusion. Here, the degree (DG), PageRank (PR) and Betweenness (BE) centralities were considered.

Experimental results

In this section we analyzed the information diffusion in an artificial scale-free and four real social networks. We evaluated the impact spreaders behavior have in the diffusion on the networks. Then, the results about the selection of initial spreaders by communities, best-ranked vertices of the network and random seeds were explored.

Spreader behavior analysis

We analyzed the three behavioral approaches for the spreaders and present the impact they produce on the propagation process. We considered the MT model with an artificial scale-free network of size N = 1000 and hki = 8. In order to understand the overall spectral effect with the parameters, the simulations were evaluating a range of and µ in (0, 1]. Therefore, the differences between the approaches are evidenced. For each tuple of values ( , µ), it was selected 100 times at random ⌘ = 1% of initial spreader (seeds) and each time was an average over 50 executions. The impact of the behavioral approach in the final fraction of informed individuals is shown in Figure 1. We observed that the CP approach is less redundant in the number of contacts made by spreader, producing lower fractions of informed individuals, in comparison to the other behaviors. Still, because the single contact made by iteration, the CP behavior is more similar to a propagation through the "word-of-mouth" situation.

The RP approach obtained more than 80% of informed individuals with values of 0.3, no matter values of µ. Therefore, the RP approach favors a viral diffusion on the network with lower values of and it happens independently of which are the initial seeds. For this reason, RP is a more suitable approach to simulate broadcasting propagation.

On the other hand, the TP behavior is more related to the contact network scenario, where the position and topological characteristics of seeds may have influence in the diffusion. Moreover, TP presents more balanced results, near 60% when ⇡ µ, and contacts are not as restricted as CP behavior. For this reason, we adopted hereafter the MT-TP approach as the propagation process for the analysis.

Multiple initial spreader analysis

The experiments were performed with three possibilities for choosing the initial spreader: (i) by randomly selecting ⌘ individuals as initial seeds in the network; (ii) by selecting the best-ranked ⌘ individuals with highest value of a specific centrality of the network; and finally, (iii) by detecting ⌘ communities on the network and for each isolated community selecting the individual with highest value of a specific centrality measure. The centrality measures selected were degree (DG), Betweenness (BE) and Pagerank (PR).

Real social networks

We adopted the email (Guimera et al., 2003), advogato (Kunegis, 2014a), astrophysics (Newman, 2001) and hamsterster (Kunegis, 2013;Kunegis, 2014b). All of them were assumed as undirected and unweighted networks and also it was considered the largest component for the simulations. The structural properties of the networks are summarized in Table 1, with the respective number of vertices N , the average degree hki, shortest paths email represents a social network of information exchanged by emails between members of the Rovira i Virgili University, Tarragona, with largest hub degree equal to 71.

hamsterster is an undirected and unweighted network based on the website data HAMSTER-STER.COM. The edges represent a relationship of family or friend among users. The largest hub has degree equal to 273.

advogato is an online community platform for developers of free software launched in 1999. Vertices are users of advogato, the directed edges represent trust relationships. The largest hub has degree equal to 807.

Finally, astrophysics is a collaborative network between scientists on previous studies of astrophysics reported in arXiv during January 1, 1995 until December 31, 1999. The network is weighted and directed and originally it has 16707 vertices. The largest hub of the main component has 360 connections.

Information diffusion results

The final fraction of informed individuals ('(1)) was averaged over 100 executions for each combination of initial seeds and parameters. This average represents the propagation capacity achieved by the selected seeds.

We evaluated the relation between the parameters and the selection of the initial spreaders in an artificial network. In this experiment the PR was defined as the centrality measure employed to find the seeds in the communities and the whole network. A value of ⌘ = 4% of initial spreaders was adopted for a scale-free network of size N = 1000, hki = 8, hgi = 3.19, H = 0.33 and ⇢ = 0.04.

The propagation capacity '(1) was affected according to the initial seeds (Figure 2). The solid and dashed white curves represent the combination of and µ parameters that obtained 35% and 60% of informed individuals respectively. We observed that these curves show a well defined linear pattern, which means any proportion of = /µ will obtain equivalent '(1) results.

The selection of seeds by communities (Figure 2 Consequently, we sought to analyze the impact of ⌘ and centrality measures in the selection of seeds in the diffusion process. We varied the number of communities and seeds from 2 to 50 and fixed = 0.3 and µ = 0.2 for all simulations. The real social networks described and the MT-TP propagation model were considered in the analysis (Figure 3). The random selection of initial spreader (blue points, RANDOM) or best-ranked vertices (black points, BST-**) of DG, BE or PR centrality, produced a constant propagation capacity ('(1)). In some case, random selection of seeds reached a higher propagation capacity than the selection of best-ranked vertices. For a larger number of initial spreaders, '(1) tend to fall when the best-ranked vertices are selected.

On the other hand, when the community detection was performed and individuals with highest values of DG, BE or PR in each community (red points, COM-**) were selected, the propagation capacity was improved and achieved the best results. Therefore, more individuals were informed in the network by the community selection, with the same propagation constraint (number of seeds).

In terms of the topological measures, we observed that vertices with highest PageRank cen-trality in the communities (COM-PR) obtained in average the best propagation results (Table 2). Even in the selection of the best-ranked vertices, the PageRank was notable. Another important point is that often, the uniformly random selection of initial spreader could be a better option than select the most central vertices (best-ranked) of the network. This is contrary what is currently expected and adopted in marketing campaigns, for instances. For all networks and for all size of seeds, we evidenced that starts the diffusion from the best-ranked vertices produces lower influence, or final fraction of informed individuals, than purely select vertices at random; in some cases, the best-ranked selection achieved the worst results. However, the selection of initial spreaders by communities showed, independently of the centrality measure, higher results.

Final remarks

In this work, we proposed a method for maximizing the information diffusion on networks. First, we analyzed the impact of the spreader behavior in the propagation and confirmed that the Truncate Process (TP) is more suitable to simulate information diffusion on networks. We applied community detection and targeted the most influential vertices from these communities as initial seeds. Experimental results on an artificial scale-free and four real social networks confirmed the increase in the final fraction of informed individuals. Moreover, it was found that the PageRank centrality in communities was a better choice in terms of efficiency and influence maximization.

A brief overview about complex network measures, community detection and information propagation was introduced. We present our proposal to select initial spreaders by communities. There is still an open problem related to an exact definition of what is considered a community and what would be the ideal division. Nevertheless, we varied the number of communities from 2 to 50 and in general (for every community division) our proposal achieved better results versus propagation without considering the community structure.

In future work, other measures for selecting influential individuals on networks could be explored, in addition to DG, BE and PR applied here. Also, other models of propagation and network topologies could be tested, as well as novel strategies taking into account community information.

Figure 1 :1Figure 1: MT (Maki and Thompson, 1973) propagation in an artificial scale-free network with N = 1000, hki = 8 and ⌘ = 1% of initial seeds selected. The color bar shows the final fraction of informed individuals. The behavioral approaches for spreader analyzed are: (a) Reactive process (RP); (b) Truncated process (TP); and (c) Contact process (CP)

Figure 2 :2Figure 2: MT-TP propagation in an artificial scale-free network with N = 1000 and hki = 8. The final fraction of informed individuals are shown in the color bar. The selection of ⌘ = 4% of seeds was made by: (a) ⌘ communities taking the individual with best PR centrality of each one; (b) uniform random selection of individuals; and (c) the ⌘ individuals with the best PR centrality of the network. Solid white lines to the left in the contour plots show the and µ combinations that achieved 35% of informed individuals. Dashed white lines show the combinations that achieved 60% of informed individuals.

(a)) improved the diffusion on the network in comparison with the Random seeds (Figure 2(b)) and Best-ranked vertices (Figure 2(c)). This result is corroborated by the increase of the white lines

Figure 3 :3Figure 3: Propagation capacity of MT-TP model in the four real networks given the selection of seeds by communities (red points), best-ranked (black points), and randomly (blue points).

Table 1 :1Topological properties and results of community detection of the networks: last column, the best modularity Q and Also, the best modularity Q value and division number of communities NC of the networks produced by the FastGreedy algorithm are reported.community division by fastgreedy algorithmNetworkNhkihgiH⇢FastGreedy Q Ncemail1133 9.62 3.60 0.45 0.01 0.4916hamsterster2000 16.1 3.58 0.48 0.02 0.4657advogato5054 15.6 3.27 0.40 -0.09 0.3449astrophysics 14845 16.1 4.79 0.38 0.23 0.63 1172average hgi, normalized entropy H, pearson cor-relation ⇢.

Table 2 :2Average of propagation capacities for the full range of ⌘ 2 [1, 50], for each network achieved by seeds: (second bigcolumn) selecting the most important individuals from communities; (third big column) selecting the best-ranked individualsof the network; and (last column) randomly selecting the initial seeds. The adopted measures were Betweenness (BE), degree(DG) and PageRank (PR) centralitiesNetworkCommunityBest rankedRandomBEDGP RBEDGP Rselectionemail0.6065 0.6105 0.6150 0.5880 0.5840 0.58840.6023hamsterster0.5485 0.5693 0.5728 0.5306 0.5226 0.52710.5273advogato0.4077 0.4102 0.4112 0.3993 0.3958 0.40070.3805astrophysics0.5417 0.5398 0.5415 0.5321 0.5278 0.53010.5337slope. However, a little decrease in the lines slopeis evidenced in the MT-TP Best case with respectto the MT-TP Random case.

Acknowledgments

This research was partially supported by National Council for Scientific and Technological Development (CNPq) grant: 140688/2013-7 and São Paulo Research Foundation (FAPESP) grant: 2011/21880-3.

Statistical mechanics of complex networks RékaAlbert Albert-LászlóBarabási Rev. Mod. Phys 74 1 2002. jan The architecture of complexity: From network structure to human dynamics A.-LBarabási IEEE Control Systems Magazine 27 4 2007 AlainBarrat MarseillemarcBarthélemy AlessandroVespignani Dynamical Processes on Complex Networks Cambridge University Press 2008 Emergence of Influential Spreaders in Modified Rumor Models JavierBorge-Holthoefer SandroMeloni BrunoGonc YamirMoreno Journal of Statistical Physics 151 1-2 2012. September The anatomy of a large-scale hypertextual web search engine SBrin LPage Computer Networks and ISDN Systems V 1998 Statistical Physics of Social Dynamics ClaudioCastellano SantoFortunato VittorioLoreto Reviews of Modern Physics 81 2 2009 may Emerging topic detection on Twitter based on temporal and social terms evaluation MarioCataldi LuigiDiCaro ClaudioSchifanella Proceedings of the Tenth International Workshop on Multimedia Data Mining -MDMKDD '10 the Tenth International Workshop on Multimedia Data Mining -MDMKDD '10

New York, New York, USA

ACM Press 2010 Efficient influence maximization in social networks WeiChen YajunWang SiyuYang Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining -KDD '09 the 15th ACM SIGKDD international conference on Knowledge discovery and data mining -KDD '09

New York, New York, USA, jun

ACM Press 2009 199 Characterization of complex networks: A survey of measurements LD FCosta FARodrigues PRTravieso Villas Boas Advances in Physics 56 2007 A set of measures of centrality based on betweenness L CFreeman Sociometry 40 1977 Self-similar community structure in a network of human interactions Guimera Danon Diaz-Guilera Giralt Arenas Physical Review E 68 2003. 2003 Global efficiency of local immunization on complex networks LaurentHébert-Dufresne AntoineAllard Jean-GabrielYoung LouisJDubé Scientific reports 3 2171 2013. January Maximizing the Spread of Influence Through a Social Network DavidKempe JonMKleinberg ÉvaTardos Lise Getoor, Ted E. Senator, Pedro Domingos, and Christos Faloutsos 2003 ACM Identification of influential spreaders in complex networks MaksimKitsak LazarosKGallos ShlomoHavlin FredrikLiljeros LevMuchnik HEugeneStanley HernánAMakse Nature Physics 6 11 2010 August KONECT -The Koblenz Network Collection JrmeKunegis Proc. Int. Web Observatory Workshop Int. Web Observatory Workshop 2013 JrmeKunegis Advogato network dataset -KONECT 2014a. October JrmeKunegis Hamsterster full network dataset -KONECT 2014b. jan Measuring node spreading power by expected cluster degree GlennLawyer 2012. September DPMaki Thompson Mathematical Models and Applications, with Emphasis on the Social, Life, and Management Sciences Prentice-Hall 1973 Modeling epidemic spreading in complex networks: Concurrency and traffic SandroMeloni AlexArenas SergioGmez JavierBorge-Holthoefer YamirMoreno editors, Handbook of Optimization in Complex Networks, Springer Optimization and Its Applications MyTThai PanosMPardalos Springer US 2012 Dynamics of rumor spreading in complex networks YamirMoreno MaziarNekovee AmalioFPacheco Physical Review E 69 6 66130 2004. jun The structure of scientific collaboration networks ME JNewman Natl. Acad. Sci

USA

2001 98 Fast algorithm for detecting community structure in networks M E JNewman Physical Review E 69 3 66133 2004 Finding community structure in networks using the eigenvectors of matrices M E JNewman Physical Review E 74 3 36104 2006 MarkNewman Networks: An Introduction

New York, NY, USA

Oxford University Press, Inc 2010 Epidemic Spreading in Scale-Free Networks RomualdoPastor -Satorras AlessandroVespignani Physical Review Letters 86 14 2001. April Spreading dynamics in complex networks SenPei Hernán Makse Journal of Statistical Mechanics: Theory and Experiment 12 P12002 2013. 2013. December Mining knowledge-sharing sites for viral marketing MatthewRichardson PedroDomingos Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining -KDD '02 the eighth ACM SIGKDD international conference on Knowledge discovery and data mining -KDD '02

New York, New York, USA, jul

ACM Press 2002