=Paper=
{{Paper
|id=Vol-1478/paper8
|storemode=property
|title=Spreader Selection by Community to Maximize Information Diffusion in Social Networks
|pdfUrl=https://ceur-ws.org/Vol-1478/paper8.pdf
|volume=Vol-1478
|dblpUrl=https://dblp.org/rec/conf/simbig/Vega-OliverosB15
}}
==Spreader Selection by Community to Maximize Information Diffusion in Social Networks==
Spreader Selection by Community to Maximize Information Diffusion in
Social Networks
Didier A. Vega-Oliveros and Lilian Berton
Department of Computer Science
ICMC, University of São Paulo
C.P. 668, CEP 13560-970, São Carlos, SP, Brazil
davo,lberton@icmc.usp.br
Abstract 2004; Barrat et al., 2008; Castellano et al., 2009).
Therefore, some individuals can have a higher in-
Rumors or information can spread quickly fluence than others according to the network struc-
and reach many users in social networks. ture. Researchers have focused on identifying the
Models for understanding, preventing or most influential vertices (Kempe et al., 2003; Kit-
increasing the diffusion of information are sak et al., 2010; Lawyer, 2012; Pei and Makse,
of greatest interest to companies, gov- 2013; Hébert-Dufresne et al., 2013) according to
ernments, scientists, etc. In this paper, topological properties. It is expected this influ-
we propose an approach for maximizing encers convince the largest number of individuals
the information diffusion by selecting the in the network. However, the selection of more
most important (central) users from com- than one of them not necessarily maximizes the
munities. We also analyze the selection expected fraction of informed individuals, com-
of the most central vertices of the net- pared to an uniformly random selection approach.
work and considered artificial and real so- In this paper, we propose an approach to max-
cial networks, such as email, hamsterster, imize the information diffusion considering the
advogato and astrophysics. Experimental community structure of the network. The com-
results confirmed the improvement of the munity symbolizes a group of individuals with a
final fraction of informed individuals by greater tendency to have more internal than ex-
applying the proposed approach. ternal connections to other groups. The reason is
that vertices belonging to the same community are
1 Introduction likely to be more similar to each other and share
The modeling of propagation or diffusion pro- similar properties and affinity. We confirmed that
cesses in social networks has recently received selecting the most influential individual from each
more attention, since it allows to understand how community as initial spreaders increases more the
a disease can be controlled or how information information diffusion than selecting the most in-
spread among individuals. These diffusion pro- fluential individuals from the whole network.
cess are generally analyzed employing complex As a motivation example, let us consider a com-
network theory (Barrat et al., 2008; Castellano et pany that wants to market a new product in the
al., 2009). The area of complex networks seeks blogosphere. The company could select three very
to study and understand the dynamics and behav- influential individuals of this social network (blog-
ior of complex systems, from the structure of the gers with thousands of access) to advertise its
network to the internal dynamics or interactions. product. However, these influencers may be popu-
Models that describe the evolution of rumors lar in the same group of people. On the other hand,
can be adapted to analyze the spread of spam on if the strategy is to identify the three main commu-
the Internet, advertising and marketing, political nities on the network and select the most influen-
ideologies or technological news between individ- tial individuals of each community, the company
uals (Castellano et al., 2009). In such cases, the would achieve a variety group of users and maxi-
representation in complex networks enables the mize the marketing diffusion.
analysis of traditional models and the heteroge- The main contribution of this paper is the in-
neous structure, which has a strong influence on formation diffusion approach based on selecting
the information diffusion process (Moreno et al., the most influential individuals from communities.
73
We employed an artificial scale-free and four real vertex i has. In the case of directed networks is the
networks: email, hamsterster, advogato and as- sum of the degrees of input (edges that reach the
trophysics. We applied the SIR model for rumor vertex) and output (edges that leave the vertex).
propagation selecting the initial seeds from the The average degree hki is the average of all ki of
whole network and from the communities. The the network. The vertices that have a very high
impact that the Truncate (TP), Contact (CP) or degree in the network are called hubs.
Reactive (RP) processes have in the information The degree distribution of a network P (k) is the
diffusion was analyzed. The experimental results probability of randomly select a vertex with degree
showed that the selection of individuals from the k. Social networks present scale-free degree dis-
communities as initial spreaders, the final fraction tribution (Barabási, 2007; Newman, 2010), with
of informed individuals is improved. P (k) ⇠ k , in which most of the individuals
The remainder of the paper is organized as fol- have low degree, near to the average, and only a
lows: Section 2 introduces some definitions and few of them have very high degree (hubs). The
measures covered in this paper, the community de- level of disorder or heterogeneity in vertices con-
tection algorithm applied and the propagation pro- nections is obtained with the entropy of degree dis-
cess in networks. Section 3 brings some related tribution. We employed the normalized version of
work. Section 4 presents the proposed approach the Shannon entropy, i.e.
for information diffusion based on communities.
P1
Section 5 exhibits the experimental results on an k=0 P (k) log(P (k))
artificial scale-free and four real social networks. H̃ = , (1)
log(N )
Finally, Section 6 discusses the final remarks.
with 0 H̃ 1. The maximum value for the
2 Theoretical background
entropy occurs when P (k) shows a uniform distri-
A network is a collection of items called nodes or bution and the lowest possible value happens when
vertices, which are joined together by connections all vertices have the same degree. The entropy of a
called links or edges. Formally we define the net- network is related to the robustness and their level
work G = (V, E, W ), where V = {v1 , v2 , . . . vn } of resilience.
is the set of N vertices, E = {e1 , e2 , . . . em } is the The robustness is also related to the correlation
set of M edges and W = {w1 , w2 , . . . wm } are the degree of the network. A network is assortative,
weights of the edges that determine the strength of or positive correlated, if vertices tend to connect
the interaction between the respective vertices, in with vertices with similar degree. A Network is
the case of weighted networks. In mathematical dissassortative, or negative correlated, if vertices
terms, an undirected and unweighted network can with low degree tend to connect with higher con-
be represented by an adjacency matrix A, in which nected vertices (hubs). When networks do not
two connected vertices i and j are in the matrix present any of above patterns, they are called as
aij = aji = 1, otherwise, they are equal to 0. non-assortative. For the calculation of the network
A path is a consecutive sequence that starts at correlation we employed the Pearson coefficient,
vertex i and ends in j, so that vertices are vis- formulated with adjacency matrix as
ited more than once. The distance or length of the
P
path is defined as the number of edges contained in (1/M ) j>i ki kj aij
h P i2
the sequence. The shortest distance between two (k +k )a
(1/M ) j>i i 2j ij
vertices is known as the shortest path or geodesic ⇢= (2)
path gi,j . A component is the largest sub-set of P (ki 2 +kj 2 )aij
(1/M ) j>i 2
vertices from the network in which exist at least h P i
(ki +kj )aij 2
one path between each pair of vertices, but never (1/M ) j>i 2
connect to another component. A connected net-
work has only one component. When the networks where M is the total number of edges in the net-
have more than one component, we considered the work. If r > 0, then the network is assortative.
largest of them. If r < 0 the network is dissassortative. If r = 0,
The degree or connectivity of vertex i, called then there is no correlation between the degree of
as ki , is the number of edges or connections that vertices.
74
2.1 Centrality measures properties and perform similar roles. Therefore,
In complex and social networks have been pro- the division of a network into communities helps
posed measures to describe the importance or cen- to understand their topological structure (struc-
trality of vertices (Costa et al., 2007) according to tural and functional properties) and its dynamic
topological and dynamical properties. The cen- processes, obtaining relevant information and fea-
tralities adopted in this work are briefly described tures to the network domain.
as follow. We can evaluate a partition based on the scores
Degree centrality (DG) is related with the num- obtained from a quality measure. The goal is to
ber of connections or popularity of a vertex (Costa evaluate expected features in a good community
et al., 2007) and in terms of the adjacency matrix division. One of the most popular quality mea-
is expressed as sures is the modularity Q (Newman, 2004). It
compares the current density of intra-community
X
ki = aij . (3) and inter-community edges relative to a random
i2N network with similar characteristics. It is based on
the fact that random networks have no community
Betweenness centrality (BE) quantifies the structure.
number of shortest paths that pass through a vertex Given a network with c communities, the Q
j between all pair of different vertices (i, k) (Free- modularity is calculated by a symmetric matrix
man, 1977). It expresses how much a vertex Bj N ⇥ N , in which elements along the main diag-
works as bridge or is a trusted vertex in the trans- onal eii represent connections into the same com-
mission of information, i.e. munity and elements eij represent connections be-
X tween different communities i and j. Equation 6
ik (j)
Bj = , (4) shows the formulation of Q.
ik
i,k2V,i6=k
2 0 12 3
where ik is the total number of different shortest X X
Q= 4eij @ eij A 5 (6)
path between i and k, and ik (j) is the number of i j
times j appears in those paths.
PageRank centrality (PR) expresses the impor- If a specific division provides less edges be-
tance of a vertex according to the probability that tween communities than would be expected by
other vertices have to arrive at it, after a large num- random connections, the value of Q would be 0.
ber of steps (Brin and Page, 1998). The idea is to When the network has isolated communities the
simulate the surfing on the net. The user can fol- value of Q would be 1. This measure is employed
low the links available at the current page or jump by several techniques to identify communities in
to other by typing a new URL. In social terms, it networks systems, especially in divisive and ag-
can be approached like the more cited or renowned glomerative methods (Guimera et al., 2003; New-
individuals. The formalization of the PageRank man, 2004; Newman, 2006).
centrality is Newman (Newman, 2004) proposes an agglom-
~⇡ t = ~⇡ t 1 G , (5) erative method that is an optimized greedy al-
gorithm, called fastgreedy. The approach starts
where ~⇡ t are the PageRank values for each vertex with a copy of a real network of N vertices with
in the tth step of navigation and G is known as no connections, producing at first N communi-
the Google matrix. When t = 0 we have by de- ties. At each iteration, two communities ci and cj ,
fault ~⇡ 0 = {1, . . . , 1}. The jumps are represented which have connections in real network, are cho-
by a probability ↵ and we adopted the same value sen in order to obtain the greatest improvement of
as defined in the original version (Brin and Page, Q (Equation 7). A pruning is performed in the
1998). search space considering only the edges that exist
between communities. Therefore, execution time
2.2 Community detection
is reduced when considering the new Q function
Communities are sets of densely interconnected (Equation 7).
vertices and sparsely connected with the rest of the P P
✓ ◆
network (Newman, 2010). Vertices that belong to j eij i eij
the same community, in general, share common Qij = 2 eij (7)
2M
75
The result can also be represented as a den- all their spreader and stifler neighbors.
drogram. Cuts at different levels of the dendro-
• Truncated Process (TP): It consists of trun-
gram produce divisions with greater or lesser num-
cate or interrupt the contagion in the precise
ber of communities, and the best cut yields the
time. In each iteration and for each spreader,
largest value of Q. The algorithm at each step has
it is randomly selected one neighbor at time,
O(M + N ). Since there are at most N 1 join
and setting up the states of the contact as cor-
operations required to build a complete dendro-
responds. The selection continues until the
gram, their overall complexity is O((M + N )N ),
spreader visit all its neighbors or it becomes
or O(N 2 ), for sparse graph. Consequently, by
stifler, whichever occurs first.
adopting this method is more treatable the analysis
of communities in larger networks. • Contact Process (CP): In each time step and
for each spreader, it is chosen at random a
2.3 Propagation process on networks
single neighbor. Then, it is resolved the tran-
In classical rumor diffusion models the ignorant sition states according to the rule that corre-
or inactive individuals (S) are those who remain sponds. After that, continues with the next
unaware of the information, the spreaders (I) are spreader of the network of the same time step.
those who disseminate the information and the sti-
Different theoretical models have been pro-
fler (R) are those who know the information but
posed for modeling the rumor dynamics on net-
lose the interest in spreading it. All vertices have
works (Moreno et al., 2004; Barrat et al., 2008;
the same probability for transmit the information
Castellano et al., 2009; Borge-Holthoefer et al.,
to their neighbors and µ for stopping to be active.
2012). These analytical models make assump-
The Maki-Thompson (MT) (Maki and Thomp-
tions about the network structure, such as the de-
son, 1973) model is a spreader-centric approach
gree correlation or distribution, compartments or
employed for describing the propagation of ideas
class of vertices with same probabilities, homoge-
and rumors on networks. In the MT process when-
neous/heterogeneous mixing or mean field theory.
ever an active spreader i contacts a vertex j that is
Notwithstanding all of them claim that their nu-
inactive, the latter will become active with a fixed
merical solutions agree with the MC simulations,
probability . Otherwise, in the case that j knows
so we adopt this approach as an exploratory re-
about the rumor, it means j is an active spreader or
search.
a stifler, the vertex i turns into a stifler with proba-
bility µ. The behavior when the spreader stops to 3 Related work
propagate can be understood because the informa-
tion is too much known (contacting spreaders) or Many approaches have been developed in order
without novelty (contacting stifler). to understand the propagation of ideas or infor-
Three possible choices related with the spreader mation through social networks (Castellano et al.,
behavior during the diffusion have been re- 2009). Specially, characterize the individuals that
ported (Borge-Holthoefer et al., 2012; Meloni et are most influential in the propagation process has
al., 2012). They are the Reactive process (RP), attracted the attention of researchers (Richardson
Truncated process (TP) and Contact Process (CP). and Domingos, 2002; Kempe et al., 2003; Kitsak
However, a clear analysis about the impact of et al., 2010; Pei and Makse, 2013).
spreaders behavior in the propagation process has The conventional approach for describing the
not been tackled yet. Moreover, there is not a con- most influential vertices is performing a micro-
sensus about what to employ in rumor or informa- scopic analysis on the network. Vertices are
tion diffusion and it may cause a misinterpretation classified considering their topological properties,
of results. The three main characteristic behaviors sorted and ranked in order to generalize their
reported to spreaders are described as follow. ability to propagate (Kitsak et al., 2010; Hébert-
Dufresne et al., 2013; Pei and Makse, 2013).
• Reactive Process (RP): In each iteration, the However, to find the set of initial vertices that
spreaders try to pass the rumor among all maximize the propagation capacity, the selection
their ignorant neighbors. After that, it evalu- of the most influential spreader may produce an
ates whether it will become stifler in the next overlap of influence in the population (Kitsak et
iteration or not, considering the contact with al., 2010; Pei and Makse, 2013).
76
In terms of topological properties, there not spreaders ( (t)) and stifler ('(t)) over time is cal-
1 PN
exists a consensus about what is the more ac- culated as (t) = N i=1 S i (t), that is similar to
curate measure that describes the most influen- the other states and always fulfill (t) + (t) +
tial vertices. Some researches claim that hubs '(t) = 1. We assume that infection and recov-
are more representative to influence others ver- ering do not occur during the same discrete time
tices (Pastor-Satorras and Vespignani, 2001; Al- window or step.
bert and Barabási, 2002). Vertices with higher
degree are more efficient to maximize the prop- 4.1 Setup
agation because, in general, hubs not tend to con- The initial setup for the propagation is (0) =
nect with each other and thus can achieve a greater 1 ⌘/N , (0) = ⌘/N and '(0) = 0, where ⌘
number of vertices (Kitsak et al., 2010). In the represents the seeds or number of initial spread-
case of communities, the degree proportion of a ers. Each simulation begins with a selection of
vertex i is defined as the number of edges that i ⌘ vertices. At each time step, all spreaders uni-
has in each community. This degree proportion formly select and try to infect its neighbors with
was found as a good descriptor of influence for probability , or stop the diffusion with probabil-
communities (Lawyer, 2012). ity µ according to the spreader behavior adopted.
On the other hand, the most influential vertices Successful change of state (to be spreader or to be
are described as those with the largest Between- stifler) are effective at the next iteration. The sim-
ness centrality (Hébert-Dufresne et al., 2013), be- ulations run until the end of the process is reached,
cause they intermediate the communication be- when (1) = 0.
tween groups of vertices, which increase their in-
4.2 Community selection approach
fluence. According to the authors, Betweenness
centrality is a better descriptor of the most influ- We propose to select the initial spreaders from the
ential spreader in communities. community division of the network. The multiple
The PageRank is also considered a better seeds are the most central vertices of each commu-
measure to describe the most influential ver- nity. The community division may be calculated
tices (Cataldi et al., 2010). The reason is that it by some divisive or agglomerative method (Sec-
employs the random walk concept over the net- tion 2.2) and here the fastgreedy algorithm was
work to be calculated and vertices with higher val- employed. The method is detailed as follow:
ues mean higher probability to be visited. First, given a required number of ⌘ initial
Finally, Kempe et al. (2003) propose a greedy spreaders, we find the ⌘ main communities of the
algorithm to obtain ⌘ initial spreaders that maxi- network by the fastgreedy algorithm. Then, each
mizes the diffusion influence. The authors adopt a community is isolated, which produces ⌘ com-
discrete optimization approach and prove that the ponents. The isolation process consists in main-
optimization problem is NP-hard. It was imple- taining the intra-community edges and erasing the
mented considering the independent and weighted inter-community connections. For each isolated
cascade model that have only two states, which are community, a specific centrality measure is cal-
different to the SIR model. The method evaluates culated to all vertices. Since vertices with higher
one vertex at time to be added in the set of selected centrality are considered more suitable to influ-
seeds. The new vertex is accepted if it is what most ence on the network, we select the most impor-
increment the diffusion. However, this approach tant vertex from each community. Therefore, these
has a very higher computational cost problem, al- vertices influence more in their own community
though new researches try to optimize the perfor- and the overlap of influence in the population is
mance (Chen et al., 2009). minimized. At the end, ⌘ seeds are selected and
they have the best centrality value of its commu-
4 Information diffusion by communities nity. We take the original full network, the ⌘ seeds,
the parameters and execute the corresponding sim-
Let us consider a constant population of N ver- ulations.
tices in all time steps. Each vertex can be For the centrality measure, the point is to
only in one state, that is I i (t) = 1 iff i 2 find what centrality better identifies the influential
I, otherwise I i (t) = 0, and S i (t)+I i (t)+Ri (i) = spreaders, by communities and in the whole net-
1. The macroscopic fraction of ignorant ( (t)), work, that maximizes the information diffusion.
77
(a) MT-RP (b) MT-TP (c) MT-CP
Figure 1: MT (Maki and Thompson, 1973) propagation in an artificial scale-free network with N = 1000, hki = 8 and
⌘ = 1% of initial seeds selected. The color bar shows the final fraction of informed individuals. The behavioral approaches for
spreader analyzed are: (a) Reactive process (RP); (b) Truncated process (TP); and (c) Contact process (CP)
Here, the degree (DG), PageRank (PR) and Be- favors a viral diffusion on the network with lower
tweenness (BE) centralities were considered. values of and it happens independently of which
are the initial seeds. For this reason, RP is a more
5 Experimental results suitable approach to simulate broadcasting propa-
gation.
In this section we analyzed the information dif-
On the other hand, the TP behavior is more re-
fusion in an artificial scale-free and four real so-
lated to the contact network scenario, where the
cial networks. We evaluated the impact spread-
position and topological characteristics of seeds
ers behavior have in the diffusion on the networks.
may have influence in the diffusion. Moreover, TP
Then, the results about the selection of initial
presents more balanced results, near 60% when
spreaders by communities, best-ranked vertices of
⇡ µ, and contacts are not as restricted as CP
the network and random seeds were explored.
behavior. For this reason, we adopted hereafter
5.1 Spreader behavior analysis the MT-TP approach as the propagation process
for the analysis.
We analyzed the three behavioral approaches for
the spreaders and present the impact they produce 5.2 Multiple initial spreader analysis
on the propagation process. We considered the The experiments were performed with three pos-
MT model with an artificial scale-free network of sibilities for choosing the initial spreader: (i) by
size N = 1000 and hki = 8. In order to under- randomly selecting ⌘ individuals as initial seeds
stand the overall spectral effect with the parame- in the network; (ii) by selecting the best- ranked ⌘
ters, the simulations were evaluating a range of individuals with highest value of a specific central-
and µ in (0, 1]. Therefore, the differences between ity of the network; and finally, (iii) by detecting ⌘
the approaches are evidenced. For each tuple of communities on the network and for each isolated
values ( , µ), it was selected 100 times at random community selecting the individual with highest
⌘ = 1% of initial spreader (seeds) and each time value of a specific centrality measure. The cen-
was an average over 50 executions. trality measures selected were degree (DG), Be-
The impact of the behavioral approach in the fi- tweenness (BE) and Pagerank (PR).
nal fraction of informed individuals is shown in
Figure 1. We observed that the CP approach is 5.2.1 Real social networks
less redundant in the number of contacts made by We adopted the email (Guimera et al., 2003), ad-
spreader, producing lower fractions of informed vogato (Kunegis, 2014a), astrophysics (Newman,
individuals, in comparison to the other behaviors. 2001) and hamsterster (Kunegis, 2013; Kunegis,
Still, because the single contact made by iteration, 2014b). All of them were assumed as undirected
the CP behavior is more similar to a propagation and unweighted networks and also it was consid-
through the “word-of-mouth” situation. ered the largest component for the simulations.
The RP approach obtained more than 80% of The structural properties of the networks are sum-
informed individuals with values of 0.3, no marized in Table 1, with the respective number of
matter values of µ. Therefore, the RP approach vertices N , the average degree hki, shortest paths
78
(a) MT-TP Community (b) MT-TP Random (c) MT-TP Best
Figure 2: MT-TP propagation in an artificial scale-free network with N = 1000 and hki = 8. The final fraction of informed
individuals are shown in the color bar. The selection of ⌘ = 4% of seeds was made by: (a) ⌘ communities taking the individual
with best PR centrality of each one; (b) uniform random selection of individuals; and (c) the ⌘ individuals with the best PR
centrality of the network. Solid white lines to the left in the contour plots show the and µ combinations that achieved 35% of
informed individuals. Dashed white lines show the combinations that achieved 60% of informed individuals.
Table 1: Topological properties and results of community detection of the networks: last column, the best modularity Q and
community division by fastgreedy algorithm
Network N hki hgi H̃ ⇢ FastGreedy
Q Nc
email 1133 9.62 3.60 0.45 0.01 0.49 16
hamsterster 2000 16.1 3.58 0.48 0.02 0.46 57
advogato 5054 15.6 3.27 0.40 -0.09 0.34 49
astrophysics 14845 16.1 4.79 0.38 0.23 0.63 1172
average hgi, normalized entropy H̃, pearson cor- 5.2.2 Information diffusion results
relation ⇢. Also, the best modularity Q value and The final fraction of informed individuals ('(1))
division number of communities N C of the net- was averaged over 100 executions for each combi-
works produced by the FastGreedy algorithm are nation of initial seeds and parameters. This aver-
reported. age represents the propagation capacity achieved
email represents a social network of informa- by the selected seeds.
tion exchanged by emails between members of the We evaluated the relation between the param-
Rovira i Virgili University, Tarragona, with largest eters and the selection of the initial spreaders in
hub degree equal to 71. an artificial network. In this experiment the PR
was defined as the centrality measure employed to
hamsterster is an undirected and unweighted find the seeds in the communities and the whole
network based on the website data HAMSTER - network. A value of ⌘ = 4% of initial spread-
STER . COM . The edges represent a relationship of ers was adopted for a scale-free network of size
family or friend among users. The largest hub has N = 1000, hki = 8, hgi = 3.19, H̃ = 0.33 and
degree equal to 273. ⇢ = 0.04.
The propagation capacity '(1) was affected
advogato is an online community platform for
according to the initial seeds (Figure 2). The solid
developers of free software launched in 1999. Ver-
and dashed white curves represent the combina-
tices are users of advogato, the directed edges rep-
tion of and µ parameters that obtained 35% and
resent trust relationships. The largest hub has de-
60% of informed individuals respectively. We ob-
gree equal to 807.
served that these curves show a well defined linear
Finally, astrophysics is a collaborative net- pattern, which means any proportion of = /µ
work between scientists on previous studies of will obtain equivalent '(1) results.
astrophysics reported in arXiv during January 1, The selection of seeds by communities (Figure
1995 until December 31, 1999. The network is 2(a)) improved the diffusion on the network in
weighted and directed and originally it has 16707 comparison with the Random seeds (Figure 2(b))
vertices. The largest hub of the main component and Best-ranked vertices (Figure 2(c)). This result
has 360 connections. is corroborated by the increase of the white lines
79
0.62
BST−BE
BST−DG
BST−PR 0.6
0.62 COM−BE
COM−DG
COM−PR 0.58
RANDOM
% of informed individuals
% of informed individuals
0.61
0.56
0.6 0.54
0.52
0.59
BST−BE
0.5 BST−DG
BST−PR
0.58 COM−BE
0.48 COM−DG
COM−PR
RANDOM
0.57 0.46
0 10 20 30 40 50 0 10 20 30 40 50
seeds seeds
(a) email (b) hamsterster
0.43 0.56
0.42
0.55
0.41
% of informed individuals
% of informed individuals
0.4
0.54
0.39
0.38 0.53
0.37
0.52
0.36 BST−BE BST−BE
BST−DG BST−DG
0.35 BST−PR BST−PR
COM−BE 0.51 COM−BE
COM−DG COM−DG
0.34 COM−PR COM−PR
RANDOM RANDOM
0.33 0.5
0 10 20 30 40 50 0 10 20 30 40 50
seeds seeds
(c) advogato (d) astrophysics
Figure 3: Propagation capacity of MT-TP model in the four real networks given the selection of seeds by communities (red
points), best-ranked (black points), and randomly (blue points).
Table 2: Average of propagation capacities for the full range of ⌘ 2 [1, 50], for each network achieved by seeds: (second big
column) selecting the most important individuals from communities; (third big column) selecting the best-ranked individuals
of the network; and (last column) randomly selecting the initial seeds. The adopted measures were Betweenness (BE), degree
(DG) and PageRank (PR) centralities
Network Community Best ranked Random
BE DG PR BE DG PR selection
email 0.6065 0.6105 0.6150 0.5880 0.5840 0.5884 0.6023
hamsterster 0.5485 0.5693 0.5728 0.5306 0.5226 0.5271 0.5273
advogato 0.4077 0.4102 0.4112 0.3993 0.3958 0.4007 0.3805
astrophysics 0.5417 0.5398 0.5415 0.5321 0.5278 0.5301 0.5337
slope. However, a little decrease in the lines slope seeds reached a higher propagation capacity than
is evidenced in the MT-TP Best case with respect the selection of best-ranked vertices. For a larger
to the MT-TP Random case. number of initial spreaders, '(1) tend to fall
when the best-ranked vertices are selected.
Consequently, we sought to analyze the impact
On the other hand, when the community de-
of ⌘ and centrality measures in the selection of
tection was performed and individuals with high-
seeds in the diffusion process. We varied the num-
est values of DG, BE or PR in each community
ber of communities and seeds from 2 to 50 and
(red points, COM-**) were selected, the propa-
fixed = 0.3 and µ = 0.2 for all simulations.
gation capacity was improved and achieved the
The real social networks described and the MT-TP
best results. Therefore, more individuals were in-
propagation model were considered in the anal-
formed in the network by the community selec-
ysis (Figure 3). The random selection of initial
tion, with the same propagation constraint (num-
spreader (blue points, RANDOM) or best-ranked
ber of seeds).
vertices (black points, BST-**) of DG, BE or PR
centrality, produced a constant propagation capac- In terms of the topological measures, we ob-
ity ('(1)). In some case, random selection of served that vertices with highest PageRank cen-
80
trality in the communities (COM-PR) obtained 7 Acknowledgments
in average the best propagation results (Table 2).
This research was partially supported by Na-
Even in the selection of the best-ranked vertices,
tional Council for Scientific and Technological
the PageRank was notable. Another important
Development (CNPq) grant: 140688/2013-7 and
point is that often, the uniformly random selec-
São Paulo Research Foundation (FAPESP) grant:
tion of initial spreader could be a better option
2011/21880-3.
than select the most central vertices (best-ranked)
of the network. This is contrary what is cur-
rently expected and adopted in marketing cam- References
paigns, for instances. For all networks and for all Réka Albert and Albert-László Barabási. 2002. Statisti-
size of seeds, we evidenced that starts the diffu- cal mechanics of complex networks. Rev. Mod. Phys.,
74(1):47–97, jan.
sion from the best-ranked vertices produces lower
influence, or final fraction of informed individu- A.-L. Barabási. 2007. The architecture of complexity: From
network structure to human dynamics. IEEE Control Sys-
als, than purely select vertices at random; in some tems Magazine, 27(4):33–42.
cases, the best-ranked selection achieved the worst
Alain Barrat, MarseilleMarc Barthélemy, and Alessandro
results. However, the selection of initial spread- Vespignani. 2008. Dynamical Processes on Complex Net-
ers by communities showed, independently of the works. Cambridge University Press.
centrality measure, higher results.
Javier Borge-Holthoefer, Sandro Meloni, Bruno Gonçalves,
and Yamir Moreno. 2012. Emergence of Influential
6 Final remarks Spreaders in Modified Rumor Models. Journal of Statis-
tical Physics, 151(1-2):383–393, September.
In this work, we proposed a method for maximiz- S. Brin and L. Page. 1998. The anatomy of a large-scale
ing the information diffusion on networks. First, hypertextual web search engine. Computer Networks and
we analyzed the impact of the spreader behavior ISDN Systems, V:107–117.
in the propagation and confirmed that the Trun- Claudio Castellano, Santo Fortunato, and Vittorio Loreto.
cate Process (TP) is more suitable to simulate in- 2009. Statistical Physics of Social Dynamics. Reviews
of Modern Physics, 81(2):591–646, may.
formation diffusion on networks. We applied com-
munity detection and targeted the most influential Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. 2010.
Emerging topic detection on Twitter based on tempo-
vertices from these communities as initial seeds. ral and social terms evaluation. In Proceedings of the
Experimental results on an artificial scale-free and Tenth International Workshop on Multimedia Data Min-
four real social networks confirmed the increase in ing - MDMKDD ’10, pages 1–10, New York, New York,
USA, jul. ACM Press.
the final fraction of informed individuals. More-
over, it was found that the PageRank centrality in Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient
influence maximization in social networks. In Proceed-
communities was a better choice in terms of effi- ings of the 15th ACM SIGKDD international conference
ciency and influence maximization. on Knowledge discovery and data mining - KDD ’09, page
A brief overview about complex network mea- 199, New York, New York, USA, jun. ACM Press.
sures, community detection and information prop- L. D. F. Costa, F. A. Rodrigues, G Travieso, and P. R. Villas
agation was introduced. We present our proposal Boas. 2007. Characterization of complex networks: A
survey of measurements. Advances in Physics, 56:167–
to select initial spreaders by communities. There 242.
is still an open problem related to an exact defini-
L C Freeman. 1977. A set of measures of centrality based on
tion of what is considered a community and what betweenness. Sociometry, 40:35–41.
would be the ideal division. Nevertheless, we var-
ied the number of communities from 2 to 50 and R Guimera, L Danon, A Diaz-Guilera, F Giralt, and A Are-
nas. 2003. Self-similar community structure in a network
in general (for every community division) our pro- of human interactions. Physical Review E, 68:2003.
posal achieved better results versus propagation
Laurent Hébert-Dufresne, Antoine Allard, Jean-Gabriel
without considering the community structure. Young, and Louis J Dubé. 2013. Global efficiency of
In future work, other measures for selecting local immunization on complex networks. Scientific re-
ports, 3:2171, January.
influential individuals on networks could be ex-
plored, in addition to DG, BE and PR applied here. David Kempe, Jon M. Kleinberg, and Éva Tardos. 2003.
Also, other models of propagation and network Maximizing the Spread of Influence Through a Social
Network. In Lise Getoor, Ted E. Senator, Pedro Domin-
topologies could be tested, as well as novel strate- gos, and Christos Faloutsos, editors, KDD, pages 137–
gies taking into account community information. 146. ACM.
81
Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik M. E. J. Newman. 2001. The structure of scientific collab-
Liljeros, Lev Muchnik, H. Eugene Stanley, and Hernán A. oration networks. In Natl. Acad. Sci. USA, number 98,
Makse. 2010. Identification of influential spreaders in pages 404 – 409.
complex networks. Nature Physics, 6(11):888–893, Au-
gust. M E J Newman. 2004. Fast algorithm for detecting
community structure in networks. Physical Review E,
Jrme Kunegis. 2013. KONECT – The Koblenz Network Col- 69(3):66133.
lection. In Proc. Int. Web Observatory Workshop, pages
1343–1350. M E J Newman. 2006. Finding community structure in net-
works using the eigenvectors of matrices. Physical Review
Jrme Kunegis. 2014a. Advogato network dataset – E, 74(3):36104.
KONECT, October.
Jrme Kunegis. 2014b. Hamsterster full network dataset – Mark Newman. 2010. Networks: An Introduction. Oxford
KONECT, jan. University Press, Inc., New York, NY, USA.
Glenn Lawyer. 2012. Measuring node spreading power by Romualdo Pastor-Satorras and Alessandro Vespignani. 2001.
expected cluster degree. page 4, September. Epidemic Spreading in Scale-Free Networks. Physical
Review Letters, 86(14):3200–3203, April.
D. P. Maki and M Thompson. 1973. Mathematical Models
and Applications, with Emphasis on the Social, Life, and Sen Pei and Hernán A Makse. 2013. Spreading dynamics
Management Sciences. Prentice-Hall. in complex networks. Journal of Statistical Mechanics:
Theory and Experiment, 2013(12):P12002, December.
Sandro Meloni, Alex Arenas, Sergio Gmez, Javier Borge-
Holthoefer, and Yamir Moreno. 2012. Modeling epi- Matthew Richardson and Pedro Domingos. 2002. Mining
demic spreading in complex networks: Concurrency and knowledge-sharing sites for viral marketing. In Proceed-
traffic. In My T. Thai and Panos M. Pardalos, edi- ings of the eighth ACM SIGKDD international confer-
tors, Handbook of Optimization in Complex Networks, ence on Knowledge discovery and data mining - KDD ’02,
Springer Optimization and Its Applications, pages 435– page 61, New York, New York, USA, jul. ACM Press.
462. Springer US.
Yamir Moreno, Maziar Nekovee, and Amalio F. Pacheco.
2004. Dynamics of rumor spreading in complex networks.
Physical Review E, 69(6):066130, jun.
82