Spreader Selection by Community to Maximize Information Diffusion in
                          Social Networks
                          Didier A. Vega-Oliveros and Lilian Berton
                                Department of Computer Science
                                  ICMC, University of São Paulo
                         C.P. 668, CEP 13560-970, São Carlos, SP, Brazil
                               davo,lberton@icmc.usp.br


                    Abstract                              2004; Barrat et al., 2008; Castellano et al., 2009).
                                                          Therefore, some individuals can have a higher in-
    Rumors or information can spread quickly              fluence than others according to the network struc-
    and reach many users in social networks.              ture. Researchers have focused on identifying the
    Models for understanding, preventing or               most influential vertices (Kempe et al., 2003; Kit-
    increasing the diffusion of information are           sak et al., 2010; Lawyer, 2012; Pei and Makse,
    of greatest interest to companies, gov-               2013; Hébert-Dufresne et al., 2013) according to
    ernments, scientists, etc. In this paper,             topological properties. It is expected this influ-
    we propose an approach for maximizing                 encers convince the largest number of individuals
    the information diffusion by selecting the            in the network. However, the selection of more
    most important (central) users from com-              than one of them not necessarily maximizes the
    munities. We also analyze the selection               expected fraction of informed individuals, com-
    of the most central vertices of the net-              pared to an uniformly random selection approach.
    work and considered artificial and real so-               In this paper, we propose an approach to max-
    cial networks, such as email, hamsterster,            imize the information diffusion considering the
    advogato and astrophysics. Experimental               community structure of the network. The com-
    results confirmed the improvement of the              munity symbolizes a group of individuals with a
    final fraction of informed individuals by             greater tendency to have more internal than ex-
    applying the proposed approach.                       ternal connections to other groups. The reason is
                                                          that vertices belonging to the same community are
1   Introduction                                          likely to be more similar to each other and share
The modeling of propagation or diffusion pro-             similar properties and affinity. We confirmed that
cesses in social networks has recently received           selecting the most influential individual from each
more attention, since it allows to understand how         community as initial spreaders increases more the
a disease can be controlled or how information            information diffusion than selecting the most in-
spread among individuals. These diffusion pro-            fluential individuals from the whole network.
cess are generally analyzed employing complex                 As a motivation example, let us consider a com-
network theory (Barrat et al., 2008; Castellano et        pany that wants to market a new product in the
al., 2009). The area of complex networks seeks            blogosphere. The company could select three very
to study and understand the dynamics and behav-           influential individuals of this social network (blog-
ior of complex systems, from the structure of the         gers with thousands of access) to advertise its
network to the internal dynamics or interactions.         product. However, these influencers may be popu-
   Models that describe the evolution of rumors           lar in the same group of people. On the other hand,
can be adapted to analyze the spread of spam on           if the strategy is to identify the three main commu-
the Internet, advertising and marketing, political        nities on the network and select the most influen-
ideologies or technological news between individ-         tial individuals of each community, the company
uals (Castellano et al., 2009). In such cases, the        would achieve a variety group of users and maxi-
representation in complex networks enables the            mize the marketing diffusion.
analysis of traditional models and the heteroge-              The main contribution of this paper is the in-
neous structure, which has a strong influence on          formation diffusion approach based on selecting
the information diffusion process (Moreno et al.,         the most influential individuals from communities.


                                                     73
We employed an artificial scale-free and four real                vertex i has. In the case of directed networks is the
networks: email, hamsterster, advogato and as-                    sum of the degrees of input (edges that reach the
trophysics. We applied the SIR model for rumor                    vertex) and output (edges that leave the vertex).
propagation selecting the initial seeds from the                  The average degree hki is the average of all ki of
whole network and from the communities. The                       the network. The vertices that have a very high
impact that the Truncate (TP), Contact (CP) or                    degree in the network are called hubs.
Reactive (RP) processes have in the information                      The degree distribution of a network P (k) is the
diffusion was analyzed. The experimental results                  probability of randomly select a vertex with degree
showed that the selection of individuals from the                 k. Social networks present scale-free degree dis-
communities as initial spreaders, the final fraction              tribution (Barabási, 2007; Newman, 2010), with
of informed individuals is improved.                              P (k) ⇠ k , in which most of the individuals
   The remainder of the paper is organized as fol-                have low degree, near to the average, and only a
lows: Section 2 introduces some definitions and                   few of them have very high degree (hubs). The
measures covered in this paper, the community de-                 level of disorder or heterogeneity in vertices con-
tection algorithm applied and the propagation pro-                nections is obtained with the entropy of degree dis-
cess in networks. Section 3 brings some related                   tribution. We employed the normalized version of
work. Section 4 presents the proposed approach                    the Shannon entropy, i.e.
for information diffusion based on communities.
                                                                                   P1
Section 5 exhibits the experimental results on an                                     k=0 P (k) log(P (k))
artificial scale-free and four real social networks.                       H̃ =                              ,     (1)
                                                                                           log(N )
Finally, Section 6 discusses the final remarks.
                                                                  with 0  H̃  1. The maximum value for the
2   Theoretical background
                                                                  entropy occurs when P (k) shows a uniform distri-
A network is a collection of items called nodes or                bution and the lowest possible value happens when
vertices, which are joined together by connections                all vertices have the same degree. The entropy of a
called links or edges. Formally we define the net-                network is related to the robustness and their level
work G = (V, E, W ), where V = {v1 , v2 , . . . vn }              of resilience.
is the set of N vertices, E = {e1 , e2 , . . . em } is the           The robustness is also related to the correlation
set of M edges and W = {w1 , w2 , . . . wm } are the              degree of the network. A network is assortative,
weights of the edges that determine the strength of               or positive correlated, if vertices tend to connect
the interaction between the respective vertices, in               with vertices with similar degree. A Network is
the case of weighted networks. In mathematical                    dissassortative, or negative correlated, if vertices
terms, an undirected and unweighted network can                   with low degree tend to connect with higher con-
be represented by an adjacency matrix A, in which                 nected vertices (hubs). When networks do not
two connected vertices i and j are in the matrix                  present any of above patterns, they are called as
aij = aji = 1, otherwise, they are equal to 0.                    non-assortative. For the calculation of the network
   A path is a consecutive sequence that starts at                correlation we employed the Pearson coefficient,
vertex i and ends in j, so that vertices are vis-                 formulated with adjacency matrix as
ited more than once. The distance or length of the
                                                                                   P
path is defined as the number of edges contained in                         (1/M ) j>i ki kj aij
                                                                              h       P                  i2
the sequence. The shortest distance between two                                             (k +k )a
                                                                                (1/M ) j>i i 2j ij
vertices is known as the shortest path or geodesic                       ⇢=                                        (2)
path gi,j . A component is the largest sub-set of                                  P    (ki 2 +kj 2 )aij
                                                                            (1/M ) j>i         2
vertices from the network in which exist at least                             h       P                  i
                                                                                            (ki +kj )aij 2
one path between each pair of vertices, but never                               (1/M ) j>i        2
connect to another component. A connected net-
work has only one component. When the networks                    where M is the total number of edges in the net-
have more than one component, we considered the                   work. If r > 0, then the network is assortative.
largest of them.                                                  If r < 0 the network is dissassortative. If r = 0,
   The degree or connectivity of vertex i, called                 then there is no correlation between the degree of
as ki , is the number of edges or connections that                vertices.


                                                             74
2.1   Centrality measures                                       properties and perform similar roles. Therefore,
In complex and social networks have been pro-                   the division of a network into communities helps
posed measures to describe the importance or cen-               to understand their topological structure (struc-
trality of vertices (Costa et al., 2007) according to           tural and functional properties) and its dynamic
topological and dynamical properties. The cen-                  processes, obtaining relevant information and fea-
tralities adopted in this work are briefly described            tures to the network domain.
as follow.                                                         We can evaluate a partition based on the scores
   Degree centrality (DG) is related with the num-              obtained from a quality measure. The goal is to
ber of connections or popularity of a vertex (Costa             evaluate expected features in a good community
et al., 2007) and in terms of the adjacency matrix              division. One of the most popular quality mea-
is expressed as                                                 sures is the modularity Q (Newman, 2004). It
                                                                compares the current density of intra-community
                          X
                    ki =      aij .               (3)           and inter-community edges relative to a random
                          i2N                                   network with similar characteristics. It is based on
                                                                the fact that random networks have no community
   Betweenness centrality (BE) quantifies the                   structure.
number of shortest paths that pass through a vertex                Given a network with c communities, the Q
j between all pair of different vertices (i, k) (Free-          modularity is calculated by a symmetric matrix
man, 1977). It expresses how much a vertex Bj                   N ⇥ N , in which elements along the main diag-
works as bridge or is a trusted vertex in the trans-            onal eii represent connections into the same com-
mission of information, i.e.                                    munity and elements eij represent connections be-
                        X                                       tween different communities i and j. Equation 6
                                  ik (j)
              Bj =                         ,        (4)         shows the formulation of Q.
                                   ik
                     i,k2V,i6=k
                                                                                     2      0         12 3
where ik is the total number of different shortest                             X                X
                                                                         Q=          4eij   @       eij A 5     (6)
path between i and k, and ik (j) is the number of                                i              j
times j appears in those paths.
   PageRank centrality (PR) expresses the impor-                   If a specific division provides less edges be-
tance of a vertex according to the probability that             tween communities than would be expected by
other vertices have to arrive at it, after a large num-         random connections, the value of Q would be 0.
ber of steps (Brin and Page, 1998). The idea is to              When the network has isolated communities the
simulate the surfing on the net. The user can fol-              value of Q would be 1. This measure is employed
low the links available at the current page or jump             by several techniques to identify communities in
to other by typing a new URL. In social terms, it               networks systems, especially in divisive and ag-
can be approached like the more cited or renowned               glomerative methods (Guimera et al., 2003; New-
individuals. The formalization of the PageRank                  man, 2004; Newman, 2006).
centrality is                                                      Newman (Newman, 2004) proposes an agglom-
                   ~⇡ t = ~⇡ t 1 G ,                 (5)        erative method that is an optimized greedy al-
                                                                gorithm, called fastgreedy. The approach starts
where ~⇡ t are the PageRank values for each vertex              with a copy of a real network of N vertices with
in the tth step of navigation and G is known as                 no connections, producing at first N communi-
the Google matrix. When t = 0 we have by de-                    ties. At each iteration, two communities ci and cj ,
fault ~⇡ 0 = {1, . . . , 1}. The jumps are represented          which have connections in real network, are cho-
by a probability ↵ and we adopted the same value                sen in order to obtain the greatest improvement of
as defined in the original version (Brin and Page,              Q (Equation 7). A pruning is performed in the
1998).                                                          search space considering only the edges that exist
                                                                between communities. Therefore, execution time
2.2   Community detection
                                                                is reduced when considering the new Q function
Communities are sets of densely interconnected                  (Equation 7).
vertices and sparsely connected with the rest of the                                        P        P
                                                                                    ✓                        ◆
network (Newman, 2010). Vertices that belong to                                                j eij   i eij
the same community, in general, share common                               Qij = 2 eij                          (7)
                                                                                                  2M


                                                           75
   The result can also be represented as a den-                      all their spreader and stifler neighbors.
drogram. Cuts at different levels of the dendro-
                                                                  • Truncated Process (TP): It consists of trun-
gram produce divisions with greater or lesser num-
                                                                    cate or interrupt the contagion in the precise
ber of communities, and the best cut yields the
                                                                    time. In each iteration and for each spreader,
largest value of Q. The algorithm at each step has
                                                                    it is randomly selected one neighbor at time,
O(M + N ). Since there are at most N 1 join
                                                                    and setting up the states of the contact as cor-
operations required to build a complete dendro-
                                                                    responds. The selection continues until the
gram, their overall complexity is O((M + N )N ),
                                                                    spreader visit all its neighbors or it becomes
or O(N 2 ), for sparse graph. Consequently, by
                                                                    stifler, whichever occurs first.
adopting this method is more treatable the analysis
of communities in larger networks.                                • Contact Process (CP): In each time step and
                                                                    for each spreader, it is chosen at random a
2.3   Propagation process on networks
                                                                    single neighbor. Then, it is resolved the tran-
In classical rumor diffusion models the ignorant                    sition states according to the rule that corre-
or inactive individuals (S) are those who remain                    sponds. After that, continues with the next
unaware of the information, the spreaders (I) are                   spreader of the network of the same time step.
those who disseminate the information and the sti-
                                                                   Different theoretical models have been pro-
fler (R) are those who know the information but
                                                                posed for modeling the rumor dynamics on net-
lose the interest in spreading it. All vertices have
                                                                works (Moreno et al., 2004; Barrat et al., 2008;
the same probability for transmit the information
                                                                Castellano et al., 2009; Borge-Holthoefer et al.,
to their neighbors and µ for stopping to be active.
                                                                2012). These analytical models make assump-
   The Maki-Thompson (MT) (Maki and Thomp-
                                                                tions about the network structure, such as the de-
son, 1973) model is a spreader-centric approach
                                                                gree correlation or distribution, compartments or
employed for describing the propagation of ideas
                                                                class of vertices with same probabilities, homoge-
and rumors on networks. In the MT process when-
                                                                neous/heterogeneous mixing or mean field theory.
ever an active spreader i contacts a vertex j that is
                                                                Notwithstanding all of them claim that their nu-
inactive, the latter will become active with a fixed
                                                                merical solutions agree with the MC simulations,
probability . Otherwise, in the case that j knows
                                                                so we adopt this approach as an exploratory re-
about the rumor, it means j is an active spreader or
                                                                search.
a stifler, the vertex i turns into a stifler with proba-
bility µ. The behavior when the spreader stops to               3 Related work
propagate can be understood because the informa-
tion is too much known (contacting spreaders) or                Many approaches have been developed in order
without novelty (contacting stifler).                           to understand the propagation of ideas or infor-
   Three possible choices related with the spreader             mation through social networks (Castellano et al.,
behavior during the diffusion have been re-                     2009). Specially, characterize the individuals that
ported (Borge-Holthoefer et al., 2012; Meloni et                are most influential in the propagation process has
al., 2012). They are the Reactive process (RP),                 attracted the attention of researchers (Richardson
Truncated process (TP) and Contact Process (CP).                and Domingos, 2002; Kempe et al., 2003; Kitsak
However, a clear analysis about the impact of                   et al., 2010; Pei and Makse, 2013).
spreaders behavior in the propagation process has                  The conventional approach for describing the
not been tackled yet. Moreover, there is not a con-             most influential vertices is performing a micro-
sensus about what to employ in rumor or informa-                scopic analysis on the network. Vertices are
tion diffusion and it may cause a misinterpretation             classified considering their topological properties,
of results. The three main characteristic behaviors             sorted and ranked in order to generalize their
reported to spreaders are described as follow.                  ability to propagate (Kitsak et al., 2010; Hébert-
                                                                Dufresne et al., 2013; Pei and Makse, 2013).
  • Reactive Process (RP): In each iteration, the               However, to find the set of initial vertices that
    spreaders try to pass the rumor among all                   maximize the propagation capacity, the selection
    their ignorant neighbors. After that, it evalu-             of the most influential spreader may produce an
    ates whether it will become stifler in the next             overlap of influence in the population (Kitsak et
    iteration or not, considering the contact with              al., 2010; Pei and Makse, 2013).


                                                           76
   In terms of topological properties, there not              spreaders ( (t)) and stifler ('(t)) over time is cal-
                                                                                 1 PN
exists a consensus about what is the more ac-                 culated as (t) = N i=1 S i (t), that is similar to
curate measure that describes the most influen-               the other states and always fulfill (t) + (t) +
tial vertices. Some researches claim that hubs                '(t) = 1. We assume that infection and recov-
are more representative to influence others ver-              ering do not occur during the same discrete time
tices (Pastor-Satorras and Vespignani, 2001; Al-              window or step.
bert and Barabási, 2002). Vertices with higher
degree are more efficient to maximize the prop-               4.1   Setup
agation because, in general, hubs not tend to con-            The initial setup for the propagation is (0) =
nect with each other and thus can achieve a greater           1     ⌘/N , (0) = ⌘/N and '(0) = 0, where ⌘
number of vertices (Kitsak et al., 2010). In the              represents the seeds or number of initial spread-
case of communities, the degree proportion of a               ers. Each simulation begins with a selection of
vertex i is defined as the number of edges that i             ⌘ vertices. At each time step, all spreaders uni-
has in each community. This degree proportion                 formly select and try to infect its neighbors with
was found as a good descriptor of influence for               probability , or stop the diffusion with probabil-
communities (Lawyer, 2012).                                   ity µ according to the spreader behavior adopted.
   On the other hand, the most influential vertices           Successful change of state (to be spreader or to be
are described as those with the largest Between-              stifler) are effective at the next iteration. The sim-
ness centrality (Hébert-Dufresne et al., 2013), be-          ulations run until the end of the process is reached,
cause they intermediate the communication be-                 when (1) = 0.
tween groups of vertices, which increase their in-
                                                              4.2   Community selection approach
fluence. According to the authors, Betweenness
centrality is a better descriptor of the most influ-          We propose to select the initial spreaders from the
ential spreader in communities.                               community division of the network. The multiple
   The PageRank is also considered a better                   seeds are the most central vertices of each commu-
measure to describe the most influential ver-                 nity. The community division may be calculated
tices (Cataldi et al., 2010). The reason is that it           by some divisive or agglomerative method (Sec-
employs the random walk concept over the net-                 tion 2.2) and here the fastgreedy algorithm was
work to be calculated and vertices with higher val-           employed. The method is detailed as follow:
ues mean higher probability to be visited.                       First, given a required number of ⌘ initial
   Finally, Kempe et al. (2003) propose a greedy              spreaders, we find the ⌘ main communities of the
algorithm to obtain ⌘ initial spreaders that maxi-            network by the fastgreedy algorithm. Then, each
mizes the diffusion influence. The authors adopt a            community is isolated, which produces ⌘ com-
discrete optimization approach and prove that the             ponents. The isolation process consists in main-
optimization problem is NP-hard. It was imple-                taining the intra-community edges and erasing the
mented considering the independent and weighted               inter-community connections. For each isolated
cascade model that have only two states, which are            community, a specific centrality measure is cal-
different to the SIR model. The method evaluates              culated to all vertices. Since vertices with higher
one vertex at time to be added in the set of selected         centrality are considered more suitable to influ-
seeds. The new vertex is accepted if it is what most          ence on the network, we select the most impor-
increment the diffusion. However, this approach               tant vertex from each community. Therefore, these
has a very higher computational cost problem, al-             vertices influence more in their own community
though new researches try to optimize the perfor-             and the overlap of influence in the population is
mance (Chen et al., 2009).                                    minimized. At the end, ⌘ seeds are selected and
                                                              they have the best centrality value of its commu-
4   Information diffusion by communities                      nity. We take the original full network, the ⌘ seeds,
                                                              the parameters and execute the corresponding sim-
Let us consider a constant population of N ver-               ulations.
tices in all time steps. Each vertex can be                      For the centrality measure, the point is to
only in one state, that is I i (t) = 1 iff i 2                find what centrality better identifies the influential
I, otherwise I i (t) = 0, and S i (t)+I i (t)+Ri (i) =        spreaders, by communities and in the whole net-
1. The macroscopic fraction of ignorant ( (t)),               work, that maximizes the information diffusion.


                                                         77
               (a) MT-RP                                 (b) MT-TP                                 (c) MT-CP

Figure 1: MT (Maki and Thompson, 1973) propagation in an artificial scale-free network with N = 1000, hki = 8 and
⌘ = 1% of initial seeds selected. The color bar shows the final fraction of informed individuals. The behavioral approaches for
spreader analyzed are: (a) Reactive process (RP); (b) Truncated process (TP); and (c) Contact process (CP)


Here, the degree (DG), PageRank (PR) and Be-                      favors a viral diffusion on the network with lower
tweenness (BE) centralities were considered.                      values of and it happens independently of which
                                                                  are the initial seeds. For this reason, RP is a more
5     Experimental results                                        suitable approach to simulate broadcasting propa-
                                                                  gation.
In this section we analyzed the information dif-
                                                                     On the other hand, the TP behavior is more re-
fusion in an artificial scale-free and four real so-
                                                                  lated to the contact network scenario, where the
cial networks. We evaluated the impact spread-
                                                                  position and topological characteristics of seeds
ers behavior have in the diffusion on the networks.
                                                                  may have influence in the diffusion. Moreover, TP
Then, the results about the selection of initial
                                                                  presents more balanced results, near 60% when
spreaders by communities, best-ranked vertices of
                                                                     ⇡ µ, and contacts are not as restricted as CP
the network and random seeds were explored.
                                                                  behavior. For this reason, we adopted hereafter
5.1    Spreader behavior analysis                                 the MT-TP approach as the propagation process
                                                                  for the analysis.
We analyzed the three behavioral approaches for
the spreaders and present the impact they produce                 5.2   Multiple initial spreader analysis
on the propagation process. We considered the                     The experiments were performed with three pos-
MT model with an artificial scale-free network of                 sibilities for choosing the initial spreader: (i) by
size N = 1000 and hki = 8. In order to under-                     randomly selecting ⌘ individuals as initial seeds
stand the overall spectral effect with the parame-                in the network; (ii) by selecting the best- ranked ⌘
ters, the simulations were evaluating a range of                  individuals with highest value of a specific central-
and µ in (0, 1]. Therefore, the differences between               ity of the network; and finally, (iii) by detecting ⌘
the approaches are evidenced. For each tuple of                   communities on the network and for each isolated
values ( , µ), it was selected 100 times at random                community selecting the individual with highest
⌘ = 1% of initial spreader (seeds) and each time                  value of a specific centrality measure. The cen-
was an average over 50 executions.                                trality measures selected were degree (DG), Be-
   The impact of the behavioral approach in the fi-               tweenness (BE) and Pagerank (PR).
nal fraction of informed individuals is shown in
Figure 1. We observed that the CP approach is                     5.2.1 Real social networks
less redundant in the number of contacts made by                  We adopted the email (Guimera et al., 2003), ad-
spreader, producing lower fractions of informed                   vogato (Kunegis, 2014a), astrophysics (Newman,
individuals, in comparison to the other behaviors.                2001) and hamsterster (Kunegis, 2013; Kunegis,
Still, because the single contact made by iteration,              2014b). All of them were assumed as undirected
the CP behavior is more similar to a propagation                  and unweighted networks and also it was consid-
through the “word-of-mouth” situation.                            ered the largest component for the simulations.
   The RP approach obtained more than 80% of                      The structural properties of the networks are sum-
informed individuals with values of          0.3, no              marized in Table 1, with the respective number of
matter values of µ. Therefore, the RP approach                    vertices N , the average degree hki, shortest paths


                                                             78
         (a) MT-TP Community                         (b) MT-TP Random                             (c) MT-TP Best

Figure 2: MT-TP propagation in an artificial scale-free network with N = 1000 and hki = 8. The final fraction of informed
individuals are shown in the color bar. The selection of ⌘ = 4% of seeds was made by: (a) ⌘ communities taking the individual
with best PR centrality of each one; (b) uniform random selection of individuals; and (c) the ⌘ individuals with the best PR
centrality of the network. Solid white lines to the left in the contour plots show the and µ combinations that achieved 35% of
informed individuals. Dashed white lines show the combinations that achieved 60% of informed individuals.


Table 1: Topological properties and results of community detection of the networks: last column, the best modularity Q and
community division by fastgreedy algorithm
                               Network         N      hki    hgi      H̃     ⇢       FastGreedy
                                                                                     Q      Nc
                                  email      1133     9.62   3.60    0.45   0.01    0.49    16
                              hamsterster    2000     16.1   3.58    0.48   0.02    0.46    57
                               advogato      5054     15.6   3.27    0.40   -0.09   0.34    49
                              astrophysics   14845    16.1   4.79    0.38   0.23    0.63   1172


average hgi, normalized entropy H̃, pearson cor-                   5.2.2 Information diffusion results
relation ⇢. Also, the best modularity Q value and                  The final fraction of informed individuals ('(1))
division number of communities N C of the net-                     was averaged over 100 executions for each combi-
works produced by the FastGreedy algorithm are                     nation of initial seeds and parameters. This aver-
reported.                                                          age represents the propagation capacity achieved
   email represents a social network of informa-                   by the selected seeds.
tion exchanged by emails between members of the                       We evaluated the relation between the param-
Rovira i Virgili University, Tarragona, with largest               eters and the selection of the initial spreaders in
hub degree equal to 71.                                            an artificial network. In this experiment the PR
                                                                   was defined as the centrality measure employed to
  hamsterster is an undirected and unweighted                      find the seeds in the communities and the whole
network based on the website data HAMSTER -                        network. A value of ⌘ = 4% of initial spread-
STER . COM . The edges represent a relationship of                 ers was adopted for a scale-free network of size
family or friend among users. The largest hub has                  N = 1000, hki = 8, hgi = 3.19, H̃ = 0.33 and
degree equal to 273.                                               ⇢ = 0.04.
                                                                      The propagation capacity '(1) was affected
   advogato is an online community platform for
                                                                   according to the initial seeds (Figure 2). The solid
developers of free software launched in 1999. Ver-
                                                                   and dashed white curves represent the combina-
tices are users of advogato, the directed edges rep-
                                                                   tion of and µ parameters that obtained 35% and
resent trust relationships. The largest hub has de-
                                                                   60% of informed individuals respectively. We ob-
gree equal to 807.
                                                                   served that these curves show a well defined linear
   Finally, astrophysics is a collaborative net-                   pattern, which means any proportion of = /µ
work between scientists on previous studies of                     will obtain equivalent '(1) results.
astrophysics reported in arXiv during January 1,                      The selection of seeds by communities (Figure
1995 until December 31, 1999. The network is                       2(a)) improved the diffusion on the network in
weighted and directed and originally it has 16707                  comparison with the Random seeds (Figure 2(b))
vertices. The largest hub of the main component                    and Best-ranked vertices (Figure 2(c)). This result
has 360 connections.                                               is corroborated by the increase of the white lines


                                                             79
                                                                                                                                 0.62
                                                    BST−BE
                                                    BST−DG
                                                    BST−PR                                                                        0.6
                                            0.62    COM−BE
                                                    COM−DG
                                                    COM−PR                                                                       0.58
                                                    RANDOM


                % of informed individuals


                                                                                                     % of informed individuals
                                            0.61
                                                                                                                                 0.56


                                             0.6                                                                                 0.54


                                                                                                                                 0.52
                                            0.59
                                                                                                                                                                          BST−BE
                                                                                                                                  0.5                                     BST−DG
                                                                                                                                                                          BST−PR
                                            0.58                                                                                                                          COM−BE
                                                                                                                                 0.48                                     COM−DG
                                                                                                                                                                          COM−PR
                                                                                                                                                                          RANDOM
                                            0.57                                                                                 0.46
                                                0       10      20           30   40            50                                   0   10        20           30   40            50
                                                                     seeds                                                                              seeds


                                                              (a) email                                                                       (b) hamsterster
                                            0.43                                                                                 0.56

                                            0.42
                                                                                                                                 0.55
                                            0.41
                % of informed individuals


                                                                                                     % of informed individuals
                                             0.4
                                                                                                                                 0.54
                                            0.39

                                            0.38                                                                                 0.53

                                            0.37
                                                                                                                                 0.52
                                            0.36                                       BST−BE                                                                             BST−BE
                                                                                       BST−DG                                                                             BST−DG
                                            0.35                                       BST−PR                                                                             BST−PR
                                                                                       COM−BE                                    0.51                                     COM−BE
                                                                                       COM−DG                                                                             COM−DG
                                            0.34                                       COM−PR                                                                             COM−PR
                                                                                       RANDOM                                                                             RANDOM
                                            0.33                                                                                  0.5
                                                0       10      20           30   40            50                                   0   10        20           30   40            50
                                                                     seeds                                                                              seeds


                                                             (c) advogato                                                                     (d) astrophysics

Figure 3: Propagation capacity of MT-TP model in the four real networks given the selection of seeds by communities (red
points), best-ranked (black points), and randomly (blue points).


Table 2: Average of propagation capacities for the full range of ⌘ 2 [1, 50], for each network achieved by seeds: (second big
column) selecting the most important individuals from communities; (third big column) selecting the best-ranked individuals
of the network; and (last column) randomly selecting the initial seeds. The adopted measures were Betweenness (BE), degree
(DG) and PageRank (PR) centralities
                       Network             Community                      Best ranked         Random
                                      BE       DG        PR          BE       DG      PR      selection
                        email       0.6065 0.6105 0.6150 0.5880 0.5840 0.5884                  0.6023
                     hamsterster    0.5485 0.5693 0.5728 0.5306 0.5226 0.5271                  0.5273
                      advogato      0.4077 0.4102 0.4112 0.3993 0.3958 0.4007                  0.3805
                    astrophysics    0.5417 0.5398 0.5415 0.5321 0.5278 0.5301                  0.5337


slope. However, a little decrease in the lines slope                                                                             seeds reached a higher propagation capacity than
is evidenced in the MT-TP Best case with respect                                                                                 the selection of best-ranked vertices. For a larger
to the MT-TP Random case.                                                                                                        number of initial spreaders, '(1) tend to fall
                                                                                                                                 when the best-ranked vertices are selected.
   Consequently, we sought to analyze the impact
                                                                                                                                    On the other hand, when the community de-
of ⌘ and centrality measures in the selection of
                                                                                                                                 tection was performed and individuals with high-
seeds in the diffusion process. We varied the num-
                                                                                                                                 est values of DG, BE or PR in each community
ber of communities and seeds from 2 to 50 and
                                                                                                                                 (red points, COM-**) were selected, the propa-
fixed = 0.3 and µ = 0.2 for all simulations.
                                                                                                                                 gation capacity was improved and achieved the
The real social networks described and the MT-TP
                                                                                                                                 best results. Therefore, more individuals were in-
propagation model were considered in the anal-
                                                                                                                                 formed in the network by the community selec-
ysis (Figure 3). The random selection of initial
                                                                                                                                 tion, with the same propagation constraint (num-
spreader (blue points, RANDOM) or best-ranked
                                                                                                                                 ber of seeds).
vertices (black points, BST-**) of DG, BE or PR
centrality, produced a constant propagation capac-                                                                                  In terms of the topological measures, we ob-
ity ('(1)). In some case, random selection of                                                                                    served that vertices with highest PageRank cen-


                                                                                                80
trality in the communities (COM-PR) obtained                7 Acknowledgments
in average the best propagation results (Table 2).
                                                            This research was partially supported by Na-
Even in the selection of the best-ranked vertices,
                                                            tional Council for Scientific and Technological
the PageRank was notable. Another important
                                                            Development (CNPq) grant: 140688/2013-7 and
point is that often, the uniformly random selec-
                                                            São Paulo Research Foundation (FAPESP) grant:
tion of initial spreader could be a better option
                                                            2011/21880-3.
than select the most central vertices (best-ranked)
of the network. This is contrary what is cur-
rently expected and adopted in marketing cam-               References
paigns, for instances. For all networks and for all         Réka Albert and Albert-László Barabási. 2002. Statisti-
size of seeds, we evidenced that starts the diffu-             cal mechanics of complex networks. Rev. Mod. Phys.,
                                                               74(1):47–97, jan.
sion from the best-ranked vertices produces lower
influence, or final fraction of informed individu-          A.-L. Barabási. 2007. The architecture of complexity: From
                                                               network structure to human dynamics. IEEE Control Sys-
als, than purely select vertices at random; in some            tems Magazine, 27(4):33–42.
cases, the best-ranked selection achieved the worst
                                                            Alain Barrat, MarseilleMarc Barthélemy, and Alessandro
results. However, the selection of initial spread-            Vespignani. 2008. Dynamical Processes on Complex Net-
ers by communities showed, independently of the               works. Cambridge University Press.
centrality measure, higher results.
                                                            Javier Borge-Holthoefer, Sandro Meloni, Bruno Gonçalves,
                                                               and Yamir Moreno. 2012. Emergence of Influential
6   Final remarks                                              Spreaders in Modified Rumor Models. Journal of Statis-
                                                               tical Physics, 151(1-2):383–393, September.
In this work, we proposed a method for maximiz-             S. Brin and L. Page. 1998. The anatomy of a large-scale
ing the information diffusion on networks. First,              hypertextual web search engine. Computer Networks and
we analyzed the impact of the spreader behavior                ISDN Systems, V:107–117.
in the propagation and confirmed that the Trun-             Claudio Castellano, Santo Fortunato, and Vittorio Loreto.
cate Process (TP) is more suitable to simulate in-             2009. Statistical Physics of Social Dynamics. Reviews
                                                               of Modern Physics, 81(2):591–646, may.
formation diffusion on networks. We applied com-
munity detection and targeted the most influential          Mario Cataldi, Luigi Di Caro, and Claudio Schifanella. 2010.
                                                              Emerging topic detection on Twitter based on tempo-
vertices from these communities as initial seeds.             ral and social terms evaluation. In Proceedings of the
Experimental results on an artificial scale-free and          Tenth International Workshop on Multimedia Data Min-
four real social networks confirmed the increase in           ing - MDMKDD ’10, pages 1–10, New York, New York,
                                                              USA, jul. ACM Press.
the final fraction of informed individuals. More-
over, it was found that the PageRank centrality in          Wei Chen, Yajun Wang, and Siyu Yang. 2009. Efficient
                                                              influence maximization in social networks. In Proceed-
communities was a better choice in terms of effi-             ings of the 15th ACM SIGKDD international conference
ciency and influence maximization.                            on Knowledge discovery and data mining - KDD ’09, page
   A brief overview about complex network mea-                199, New York, New York, USA, jun. ACM Press.
sures, community detection and information prop-            L. D. F. Costa, F. A. Rodrigues, G Travieso, and P. R. Villas
agation was introduced. We present our proposal                Boas. 2007. Characterization of complex networks: A
                                                               survey of measurements. Advances in Physics, 56:167–
to select initial spreaders by communities. There              242.
is still an open problem related to an exact defini-
                                                            L C Freeman. 1977. A set of measures of centrality based on
tion of what is considered a community and what                betweenness. Sociometry, 40:35–41.
would be the ideal division. Nevertheless, we var-
ied the number of communities from 2 to 50 and              R Guimera, L Danon, A Diaz-Guilera, F Giralt, and A Are-
                                                              nas. 2003. Self-similar community structure in a network
in general (for every community division) our pro-            of human interactions. Physical Review E, 68:2003.
posal achieved better results versus propagation
                                                            Laurent Hébert-Dufresne, Antoine Allard, Jean-Gabriel
without considering the community structure.                  Young, and Louis J Dubé. 2013. Global efficiency of
   In future work, other measures for selecting               local immunization on complex networks. Scientific re-
                                                              ports, 3:2171, January.
influential individuals on networks could be ex-
plored, in addition to DG, BE and PR applied here.          David Kempe, Jon M. Kleinberg, and Éva Tardos. 2003.
Also, other models of propagation and network                 Maximizing the Spread of Influence Through a Social
                                                              Network. In Lise Getoor, Ted E. Senator, Pedro Domin-
topologies could be tested, as well as novel strate-          gos, and Christos Faloutsos, editors, KDD, pages 137–
gies taking into account community information.               146. ACM.


                                                       81
Maksim Kitsak, Lazaros K. Gallos, Shlomo Havlin, Fredrik           M. E. J. Newman. 2001. The structure of scientific collab-
  Liljeros, Lev Muchnik, H. Eugene Stanley, and Hernán A.           oration networks. In Natl. Acad. Sci. USA, number 98,
  Makse. 2010. Identification of influential spreaders in            pages 404 – 409.
  complex networks. Nature Physics, 6(11):888–893, Au-
  gust.                                                            M E J Newman. 2004. Fast algorithm for detecting
                                                                     community structure in networks. Physical Review E,
Jrme Kunegis. 2013. KONECT – The Koblenz Network Col-                69(3):66133.
   lection. In Proc. Int. Web Observatory Workshop, pages
   1343–1350.                                                      M E J Newman. 2006. Finding community structure in net-
                                                                     works using the eigenvectors of matrices. Physical Review
Jrme Kunegis.  2014a.        Advogato network dataset –              E, 74(3):36104.
   KONECT, October.
Jrme Kunegis. 2014b. Hamsterster full network dataset –            Mark Newman. 2010. Networks: An Introduction. Oxford
   KONECT, jan.                                                      University Press, Inc., New York, NY, USA.

Glenn Lawyer. 2012. Measuring node spreading power by              Romualdo Pastor-Satorras and Alessandro Vespignani. 2001.
  expected cluster degree. page 4, September.                        Epidemic Spreading in Scale-Free Networks. Physical
                                                                     Review Letters, 86(14):3200–3203, April.
D. P. Maki and M Thompson. 1973. Mathematical Models
   and Applications, with Emphasis on the Social, Life, and        Sen Pei and Hernán A Makse. 2013. Spreading dynamics
   Management Sciences. Prentice-Hall.                               in complex networks. Journal of Statistical Mechanics:
                                                                     Theory and Experiment, 2013(12):P12002, December.
Sandro Meloni, Alex Arenas, Sergio Gmez, Javier Borge-
  Holthoefer, and Yamir Moreno. 2012. Modeling epi-                Matthew Richardson and Pedro Domingos. 2002. Mining
  demic spreading in complex networks: Concurrency and               knowledge-sharing sites for viral marketing. In Proceed-
  traffic. In My T. Thai and Panos M. Pardalos, edi-                 ings of the eighth ACM SIGKDD international confer-
  tors, Handbook of Optimization in Complex Networks,                ence on Knowledge discovery and data mining - KDD ’02,
  Springer Optimization and Its Applications, pages 435–             page 61, New York, New York, USA, jul. ACM Press.
  462. Springer US.
Yamir Moreno, Maziar Nekovee, and Amalio F. Pacheco.
  2004. Dynamics of rumor spreading in complex networks.
  Physical Review E, 69(6):066130, jun.


                                                              82