<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Spreader Selection by Community to Maximize Information Diffusion in Social Networks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Didier A. Vega-Oliveros</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lilian Berton</string-name>
          <email>lberton@icmc.usp.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science ICMC, University of Sa ̃o Paulo C.P. 668, CEP 13560-970</institution>
          ,
          <addr-line>Sa ̃o Carlos, SP</addr-line>
          ,
          <country country="BR">Brazil</country>
        </aff>
      </contrib-group>
      <fpage>73</fpage>
      <lpage>82</lpage>
      <abstract>
        <p>Rumors or information can spread quickly and reach many users in social networks. Models for understanding, preventing or increasing the diffusion of information are of greatest interest to companies, governments, scientists, etc. In this paper, we propose an approach for maximizing the information diffusion by selecting the most important (central) users from communities. We also analyze the selection of the most central vertices of the network and considered artificial and real social networks, such as email, hamsterster, advogato and astrophysics. Experimental results confirmed the improvement of the final fraction of informed individuals by applying the proposed approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The modeling of propagation or diffusion
processes in social networks has recently received
more attention, since it allows to understand how
a disease can be controlled or how information
spread among individuals. These diffusion
process are generally analyzed employing complex
network theory
        <xref ref-type="bibr" rid="ref3 ref6">(Barrat et al., 2008; Castellano et
al., 2009)</xref>
        . The area of complex networks seeks
to study and understand the dynamics and
behavior of complex systems, from the structure of the
network to the internal dynamics or interactions.
      </p>
      <p>
        Models that describe the evolution of rumors
can be adapted to analyze the spread of spam on
the Internet, advertising and marketing, political
ideologies or technological news between
individuals
        <xref ref-type="bibr" rid="ref6">(Castellano et al., 2009)</xref>
        . In such cases, the
representation in complex networks enables the
analysis of traditional models and the
heterogeneous structure, which has a strong influence on
the information diffusion process
        <xref ref-type="bibr" rid="ref21 ref3 ref6">(Moreno et al.,
2004; Barrat et al., 2008; Castellano et al., 2009)</xref>
        .
Therefore, some individuals can have a higher
influence than others according to the network
structure. Researchers have focused on identifying the
most influential vertices
        <xref ref-type="bibr" rid="ref12 ref13 ref14 ref18 ref27">(Kempe et al., 2003;
Kitsak et al., 2010; Lawyer, 2012; Pei and Makse,
2013; He´bert-Dufresne et al., 2013)</xref>
        according to
topological properties. It is expected this
influencers convince the largest number of individuals
in the network. However, the selection of more
than one of them not necessarily maximizes the
expected fraction of informed individuals,
compared to an uniformly random selection approach.
      </p>
      <p>In this paper, we propose an approach to
maximize the information diffusion considering the
community structure of the network. The
community symbolizes a group of individuals with a
greater tendency to have more internal than
external connections to other groups. The reason is
that vertices belonging to the same community are
likely to be more similar to each other and share
similar properties and affinity. We confirmed that
selecting the most influential individual from each
community as initial spreaders increases more the
information diffusion than selecting the most
influential individuals from the whole network.</p>
      <p>As a motivation example, let us consider a
company that wants to market a new product in the
blogosphere. The company could select three very
influential individuals of this social network
(bloggers with thousands of access) to advertise its
product. However, these influencers may be
popular in the same group of people. On the other hand,
if the strategy is to identify the three main
communities on the network and select the most
influential individuals of each community, the company
would achieve a variety group of users and
maximize the marketing diffusion.</p>
      <p>The main contribution of this paper is the
information diffusion approach based on selecting
the most influential individuals from communities.
We employed an artificial scale-free and four real
networks: email, hamsterster, advogato and
astrophysics. We applied the SIR model for rumor
propagation selecting the initial seeds from the
whole network and from the communities. The
impact that the Truncate (TP), Contact (CP) or
Reactive (RP) processes have in the information
diffusion was analyzed. The experimental results
showed that the selection of individuals from the
communities as initial spreaders, the final fraction
of informed individuals is improved.</p>
      <p>The remainder of the paper is organized as
follows: Section 2 introduces some definitions and
measures covered in this paper, the community
detection algorithm applied and the propagation
process in networks. Section 3 brings some related
work. Section 4 presents the proposed approach
for information diffusion based on communities.
Section 5 exhibits the experimental results on an
artificial scale-free and four real social networks.
Finally, Section 6 discusses the final remarks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Theoretical background</title>
      <p>A network is a collection of items called nodes or
vertices, which are joined together by connections
called links or edges. Formally we define the
network G = (V, E, W ), where V = {v1, v2, . . . vn}
is the set of N vertices, E = {e1, e2, . . . em} is the
set of M edges and W = {w1, w2, . . . wm} are the
weights of the edges that determine the strength of
the interaction between the respective vertices, in
the case of weighted networks. In mathematical
terms, an undirected and unweighted network can
be represented by an adjacency matrix A, in which
two connected vertices i and j are in the matrix
aij = aji = 1, otherwise, they are equal to 0.</p>
      <p>A path is a consecutive sequence that starts at
vertex i and ends in j, so that vertices are
visited more than once. The distance or length of the
path is defined as the number of edges contained in
the sequence. The shortest distance between two
vertices is known as the shortest path or geodesic
path gi,j . A component is the largest sub-set of
vertices from the network in which exist at least
one path between each pair of vertices, but never
connect to another component. A connected
network has only one component. When the networks
have more than one component, we considered the
largest of them.</p>
      <p>The degree or connectivity of vertex i, called
as ki, is the number of edges or connections that
vertex i has. In the case of directed networks is the
sum of the degrees of input (edges that reach the
vertex) and output (edges that leave the vertex).
The average degree hki is the average of all ki of
the network. The vertices that have a very high
degree in the network are called hubs.</p>
      <p>
        The degree distribution of a network P (k) is the
probability of randomly select a vertex with degree
k. Social networks present scale-free degree
distribution
        <xref ref-type="bibr" rid="ref2 ref25">(Baraba´si, 2007; Newman, 2010)</xref>
        , with
P (k) ⇠ k , in which most of the individuals
have low degree, near to the average, and only a
few of them have very high degree (hubs). The
level of disorder or heterogeneity in vertices
connections is obtained with the entropy of degree
distribution. We employed the normalized version of
the Shannon entropy, i.e.
      </p>
      <p>H˜ =</p>
      <p>P1k=0 P (k) log(P (k))
log(N )
,
(1)
with 0  H˜  1. The maximum value for the
entropy occurs when P (k) shows a uniform
distribution and the lowest possible value happens when
all vertices have the same degree. The entropy of a
network is related to the robustness and their level
of resilience.</p>
      <p>The robustness is also related to the correlation
degree of the network. A network is assortative,
or positive correlated, if vertices tend to connect
with vertices with similar degree. A Network is
dissassortative, or negative correlated, if vertices
with low degree tend to connect with higher
connected vertices (hubs). When networks do not
present any of above patterns, they are called as
non-assortative. For the calculation of the network
correlation we employed the Pearson coefficient,
formulated with adjacency matrix as
(1/M ) Pj&gt;i kikj aij</p>
      <p>h(1/M ) Pj&gt;i (ki+k2j)aij i2
(1/M ) Pj&gt;i (ki2+k2j2)aij</p>
      <p>
        h(1/M ) Pj&gt;i (ki+k2j)aij i2
⇢ =
(2)
where M is the total number of edges in the
network. If r &gt; 0, then the network is assortative.
If r &lt; 0 the network is dissassortative. If r = 0,
then there is no correlation between the degree of
vertices.
In complex and social networks have been
proposed measures to describe the importance or
centrality of vertices
        <xref ref-type="bibr" rid="ref9">(Costa et al., 2007)</xref>
        according to
topological and dynamical properties. The
centralities adopted in this work are briefly described
as follow.
      </p>
      <p>
        Degree centrality (DG) is related with the
number of connections or popularity of a vertex
        <xref ref-type="bibr" rid="ref9">(Costa
et al., 2007)</xref>
        and in terms of the adjacency matrix
is expressed as
ki =
      </p>
      <p>X aij .</p>
      <p>i2 N</p>
      <p>
        Betweenness centrality (BE) quantifies the
number of shortest paths that pass through a vertex
j between all pair of different vertices (i, k)
        <xref ref-type="bibr" rid="ref10">(Freeman, 1977)</xref>
        . It expresses how much a vertex Bj
works as bridge or is a trusted vertex in the
transmission of information, i.e.
      </p>
      <p>Bj =</p>
      <p>X
i,k2 V,i6=k
ik(j)
ik
,
where ik is the total number of different shortest
path between i and k, and ik(j) is the number of
times j appears in those paths.</p>
      <p>
        PageRank centrality (PR) expresses the
importance of a vertex according to the probability that
other vertices have to arrive at it, after a large
number of steps
        <xref ref-type="bibr" rid="ref5">(Brin and Page, 1998)</xref>
        . The idea is to
simulate the surfing on the net. The user can
follow the links available at the current page or jump
to other by typing a new URL. In social terms, it
can be approached like the more cited or renowned
individuals. The formalization of the PageRank
centrality is
~⇡ t = ~⇡ t 1G ,
(5)
where ~⇡ t are the PageRank values for each vertex
in the tth step of navigation and G is known as
the Google matrix. When t = 0 we have by
default ~⇡ 0 = {1, . . . , 1}. The jumps are represented
by a probability ↵ and we adopted the same value
as defined in the original version
        <xref ref-type="bibr" rid="ref5">(Brin and Page,
1998)</xref>
        .
2.2
      </p>
      <sec id="sec-2-1">
        <title>Community detection</title>
        <p>
          Communities are sets of densely interconnected
vertices and sparsely connected with the rest of the
network
          <xref ref-type="bibr" rid="ref25">(Newman, 2010)</xref>
          . Vertices that belong to
the same community, in general, share common
(3)
(4)
properties and perform similar roles. Therefore,
the division of a network into communities helps
to understand their topological structure
(structural and functional properties) and its dynamic
processes, obtaining relevant information and
features to the network domain.
        </p>
        <p>
          We can evaluate a partition based on the scores
obtained from a quality measure. The goal is to
evaluate expected features in a good community
division. One of the most popular quality
measures is the modularity Q
          <xref ref-type="bibr" rid="ref23">(Newman, 2004)</xref>
          . It
compares the current density of intra-community
and inter-community edges relative to a random
network with similar characteristics. It is based on
the fact that random networks have no community
structure.
        </p>
        <p>Given a network with c communities, the Q
modularity is calculated by a symmetric matrix
N ⇥ N , in which elements along the main
diagonal eii represent connections into the same
community and elements eij represent connections
between different communities i and j. Equation 6
shows the formulation of Q.</p>
        <p>Q = X
i
2
4 eij
0
X eij A 5
j
(6)</p>
        <p>
          If a specific division provides less edges
between communities than would be expected by
random connections, the value of Q would be 0.
When the network has isolated communities the
value of Q would be 1. This measure is employed
by several techniques to identify communities in
networks systems, especially in divisive and
agglomerative methods
          <xref ref-type="bibr" rid="ref11 ref23 ref24">(Guimera et al., 2003;
Newman, 2004; Newman, 2006)</xref>
          .
        </p>
        <p>
          Newman
          <xref ref-type="bibr" rid="ref23">(Newman, 2004)</xref>
          proposes an
agglomerative method that is an optimized greedy
algorithm, called fastgreedy. The approach starts
with a copy of a real network of N vertices with
no connections, producing at first N
communities. At each iteration, two communities ci and cj ,
which have connections in real network, are
chosen in order to obtain the greatest improvement of
Q (Equation 7). A pruning is performed in the
search space considering only the edges that exist
between communities. Therefore, execution time
is reduced when considering the new Q function
(Equation 7).
        </p>
        <p>✓
Qij = 2 eij</p>
        <p>Pj eij Pi eij ◆
2M
(7)</p>
        <p>The result can also be represented as a
dendrogram. Cuts at different levels of the
dendrogram produce divisions with greater or lesser
number of communities, and the best cut yields the
largest value of Q. The algorithm at each step has
O(M + N ). Since there are at most N 1 join
operations required to build a complete
dendrogram, their overall complexity is O((M + N )N ),
or O(N 2), for sparse graph. Consequently, by
adopting this method is more treatable the analysis
of communities in larger networks.
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Propagation process on networks</title>
        <p>In classical rumor diffusion models the ignorant
or inactive individuals (S) are those who remain
unaware of the information, the spreaders (I) are
those who disseminate the information and the
stifler (R) are those who know the information but
lose the interest in spreading it. All vertices have
the same probability for transmit the information
to their neighbors and µ for stopping to be active.</p>
        <p>
          The Maki-Thompson (MT)
          <xref ref-type="bibr" rid="ref19">(Maki and
Thompson, 1973)</xref>
          model is a spreader-centric approach
employed for describing the propagation of ideas
and rumors on networks. In the MT process
whenever an active spreader i contacts a vertex j that is
inactive, the latter will become active with a fixed
probability . Otherwise, in the case that j knows
about the rumor, it means j is an active spreader or
a stifler, the vertex i turns into a stifler with
probability µ. The behavior when the spreader stops to
propagate can be understood because the
information is too much known (contacting spreaders) or
without novelty (contacting stifler).
        </p>
        <p>
          Three possible choices related with the spreader
behavior during the diffusion have been
reported
          <xref ref-type="bibr" rid="ref20 ref4 ref4">(Borge-Holthoefer et al., 2012; Meloni et
al., 2012)</xref>
          . They are the Reactive process (RP),
Truncated process (TP) and Contact Process (CP).
However, a clear analysis about the impact of
spreaders behavior in the propagation process has
not been tackled yet. Moreover, there is not a
consensus about what to employ in rumor or
information diffusion and it may cause a misinterpretation
of results. The three main characteristic behaviors
reported to spreaders are described as follow.
• Reactive Process (RP): In each iteration, the
spreaders try to pass the rumor among all
their ignorant neighbors. After that, it
evaluates whether it will become stifler in the next
iteration or not, considering the contact with
        </p>
        <p>all their spreader and stifler neighbors.
• Truncated Process (TP): It consists of
truncate or interrupt the contagion in the precise
time. In each iteration and for each spreader,
it is randomly selected one neighbor at time,
and setting up the states of the contact as
corresponds. The selection continues until the
spreader visit all its neighbors or it becomes
stifler, whichever occurs first.
• Contact Process (CP): In each time step and
for each spreader, it is chosen at random a
single neighbor. Then, it is resolved the
transition states according to the rule that
corresponds. After that, continues with the next
spreader of the network of the same time step.</p>
        <p>
          Different theoretical models have been
proposed for modeling the rumor dynamics on
networks
          <xref ref-type="bibr" rid="ref21 ref3 ref4 ref6">(Moreno et al., 2004; Barrat et al., 2008;
Castellano et al., 2009; Borge-Holthoefer et al.,
2012)</xref>
          . These analytical models make
assumptions about the network structure, such as the
degree correlation or distribution, compartments or
class of vertices with same probabilities,
homogeneous/heterogeneous mixing or mean field theory.
Notwithstanding all of them claim that their
numerical solutions agree with the MC simulations,
so we adopt this approach as an exploratory
research.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>
        Many approaches have been developed in order
to understand the propagation of ideas or
information through social networks
        <xref ref-type="bibr" rid="ref6">(Castellano et al.,
2009)</xref>
        . Specially, characterize the individuals that
are most influential in the propagation process has
attracted the attention of researchers
        <xref ref-type="bibr" rid="ref1 ref13 ref14 ref27 ref27 ref28">(Richardson
and Domingos, 2002; Kempe et al., 2003; Kitsak
et al., 2010; Pei and Makse, 2013)</xref>
        .
      </p>
      <p>
        The conventional approach for describing the
most influential vertices is performing a
microscopic analysis on the network. Vertices are
classified considering their topological properties,
sorted and ranked in order to generalize their
ability to propagate
        <xref ref-type="bibr" rid="ref12 ref14 ref27">(Kitsak et al., 2010;
He´bertDufresne et al., 2013; Pei and Makse, 2013)</xref>
        .
However, to find the set of initial vertices that
maximize the propagation capacity, the selection
of the most influential spreader may produce an
overlap of influence in the population
        <xref ref-type="bibr" rid="ref14 ref27">(Kitsak et
al., 2010; Pei and Makse, 2013)</xref>
        .
      </p>
      <p>
        In terms of topological properties, there not
exists a consensus about what is the more
accurate measure that describes the most
influential vertices. Some researches claim that hubs
are more representative to influence others
vertices
        <xref ref-type="bibr" rid="ref1 ref26 ref27 ref28">(Pastor-Satorras and Vespignani, 2001;
Albert and Baraba´si, 2002)</xref>
        . Vertices with higher
degree are more efficient to maximize the
propagation because, in general, hubs not tend to
connect with each other and thus can achieve a greater
number of vertices
        <xref ref-type="bibr" rid="ref14">(Kitsak et al., 2010)</xref>
        . In the
case of communities, the degree proportion of a
vertex i is defined as the number of edges that i
has in each community. This degree proportion
was found as a good descriptor of influence for
communities
        <xref ref-type="bibr" rid="ref18">(Lawyer, 2012)</xref>
        .
      </p>
      <p>
        On the other hand, the most influential vertices
are described as those with the largest
Betweenness centrality
        <xref ref-type="bibr" rid="ref12">(He´bert-Dufresne et al., 2013)</xref>
        ,
because they intermediate the communication
between groups of vertices, which increase their
influence. According to the authors, Betweenness
centrality is a better descriptor of the most
influential spreader in communities.
      </p>
      <p>
        The PageRank is also considered a better
measure to describe the most influential
vertices
        <xref ref-type="bibr" rid="ref7">(Cataldi et al., 2010)</xref>
        . The reason is that it
employs the random walk concept over the
network to be calculated and vertices with higher
values mean higher probability to be visited.
      </p>
      <p>
        Finally, Kempe et al. (2003) propose a greedy
algorithm to obtain ⌘ initial spreaders that
maximizes the diffusion influence. The authors adopt a
discrete optimization approach and prove that the
optimization problem is NP-hard. It was
implemented considering the independent and weighted
cascade model that have only two states, which are
different to the SIR model. The method evaluates
one vertex at time to be added in the set of selected
seeds. The new vertex is accepted if it is what most
increment the diffusion. However, this approach
has a very higher computational cost problem,
although new researches try to optimize the
performance
        <xref ref-type="bibr" rid="ref8">(Chen et al., 2009)</xref>
        .
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Information diffusion by communities</title>
      <p>Let us consider a constant population of N
vertices in all time steps. Each vertex can be
only in one state, that is Ii(t) = 1 iff i 2
I, otherwise Ii(t) = 0, and Si(t)+Ii(t)+Ri(i) =
1. The macroscopic fraction of ignorant ( (t)),
spreaders ( (t)) and stifler (' (t)) over time is
calculated as (t) = N1 PiN=1 Si(t), that is similar to
the other states and always fulfill (t) + (t) +
' (t) = 1. We assume that infection and
recovering do not occur during the same discrete time
window or step.
4.1</p>
      <sec id="sec-4-1">
        <title>Setup</title>
        <p>The initial setup for the propagation is (0) =
1 ⌘/N , (0) = ⌘/N and ' (0) = 0, where ⌘
represents the seeds or number of initial
spreaders. Each simulation begins with a selection of
⌘ vertices. At each time step, all spreaders
uniformly select and try to infect its neighbors with
probability , or stop the diffusion with
probability µ according to the spreader behavior adopted.
Successful change of state (to be spreader or to be
stifler) are effective at the next iteration. The
simulations run until the end of the process is reached,
when (1 ) = 0.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Community selection approach</title>
        <p>We propose to select the initial spreaders from the
community division of the network. The multiple
seeds are the most central vertices of each
community. The community division may be calculated
by some divisive or agglomerative method
(Section 2.2) and here the fastgreedy algorithm was
employed. The method is detailed as follow:</p>
        <p>First, given a required number of ⌘ initial
spreaders, we find the ⌘ main communities of the
network by the fastgreedy algorithm. Then, each
community is isolated, which produces ⌘
components. The isolation process consists in
maintaining the intra-community edges and erasing the
inter-community connections. For each isolated
community, a specific centrality measure is
calculated to all vertices. Since vertices with higher
centrality are considered more suitable to
influence on the network, we select the most
important vertex from each community. Therefore, these
vertices influence more in their own community
and the overlap of influence in the population is
minimized. At the end, ⌘ seeds are selected and
they have the best centrality value of its
community. We take the original full network, the ⌘ seeds,
the parameters and execute the corresponding
simulations.</p>
        <p>For the centrality measure, the point is to
find what centrality better identifies the influential
spreaders, by communities and in the whole
network, that maximizes the information diffusion.
(a) MT-RP
(b) MT-TP
(c) MT-CP
Here, the degree (DG), PageRank (PR) and
Betweenness (BE) centralities were considered.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental results</title>
      <p>In this section we analyzed the information
diffusion in an artificial scale-free and four real
social networks. We evaluated the impact
spreaders behavior have in the diffusion on the networks.
Then, the results about the selection of initial
spreaders by communities, best-ranked vertices of
the network and random seeds were explored.
5.1</p>
      <sec id="sec-5-1">
        <title>Spreader behavior analysis</title>
        <p>We analyzed the three behavioral approaches for
the spreaders and present the impact they produce
on the propagation process. We considered the
MT model with an artificial scale-free network of
size N = 1000 and hki = 8. In order to
understand the overall spectral effect with the
parameters, the simulations were evaluating a range of
and µ in (0, 1]. Therefore, the differences between
the approaches are evidenced. For each tuple of
values ( , µ), it was selected 100 times at random
⌘ = 1% of initial spreader (seeds) and each time
was an average over 50 executions.</p>
        <p>The impact of the behavioral approach in the
final fraction of informed individuals is shown in
Figure 1. We observed that the CP approach is
less redundant in the number of contacts made by
spreader, producing lower fractions of informed
individuals, in comparison to the other behaviors.
Still, because the single contact made by iteration,
the CP behavior is more similar to a propagation
through the “word-of-mouth” situation.</p>
        <p>The RP approach obtained more than 80% of
informed individuals with values of 0.3, no
matter values of µ. Therefore, the RP approach
favors a viral diffusion on the network with lower
values of and it happens independently of which
are the initial seeds. For this reason, RP is a more
suitable approach to simulate broadcasting
propagation.</p>
        <p>On the other hand, the TP behavior is more
related to the contact network scenario, where the
position and topological characteristics of seeds
may have influence in the diffusion. Moreover, TP
presents more balanced results, near 60% when
⇡ µ, and contacts are not as restricted as CP
behavior. For this reason, we adopted hereafter
the MT-TP approach as the propagation process
for the analysis.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Multiple initial spreader analysis</title>
        <p>The experiments were performed with three
possibilities for choosing the initial spreader: (i) by
randomly selecting ⌘ individuals as initial seeds
in the network; (ii) by selecting the best- ranked ⌘
individuals with highest value of a specific
centrality of the network; and finally, (iii) by detecting ⌘
communities on the network and for each isolated
community selecting the individual with highest
value of a specific centrality measure. The
centrality measures selected were degree (DG),
Betweenness (BE) and Pagerank (PR).</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.2.1 Real social networks</title>
        <p>
          We adopted the email
          <xref ref-type="bibr" rid="ref11">(Guimera et al., 2003)</xref>
          ,
advogato
          <xref ref-type="bibr" rid="ref16 ref17">(Kunegis, 2014a)</xref>
          , astrophysics
          <xref ref-type="bibr" rid="ref22">(Newman,
2001)</xref>
          and hamsterster
          <xref ref-type="bibr" rid="ref15 ref16 ref17">(Kunegis, 2013; Kunegis,
2014b)</xref>
          . All of them were assumed as undirected
and unweighted networks and also it was
considered the largest component for the simulations.
The structural properties of the networks are
summarized in Table 1, with the respective number of
vertices N , the average degree hki, shortest paths
(a) MT-TP Community
(b) MT-TP Random
(c) MT-TP Best
average hgi, normalized entropy H˜ , pearson
correlation ⇢ . Also, the best modularity Q value and
division number of communities N C of the
networks produced by the FastGreedy algorithm are
reported.
        </p>
        <p>email represents a social network of
information exchanged by emails between members of the
Rovira i Virgili University, Tarragona, with largest
hub degree equal to 71.</p>
        <p>hamsterster is an undirected and unweighted
network based on the website data
HAMSTERSTER.COM. The edges represent a relationship of
family or friend among users. The largest hub has
degree equal to 273.</p>
        <p>advogato is an online community platform for
developers of free software launched in 1999.
Vertices are users of advogato, the directed edges
represent trust relationships. The largest hub has
degree equal to 807.</p>
        <p>Finally, astrophysics is a collaborative
network between scientists on previous studies of
astrophysics reported in arXiv during January 1,
1995 until December 31, 1999. The network is
weighted and directed and originally it has 16707
vertices. The largest hub of the main component
has 360 connections.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.2.2 Information diffusion results</title>
        <p>The final fraction of informed individuals (' (1 ))
was averaged over 100 executions for each
combination of initial seeds and parameters. This
average represents the propagation capacity achieved
by the selected seeds.</p>
        <p>We evaluated the relation between the
parameters and the selection of the initial spreaders in
an artificial network. In this experiment the PR
was defined as the centrality measure employed to
find the seeds in the communities and the whole
network. A value of ⌘ = 4% of initial
spreaders was adopted for a scale-free network of size
N = 1000, hki = 8, hgi = 3.19, H˜ = 0.33 and
⇢ = 0.04.</p>
        <p>The propagation capacity ' (1 ) was affected
according to the initial seeds (Figure 2). The solid
and dashed white curves represent the
combination of and µ parameters that obtained 35% and
60% of informed individuals respectively. We
observed that these curves show a well defined linear
pattern, which means any proportion of = /µ
will obtain equivalent ' (1 ) results.</p>
        <p>The selection of seeds by communities (Figure
2(a)) improved the diffusion on the network in
comparison with the Random seeds (Figure 2(b))
and Best-ranked vertices (Figure 2(c)). This result
is corroborated by the increase of the white lines
0.62
slope. However, a little decrease in the lines slope
is evidenced in the MT-TP Best case with respect
to the MT-TP Random case.</p>
        <p>Consequently, we sought to analyze the impact
of ⌘ and centrality measures in the selection of
seeds in the diffusion process. We varied the
number of communities and seeds from 2 to 50 and
fixed = 0.3 and µ = 0.2 for all simulations.
The real social networks described and the MT-TP
propagation model were considered in the
analysis (Figure 3). The random selection of initial
spreader (blue points, RANDOM) or best-ranked
vertices (black points, BST-**) of DG, BE or PR
centrality, produced a constant propagation
capacity (' (1 )). In some case, random selection of
seeds reached a higher propagation capacity than
the selection of best-ranked vertices. For a larger
number of initial spreaders, ' (1 ) tend to fall
when the best-ranked vertices are selected.</p>
        <p>On the other hand, when the community
detection was performed and individuals with
highest values of DG, BE or PR in each community
(red points, COM-**) were selected, the
propagation capacity was improved and achieved the
best results. Therefore, more individuals were
informed in the network by the community
selection, with the same propagation constraint
(number of seeds).</p>
        <p>In terms of the topological measures, we
observed that vertices with highest PageRank
centrality in the communities (COM-PR) obtained
in average the best propagation results (Table 2).
Even in the selection of the best-ranked vertices,
the PageRank was notable.</p>
        <p>Another important
point is that often, the uniformly random
selection of initial spreader could be a better option
than select the most central vertices (best-ranked)
of the network.</p>
        <p>This is contrary what is
currently expected and adopted in marketing
campaigns, for instances. For all networks and for all
size of seeds, we evidenced that starts the
diffusion from the best-ranked vertices produces lower
influence, or final fraction of informed
individuals, than purely select vertices at random; in some
cases, the best-ranked selection achieved the worst
results. However, the selection of initial
spreaders by communities showed, independently of the
centrality measure, higher results.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Final remarks</title>
      <p>In this work, we proposed a method for
maximizing the information diffusion on networks. First,
we analyzed the impact of the spreader behavior
in the propagation and confirmed that the
Truncate Process (TP) is more suitable to simulate
information diffusion on networks. We applied
community detection and targeted the most influential
vertices from these communities as initial seeds.
Experimental results on an artificial scale-free and
four real social networks confirmed the increase in
the final fraction of informed individuals.
Moreover, it was found that the PageRank centrality in
communities was a better choice in terms of
efficiency and influence maximization.</p>
      <p>A brief overview about complex network
measures, community detection and information
propagation was introduced. We present our proposal
to select initial spreaders by communities. There
is still an open problem related to an exact
definition of what is considered a community and what
would be the ideal division. Nevertheless, we
varied the number of communities from 2 to 50 and
in general (for every community division) our
proposal achieved better results versus propagation
without considering the community structure.</p>
      <p>In future work, other measures for selecting
influential individuals on networks could be
explored, in addition to DG, BE and PR applied here.
Also, other models of propagation and network
topologies could be tested, as well as novel
strategies taking into account community information.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research was partially supported by
National Council for Scientific and Technological
Development (CNPq) grant: 140688/2013-7 and
Sa˜o Paulo Research Foundation (FAPESP) grant:</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>Re´ka Albert and Albert-La´szlo´ Baraba´si</article-title>
          .
          <year>2002</year>
          .
          <article-title>Statistical mechanics of complex networks</article-title>
          .
          <source>Rev. Mod. Phys.</source>
          ,
          <volume>74</volume>
          (
          <issue>1</issue>
          ):
          <fpage>47</fpage>
          -
          <lpage>97</lpage>
          , jan.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>A.-L</surname>
          </string-name>
          . Baraba´si.
          <year>2007</year>
          .
          <article-title>The architecture of complexity: From network structure to human dynamics</article-title>
          .
          <source>IEEE Control Systems Magazine</source>
          ,
          <volume>27</volume>
          (
          <issue>4</issue>
          ):
          <fpage>33</fpage>
          -
          <lpage>42</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Alain</given-names>
            <surname>Barrat</surname>
          </string-name>
          , MarseilleMarc Barthe´lemy, and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Vespignani</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Dynamical Processes on Complex Networks</article-title>
          . Cambridge University Press.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Javier</given-names>
            <surname>Borge-Holthoefer</surname>
          </string-name>
          , Sandro Meloni, Bruno Gonc¸alves, and Yamir Moreno.
          <year>2012</year>
          .
          <article-title>Emergence of Influential Spreaders in Modified Rumor Models</article-title>
          .
          <source>Journal of Statistical Physics</source>
          ,
          <volume>151</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>383</fpage>
          -
          <lpage>393</lpage>
          , September.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>S.</given-names>
            <surname>Brin</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.</given-names>
            <surname>Page</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>The anatomy of a large-scale hypertextual web search engine</article-title>
          .
          <source>Computer Networks and ISDN Systems</source>
          , V:
          <fpage>107</fpage>
          -
          <lpage>117</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Castellano</surname>
          </string-name>
          , Santo Fortunato, and
          <string-name>
            <given-names>Vittorio</given-names>
            <surname>Loreto</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <source>Statistical Physics of Social Dynamics. Reviews of Modern Physics</source>
          ,
          <volume>81</volume>
          (
          <issue>2</issue>
          ):
          <fpage>591</fpage>
          -
          <lpage>646</lpage>
          , may.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Mario</given-names>
            <surname>Cataldi</surname>
          </string-name>
          , Luigi Di Caro, and
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Schifanella</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Emerging topic detection on Twitter based on temporal and social terms evaluation</article-title>
          .
          <source>In Proceedings of the Tenth International Workshop on Multimedia Data Mining - MDMKDD '10</source>
          , pages
          <fpage>1</fpage>
          -
          <lpage>10</lpage>
          , New York, New York, USA, jul. ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Wei</surname>
            <given-names>Chen</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Yajun</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Siyu</given-names>
            <surname>Yang</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Efficient influence maximization in social networks</article-title>
          .
          <source>In Proceedings of the 15th ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '09</source>
          ,
          <string-name>
            <surname>page</surname>
            <given-names>199</given-names>
          </string-name>
          , New York, New York, USA, jun. ACM Press.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>L. D. F. Costa</surname>
            ,
            <given-names>F. A.</given-names>
          </string-name>
          <string-name>
            <surname>Rodrigues</surname>
            ,
            <given-names>G</given-names>
          </string-name>
          <string-name>
            <surname>Travieso</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>P. R. Villas</given-names>
            <surname>Boas</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Characterization of complex networks: A survey of measurements</article-title>
          .
          <source>Advances in Physics</source>
          ,
          <volume>56</volume>
          :
          <fpage>167</fpage>
          -
          <lpage>242</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>L C</given-names>
            <surname>Freeman</surname>
          </string-name>
          .
          <year>1977</year>
          .
          <article-title>A set of measures of centrality based on betweenness</article-title>
          .
          <source>Sociometry</source>
          ,
          <volume>40</volume>
          :
          <fpage>35</fpage>
          -
          <lpage>41</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>R</given-names>
            <surname>Guimera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L</given-names>
            <surname>Danon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Diaz-Guilera</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F</given-names>
            <surname>Giralt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A</given-names>
            <surname>Arenas</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Self-similar community structure in a network of human interactions</article-title>
          .
          <source>Physical Review E</source>
          ,
          <volume>68</volume>
          :
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Laurent</given-names>
            <surname>He</surname>
          </string-name>
          <article-title>´bert-</article-title>
          <string-name>
            <surname>Dufresne</surname>
          </string-name>
          , Antoine Allard,
          <string-name>
            <surname>Jean-Gabriel Young</surname>
          </string-name>
          , and
          <string-name>
            <surname>Louis</surname>
          </string-name>
          J Dube´.
          <year>2013</year>
          .
          <article-title>Global efficiency of local immunization on complex networks</article-title>
          .
          <source>Scientific reports</source>
          ,
          <volume>3</volume>
          :
          <fpage>2171</fpage>
          ,
          <string-name>
            <surname>January</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>David</given-names>
            <surname>Kempe</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Jon M.</given-names>
            <surname>Kleinberg</surname>
          </string-name>
          , and E´va Tardos.
          <year>2003</year>
          .
          <article-title>Maximizing the Spread of Influence Through a Social Network</article-title>
          . In Lise Getoor, Ted E. Senator, Pedro Domingos, and Christos Faloutsos, editors,
          <source>KDD</source>
          , pages
          <fpage>137</fpage>
          -
          <lpage>146</lpage>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Maksim</given-names>
            <surname>Kitsak</surname>
          </string-name>
          ,
          <string-name>
            <surname>Lazaros K. Gallos</surname>
            , Shlomo Havlin, Fredrik Liljeros, Lev Muchnik,
            <given-names>H. Eugene</given-names>
          </string-name>
          <string-name>
            <surname>Stanley</surname>
            , and Herna´n
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Makse</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Identification of influential spreaders in complex networks</article-title>
          .
          <source>Nature Physics</source>
          ,
          <volume>6</volume>
          (
          <issue>11</issue>
          ):
          <fpage>888</fpage>
          -
          <lpage>893</lpage>
          ,
          <year>August</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Jrme</given-names>
            <surname>Kunegis</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>KONECT - The Koblenz Network Collection</article-title>
          .
          <source>In Proc. Int. Web Observatory Workshop</source>
          , pages
          <fpage>1343</fpage>
          -
          <lpage>1350</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Jrme</given-names>
            <surname>Kunegis</surname>
          </string-name>
          .
          <year>2014a</year>
          . KONECT, October.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Jrme</given-names>
            <surname>Kunegis</surname>
          </string-name>
          . 2014b.
          <article-title>Hamsterster full network dataset - KONECT, jan</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Glenn</given-names>
            <surname>Lawyer</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Measuring node spreading power by expected cluster degree</article-title>
          .
          <source>page 4</source>
          ,
          <string-name>
            <surname>September</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Maki</surname>
          </string-name>
          and
          <string-name>
            <given-names>M</given-names>
            <surname>Thompson</surname>
          </string-name>
          .
          <year>1973</year>
          .
          <article-title>Mathematical Models and Applications, with Emphasis on the Social, Life, and Management Sciences</article-title>
          . Prentice-Hall.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <string-name>
            <given-names>Sandro</given-names>
            <surname>Meloni</surname>
          </string-name>
          , Alex Arenas, Sergio Gmez, Javier BorgeHolthoefer, and
          <string-name>
            <given-names>Yamir</given-names>
            <surname>Moreno</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Modeling epidemic spreading in complex networks: Concurrency and traffic</article-title>
          . In My T. Thai and Panos M. Pardalos, editors,
          <source>Handbook of Optimization in Complex Networks</source>
          ,
          <source>Springer Optimization and Its Applications</source>
          , pages
          <fpage>435</fpage>
          -
          <lpage>462</lpage>
          . Springer US.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Yamir</given-names>
            <surname>Moreno</surname>
          </string-name>
          , Maziar Nekovee,
          <string-name>
            <given-names>and Amalio F.</given-names>
            <surname>Pacheco</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Dynamics of rumor spreading in complex networks</article-title>
          .
          <source>Physical Review E</source>
          ,
          <volume>69</volume>
          (
          <issue>6</issue>
          ):066130, jun.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>M. E. J.</given-names>
            <surname>Newman</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>The structure of scientific collaboration networks</article-title>
          .
          <source>In Natl. Acad. Sci. USA</source>
          , number
          <volume>98</volume>
          , pages
          <fpage>404</fpage>
          -
          <lpage>409</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <surname>M E J Newman</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Fast algorithm for detecting community structure in networks</article-title>
          .
          <source>Physical Review E</source>
          ,
          <volume>69</volume>
          (
          <issue>3</issue>
          ):
          <fpage>66133</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <surname>M E J Newman</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Finding community structure in networks using the eigenvectors of matrices</article-title>
          . Physical Review E,
          <volume>74</volume>
          (
          <issue>3</issue>
          ):
          <fpage>36104</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <given-names>Mark</given-names>
            <surname>Newman</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Networks: An Introduction</article-title>
          . Oxford University Press, Inc., New York, NY, USA.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Romualdo</given-names>
            <surname>Pastor-Satorras</surname>
          </string-name>
          and
          <string-name>
            <given-names>Alessandro</given-names>
            <surname>Vespignani</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>Epidemic Spreading in Scale-Free Networks</article-title>
          .
          <source>Physical Review Letters</source>
          ,
          <volume>86</volume>
          (
          <issue>14</issue>
          ):
          <fpage>3200</fpage>
          -
          <lpage>3203</lpage>
          , April.
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Sen</given-names>
            <surname>Pei</surname>
          </string-name>
          and Herna´n
          <string-name>
            <given-names>A</given-names>
            <surname>Makse</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Spreading dynamics in complex networks</article-title>
          .
          <source>Journal of Statistical Mechanics: Theory and Experiment</source>
          ,
          <year>2013</year>
          (
          <volume>12</volume>
          ):
          <fpage>P12002</fpage>
          , December.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Matthew</given-names>
            <surname>Richardson</surname>
          </string-name>
          and
          <string-name>
            <given-names>Pedro</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Mining knowledge-sharing sites for viral marketing</article-title>
          .
          <source>In Proceedings of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining - KDD '02</source>
          ,
          <string-name>
            <surname>page</surname>
            <given-names>61</given-names>
          </string-name>
          , New York, New York, USA, jul. ACM Press.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>