Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina Influence maximization based on the least influential spreaders Didier A. Vega-Oliveros, Lilian Berton, Alneu de Andrade Lopes and Francisco A. Rodrigues Instituto de Ciências Matemáticas e de Computação Universidade de São Paulo - Campus de São Carlos 13560-970 São Carlos, SP - Brazil Email: {davo, lberton, alneu, francisco}@icmc.usp.br Abstract Some individuals can have a higher social influence than others, according to their position, topological character- The emergence of social media increases the need istics in the network and dynamic [Kitsak et al., 2010; for the recognization of social influence mainly mo- González-Baillón et al., 2011; Borge-Holthoefer et al., 2012]. tivated by online advertising, political and health A lot of researches have tried to identify influence in social campaigns, recommendation systems, epidemio- networks [Burt, 1992; Sabidussi, 1996; Kitsak et al., 2010; logical study, etc. In spreading processes, it is pos- Borge-Holthoefer et al., 2012]. It was found that the final sible to define the most central or influential ver- fraction of informed individuals is not significantly affected tices according to the network topology and dy- by chosen the spreaders by topological features [Moreno et namic. On the other hand, the least influential al., 2004]. However, incorporating characteristics of real sce- spreaders have been disregarded. This paper aims narios as the activity of spreader or apathy with the new infor- to maximize the mean of information propagation mation [Borge-Holthoefer et al., 2012], or exposure to multi- on the network by recognizing the non influential ple sources and individual’s thresholds [González-Baillón et individuals by making them better spreader. Ex- al., 2011] lead to the emergence of influential spreaders in the perimental results confirm that selecting 0.5% of information spreading dynamic. least influential spreaders in three social networks (google+, hamsterster and advogato) and rewiring In the other hand, the least influential individuals of the one connection to some important vertex, increase network are ignored in the literature. Let consider the case of the propagation over the entire network. a scientific collaboration network. Some research groups are more prominent than others. If a researcher with few publi- cation or contacts visits a prominent group, by the homophily 1 Introduction phenomenon he/she can consequently improve his/her publi- Nowadays social media, such as blogs, social networks, cation capacity. Based on this, we measure the vertices’ in- sharing sites, etc., can quickly spread rumors or informa- fluence and recognize the least influential spreaders. Thereby, tion [Castellano et al., 2009; González-Baillón et al., 2011], we propose to turn those individuals into better spreaders by reaching millions of users in the Global village. Propagation connecting to the most central nodes. process evolving information or rumors can be both benefi- The main contributions of this paper are: i) confirm the ex- cial or destructive in scenarios like stock market, advertising istence of least influential individuals in the network; ii) char- and marketing, adoption of technologies, politics and national acterize the least influential spreaders by analyzing central- security, among other. ity measures, such as degree, betweenness, closeness, pager- Many of the approaches to information or rumor spread- ank, eigenvector, k-core, clustering coefficient and struc- ing [Moreno et al., 2004; Castellano et al., 2009; González- tural holes; iii) select a proportion of least influential users Baillón et al., 2011; Borge-Holthoefer et al., 2012] concen- from google+, hamsterster and advogato social networks and trate in how ideas are shared among individuals and what make them better spreaders rewiring one of its connection to are the conditions that allow a large dissemination. For this a more central individual; iv) analyze the impact of different reason, they are understood as being equivalent to dynam- proportions in the rewiring process and how it increases the ics like epidemics spreading, in the sense that individuals mean of information spreading on the network. would be psychological ‘infected’ with some idea or opin- The paper is organized as follows: Section 2 describes the ion [Daley and Kendall, 1964; Maki and Thompson, 1973; centrality measures considered for network characterization Barrat et al., 2008; Castellano et al., 2009]. The rumor and the spreading process (SIR model); Section 3 presents the spreading process assumes that the population can be divided simulations in three social networks (google+, hamsterster into groups with different states, which is generally described and advogato), recognizing the least influential users, mak- as the Susceptible (inactive), Infectious (spreader) and Re- ing them potential spreaders and increasing the information covered (stifler) (SIR) model [Daley and Kendall, 1964; capacity of the networks. Finally, Section 4 reports the final Maki and Thompson, 1973]. remarks. Copyright c 2015 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors 3 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina 2 Definitions the importance of the vertices given the quality of its connec- A social network can be represented as a static unweighted tions [Bonacich, 1972]. network G = (V, E), such as V = {v1 , v2 , ... vn } is the Pagerank centrality (PR) derives from a Markovian pro- set of N vertices or actors and E = {e1 , e2 , ... em } is the cess that follows a random walk navigation through the net- set of M edges or connections. The adjacency matrix A is work. It expresses the importance of the vertices considering the mathematical entity that represents the connections of the the probability of arriving at certain vertex after a large num- network. We consider undirected networks, that means i, j ber of steps. The pagerank was initially proposed to rank web ∈ V , Aij = Aji = 1 if i and j are connected, or 0 otherwise. pages [Brin and Page, 1998] and the idea is to simulate the be- A walk between a pair of vertices (i, j) is a consecutive havior of a user that is surfing on the net. The user navigates sequence that starts at i and ends in j, so that any vertices are following the links available at the current page or, she/he can visited more than once. The distance or length of the walk jump to other pages by typing a new URL in the browser. In is defined as the number of edges contained in the sequence. social networks it can be approached like the more cited or Two vertices are neighbors if they are connected in a walk of renowned individuals. length 1. The shortest distance between two vertices is known Closeness centrality (CL) is the average of the shortest as the shortest path or geodesic path `ij . The shortest paths paths from each vertex to the rest of the network [Sabidussi, can be computed by the Dijkstra, BellmanFord algorithms or 1996]. Formally, the closeness centrality is the inverse of the by a Breadth-first search method [Cormen et al., 2009]. A Xshortest paths from i to all the vertices, i.e., average of the component is the largest sub-set of vertices from the network CLi = N/ `ij . Thereby, vertices that are closer to the in which exist at least one walk between each pair of vertices, j6=i but never connect to another component. A connected net- others have higher closeness centrality. For the sake of calcu- work has only one component. In the case that i and j belong lating the centrality for disconnected networks, the average is to different components, it is assumed that `ij = ∞. For considered by component. this reason, here we considered the largest component of the k-core centrality (KC) describes the topology of the net- network. work in terms of sub network decomposition in cores. The 2.1 Centrality measures core of k order (Hk ) is the set of vertices where for each vertex i in Hk its degree ki ≥ k. It means, k is the maxi- Several measures have been proposed to describe the impor- mum core that i can belongs, with Kc(i) = k and Hk been tance or centrality of a vertex in the network [Costa et al., the largest set of vertices with this property [Seidman, 1983]. 2007]. These centrality measures are defined considering par- The main core is the set of vertices with the largest k-core ticular definitions of influence [Newman, 2003]. It is assumed value from the network, and these vertices are the most cen- that vertices with higher centrality measure are more suitable tral. Vertices with low values of k-core are commonly located to influence in the opinion of others. Among some of the at the periphery of the network. Not necessary all the high- possibilities, we have the popularity of a vertex (degree cen- degree vertices have higher k-core values. For instance, hubs trality) [Costa et al., 2007], the proximity or how close an located in the periphery would have small values of k-core, individual is to the others (closeness centrality) [Sabidussi, or vertices with larger k-core value could have not so large 1996], the trusted vertices in the transmission of informa- degree [Kitsak et al., 2010]. tion (betweenness centrality) [Freeman, 1977], the proximity of vertices to the network core (k-core) [Seidman, 1983] or Clustering coefficient (CT) or transitivity is a common even the renowned that an individual has (pagerank central- property in real-world networks. In social networks it means ity) [Brin and Page, 1998]. In the following, we present the that if A has one friend that is also friend of B, there is a strong centrality measures adopted in this paper. tendency that A being also a friend of B. In topology terms it Degree centrality (DG) considers the number of connec- is the presence of triangles (cycles of order three) in the net- tions or relations of a vertex. The set of vertices connected work. The clustering coefficient [Watts and Strogatz, 1998] to a certain vertex i is defined as the neighborhood and the for a vertex CTi is defined as the number of triangles centered connection degree DGi represents the size of its neighbor- on i over the maximum number of possible connections for i. hood [Costa et al., 2007]. It means, the higher the degree, the CTi has value 1 if all neighbors of i are interconnected. more popular the vertex. Structural Holes (HO) considers all the vertices as an ego Betweenness centrality (BE) is related to the capacity of network, where connections no related with each vertex have information transmission of vertices. For a vertex j this mea- not direct effect. The key factor is the redundancy that each sure quantified the number of shortest paths that pass through vertex has in its neighborhood, evaluating if its position and j between all pair of vertices (i, k) [Freeman, 1977], with connections brings some opportunities. It means that the suc- i, j and k different. The centrality measure expresses how cess of an individual i within a social network or organization much the vertex j works as bridge, meaning how confidence is related to access local bridges (trusted people) [Burt, 1992]. or trusted is j in the network. If removed i from the network, a structural hole will happen Eigenvector centrality (EV) takes into account that ver- in the local neighborhood. These individuals are important to tices with same degree have different levels of importance the connectivity of local regions. according to the importance of their neighbors. The eigen- The network can be characterized by averagePn the centrality vector centrality is the principal eigenvector associated with measures of all vertices i.e., hDGi = N1 i=1 DGi , and is the greatest eigenvalue of the adjacency matrix A. It describes similar for the other measures. 4 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina Table 1: Structural properties of the complex networks Network N hDGi hCT i hHOi hBEi hCLi hEV i hKCi hP Ri hamsterster 2000 16.1 0.5399 0.2898 2.588 × 103 1.43 × 10−4 0.011 9.287 5.00 × 10−4 advogato 5054 15.6 0.2525 0.3700 5.747 × 103 6.19 × 10−5 6.82 × 10−3 8.137 1.98 × 10−4 google+ 23613 3.32 0.1742 0.8112 3.581 × 104 1.10 × 10−5 2.30 × 10−3 1.669 4.21 × 10−5 2.2 Spreading process et al., 2012]. These analytical models make assumptions Spreading is a pervasive process in society and several models about the network structure such as the degree correlation have been developed in order to understand the propagation or distribution, compartments or class of vertices with same of ideas or opinions through social networks [Castellano et probabilities, homogeneous mixing or mean field theory. al., 2009]. In classical rumor spreading models the ignorant Notwithstanding all of them claim that their numerical so- or inactive (S) are those who remain unaware of the informa- lutions agree with the MC simulations, so we adopt this ap- tion, the spreaders (I) are those who disseminate the ideas, proach. and the stifler (R) are those who know the information but lose the interest in spreading it. All vertices have the same 3 Maximization of least influential spreaders probability β for convincing their neighbors and probability We analyze the spreading capacity in the microscopic and γ for stopping to be active as propagator. macroscopic scales of the propagation. For each vertex i ∈ The Maki-Thompson (MT) [Maki and Thompson, 1973] V , we calculate the final fraction of stifler ϕi∞ . This quan- rumor approach was employed to model the spreading pro- tity represents how long the rumor propagates in the net- cess. In the MT whenever an active spreader i contacts a work starting in i. Each ϕi∞ were average over 90 realiza- vertex j that is inactive, the latter will become active with tions. For the macroscopic propagation scale, the ϕV∞ repre- a fixed probability β. Otherwise, when j knows about the sents thePmean of spreading capacity for all vertices, it means rumor, it means j is a spreader or a stifler, the vertex i will ϕV∞ = ( i∈V ϕi∞ )/N . turn into a stifler with probability γ. The behavior when the spreader stops to propagate is understood because the infor- 3.1 Dataset mation is too much known (contacting spreaders) or without We employ the following social networks: the hamster- novelty (contacting stifler). The general rules of contact can ster [Kunegis, 2014], an undirected and unweighted network be represented as: based on the website data from HAMSTERSTER . COM. The  β vertices are the users of the system and the edges represent  I i + S j −→ I j ,  a relationship among users. It consists of friend and fam- γ I i + Rj −→ Ri , (1) ily relationship between the users of the website. The ad-   γ I i + I j −→ Ri , vogato [kon, 2014], an online community platform for devel- opers of free software launched in 1999. Vertices are users of where i and j are neighbors and the operator “+” means the advogato and the directed edges represent trust relationships. contact between them. Finally, the google+ [Gpl, 2014] contains Google Plus user- In terms of Monte Carlo (MC) implementation, let con- user links. The directed edge denotes that one user has the sider a constant population of N vertices in all time steps. other in his circles, but we assume the network as undirected. Each vertex can be only in one state, that is I i (t) = 1 if i ∈ We always consider the main component for these networks. I, otherwise I i (t) = 0, and S i (t) + I i (t) + Ri (i) = 1. The The topological characteristics of these networks are sum- macroscopic fraction of ignorant (ψ(t)), spreaders (φ(t)) and marized in Table 1, with the corresponding averages of PN stifler (ϕ(t)) over time is calculated as ψ(t) = N1 i=1 S i (t), the centralities: degree hDGi, clustering coefficient hCT i, that is similar to the other states and always fulfill ψ(t) + structural holes hHOi, betweenness hBEi, closeness hCLi, φ(t) + ϕ(t) = 1. We assume that infection and recovering do eigenvector hEV i, k-core hKCi and pagerank hP Ri. The not occur during the same discrete time window or step. Also google+ is more an egocentric network (hHOi) than others, the case when a spreader randomly contacts one neighbor per where vertices are very close (hCLi) and most of them have unit time, the contact process, was adopted. few connections. The hamsterster is a more sparse network The initial setup for the propagation is ψ(0) = 1 − 1/N , with more triangles (hCT i) and connections between users. φ(0) = 1/N and ϕ(0) = 0. Each simulation begins with a And, advogato is a more dense network and in the middle uniformly selection of vertices as defined in the initial setup. term of the previous. At each time step, all spreaders uniformly select one of its neighbors and try to infect it with probability β, or stop the 3.2 Propagation analysis propagating with probability γ. The simulations run until the For evaluating the impact of the least influential users in end of the propagation process is reached, when φ∞ = 0. the networks, we employ the z-score normalization over the Different theoretical models have been proposed for mod- spreading capacities ϕi∞ and sort the values in ascending or- eling the rumor dynamics in networks [Moreno et al., 2004; der. The z-score indicates how many standard deviations an Barrat et al., 2008; Castellano et al., 2009; Borge-Holthoefer element is according to the mean. The z-score of a raw value 5 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina 0 hamsterster Table 2: k-means clustering result for the google+ network advogato Measures Cluster 1 (10%) Cluster 2 (67%) Cluster 3 (23%) −2 google+ z−score of informed individuals DG 1.029 ± 1.001 1.128 ± 0.386 10.553 ± 72.435 −4 BE 0.00 ± 0.0004 0.00 ± 0.0005 0.0023 ± 0.023 CL 0.5 ± 0.00 0.5 ± 0.00 0.501 ± 0.019 −6 PR 0.0007 ± 0.0007 0.0007 ± 0.0002 0.0043 ± 0.0299 EV 0.00 ± 0.00 0.0021 ± 0.0037 0.0117 ± 0.0202 −8 KC 0.083 ± 0 0.094 ± 0.032 0.292 ± 0.178 CT 0.00 ± 0.00 0.00 ± 0.00 0.746 ± 0.312 −10 HO 0.991 ± 0.0291 0.942 ± 0.166 0.360 ± 0.145 Propagation 2612.274 ± 332.095 3112.858 ± 56.965 3135.407 ± 37.310 −12 −14 0,001 0,01 0,1 1 fraction of individuals Table 3: Propagation improvements for the real networks: 1st place are in bold and 2nd place are underlined Figure 1: Comparative z-score of the propagation capacity in func- Network Rewired (%) DG BE EV PR tion of the proportion of least influential spreaders 5 1.0817 1.0932 1.0810 1.0844 hamsterster 2.5 1.0804 1.0817 1.0794 1.0783 0.5 1.0493 1.0510 1.0479 1.0489 x is z = x−µσ , where: µ is the mean and σ is the standard 5 1.0600 1.0594 1.0599 1.0594 deviation of the population. The impact is shown in Fig- advogato 2.5 1.0529 1.0556 1.0549 1.0557 ure 1, where considering 10% of least influential spreaders, 0.5 1.0450 1.0439 1.0453 1.0447 the mean decreases around 3 standard deviations. This de- 5 1.0124 1.0158 1.0050 1.0112 google+ 2.5 1.0143 1.0102 1.0103 1.0171 creasing behavior continues successively. The less influential 0.5 1.0241 1.0190 1.0187 1.0266 spreaders impact the overall mean of spreading capacity. We aim to recognize these least influential spreaders in or- der to improve their influence. Following, we considered the centrality measures and analyzed the topological influence 3.4 Cluster analysis over the propagation and data clustering to find patterns on We perform a clustering analysis in order to discover the these individuals. group of least influential spreaders according to the topolog- ical characteristics of the vertices. As objects in the same 3.3 Topological analysis group are more similar to each other than those in other Z-score was calculated for each centrality measure and net- groups, we aim to identify common properties of the elements work (Figure 2 a-c). The fraction of individuals was sorted that belong to the group with lowest propagation rates. in ascending order for each centrality. Vertices with lowest We apply the popular k-means clustering [Macqueen, eigenvector centrality always led the lowest z-score values. 1967] that partitions data into k mutually exclusive clusters. Jaccard coefficient was employed in order to measure the Each cluster is defined by its members and by its centroid proportion of least influential spreaders contained in each that represents the point to which the sum of distances from group of vertices with lower centrality measure (Figure 2 d-f). all objects in the cluster is minimized. We used the Weka1 The Jaccard coefficient compares the similarity and diversity implementation with the standard configuration. of sample sets. It measures similarity between finite sample The result for the google+ network is shown in Table 2. sets, and is defined as the size of the intersection divided by As we set the parameter k = 3 the k-means generates three the size of the union of the sample sets: J(A, B) = |A∩B| |A∪B| . clusters. We are mainly interested in the cluster 1 because Here, for each centrality measure A represents the proportion has the individuals with the lowest propagation mean. By of vertices with lowest values and B the fraction of least influ- comparing the mean of the measures among clusters 1, 2 and ential vertices. We define the least influential users as the ver- 3, we observe that the values of EV ≈ 0 and HO ≈ 1 are ϕV detachable in comparison to other clusters. tices that achieved propagation values ϕi∞ ≤ 2∞ . Selecting less than 10% of vertices with lowest eigenvector centrality, When performed the cluster analysis in hamsterster and it was obtained around half of the less influential spreaders. advogato, the more significant measures were CL < 0.5 and The topological properties considering 1% of vertices with also EV ≈ 0. In all the case, individuals with eigenvector least influential spreading and lowest EV value were ana- values close to zero are in the cluster with lowest spread- lyzed. We verify if these vertices also have low values in ing capacity. If we aim to find a cluster with non influen- other centralities. For each centrality measure we normal- tial spreaders in the network, we can run the k-means algo- ize the values by its own average. Therefore, it is possible rithm until some group partition achieve an eigenvector mean to compare the networks and the maximum values achieved hEV i < 1/n. This pattern confirms the results of the topo- (Figure 2 g-i). We observe that the topological properties of logical analysis. Hence, we generalize that the lower the individuals with lowest EV values are a subset of the values eigenvector value, the lower the spreading capacity of ver- presented for the least influential spreaders group. Unlike ex- tices. pected, vertices with BE, DG or PR above the mean may be 1 also less influential spreaders. http://www.cs.waikato.ac.nz/ml/weka 6 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina 1 2 0 0 0 z−score of informed individuals z−score of informed individuals z−score of informed individuals −2 −2 −1 −4 −4 −2 −6 DG DG DG BE BE −6 BE −3 CL −8 CL CL PR PR −8 PR EV EV EV −4 KC −10 KC KC −10 CT CT CT HO −12 HO HO −5 −12 0 0.2 0.4 0.6 0.8 1 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 fraction of individuals fraction of individuals fraction of individuals (a) hamsterster (b) advogato (c) google+ 0.45 0.4 0.7 DG DG DG 0.4 BE 0.35 BE BE 0.6 CL CL CL 0.35 PR PR PR 0.3 EV EV 0.5 EV Jaccard coefficient Jaccard coefficient Jaccard coefficient 0.3 KC KC KC 0.25 CT CT 0.4 CT 0.25 HO HO HO 0.2 0.2 0.3 0.15 0.15 0.2 0.1 0.1 0.05 0.1 0.05 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.1 0.2 0.3 0.4 0.5 0 0.1 0.2 0.3 0.4 0.5 fraction of individuals fraction of individuals fraction of individuals (d) hamsterster (e) advogato (f) google+ CL CL CL 0.64 0.97 1 PR BE PR BE PR BE 1.2 1.26 21.41 0.43 0.77 0.65 3.51 0.67 41.76 0.8 0.84 14.27 0.51 2.34 27.84 0.21 0.32 0.33 0.4 0.42 7.14 0.26 1.17 13.92 EV 0 0 0 DG EV 0.1 0.06 0.03 DG EV 0.07 0.04 0.02 DG 0.14 0.29 0.43 0.09 0.17 0.26 5.22 10.44 15.66 0.22 0.08 0.4 1.16 0.9 0.4 0.62 1.32 0 0.43 0.16 0.8 2.32 1.8 0.8 0.65 1.23 0.25 2.64 1.2 0 KC 3.47 HO KC 2.7 HO KC 1.19 HO 1.85 3.96 0 CT lower influence CT lower influence CT lower influence lower EV lower EV lower EV (g) hamsterster (h) advogato (i) google+ Figure 2: Propagation analysis for the real social networks hamsterster, advogato and google+: (a-c) the impact of vertices with low centrality in the propagation; (d-f) the similarity between vertices with lower centrality and the less influential spreaders; (g-i) the topological characteristic of the least influential spreaders and vertices with lowest EV values. For hamsterster and advogato were considered β = 0.3 and γ = 0.1, and for google+ β = 0.5 and γ = 0.1, in all simulations 0.5 0.5 1 0 0 0 z−score of informed individuals z−score of informed individuals z−score of informed individuals −1 −0.5 −0.5 −2 −1 −3 −1 −1.5 −4 DG DG DG −1.5 BE BE BE 0 −2 CL CL CL −2 PR PR −6 PR EV −2.5 EV EV −7 KC KC KC −2.5 −3 CT CT −8 CT HO HO HO −3 −3.5 −9 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 fraction of individuals fraction of individuals fraction of individuals (a) hamsterster (b) advogato (c) google+ Figure 3: Propagation analysis for the real social networks when maximizing the social influence rewiring 0.5% of least influential spreaders considering vertices with highest DG centrality: the impact of low centrality values in the propagation mean 7 Proceedings of the 1st International Workshop on Social Influence Analysis (SocInf 2015) July 27th, 2015 - Buenos Aires, Argentina 3.5 Influence maximization [Borge-Holthoefer et al., 2012] Javier Borge-Holthoefer, Sandro For improving the spreading capacity of the network we se- Meloni, Bruno Gonçalves, and Yamir Moreno. Emergence of Influential Spreaders in Modified Rumor Models. Journal of Sta- lected 5%, 2.5% and 0.5% of the vertices with lower EV. For tistical Physics, 151(1-2):383–393, September 2012. each vertex only one of its edges was randomly selected and rewired to an influential vertex of the network. We randomly [Brin and Page, 1998] S. Brin and L. Page. The anatomy of a large- consider vertices with highest DG, BE, EV and PR central- scale hypertextual web search engine. Computer Networks and ISDN Systems, V:107–117, 1998. ity and exchange only one edge of each influencer. For each measure and proportion of lowest EV vertices to be rewired, [Burt, 1992] Ronald S. Burt. Structural holes: The social structure we have the resulting spreading capacity (ϕV∞ ) normalized by of competition. Harvard University Press, Cambridge, MA, 1992. the mean of the non maximized case (Table 3). When con- [Castellano et al., 2009] Claudio Castellano, Santo Fortunato, and necting the least influential spreaders to vertices with highest Vittorio Loreto. Statistical Physics of Social Dynamics. Reviews DG centrality leads to 6 second places and 1 first place re- of Modern Physics, 81(2):591–646, may 2009. sults. Hence, the visual results selecting 0.5% of lowest EV [Cormen et al., 2009] Thomas H. Cormen, Charles E. Leiserson, connecting to vertices with highest DG centrality are shown Ronald L. Rivest, and Clifford Stein. Introduction to Algorithms. in Figure 3. The overall mean of propagation was increased The MIT Press, 3 edition, 2009. for each network. The z-score also increases for the other [Costa et al., 2007] L.D.F. Costa, F.A. Rodrigues, G Travieso, and measures in the entire network and not only for the 0.5% of P.R. Villas Boas. Characterization of complex networks: A sur- the selected vertices (Figure 3). For the best cases the im- vey of measurements. Adv. in Physics, 56:167–242, 2007. provements achieved an overall mean larger than the highest [Daley and Kendall, 1964] D. J. Daley and D. G. Kendall. Epi- spreading value of the non maximized case. demics and Rumours. Nature, 204(4963):1118–1118, 1964. [Freeman, 1977] L C Freeman. A set of measures of centrality 4 Final remarks based on betweenness. Sociometry, 40:35–41, 1977. This paper explored the least influential spreaders of a net- [González-Baillón et al., 2011] Sandra González-Baillón, Borge- work considering centrality measures and analyzing topolog- Holthoefer, Javier Rivero, and Yamir Moreno. The dynamics of ical and data clustering approaches. The results indicate that Protest Recruitment through an Online Network. Scientific Re- vertices with eigenvector value EVi < 1/N have little in- ports, 1:197:1–7, 2011. fluence in the network. We selected 5%, 2.5% and 0.5% of [Gpl, 2014] Google+ network dataset – KONECT, dec 2014. these individuals and rewired only one edge to an influential [Kitsak et al., 2010] Maksim Kitsak, Lazaros K. Gallos, Shlomo vertex. This approach improves propagation capacity of those Havlin, Fredrik Liljeros, Lev Muchnik, H. Eugene Stanley, and vertices and the mean of the propagation in three social net- Hernán A. Makse. Identification of influential spreaders in com- works considered: google+, hamsterster and advogato. More plex networks. Nature Physics, 6(11):888–893, August 2010. efficient targeted actions can be performed to improve the dif- [kon, 2014] Advogato network dataset – KONECT, October 2014. fusion on the network. Selecting a few non influential users and presenting them to one influencer, they can change their [Kunegis, 2014] J. Kunegis. Hamsterster full network dataset- action, become better spreaders and improve the overall diffu- KONECT, jan 2014. sion capacity. For example, the promotion of interchange of [Macqueen, 1967] J. Macqueen. Some methods for classification students to prominent institutions, or famous/popular people and analysis of multivariate observations. In In 5-th Berkeley giving talks in common places or to specific groups of indi- Symp. on Math. Stats. and Prob., pages 281–297, 1967. viduals, can produce a considerable impact in the diffusion of [Maki and Thompson, 1973] D. P. Maki and M Thompson. Math- ideas and benefits. The motivation or incentive for the most ematical Models and Applications, with Emphasis on the Social, important vertices may be monetary, ideological causes, in- Life, and Management Sciences. Prentice-Hall, 1973. crease publications and impact, among others. This is a work [Moreno et al., 2004] Yamir Moreno, Maziar Nekovee, and in progress that shows promising experimental results. New Amalio F. Pacheco. Dynamics of rumor spreading in complex paths of study can be developed by analyzing the least influ- networks. Physical Review E, 69(6):066130, jun 2004. ential spreaders and dynamics. [Newman, 2003] M E J Newman. The structure and function of complex networks. SIAM Review, 45:1–8, 2003. Acknowledgments [Sabidussi, 1996] G Sabidussi. The centrality index of a graph. Psy- This research was supported by National Council for Scientific and chometrika, 31:581–603, 1996. Technological Development (CNPq) grant: 140688/2013-7 and Sao [Seidman, 1983] S Seidman. Network structure and minimum de- Paulo Research Foundation (FAPESP) grant: 2011/21880-3. gree. Social networks, 5(3):269–287, 1983. [Watts and Strogatz, 1998] Duncan J Watts and Steven H Stro- References gatz. Collective dynamics of s̀mall-world’ networks. Nature, [Barrat et al., 2008] Alain Barrat, MarseilleMarc Barthélemy, and 393(6684):440–442, 1998. Alessandro Vespignani. Dynamical Processes on Complex Net- works. Cambridge University Press, 2008. [Bonacich, 1972] Phillip Bonacich. Factoring and weighting ap- proaches to clique identification. Journal of Mathematical Soci- ology, 2:113–120, 1972. 8