Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                    Influence Maximization with an Unknown
                   Network by Exploiting Community Structure

                       Bryan Wilder1 , Nicole Immorlica2 Eric Rice1 , and Milind Tambe1
                                         1
                                           University of Southern California,
                                      Center for Artificial Intelligence in Society
                                                   Los Angeles, CA
                                        {bwilder, erice, tambe}@usc.edu,
                                        2
                                          Microsoft Research, New England,
                                                   Cambridge, MA
                                                  nicimm@gmail.com


                       Abstract. In many real world applications of influence maximization,
                       practitioners intervene in a population whose social structure is initially
                       unknown. We formalize this problem by introducing exploratory influence
                       maximization, in which an algorithm queries individual network nodes to
                       learn their links. The goal is to locate a seed set nearly as influential as
                       the global optimum using very few queries. We show that this problem
                       is intractable for general graphs. However, real world networks typically
                       have community structure, in which nodes are arranged in densely con-
                       nected subgroups. We present the ARISEN algorithm, which leverages
                       community structure to find an influential seed set by querying only a
                       fraction of the network. Experiments on real world networks of home-
                       less youth, village populations in India, and others validate ARISEN’s
                       performance.


               1     Introduction

                In contexts ranging from health, to international development, to education,
               practitioners have used the social network of their target population to rapidly
               spread information and to change behavior in socially desirable ways. The chal-
               lenge is to identify the influential members of the population. While previous
               work has delivered many computationally efficient algorithms for this influence
               maximization problem [7, 21, 12], this work assumes that the social network is
               given explicitly as input. However, in many real-world domains, the network is
               not initially known and must be gathered via laborious field observations. For
               example, collecting network data from vulnerable populations such as homeless
               youth, while crucial for health interventions, requires significant time spent gath-
               ering field observations [19]. Social media data is often unavailable when access

                   Copyright c 2017 for the individual papers by the papers’ authors. Copying per-
                   mitted for private and academic purposes. This volume is published and copyrighted
                   by its editors.


                                                           2
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               to technology is limited, for instance in developing countries or with vulnera-
               ble populations. Even when such data is available, it often includes many weak
               links which are not effective at spreading influence [2]. For instance, a person
               may have hundreds of Facebook friends who they barely know. In principle, the
               entire network could be reconstructed via surveys, and then existing influence
               maximization algorithms applied. However, exhaustively reconstructing the net-
               work is very labor-intensive and considered impractical in many situations [22].
               For influence maximization to be relevant to many real-world problems, it must
               contend with limited information about the network, not just limited computa-
               tion.
                   The major informational restriction is the number of nodes which may be
               surveyed to explore the network. Thus, a key question is: how can we find influ-
               ential nodes with a small number of queries? Existing field work uses heuristics,
               such as sampling some percentage of the nodes and asking them to nominate in-
               fluencers [22]. We formalize this problem as exploratory influence maximization
               and seek a principled algorithmic solution, i.e., an algorithm which makes a small
               number of queries and returns a set of seed nodes which are approximately as
               influential as the the globally optimal seed set. To the best of our knowledge, no
               previous work directly addresses this question from an algorithmic perspective
               (we survey the closest work in Section 3).
                   We show that for general graphs, any algorithm for exploratory influence
               maximization may perform arbitrarily badly unless it examines almost the en-
               tire network. However, real world networks have useful structure. In particular,
               social networks often have strong community structure, where nodes are ar-
               ranged into groups which are connected tightly internally, but only weakly to
               the rest of the network [10, 16]. Consequently, influence mostly propagates in a
               local fashion. Community structure has been used to develop more computation-
               ally efficient influence maximization algorithms [23, 8]. Here, we use it to design a
               highly information-efficient algorithm. We make three main contributions. First,
               we introduce exploratory influence maximization and show that it is intractable
               for general graphs. Second, we present the ARISEN algorithm, which exploits
               community structure to find an influential seed set. Third, we show experimental
               results on a variety of networks(both synthetic and real) that verify ARISEN’s
               performance. Our focus here is on introducing the algorithm and showing exper-
               imental results; theoretical analysis of ARISEN’s performance will be presented
               in future work. In this paper, we focus on the description of the problem and
               survey related work. We then briefly present the high-level idea of our algorithm
               and give an example of experimental results.


               2    Exploratory influence maximization

               As a motivating example, consider a homeless youth shelter which wishes to
               spread HIV prevention information [19]. It would try to harness the youths’
               social network and select the most influential peer leaders to spread information,
               but this network is not initially known. Constructing the network requires a


                                                         3
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               laborious survey [19]. Our motivation is to mitigate this effort by querying only
               a few youth. Such queries require much less time than the day-long training peer
               leaders receive. We now formalize this problem.
                   Influence maximization: The influence maximization problem [13], starts
               with a graph G = (V, E), where |V | = n and |E| = m. We assume through-
               out that G is undirected; social links are typically reciprocal [20]. An influencer
               selects K seed nodes with the aim of maximizing the expected size of the re-
               sulting influence cascade. We assume that influence propagates according to the
               independent cascade model (ICM), which is the most prevalent model in the
               literature. Initially, all nodes are inactive except for the seed set. When a node
               becomes active, it makes one attempt to activate each neighbor. Each attempt
               succeeds independently with probability q, where q is typically assumed to be the
               same for all edges [7, 13, 25]. Let f (S) denote the expected number of activated
               nodes with seed set S ⊆ V . The objective is to compute arg max|S|≤K f (S).
                   Local information: The edge set E is not initially known. Instead, the
               algorithm explores portions of the graph using local operations. We use the
               popular “Jump-Crawl” model [5], where the algorithm may either jump to a
               uniformly random node, or crawl along an edge from an already surveyed node
               to one of its neighbors. When visited, a node reveals all of its edges. We say that
               the query cost of an algorithm is the total number of nodes visited using either
               operation. Our goal is to find influential nodes with a query cost that is much
               less than n, the total number of nodes.
                   Stochastic Block Model: In our formal analysis, we assume that the graph
               is drawn from the SBM. The SBM originated in sociology [9] and lately has
               been intensively studied in computer science and statistics (see e.g. [1, 15, 18]).
               In the SBM, the network is partitioned into disjoint communities C1 ....CL . Each
               within-community edge is present independently with probability pw and each
               between-community edge is present independently with probability pb . Notice
               that each community is an Erdős-Rényi random graph with additional random
               edges to other communities. We assume that pw ≥ log|C|C   i|
                                                                            i|
                                                                               for all Ci , since this is
               necessary for Ci to be internally connected [11]. While the SBM is a simplified
               model, our experimental results show that ARISEN performs well on real-world
               graphs. ARISEN takes as input the parameters n, pw , and pb , but is not given
               any prior information about the realized draw of the network. It is reasonable to
               assume that the model parameters are known since they can be estimated using
               existing network data from a similar population (in our experiments, we show
               that this approach works well).
                   Objective: We compare to the globally optimal solution, i.e, the best per-
               formance if the entire network structure were known in advance. Let fE (S) give
               the expected number of nodes influenced by seed set S when the set of real-
               ized edges are E. Let A(E) be the (possibly random) seed set containing our
               algorithm’s selections given edge set E. Let OP T be the expected value of the
               globally optimal solution which seeds K nodes. We measure the algorithm’s per-
               formance by the ratio OP T / E[fE (A(E))], where the expectation is over both
               the randomness in the graph and the algorithm’s choices.


                                                          4
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               3    Related work

               First, Yadav et al. [25] and Wilder et al. [24], studied dynamic influence maxi-
               mization over a series of rounds. Some edges are “uncertain” and are only present
               with some probability; the algorithm can gain information about these edges in
               each round. However, the majority of potential edges are known in advance. By
               contrast, our work does not require any known edges. Mihara et al. [17] also
               consider influence maximization over a series of rounds, but in their work the
               network is initially unknown. In each round, the algorithm makes some queries,
               selects some seed nodes, and observes all of the nodes which are activated by its
               chosen seeds. The ability to observe activated nodes makes our problem incom-
               parable with theirs because these activations can reveal a great deal about the
               network and give the algorithm information that even the global optimizer does
               not have (in their work, the benchmark does not use the activations). Thus, we
               emphasize that our algorithm uses strictly less information. Further, in many
               applications, activations correspond to private behaviors (e.g. getting tested for
               HIV) and cannot be observed for practical or legal reasons.
                   Another line of work concerns local graph algorithms, where a local algorithm
               only uses the neighborhoods around individual nodes. Borgs et al. [3] study
               local algorithms for finding the root node in a preferential attachment graph
               and for constructing a minimum dominating set. Other work, including Bressen
               et al. [6] and Borgs et al. [4], aims to find nodes with high PageRank using
               local queries. These algorithms are not suitable for our problem since a great
               deal of previous work has observed that picking high PageRank nodes as seeds
               can prove highly suboptimal for influence maximization [14, 7, 12]. Essentially,
               PageRank identifies a set of nodes that are individually central, while influence
               maximization aims to find a set of nodes which are collectively best at diffusing
               information. We also emphasize that our technical approach is entirely distinct
               from work on PageRank.


               4    Proposed algorithm and results

               We now provide a brief overview of our algorithm for exploratory influence max-
               imization. The main idea is to sample a set of T random nodes {v1 ...vT } from G
               and explore a small subgraph Hi around each vi by taking R steps of a random
               walk. We discard the first B steps of each walk as burn-in. R, T and B are inputs
               set by the user according to the size of the network and the number of seeds
               they wish to select. Intuitively, T should be greater than K so we can be sure
               of sampling each of the largest K communities. The subgraphs Hi are used to
               construct a weight vector w where wi gives the weight associated with vi . The
               algorithm then independently samples each seed from {v1 ...vT } with probabil-
               ity proportional to w. Further details will appear in an extended version of the
               paper.
                   Figure 1 shows experimental results on a network of households in a village in
               rural india. We compare our algorithm (in blue with diagonal hatches) to a series


                                                         5
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


               of baselines. From left to right, the baselines are (1) selecting K random node
               and seeding their highest degree neighbor (2) starting at a random node and
               iteratively seeding the highest degree neighbor of the previous node (3) selecting
               K seeds at random. The y axis plots the fraction of the optimal value (assuming
               that the true network were known) attained by each algorithm. We see that the
               proposed algorithm outperforms all baselines.


                                                                                  L Q G L D  
                                ) U D F W L R Q  R I OPT
                                                                 
                                                                 
                                                                 
                                                                                       
                                                                                      K/n
                       Fig. 1. Influence spread compared to OP T as K varies with q = 0.15.


               5    Conclusion

               We introduced exploratory influence maximization to study influence maximiza-
               tion when the network is initially unknown. We presented a novel algorithm,
               which exploits the community structure present in many real world social net-
               works. Experimental results on a real world network show that our algorithm is
               competitive with the global optimum and outperforms several baselines.


               References
                1. Abbe, E., Sandon, C.: Community detection in general stochastic block models:
                   Fundamental limits and efficient algorithms for recovery. In: FOCS. pp. 670–688.
                   IEEE (2015)
                2. Bond, R.M., Fariss, C.J., Jones, J.J., Kramer, A.D., Marlow, C., Settle, J.E.,
                   Fowler, J.H.: A 61-million-person experiment in social influence and political mo-
                   bilization. Nature 489(7415), 295–298 (2012)
                3. Borgs, C., Brautbar, M., Chayes, J., Khanna, S., Lucier, B.: The power of local
                   information in social networks. In: WINE. pp. 406–419. Springer (2012)
                4. Borgs, C., Brautbar, M., Chayes, J., Teng, S.H.: Multiscale matrix sampling and
                   sublinear-time pagerank computation. Internet Mathematics 10(1-2), 20–48 (2014)


                                                                                  6
Proceedings of the 3rd International Workshop on Social Influence Analysis (SocInf 2017)
August 19th, 2017 - Melbourne, Australia


                5. Brautbar, M., Kearns, M.J.: Local algorithms for finding interesting individuals
                   in large networks. In: Innovations in Theoretical Computer Science. pp. 188–199
                   (2010)
                6. Bressan, M., Peserico, E., Pretto, L.: The power of local information in pagerank.
                   In: WWW. pp. 179–180. ACM (2013)
                7. Chen, W., Wang, C., Wang, Y.: Scalable influence maximization for prevalent viral
                   marketing in large-scale social networks. In: KDD. pp. 1029–1038. ACM (2010)
                8. Chen, Y.C., Zhu, W.Y., Peng, W.C., Lee, W.C., Lee, S.Y.: Cim: community-based
                   influence maximization in social networks. ACM Transactions on Intelligent Sys-
                   tems and Technology (TIST) 5(2), 25 (2014)
                9. Fienberg, S.E., Wasserman, S.S.: Categorical data analysis of single sociometric
                   relations. Sociological methodology 12, 156–192 (1981)
               10. Girvan, M., Newman, M.E.: Community structure in social and biological networks.
                   PNAS 99(12), 7821–7826 (2002)
               11. Janson, S., Luczak, T., Rucinski, A.: Random graphs, vol. 45. John Wiley & Sons
                   (2011)
               12. Jung, K., Heo, W., Chen, W.: Irie: Scalable and robust influence maximization in
                   social networks. In: ICDM. pp. 918–923. IEEE (2012)
               13. Kempe, D., Kleinberg, J., Tardos, É.: Maximizing the spread of influence through
                   a social network. In: KDD. pp. 137–146. ACM (2003)
               14. Kimura, M., Saito, K., Nakano, R., Motoda, H.: Finding influential nodes in a social
                   network from information diffusion data. In: Social Computing and Behavioral
                   Modeling, pp. 1–8. Springer (2009)
               15. Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborová, L., Zhang, P.:
                   Spectral redemption in clustering sparse networks. PNAS 110(52), 20935–20940
                   (2013)
               16. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in
                   large networks: Natural cluster sizes and the absence of large well-defined clusters.
                   Internet Mathematics 6(1), 29–123 (2009)
               17. Mihara, S., Tsugawa, S., Ohsaki, H.: Influence maximization problem for unknown
                   social networks. In: ASONAM. pp. 1539–1546. ACM (2015)
               18. Mossel, E., Neeman, J., Sly, A.: Reconstruction and estimation in the planted
                   partition model. Probability Theory and Related Fields 162(3-4), 431–461 (2015)
               19. Rice, E., Tulbert, E., Cederbaum, J., Adhikari, A.B., Milburn, N.G.: Mobiliz-
                   ing homeless youth for HIV prevention. Health education research 27(2), 226–236
                   (2012)
               20. Squartini, T., Picciolo, F., Ruzzenenti, F., Garlaschelli, D.: Reciprocity of weighted
                   networks. Scientific Reports (2012)
               21. Tang, Y., Xiao, X., Shi, Y.: Influence maximization: Near-optimal time complexity
                   meets practical efficiency. In: KDD. ACM (2014)
               22. Valente, T.W., Pumpuang, P.: Identifying opinion leaders to promote behavior
                   change. Health Education & Behavior (2007)
               23. Wang, Y., Cong, G., Song, G., Xie, K.: Community-based greedy algorithm for
                   mining top-k influential nodes in mobile social networks. In: KDD. pp. 1039–1048.
                   ACM (2010)
               24. Wilder, B., Yadav, A., Immorlica, N., Rice, E., Tambe, M.: Uncharted but not
                   uninfluenced: Influence maximization with an uncertain network. In: AAMAS. pp.
                   740–748 (2017)
               25. Yadav, A., Chan, H., Xin Jiang, A., Xu, H., Rice, E., Tambe, M.: Using social net-
                   works to aid homeless shelters: Dynamic influence maximization under uncertainty.
                   In: AAMAS. pp. 740–748 (2016)


                                                          7