<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>In uence Maximization with an Unknown Network by Exploiting Community Structure</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bryan Wilder</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nicole Immorlica</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eric Rice</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Milind Tambe</string-name>
          <email>tambeg@usc.edu</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Microsoft Research</institution>
          ,
          <addr-line>New England, Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Southern California, Center for Arti cial Intelligence in Society Los Angeles</institution>
          ,
          <addr-line>CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2017</year>
      </pub-date>
      <fpage>2</fpage>
      <lpage>7</lpage>
      <abstract>
        <p>In many real world applications of in uence maximization, practitioners intervene in a population whose social structure is initially unknown. We formalize this problem by introducing exploratory in uence maximization, in which an algorithm queries individual network nodes to learn their links. The goal is to locate a seed set nearly as in uential as the global optimum using very few queries. We show that this problem is intractable for general graphs. However, real world networks typically have community structure, in which nodes are arranged in densely connected subgroups. We present the ARISEN algorithm, which leverages community structure to nd an in uential seed set by querying only a fraction of the network. Experiments on real world networks of homeless youth, village populations in India, and others validate ARISEN's performance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>In contexts ranging from health, to international development, to education,
practitioners have used the social network of their target population to rapidly
spread information and to change behavior in socially desirable ways. The
challenge is to identify the in uential members of the population. While previous
work has delivered many computationally e cient algorithms for this in uence
maximization problem [7, 21, 12], this work assumes that the social network is
given explicitly as input. However, in many real-world domains, the network is
not initially known and must be gathered via laborious eld observations. For
example, collecting network data from vulnerable populations such as homeless
youth, while crucial for health interventions, requires signi cant time spent
gathering eld observations [19]. Social media data is often unavailable when access
Copyright c 2017 for the individual papers by the papers' authors. Copying
permitted for private and academic purposes. This volume is published and copyrighted
by its editors.
to technology is limited, for instance in developing countries or with
vulnerable populations. Even when such data is available, it often includes many weak
links which are not e ective at spreading in uence [2]. For instance, a person
may have hundreds of Facebook friends who they barely know. In principle, the
entire network could be reconstructed via surveys, and then existing in uence
maximization algorithms applied. However, exhaustively reconstructing the
network is very labor-intensive and considered impractical in many situations [22].
For in uence maximization to be relevant to many real-world problems, it must
contend with limited information about the network, not just limited
computation.</p>
      <p>The major informational restriction is the number of nodes which may be
surveyed to explore the network. Thus, a key question is: how can we nd in
uential nodes with a small number of queries? Existing eld work uses heuristics,
such as sampling some percentage of the nodes and asking them to nominate
inuencers [22]. We formalize this problem as exploratory in uence maximization
and seek a principled algorithmic solution, i.e., an algorithm which makes a small
number of queries and returns a set of seed nodes which are approximately as
in uential as the the globally optimal seed set. To the best of our knowledge, no
previous work directly addresses this question from an algorithmic perspective
(we survey the closest work in Section 3).</p>
      <p>We show that for general graphs, any algorithm for exploratory in uence
maximization may perform arbitrarily badly unless it examines almost the
entire network. However, real world networks have useful structure. In particular,
social networks often have strong community structure, where nodes are
arranged into groups which are connected tightly internally, but only weakly to
the rest of the network [10, 16]. Consequently, in uence mostly propagates in a
local fashion. Community structure has been used to develop more
computationally e cient in uence maximization algorithms [23, 8]. Here, we use it to design a
highly information-e cient algorithm. We make three main contributions. First,
we introduce exploratory in uence maximization and show that it is intractable
for general graphs. Second, we present the ARISEN algorithm, which exploits
community structure to nd an in uential seed set. Third, we show experimental
results on a variety of networks(both synthetic and real) that verify ARISEN's
performance. Our focus here is on introducing the algorithm and showing
experimental results; theoretical analysis of ARISEN's performance will be presented
in future work. In this paper, we focus on the description of the problem and
survey related work. We then brie y present the high-level idea of our algorithm
and give an example of experimental results.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Exploratory in uence maximization</title>
      <p>As a motivating example, consider a homeless youth shelter which wishes to
spread HIV prevention information [19]. It would try to harness the youths'
social network and select the most in uential peer leaders to spread information,
but this network is not initially known. Constructing the network requires a
laborious survey [19]. Our motivation is to mitigate this e ort by querying only
a few youth. Such queries require much less time than the day-long training peer
leaders receive. We now formalize this problem.</p>
      <p>In uence maximization: The in uence maximization problem [13], starts
with a graph G = (V; E), where jV j = n and jEj = m. We assume
throughout that G is undirected; social links are typically reciprocal [20]. An in uencer
selects K seed nodes with the aim of maximizing the expected size of the
resulting in uence cascade. We assume that in uence propagates according to the
independent cascade model (ICM), which is the most prevalent model in the
literature. Initially, all nodes are inactive except for the seed set. When a node
becomes active, it makes one attempt to activate each neighbor. Each attempt
succeeds independently with probability q, where q is typically assumed to be the
same for all edges [7, 13, 25]. Let f (S) denote the expected number of activated
nodes with seed set S V . The objective is to compute arg maxjSj K f (S).</p>
      <p>Local information: The edge set E is not initially known. Instead, the
algorithm explores portions of the graph using local operations. We use the
popular \Jump-Crawl" model [5], where the algorithm may either jump to a
uniformly random node, or crawl along an edge from an already surveyed node
to one of its neighbors. When visited, a node reveals all of its edges. We say that
the query cost of an algorithm is the total number of nodes visited using either
operation. Our goal is to nd in uential nodes with a query cost that is much
less than n, the total number of nodes.</p>
      <p>Stochastic Block Model: In our formal analysis, we assume that the graph
is drawn from the SBM. The SBM originated in sociology [9] and lately has
been intensively studied in computer science and statistics (see e.g. [1, 15, 18]).
In the SBM, the network is partitioned into disjoint communities C1::::CL. Each
within-community edge is present independently with probability pw and each
between-community edge is present independently with probability pb. Notice
that each community is an Erd}os-Renyi random graph with additional random
edges to other communities. We assume that pw lojgCjiCjij for all Ci, since this is
necessary for Ci to be internally connected [11]. While the SBM is a simpli ed
model, our experimental results show that ARISEN performs well on real-world
graphs. ARISEN takes as input the parameters n; pw, and pb, but is not given
any prior information about the realized draw of the network. It is reasonable to
assume that the model parameters are known since they can be estimated using
existing network data from a similar population (in our experiments, we show
that this approach works well).</p>
      <p>Objective: We compare to the globally optimal solution, i.e, the best
performance if the entire network structure were known in advance. Let fE (S) give
the expected number of nodes in uenced by seed set S when the set of
realized edges are E. Let A(E) be the (possibly random) seed set containing our
algorithm's selections given edge set E. Let OP T be the expected value of the
globally optimal solution which seeds K nodes. We measure the algorithm's
performance by the ratio OP T = E[fE (A(E))], where the expectation is over both
the randomness in the graph and the algorithm's choices.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Related work</title>
      <p>First, Yadav et al. [25] and Wilder et al. [24], studied dynamic in uence
maximization over a series of rounds. Some edges are \uncertain" and are only present
with some probability; the algorithm can gain information about these edges in
each round. However, the majority of potential edges are known in advance. By
contrast, our work does not require any known edges. Mihara et al. [17] also
consider in uence maximization over a series of rounds, but in their work the
network is initially unknown. In each round, the algorithm makes some queries,
selects some seed nodes, and observes all of the nodes which are activated by its
chosen seeds. The ability to observe activated nodes makes our problem
incomparable with theirs because these activations can reveal a great deal about the
network and give the algorithm information that even the global optimizer does
not have (in their work, the benchmark does not use the activations). Thus, we
emphasize that our algorithm uses strictly less information. Further, in many
applications, activations correspond to private behaviors (e.g. getting tested for
HIV) and cannot be observed for practical or legal reasons.</p>
      <p>Another line of work concerns local graph algorithms, where a local algorithm
only uses the neighborhoods around individual nodes. Borgs et al. [3] study
local algorithms for nding the root node in a preferential attachment graph
and for constructing a minimum dominating set. Other work, including Bressen
et al. [6] and Borgs et al. [4], aims to nd nodes with high PageRank using
local queries. These algorithms are not suitable for our problem since a great
deal of previous work has observed that picking high PageRank nodes as seeds
can prove highly suboptimal for in uence maximization [14, 7, 12]. Essentially,
PageRank identi es a set of nodes that are individually central, while in uence
maximization aims to nd a set of nodes which are collectively best at di using
information. We also emphasize that our technical approach is entirely distinct
from work on PageRank.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Proposed algorithm and results</title>
      <p>We now provide a brief overview of our algorithm for exploratory in uence
maximization. The main idea is to sample a set of T random nodes fv1:::vT g from G
and explore a small subgraph Hi around each vi by taking R steps of a random
walk. We discard the rst B steps of each walk as burn-in. R; T and B are inputs
set by the user according to the size of the network and the number of seeds
they wish to select. Intuitively, T should be greater than K so we can be sure
of sampling each of the largest K communities. The subgraphs Hi are used to
construct a weight vector w where wi gives the weight associated with vi. The
algorithm then independently samples each seed from fv1:::vT g with
probability proportional to w. Further details will appear in an extended version of the
paper.</p>
      <p>
        Figure 1 shows experimental results on a network of households in a village in
rural india. We compare our algorithm (in blue with diagonal hatches) to a series
of baselines. From left to right, the baselines are (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) selecting K random node
and seeding their highest degree neighbor (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) starting at a random node and
iteratively seeding the highest degree neighbor of the previous node (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) selecting
K seeds at random. The y axis plots the fraction of the optimal value (assuming
that the true network were known) attained by each algorithm. We see that the
proposed algorithm outperforms all baselines.
      </p>
      <p>T1.00
P
O0.75
We introduced exploratory in uence maximization to study in uence
maximization when the network is initially unknown. We presented a novel algorithm,
which exploits the community structure present in many real world social
networks. Experimental results on a real world network show that our algorithm is
competitive with the global optimum and outperforms several baselines.
5. Brautbar, M., Kearns, M.J.: Local algorithms for nding interesting individuals
in large networks. In: Innovations in Theoretical Computer Science. pp. 188{199
(2010)
6. Bressan, M., Peserico, E., Pretto, L.: The power of local information in pagerank.</p>
      <p>
        In: WWW. pp. 179{180. ACM (2013)
7. Chen, W., Wang, C., Wang, Y.: Scalable in uence maximization for prevalent viral
marketing in large-scale social networks. In: KDD. pp. 1029{1038. ACM (2010)
8. Chen, Y.C., Zhu, W.Y., Peng, W.C., Lee, W.C., Lee, S.Y.: Cim: community-based
in uence maximization in social networks. ACM Transactions on Intelligent
Systems and Technology (TIST) 5(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), 25 (2014)
9. Fienberg, S.E., Wasserman, S.S.: Categorical data analysis of single sociometric
relations. Sociological methodology 12, 156{192 (1981)
10. Girvan, M., Newman, M.E.: Community structure in social and biological networks.
      </p>
      <p>PNAS 99(12), 7821{7826 (2002)
11. Janson, S., Luczak, T., Rucinski, A.: Random graphs, vol. 45. John Wiley &amp; Sons
(2011)
12. Jung, K., Heo, W., Chen, W.: Irie: Scalable and robust in uence maximization in
social networks. In: ICDM. pp. 918{923. IEEE (2012)
13. Kempe, D., Kleinberg, J., Tardos, E.: Maximizing the spread of in uence through
a social network. In: KDD. pp. 137{146. ACM (2003)
14. Kimura, M., Saito, K., Nakano, R., Motoda, H.: Finding in uential nodes in a social
network from information di usion data. In: Social Computing and Behavioral
Modeling, pp. 1{8. Springer (2009)
15. Krzakala, F., Moore, C., Mossel, E., Neeman, J., Sly, A., Zdeborova, L., Zhang, P.:
Spectral redemption in clustering sparse networks. PNAS 110(52), 20935{20940
(2013)
16. Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in
large networks: Natural cluster sizes and the absence of large well-de ned clusters.</p>
      <p>
        Internet Mathematics 6(
        <xref ref-type="bibr" rid="ref1">1</xref>
        ), 29{123 (2009)
17. Mihara, S., Tsugawa, S., Ohsaki, H.: In uence maximization problem for unknown
social networks. In: ASONAM. pp. 1539{1546. ACM (2015)
18. Mossel, E., Neeman, J., Sly, A.: Reconstruction and estimation in the planted
partition model. Probability Theory and Related Fields 162(
        <xref ref-type="bibr" rid="ref3 ref4">3-4</xref>
        ), 431{461 (2015)
19. Rice, E., Tulbert, E., Cederbaum, J., Adhikari, A.B., Milburn, N.G.:
Mobilizing homeless youth for HIV prevention. Health education research 27(
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), 226{236
(2012)
20. Squartini, T., Picciolo, F., Ruzzenenti, F., Garlaschelli, D.: Reciprocity of weighted
networks. Scienti c Reports (2012)
21. Tang, Y., Xiao, X., Shi, Y.: In uence maximization: Near-optimal time complexity
meets practical e ciency. In: KDD. ACM (2014)
22. Valente, T.W., Pumpuang, P.: Identifying opinion leaders to promote behavior
change. Health Education &amp; Behavior (2007)
23. Wang, Y., Cong, G., Song, G., Xie, K.: Community-based greedy algorithm for
mining top-k in uential nodes in mobile social networks. In: KDD. pp. 1039{1048.
      </p>
      <p>ACM (2010)
24. Wilder, B., Yadav, A., Immorlica, N., Rice, E., Tambe, M.: Uncharted but not
unin uenced: In uence maximization with an uncertain network. In: AAMAS. pp.
740{748 (2017)
25. Yadav, A., Chan, H., Xin Jiang, A., Xu, H., Rice, E., Tambe, M.: Using social
networks to aid homeless shelters: Dynamic in uence maximization under uncertainty.
In: AAMAS. pp. 740{748 (2016)</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abbe</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sandon</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Community detection in general stochastic block models: Fundamental limits and e cient algorithms for recovery</article-title>
          .
          <source>In: FOCS</source>
          . pp.
          <volume>670</volume>
          {
          <fpage>688</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bond</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fariss</surname>
            ,
            <given-names>C.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jones</surname>
            ,
            <given-names>J.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kramer</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marlow</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Settle</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fowler</surname>
            ,
            <given-names>J.H.:</given-names>
          </string-name>
          <article-title>A 61-million-person experiment in social in uence and political mobilization</article-title>
          .
          <source>Nature</source>
          <volume>489</volume>
          (
          <issue>7415</issue>
          ),
          <volume>295</volume>
          {
          <fpage>298</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Borgs</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brautbar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chayes</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khanna</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucier</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The power of local information in social networks</article-title>
          .
          <source>In: WINE</source>
          . pp.
          <volume>406</volume>
          {
          <fpage>419</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Borgs</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brautbar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chayes</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teng</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          :
          <article-title>Multiscale matrix sampling and sublinear-time pagerank computation</article-title>
          .
          <source>Internet Mathematics</source>
          <volume>10</volume>
          (
          <issue>1-2</issue>
          ),
          <volume>20</volume>
          {
          <fpage>48</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>