<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Mining Twitter for an Explanatory Model of Social In uence</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jan Hau a</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Benjamin Koster</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florian Hartl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Valeria Kollhofer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Georg Groh</string-name>
          <email>grohgg@in.tum.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Technische Universitat Munchen, Department of Informatics</institution>
          ,
          <addr-line>Boltzmannstr. 3, 85748 Garching</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2016</year>
      </pub-date>
      <fpage>3</fpage>
      <lpage>14</lpage>
      <abstract>
        <p>The large-scale availability of online communication data offers an opportunity to learn about social in uence on the individual level. Starting from an abstract cognitive de nition, we iteratively build a predictive model of social in uence upon the principle of locality of in uence, which implies the decomposition of observed behavior into resistance to in uence, and in uence received via direct and indirect exposure to others' behavior. After training the model on a 30,000 user dataset of the social network service Twitter, we nd that direct exposure has much less explanatory value than expected, and sources of in uence exhibit strong temporal variation. We identify two modes of communication on Twitter, di ering in the manifestation of in uence.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Interpersonal social in uence has long been a subject of research in the social
sciences. A generally accepted de nition is \change in an individual's thoughts,
feelings, attitudes, or behaviors that results from interaction" [10], but the nature
of the process, by which an individual receives in uence, remains under active
research and debate. With the rise of online social network services (SNS), social
interaction has become observable outside of constrained experimental settings
and accessible to large scale data mining. Longitudinal interaction data makes
changes in behavior visible, enabling inference about changes in people's
attitude and reasoning about the process that drives these changes. By analyzing
communication data in large volume, we attempt to identify fundamental
characteristics of social in uence.
1.1</p>
      <p>In uence in Social Networks
In a simple model of human cognition, the behavior of an individual is
determined by an internal state, which is constantly updated by perception of the
environment. Change of behavior in reaction to events in the environment is the
most general form of in uence. The internal state is not observable, but
observing both the environment and the behavior of an individual enables inductive
reasoning about their relationship, and by extension about the underlying
cognitive processes. Inferences can be tested by applying them to the prediction
Copyright c 2016 for the individual papers by the papers' authors. Copying permitted for private
and academic purposes. This volume is published and copyrighted by its editors.
of future behavior. Social in uence can be de ned as the subset of updates to
the internal state caused by interpersonal interaction, and its e ect on future
interactions.</p>
      <p>From an outside perspective, the e ects of social interaction and general
perception cannot be separated, so any amount of data that can be gathered in
a practical experiment will be insu cient for reasoning within this model. To
make inference tractable, we introduce an assumption called locality of in uence:
The in uence of behavior perceived in social context a on behavior produced
in context b is proportional to the similarity of a and b. Local in uences may
override external in uence, but the resulting change in behavior may also be
limited to a particular social context.</p>
      <p>Related concepts can be found in the literature: Latane's [5] dynamic
theory of social impact asserts that \[...] in uence is directly proportional to the
immediacy of the source of in uence." Immediacy is de ned as a combination
of variables, including \richness of the communication channels" and geospatial
distance. Myers et al. [8] provide empirical support by attributing only 29% of
information in a complete record of Twitter activity over one month to \external
events and factors outside the network". The role of local graph structure for
information di usion in social networks is discussed e.g. by Zhang et al. [15].
1.2</p>
      <p>Related Work
The main di erence between our work and other studies of social in uence [12] is
our goal of learning about the in uence process. Instead of inferring an in uence
network from observed interactions, our model yields a network-wide rule for
generating individual in uence networks for each user, comparable to egocentric
di usion networks [15].
2</p>
    </sec>
    <sec id="sec-2">
      <title>Data Acquisition</title>
      <p>Characterizing the social in uence process requires a large corpus of observed
social interaction that is not restricted to a particular social group or subject
matter. We build such a corpus by crawling Twitter, an online service focused on
the exchange of short text messages (\tweets") up to 140 characters in length,
which are public by default. The only method of interaction is posting a tweet,
and the only relation over the set of users is \a follows b", whereby a subscribes to
tweets sent by b. Following is asymmetric, and does not require con rmation by
the followee. Each user has a personal news feed that chronologically aggregates
the tweets sent by followees.
2.1</p>
      <p>Crawling Twitter
The follower network was crawled using non-exhaustive breadth- rst search
(BFS), ignoring the direction of edges. Accounts younger than 10 days, with
a degree greater than 25,000, or not posting in English were excluded, to avoid
spammers and mitigate the e ect of \hubs", e.g. celebrities, who connect
otherwise distant parts of the network.</p>
      <p>Crawling produced a longitudinal dataset of 358,342 users and their tweets,
which was subsampled to 30,000 users by BFS traversal from the original starting
point due to the computational complexity of subsequent processing. Table 1
compares the samples to the full Twitter follower graph of July 2009 [4]. The
metrics con rm that BFS is biased towards high-degree nodes, but preserves the
dissortative tendency of the graph, and improves data quality for our use case
by yielding subgraphs that are more dense than the original graph by orders of
magnitude.
The originally intended use case for Twitter was posting brief \status updates".
When holding conversations over Twitter became more popular, the community
reached consensus on social conventions, which were later adopted by Twitter
and integrated into the UI:
@-mention Pre xing a user name with the `@' sign anywhere in a tweet causes
the speci ed user to be noti ed. Honeycutt and Herring [3] identify two
main uses: Addressing a message to another user, and referencing a user in
a message intended for a wider audience.</p>
      <p>Reply Tweets starting with an @-mention are considered part of an ongoing
conversation.</p>
      <p>Retweet Reposting a received tweet under one's own name extends its visibility.</p>
      <p>The usual way of attribution is pre xing the quoted tweet with \RT" or
\via", followed by @-mentioning the original author.</p>
      <p>Among the 17 million tweets of the 30,000 user dataset, 46% are regular
tweets, 36% contain at least one @-mention, and 18% are retweets. 77% of tweets
containing @-mentions are explicit replies via the UI. 8% of replies are users
replying to their own posts, presumably chaining related posts.</p>
      <p>Addressivity is a property of communication in online social media. The
sender of an addressive message explicitly designates one or more recipients,
demonstrating awareness. Non-addressive messages are \broadcast" to an
undisclosed group of people. For the purposes of this work, we treat regular tweets
as non-addressive and replies as addressive, while tweets containing @-mentions
are counted both as non-addressive and as addressed to each mentioned user.
On average, 36% of a user's tweets are addressive ( = 24%).</p>
      <p>Given the conceptual di erences between the two types of communication, it
stands to reason that they are also di erent in terms of in uence, so we analyze
them separately. As retweeting has already been studied within the information
di usion framework, e.g. by Zhang et al. [15], we exclude retweets from the
following experiments.
2.3</p>
      <p>Data Sparsity
Certain characteristics of the dataset may cause a lack of data in an
experimental setting. The rst issue is the low information content of a single tweet,
caused by the size limit of 140 characters, and the presence of elements with a
primarily social function, e.g. @-mentions. The second issue is sparsity of the
spatio-temporal distribution of tweets. When discretizing time into periods of
equal length, and assigning non-addressive and addressive messages to the nodes
and directed edges of the social network graph, respectively, not all of them will
be active, i.e. have at least one associated tweet, in each period. For a period
length of 14 days, on average 69.2% of nodes and only 0.9% of edges were active,
while for a period length of 2 days, 48.5% of nodes and 0.2% of edges were active.
The third issue is missing observations. On average, only 19% of a node's rst
degree neighbors in the Twitter follower graph are present in the sample.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Data Representation via Topic Modeling</title>
      <p>The most salient component of interaction on Twitter is unstructured text, so a
suitable numeric representation has to be found. Given evidence that individual
potential to exert in uence depends on the topic of conversation [6], topic models
appear to be an appropriate choice.</p>
      <p>Latent Dirichlet Allocation (LDA) [11] represents each document in a
collection as a probability distribution over T topics, which in turn are probability
distributions over the set of unique words. The Author-Recipient-Topic model
(ART) [7], designed for email messages, extends LDA by observed variables for
the sender and one or more recipients. For each sender-recipient pair, it yields
a relationship-topic distribution representing the messages sent along the
corresponding social graph edge. ART assigns each word of a message to an individual
recipient. For short messages like tweets, it is more tting to assume that the
message as a whole is addressed to all recipients. As a compromise, we choose
a canonical sender-recipient pair for each tweet: The rst @-mentioned user in
an addressive tweet is the recipient, while the author of a non-addressive tweet
is both sender and recipient, yielding separate topic distributions for each mode
of communication.
3.1</p>
      <p>Parameter Estimation and Inference
The tweet text is subjected to domain speci c tokenization and stop word
removal. The number of topics T is arbitrarily set to 150; values of the other ART
hyper-parameters are chosen according to best practices: is set to 0.01 [11] to
obtain a symmetric Dirichlet prior for , while is determined in a data-driven
way [13], allowing the prior of to be asymmetric. Exact estimation of the model
parameters is intractable, so we approximate them via 2000 iterations of Gibbs
sampling.</p>
      <p>For predicting behavior and evaluating the prediction, it is necessary to
subdivide the dataset along the time axis, and compute separate relationship-topic
distributions for each period. To be comparable, these distributions need to
refer to a single set of topics . After parameter estimation on the full dataset,
relationship-topic distributions for arbitrary subsets of the original data can be
computed by resampling, i.e. repeating the Gibbs sampling process with xed
, for which 200 iterations are su cient.</p>
      <p>After resampling, the sampler's internal state can be used for fast
approximation of aggregate relationship-topic distributions over groups of senders and
recipients. The formula for estimation of [7] is adapted to sum over a set of
senders S and recipients R, resulting in 1 for approximation of the aggregate
distribution S;R, where t = 1::T is the topic index, and ni;j;t the number of
words in messages from i to j assigned to topic t.</p>
      <p>S;R;t =</p>
      <p>t + Pi2S Pj2R ni;j;t
PtT0=1( t0 + Pi2S Pj2R ni;j;t0 )
(1)</p>
      <p>After tting an ART model to Twitter data covering a certain time period, we
partition that data into observation and evaluation periods of equal length, and
separate addressive from non-addressive communication. For each of these four
subsets, various relationship-topic distributions ( M in Table 2) are computed
via resampling and aggregation.
4</p>
    </sec>
    <sec id="sec-4">
      <title>The Social Content In uence Model</title>
      <p>The Social Content In uence Model (SCIM) learns to express the content of
future interactions in terms of observed past interactions. Its predictive accuracy
serves as an indicator for the explanatory value of the learned parameters.</p>
      <p>Ignoring all other cognitive or social processes, future behavior can be fully
explained by the presence or absence of social in uence, or equivalently as a
combination of inertia and exposure to others' behavior. If exposure is potential
in uence, then inertia is individual resistance to in uence, a tendency not to
deviate from past behavior. Unobserved sources of in uence exist outside of the
studied social medium, but also within, due to sampling. Their e ect on the
observed network appears as indirect in uence, i.e. correlated behavioral changes
in non-incident nodes [2]. Analogously, we distinguish direct and indirect
exposure. If person a interacts with b, the content of the interaction can be directly
observed, but will also be partially re ected in the future interactions of b with
others. Aggregating the behavior of a group smoothes over individual
preferences, but preserves information about strong in uence that equally a ected
every member. With the principle of locality, it follows that the aggregated
behavior of people who are socially close to b re ects the behavior b is exposed
to.</p>
      <p>From the perspective of an individual node or node pair (ego and alter)
connected by an edge, the social network can be viewed as a hierarchy of social circles
of decreasing locality. To account for missing observations within the medium,
we aggregate over a node's social neighborhood. Among di erent de nitions of
neighborhood, we aim to identify those that capture indirect exposure equally
well across the whole graph. In uence from outside the medium is approximated
by the aggregate behavior of the whole network, which potentially re ects strong
trends from other media. This tripartite view of the egocentric social network
corresponds to the distinction between interpersonal, peer, and media in uence
in sociology [14].
4.1</p>
      <p>Prediction
Given the observed topic distributions from two successive time periods, the
prediction problem can be formulated as using information from the rst period
to make predictions ^iM,n,s for each node i, or ^iM;j,a,s for each edge from i to j, so
that their Jensen-Shannon divergence (JSD) from the distributions iM,n,s; iM;j,a,s
(see Table 2) in the second period is minimal. The JSD belongs to the family of
symmetrized Kullback-Leibler divergences, which are commonly used for
comparing topic distributions [11]. When de ning the prediction ^ as a nite mixture
of observed topic distributions k; k 2 C 2, nding coe cients c that minimize
the JSD is a convex optimization problem 3.</p>
      <p>^i;j =</p>
      <p>X ck k</p>
      <p>+ cd d
argmin X
c; d
i;j
k2Cnd
DJS (^i;j ; i;j ) +</p>
      <p>X ^</p>
      <p>k i;j k1
i;j
subject to 0
ck; td</p>
      <p>1 for k 2 C; t = 1::T;
k2C</p>
      <p>T
X ck = 1; X
t=1
td = 1
(2)
(3)</p>
      <p>The models for addressive and non-addressive communication di er only in
the number of mixture components. Table 2 lists all 15 components, names the
subset of messages they are computed from, and de nes the set of senders and
recipients they are aggregated over, where applicable. Each component represents
either inertia, indirect, or direct exposure at a particular level of locality (scope).
The components at relationship scope only apply to addressive communication.
M,n,s non-addr. messages sent by i
iM,a,s addr. messages sent by i
iiM;j,a,s addr. messages from i to j
N(i),a,s addr. messages from i to neighbors
iM,n,r non-addr. messages received by i
iM,a,r addr. messages received by i
ijM;i,a,s addr. messages from j to i
N(i),a,r addr. messages from neighbors to i
iM,n,s non-addr. messages sent by j
jjNM(,ai),s,n andodn-ra.dmders.smageesssasgeenst sbeyntj by neighbors
N(i),a addr. messages sent by neighbors
M,n all non-addr. messages
M,a all addr. messages
d estimated from data</p>
      <p>S</p>
      <p>R role</p>
      <p>scope
fig V inertia personal
fig V inertia personal
fig fjg inertia relationship</p>
      <p>inertia neighborhood
fx 2 V : i follows xg V direct exposure personal
V fig direct exposure personal
fjg fig direct exposure relationship</p>
      <p>direct exposure neighborhood
fjg V indirect exposure relationship
fjg V indirect exposure relationship
indirect exposure neighborhood
indirect exposure neighborhood
V V indirect exposure medium
V V indirect exposure medium
indirect exposure medium</p>
      <p>Computing a single set of scalar coe cients that minimizes the error sum
implies the assumption that the in uence process is dominated by global, instead
of individual or topical characteristics. Component d is estimated from the data,
capturing all global e ects of in uence that are either not explicitly represented
in the SCIM or not directly observable. It allows the model to attain a training
error of 0 if the in uence process does not have any individual characteristics.
The `1 regularization promotes sparse predictions and thereby the sparsity of c
and d. Regularization factor is set to 0:001.
4.2</p>
      <p>Construction of the Social Neighborhood
The social neighborhood N (i) of node i is a node-weighted subgraph of the social
network graph (V; E), induced by an indicator function Ii : V ! f0; 1g and a
weight function Wi : V ! R+. The neighborhood mixture components N(i) are
weighted sums over particular relationship-topic distributions of the subgraph
nodes: iM;v,a,s for iN(i),a,s, vM,n,s for N(i),n, vM;i,a,s for iN(i),a,r, and vM,a,s for
N(i),a.</p>
      <p>We consider seven indicator and 25 weight functions. One family of indicators
de nes the neighborhood of i as the set of all nodes with a maximum distance
of either one or two from i, either in the follower graph or the graph induced
by addressive communication. The second family nds dense subgraphs of the
undirected graph of reciprocal following, either by randomly selecting a maximal
clique containing i, or applying the clique percolation method (k = 5) [9] or edge
clustering [1], and taking the union of the communities i is member of.</p>
      <p>A basic weight function assigns uniform weight to all neighborhood nodes j.
More complex functions derive the weight from structural properties of the social
network graph (both local, such as the in-degree of j, and global, e.g. PageRank),
from community structure (e.g. the number of shared communities of i and j),
or from the communication behavior of j (e.g. how often j is retweeted).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <p>The basic prediction experiment is de ned as follows: First, a candidate set of
either edges or nodes is built, depending on the type of communication to be
analyzed. Candidates have to be active in both the observation and the
evaluation period. The set is split randomly into training and test set of equal size,
then parameter estimation and evaluation are performed.</p>
      <p>This basic experiment is repeated, testing all combinations of four experiment
parameters: The observation date marks the end of the observation and the
beginning of the evaluation period. Three equidistant dates within eight weeks
were chosen, April 20, May 4, and May 18 2012, aiming to test the temporal
stability of the model. The length of the observation and evaluation period (time
period length) needs to match the speed of conversation ow. We test periods
of 14, 5, and 2 days, falling back to an extended period of 14 days if there is
no activity. The relationship type is only relevant for addressive communication.
It controls whether or not a needs to follow b for the edge from a to b to be
considered. The last parameter is the choice of social neighborhood.</p>
      <p>The SCIM is compared to three baseline predictors to verify that it captures
non-trivial information about the in uence process. The rst predictor draws
randomly from a Dirichlet distribution Dir( ) with taken from the ART. The
second predictor outputs the mean of Dir( ), which is the relationship-topic
distribution the ART would produce in the absence of data. The third predictor
outputs the relationship-topic distribution of the observed behavior, e ectively
a model of in uence fully driven by inertia.</p>
      <p>The experiment results are ltered to improve interpretability. Two restricted
variants of the SCIM are introduced speci cally to assess the utility of the
coe cients and the neighborhood de nitions. In the rst variant, coe cients are
uniform (c1::jCj = 1=jCj; cd = 0), while in the second variant all neighborhoods
are empty. Any neighborhood de nition that does not outperform these variants
or the baselines across all combinations of experiment parameters is discarded.</p>
      <p>To determine the experiment parameters' e ect on prediction accuracy, we
propose an ANOVA design, where the choice of neighborhood is a repeated
measurement (including the baseline predictors for reference), and the remaining
parameters are between-subject factors. The candidate sets are constructed and
assigned to the experiments accordingly. All pairs of neighborhood de nitions are
tested post-hoc for signi cant di erences in mean prediction error with Tukey's
HSD test. The results can be expressed as homogeneous subsets of neighborhoods
with equivalent performance. After ranking them by mean error, the mixture
coe cients of the best-performing subset are analyzed via descriptive statistics.
5.1</p>
      <p>Results
43.4% of experiments for non-addressive, and 91.5% for addressive
communication are ltered out. ANOVA is performed with a per-group sample size of
238. For both types of communication, there are signi cant interaction e ects
( = 0:01) involving neighborhood de nition, observation date and time
period length. This indicates that the amount of indirect in uence captured by
some or all of the neighborhood de nitions varies over time, possibly related to
the temporally irregular activity of users (Section 2.3). An interaction between
neighborhood and time period length indicates that subgraphs di er in speed of
information ow.</p>
      <p>For non-addressive communication, there is a signi cant e ect of time period
length, with longer time periods improving the accuracy, but this e ect may
already be fully explained by the higher-order interactions. There is no signi cant
e ect involving the relationship type, so the existence of a follower relationship
does not appear to a ect the perception of addressive messages. For both types
of communication, the choice of neighborhood is signi cant. Tukey's test yields
a high number of overlapping homogeneous subsets, but isolated baseline
predictors. The lack of clustering limits the explanatory value of the best subsets.</p>
      <p>The subset for non-addressive communication contains neighborhoods built
by three indicator functions: First, communities found by edge clustering are
given uniform weight, which implies that follower communities re ect indirect
in uence to a degree that is di cult to improve by weighting. Second, followers
with a path distance of up to two, weighted with the number of shared followees
or communities, also hint at the importance of cohesive social groups. Third,
followers of distance one are paired with weights based on similarity of users or
their message content, promoting homogeneous neighborhoods.</p>
      <p>The neighborhoods in the best subset for addressive communication are built
by a single indicator function, followers with a distance of up to two. Weights
are mostly similarity-based and include the number of shared followees and the
similarities of both kinds of communication.</p>
      <p>Figure 1 compares the mean prediction error of the best subset to the
baseline predictors. The SCIM outperforms all baselines, with a 10% improvement
over the best performing baseline for non-addressive, and 28% for addressive
communication. The lower error of the Dirichlet mean baseline predictor in case
of addressive communication re ects the spatio-temporal sparsity discussed in
Section 2.3.</p>
      <p>Figure 2 shows the mixture coe cients as leaves of a tree, with the parent
nodes representing either role or scope as listed in Table 2. Line width is
proportional to the coe cient mean across the best subset, while the color corresponds
to the ratio of mean and standard deviation: The darker, the less a ected is the
coe cient by the experiment parameters. Both addressive and non-addressive
communication are strongly driven by inertia, but the predictive value of direct
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
exposure is unexpectedly low, contradicting the principle of locality. The value
of indirect exposure from the neighborhood is as expected, while the high value
of the data-driven component d suggests the existence of patterns of indirect
inuence not covered by the SCIM. Communication is mainly in uenced by other
communication of the same type. Components aggregating the relationship-topic
distributions of a large number of users are generally of low predictive value.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Discussion</title>
      <p>We report two main results: First, a novel point of view on the question whether
Twitter is a social network, or a bipartite network of content producers and
consumers [4]. A major di erence to other social media is the high volume of
non-addressive communication. Messaging behavior of individuals is highly
variable, with the proportion of addressive communication having a one-SD range of
12% to 60%. The di erence between the two modes of communication is visible
in the in uence process: Non-addressive communication is more resistant to
inuence, so the more stable communication behavior can be exploited by longer
observation periods. Users are in uenced in their non-addressive communication
by their edge communities, while their addressive communication receives in
uence from a larger set of neighbors, weighted by similarity. In e ect, the Twitter
social network is a product of the follower network, which governs the ow of
non-addressive communication, and the implicit network formed by addressive
messaging.</p>
      <p>Second, future behavior can be predicted to a certain extent from local
sources of information, which the SCIM learns to exploit. However, our results
do not fully con rm the decomposability of social in uence into inertia, direct,
and indirect exposure, which follows from the principle of locality. The low
exProceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016)
µ=0.62, σ=0.12
sent by ego (non-addr.)
planatory value of direct exposure implies that locality is not su cient on its
own to explain why the SCIM is able to outperform the baselines: If
interactions within and from outside the medium have similar potential for in uence,
observable interactions are responsible for just a fraction of the overall in uence.
Therefore it is important to exploit indirect in uence, which allows information
to cross the medium boundary. The best-performing neighborhood de nitions
favor nodes that are similar to the ego, and likely to be exposed to similar external
in uences.</p>
      <p>
        Future work involves repeating the experiments on new datasets from di
erent social media to test if our results apply to social interaction in general.
2. Christakis, N., Fowler, J.: Social contagion theory: Examining dynamic social
networks and human behavior. Statistics in Medicine 32(4), 556{577 (2013)
3. Honeycutt, C., Herring, S.: Beyond microblogging: Conversation and collaboration
via Twitter.
        <xref ref-type="bibr" rid="ref3">In: Proceedings of HICSS (Jan 2009</xref>
        )
media?
        <xref ref-type="bibr" rid="ref3">In: Proceedings of WWW (Apr 2010</xref>
        )
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ahn</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bagrow</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Link communities reveal multiscale complexity 4</article-title>
          .
          <string-name>
            <surname>Kwak</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            , H., Moon,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>What is Twitter, a social network or a news 5</article-title>
          .
          <string-name>
            <surname>Latane</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Dynamic social impact: The creation of culture by communication</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>Journal of Communication</source>
          <volume>46</volume>
          (
          <issue>4</issue>
          ),
          <volume>13</volume>
          {
          <fpage>25</fpage>
          (
          <year>1996</year>
          )
          <article-title>6</article-title>
          .
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J</given-names>
            ., Han, J
          </string-name>
          .,
          <string-name>
            <surname>Jiang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Mining topic-level in uence in heterogeneous networks</article-title>
          .
          <source>In: Proceedings of CIKM (Oct</source>
          <year>2010</year>
          )
          <article-title>7</article-title>
          .
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrada-Emmanuel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Topic and role discovery in social networks with experiments on Enron and academic email</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>30</volume>
          ,
          <volume>249</volume>
          {
          <fpage>272</fpage>
          (
          <year>2007</year>
          )
          <article-title>8</article-title>
          .
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leskovec</surname>
          </string-name>
          , J.:
          <article-title>Information di usion and external in uence in networks</article-title>
          .
          <source>In: Proceedings of SIGKDD (Aug</source>
          <year>2012</year>
          )
          <article-title>9</article-title>
          .
          <string-name>
            <surname>Palla</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derenyi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vicsek</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Uncovering the overlapping community structure of complex networks in nature and society</article-title>
          .
          <source>Nature</source>
          <volume>435</volume>
          (
          <issue>7043</issue>
          ),
          <volume>814</volume>
          {
          <fpage>818</fpage>
          (
          <year>2005</year>
          )
          <fpage>10</fpage>
          .
          <string-name>
            <surname>Rashotte</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Social in uence</article-title>
          . In: Ritzer,
          <string-name>
            <surname>G</surname>
          </string-name>
          . (ed.)
          <source>The Blackwell Encyclopedia of Sociology</source>
          , vol.
          <volume>9</volume>
          , pp.
          <volume>4426</volume>
          {
          <fpage>4429</fpage>
          .
          <string-name>
            <surname>Blackwell</surname>
          </string-name>
          (
          <year>2007</year>
          )
          <fpage>11</fpage>
          .
          <string-name>
            <surname>Steyvers</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Gri ths, T.:
          <article-title>Probabilistic topic models</article-title>
          . In: Landauer,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>McNamara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Dennis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kintsch</surname>
          </string-name>
          , W. (eds.)
          <article-title>Handbook of Latent Semantic Analysis, chap</article-title>
          . 21. Lawrence Erlbaum (
          <year>2007</year>
          )
          <fpage>12</fpage>
          .
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>A survey of models and algorithms for social in uence analysis</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <article-title>In: Social Network Data Analysis</article-title>
          ,
          <source>chap. 7</source>
          . Springer (
          <year>2011</year>
          )
          <fpage>13</fpage>
          .
          <string-name>
            <surname>Wallach</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mimno</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCallum</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Rethinking</surname>
            <given-names>LDA</given-names>
          </string-name>
          :
          <article-title>Why priors matter</article-title>
          .
          <source>In: Proceedings of NIPS (Dec</source>
          <year>2009</year>
          )
          <volume>14</volume>
          .
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Carr</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>DeAndrea</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tong</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Van Der Heide</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Interaction of interpersonal, peer, and media in uence sources online</article-title>
          . In: Papacharissi,
          <string-name>
            <surname>Z</surname>
          </string-name>
          . (ed.)
          <article-title>A Networked Self</article-title>
          , chap. 1.
          <string-name>
            <surname>Routledge</surname>
          </string-name>
          (
          <year>2010</year>
          )
          <fpage>15</fpage>
          .
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Xing</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Who in uenced you? Predicting retweet via social in uence locality</article-title>
          .
          <source>ACM Transactions on Knowledge Discovery from Data</source>
          <volume>9</volume>
          (
          <issue>3</issue>
          ),
          <volume>25</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>