=Paper=
{{Paper
|id=Vol-1622/SocInf2016_Paper1
|storemode=property
|title=Mining Twitter for an Explanatory Model of Social Influence
|pdfUrl=https://ceur-ws.org/Vol-1622/SocInf2016_Paper1.pdf
|volume=Vol-1622
|authors=Jan Hauffa,Benjamin Koster,Florian Hartl,Valeria Köllhofer,Georg Groh
|dblpUrl=https://dblp.org/rec/conf/ijcai/HauffaKHKG16
}}
==Mining Twitter for an Explanatory Model of Social Influence==
Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA Mining Twitter for an Explanatory Model of Social Influence Jan Hauffa, Benjamin Koster, Florian Hartl, Valeria Köllhofer, and Georg Groh Technische Universität München, Department of Informatics, Boltzmannstr. 3, 85748 Garching, Germany, {hauffa,koster,hartlf,koellhof,grohg}@in.tum.de Abstract. The large-scale availability of online communication data of- fers an opportunity to learn about social influence on the individual level. Starting from an abstract cognitive definition, we iteratively build a pre- dictive model of social influence upon the principle of locality of influence, which implies the decomposition of observed behavior into resistance to influence, and influence received via direct and indirect exposure to oth- ers’ behavior. After training the model on a 30,000 user dataset of the social network service Twitter, we find that direct exposure has much less explanatory value than expected, and sources of influence exhibit strong temporal variation. We identify two modes of communication on Twitter, differing in the manifestation of influence. 1 Introduction Interpersonal social influence has long been a subject of research in the social sciences. A generally accepted definition is “change in an individual’s thoughts, feelings, attitudes, or behaviors that results from interaction” [10], but the nature of the process, by which an individual receives influence, remains under active research and debate. With the rise of online social network services (SNS), social interaction has become observable outside of constrained experimental settings and accessible to large scale data mining. Longitudinal interaction data makes changes in behavior visible, enabling inference about changes in people’s atti- tude and reasoning about the process that drives these changes. By analyzing communication data in large volume, we attempt to identify fundamental char- acteristics of social influence. 1.1 Influence in Social Networks In a simple model of human cognition, the behavior of an individual is deter- mined by an internal state, which is constantly updated by perception of the environment. Change of behavior in reaction to events in the environment is the most general form of influence. The internal state is not observable, but observ- ing both the environment and the behavior of an individual enables inductive reasoning about their relationship, and by extension about the underlying cog- nitive processes. Inferences can be tested by applying them to the prediction Copyright c 2016 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. 3 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA of future behavior. Social influence can be defined as the subset of updates to the internal state caused by interpersonal interaction, and its effect on future interactions. From an outside perspective, the effects of social interaction and general perception cannot be separated, so any amount of data that can be gathered in a practical experiment will be insufficient for reasoning within this model. To make inference tractable, we introduce an assumption called locality of influence: The influence of behavior perceived in social context a on behavior produced in context b is proportional to the similarity of a and b. Local influences may override external influence, but the resulting change in behavior may also be limited to a particular social context. Related concepts can be found in the literature: Latané’s [5] dynamic the- ory of social impact asserts that “[...] influence is directly proportional to the immediacy of the source of influence.” Immediacy is defined as a combination of variables, including “richness of the communication channels” and geospatial distance. Myers et al. [8] provide empirical support by attributing only 29% of information in a complete record of Twitter activity over one month to “external events and factors outside the network”. The role of local graph structure for information diffusion in social networks is discussed e.g. by Zhang et al. [15]. 1.2 Related Work The main difference between our work and other studies of social influence [12] is our goal of learning about the influence process. Instead of inferring an influence network from observed interactions, our model yields a network-wide rule for generating individual influence networks for each user, comparable to egocentric diffusion networks [15]. 2 Data Acquisition Characterizing the social influence process requires a large corpus of observed social interaction that is not restricted to a particular social group or subject matter. We build such a corpus by crawling Twitter, an online service focused on the exchange of short text messages (“tweets”) up to 140 characters in length, which are public by default. The only method of interaction is posting a tweet, and the only relation over the set of users is “a follows b”, whereby a subscribes to tweets sent by b. Following is asymmetric, and does not require confirmation by the followee. Each user has a personal news feed that chronologically aggregates the tweets sent by followees. 2.1 Crawling Twitter The follower network was crawled using non-exhaustive breadth-first search (BFS), ignoring the direction of edges. Accounts younger than 10 days, with a degree greater than 25,000, or not posting in English were excluded, to avoid 4 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA spammers and mitigate the effect of “hubs”, e.g. celebrities, who connect other- wise distant parts of the network. Crawling produced a longitudinal dataset of 358,342 users and their tweets, which was subsampled to 30,000 users by BFS traversal from the original starting point due to the computational complexity of subsequent processing. Table 1 compares the samples to the full Twitter follower graph of July 2009 [4]. The metrics confirm that BFS is biased towards high-degree nodes, but preserves the dissortative tendency of the graph, and improves data quality for our use case by yielding subgraphs that are more dense than the original graph by orders of magnitude. Table 1. Metrics of the Twitter follower graph |V | 30,000 358,342 41,652,230 |E| 3,825,022 151,463,754 1,468,365,182 avg. degree 255.086 845.441 70.506 degree SD 407.538 2083.733 2534.992 density 0.009 0.002 > 10−5 clust. coeff. 0.105 0.080 0.001 assortativity -0.154 -0.066 -0.076 2.2 Social Conventions of Twitter Communication The originally intended use case for Twitter was posting brief “status updates”. When holding conversations over Twitter became more popular, the community reached consensus on social conventions, which were later adopted by Twitter and integrated into the UI: @-mention Prefixing a user name with the ‘@’ sign anywhere in a tweet causes the specified user to be notified. Honeycutt and Herring [3] identify two main uses: Addressing a message to another user, and referencing a user in a message intended for a wider audience. Reply Tweets starting with an @-mention are considered part of an ongoing conversation. Retweet Reposting a received tweet under one’s own name extends its visibility. The usual way of attribution is prefixing the quoted tweet with “RT” or “via”, followed by @-mentioning the original author. Among the 17 million tweets of the 30,000 user dataset, 46% are regular tweets, 36% contain at least one @-mention, and 18% are retweets. 77% of tweets containing @-mentions are explicit replies via the UI. 8% of replies are users replying to their own posts, presumably chaining related posts. 5 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA Addressivity is a property of communication in online social media. The sender of an addressive message explicitly designates one or more recipients, demonstrating awareness. Non-addressive messages are “broadcast” to an undis- closed group of people. For the purposes of this work, we treat regular tweets as non-addressive and replies as addressive, while tweets containing @-mentions are counted both as non-addressive and as addressed to each mentioned user. On average, 36% of a user’s tweets are addressive (σ = 24%). Given the conceptual differences between the two types of communication, it stands to reason that they are also different in terms of influence, so we analyze them separately. As retweeting has already been studied within the information diffusion framework, e.g. by Zhang et al. [15], we exclude retweets from the following experiments. 2.3 Data Sparsity Certain characteristics of the dataset may cause a lack of data in an experi- mental setting. The first issue is the low information content of a single tweet, caused by the size limit of 140 characters, and the presence of elements with a primarily social function, e.g. @-mentions. The second issue is sparsity of the spatio-temporal distribution of tweets. When discretizing time into periods of equal length, and assigning non-addressive and addressive messages to the nodes and directed edges of the social network graph, respectively, not all of them will be active, i.e. have at least one associated tweet, in each period. For a period length of 14 days, on average 69.2% of nodes and only 0.9% of edges were active, while for a period length of 2 days, 48.5% of nodes and 0.2% of edges were active. The third issue is missing observations. On average, only 19% of a node’s first degree neighbors in the Twitter follower graph are present in the sample. 3 Data Representation via Topic Modeling The most salient component of interaction on Twitter is unstructured text, so a suitable numeric representation has to be found. Given evidence that individual potential to exert influence depends on the topic of conversation [6], topic models appear to be an appropriate choice. Latent Dirichlet Allocation (LDA) [11] represents each document in a collec- tion as a probability distribution θ over T topics, which in turn are probability distributions φ over the set of unique words. The Author-Recipient-Topic model (ART) [7], designed for email messages, extends LDA by observed variables for the sender and one or more recipients. For each sender-recipient pair, it yields a relationship-topic distribution representing the messages sent along the corre- sponding social graph edge. ART assigns each word of a message to an individual recipient. For short messages like tweets, it is more fitting to assume that the message as a whole is addressed to all recipients. As a compromise, we choose a canonical sender-recipient pair for each tweet: The first @-mentioned user in an addressive tweet is the recipient, while the author of a non-addressive tweet 6 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA is both sender and recipient, yielding separate topic distributions for each mode of communication. 3.1 Parameter Estimation and Inference The tweet text is subjected to domain specific tokenization and stop word re- moval. The number of topics T is arbitrarily set to 150; values of the other ART hyper-parameters are chosen according to best practices: β is set to 0.01 [11] to obtain a symmetric Dirichlet prior for φ, while α is determined in a data-driven way [13], allowing the prior of θ to be asymmetric. Exact estimation of the model parameters is intractable, so we approximate them via 2000 iterations of Gibbs sampling. For predicting behavior and evaluating the prediction, it is necessary to sub- divide the dataset along the time axis, and compute separate relationship-topic distributions for each period. To be comparable, these distributions need to re- fer to a single set of topics φ. After parameter estimation on the full dataset, relationship-topic distributions for arbitrary subsets of the original data can be computed by resampling, i.e. repeating the Gibbs sampling process with fixed φ, for which 200 iterations are sufficient. After resampling, the sampler’s internal state can be used for fast approxi- mation of aggregate relationship-topic distributions over groups of senders and recipients. The formula for estimation of θ [7] is adapted to sum over a set of senders S and recipients R, resulting in 1 for approximation of the aggregate distribution θS,R , where t = 1..T is the topic index, and ni,j,t the number of words in messages from i to j assigned to topic t. P P αt + i∈S j∈R ni,j,t θS,R,t = PT P P (1) t0 =1 (αt0 + i∈S j∈R ni,j,t0 ) After fitting an ART model to Twitter data covering a certain time period, we partition that data into observation and evaluation periods of equal length, and separate addressive from non-addressive communication. For each of these four subsets, various relationship-topic distributions (θM in Table 2) are computed via resampling and aggregation. 4 The Social Content Influence Model The Social Content Influence Model (SCIM) learns to express the content of future interactions in terms of observed past interactions. Its predictive accuracy serves as an indicator for the explanatory value of the learned parameters. Ignoring all other cognitive or social processes, future behavior can be fully explained by the presence or absence of social influence, or equivalently as a combination of inertia and exposure to others’ behavior. If exposure is potential influence, then inertia is individual resistance to influence, a tendency not to 7 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA deviate from past behavior. Unobserved sources of influence exist outside of the studied social medium, but also within, due to sampling. Their effect on the ob- served network appears as indirect influence, i.e. correlated behavioral changes in non-incident nodes [2]. Analogously, we distinguish direct and indirect expo- sure. If person a interacts with b, the content of the interaction can be directly observed, but will also be partially reflected in the future interactions of b with others. Aggregating the behavior of a group smoothes over individual prefer- ences, but preserves information about strong influence that equally affected every member. With the principle of locality, it follows that the aggregated be- havior of people who are socially close to b reflects the behavior b is exposed to. From the perspective of an individual node or node pair (ego and alter) con- nected by an edge, the social network can be viewed as a hierarchy of social circles of decreasing locality. To account for missing observations within the medium, we aggregate over a node’s social neighborhood. Among different definitions of neighborhood, we aim to identify those that capture indirect exposure equally well across the whole graph. Influence from outside the medium is approximated by the aggregate behavior of the whole network, which potentially reflects strong trends from other media. This tripartite view of the egocentric social network corresponds to the distinction between interpersonal, peer, and media influence in sociology [14]. 4.1 Prediction Given the observed topic distributions from two successive time periods, the prediction problem can be formulated as using information from the first period to make predictions θ̂iM,n,s for each node i, or θ̂i,j M,a,s for each edge from i to j, so that their Jensen-Shannon divergence (JSD) from the distributions θiM,n,s , θi,j M,a,s (see Table 2) in the second period is minimal. The JSD belongs to the family of symmetrized Kullback-Leibler divergences, which are commonly used for com- paring topic distributions [11]. When defining the prediction θ̂ as a finite mixture of observed topic distributions θk , k ∈ C 2, finding coefficients c that minimize the JSD is a convex optimization problem 3. X θ̂i,j = ck θk + cd θd (2) k∈C\d X X argmin DJS (θ̂i,j , θi,j ) + λ · kθ̂i,j k1 (3) c,θ d i,j i,j subject to 0 ≤ ck , θtd ≤ 1 for k ∈ C, t = 1..T, X T X ck = 1, θtd = 1 k∈C t=1 The models for addressive and non-addressive communication differ only in the number of mixture components. Table 2 lists all 15 components, names the 8 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA subset of messages they are computed from, and defines the set of senders and re- cipients they are aggregated over, where applicable. Each component represents either inertia, indirect, or direct exposure at a particular level of locality (scope). The components at relationship scope only apply to addressive communication. Table 2. Mixture components of the SCIM definition S R role scope θiM,n,s non-addr. messages sent by i {i} V inertia personal θiM,a,s addr. messages sent by i {i} V inertia personal M,a,s θi,j addr. messages from i to j {i} {j} inertia relationship N (i),a,s θi addr. messages from i to neighbors inertia neighborhood θiM,n,r non-addr. messages received by i {x ∈ V : i follows x} V direct exposure personal θiM,a,r addr. messages received by i V {i} direct exposure personal M,a,s θj,i addr. messages from j to i {j} {i} direct exposure relationship N (i),a,r θi addr. messages from neighbors to i direct exposure neighborhood θjM,n,s non-addr. messages sent by j {j} V indirect exposure relationship θjM,a,s addr. messages sent by j {j} V indirect exposure relationship N (i),n θ non-addr. messages sent by neighbors indirect exposure neighborhood θN (i),a addr. messages sent by neighbors indirect exposure neighborhood θM,n all non-addr. messages V V indirect exposure medium θM,a all addr. messages V V indirect exposure medium θd estimated from data indirect exposure medium Computing a single set of scalar coefficients that minimizes the error sum implies the assumption that the influence process is dominated by global, instead of individual or topical characteristics. Component θd is estimated from the data, capturing all global effects of influence that are either not explicitly represented in the SCIM or not directly observable. It allows the model to attain a training error of 0 if the influence process does not have any individual characteristics. The `1 regularization promotes sparse predictions and thereby the sparsity of c and θd . Regularization factor λ is set to 0.001. 4.2 Construction of the Social Neighborhood The social neighborhood N (i) of node i is a node-weighted subgraph of the social network graph (V, E), induced by an indicator function Ii : V → {0, 1} and a weight function Wi : V → R+ . The neighborhood mixture components θN (i) are weighted sums over particular relationship-topic distributions of the subgraph M,a,s N (i),a,s M,a,s N (i),a,r nodes: θi,v for θi , θvM,n,s for θN (i),n , θv,i for θi , and θvM,a,s for N (i),a θ . We consider seven indicator and 25 weight functions. One family of indicators defines the neighborhood of i as the set of all nodes with a maximum distance of either one or two from i, either in the follower graph or the graph induced by addressive communication. The second family finds dense subgraphs of the undirected graph of reciprocal following, either by randomly selecting a maximal 9 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA clique containing i, or applying the clique percolation method (k = 5) [9] or edge clustering [1], and taking the union of the communities i is member of. A basic weight function assigns uniform weight to all neighborhood nodes j. More complex functions derive the weight from structural properties of the social network graph (both local, such as the in-degree of j, and global, e.g. PageRank), from community structure (e.g. the number of shared communities of i and j), or from the communication behavior of j (e.g. how often j is retweeted). 5 Experimental Evaluation The basic prediction experiment is defined as follows: First, a candidate set of either edges or nodes is built, depending on the type of communication to be analyzed. Candidates have to be active in both the observation and the evalu- ation period. The set is split randomly into training and test set of equal size, then parameter estimation and evaluation are performed. This basic experiment is repeated, testing all combinations of four experiment parameters: The observation date marks the end of the observation and the beginning of the evaluation period. Three equidistant dates within eight weeks were chosen, April 20, May 4, and May 18 2012, aiming to test the temporal stability of the model. The length of the observation and evaluation period (time period length) needs to match the speed of conversation flow. We test periods of 14, 5, and 2 days, falling back to an extended period of 14 days if there is no activity. The relationship type is only relevant for addressive communication. It controls whether or not a needs to follow b for the edge from a to b to be considered. The last parameter is the choice of social neighborhood. The SCIM is compared to three baseline predictors to verify that it captures non-trivial information about the influence process. The first predictor draws randomly from a Dirichlet distribution Dir(α) with α taken from the ART. The second predictor outputs the mean of Dir(α), which is the relationship-topic distribution the ART would produce in the absence of data. The third predictor outputs the relationship-topic distribution of the observed behavior, effectively a model of influence fully driven by inertia. The experiment results are filtered to improve interpretability. Two restricted variants of the SCIM are introduced specifically to assess the utility of the co- efficients and the neighborhood definitions. In the first variant, coefficients are uniform (c1..|C| = 1/|C|, cd = 0), while in the second variant all neighborhoods are empty. Any neighborhood definition that does not outperform these variants or the baselines across all combinations of experiment parameters is discarded. To determine the experiment parameters’ effect on prediction accuracy, we propose an ANOVA design, where the choice of neighborhood is a repeated measurement (including the baseline predictors for reference), and the remaining parameters are between-subject factors. The candidate sets are constructed and assigned to the experiments accordingly. All pairs of neighborhood definitions are tested post-hoc for significant differences in mean prediction error with Tukey’s HSD test. The results can be expressed as homogeneous subsets of neighborhoods 10 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA with equivalent performance. After ranking them by mean error, the mixture coefficients of the best-performing subset are analyzed via descriptive statistics. 5.1 Results 43.4% of experiments for non-addressive, and 91.5% for addressive communi- cation are filtered out. ANOVA is performed with a per-group sample size of 238. For both types of communication, there are significant interaction effects (α = 0.01) involving neighborhood definition, observation date and time pe- riod length. This indicates that the amount of indirect influence captured by some or all of the neighborhood definitions varies over time, possibly related to the temporally irregular activity of users (Section 2.3). An interaction between neighborhood and time period length indicates that subgraphs differ in speed of information flow. For non-addressive communication, there is a significant effect of time period length, with longer time periods improving the accuracy, but this effect may al- ready be fully explained by the higher-order interactions. There is no significant effect involving the relationship type, so the existence of a follower relationship does not appear to affect the perception of addressive messages. For both types of communication, the choice of neighborhood is significant. Tukey’s test yields a high number of overlapping homogeneous subsets, but isolated baseline pre- dictors. The lack of clustering limits the explanatory value of the best subsets. The subset for non-addressive communication contains neighborhoods built by three indicator functions: First, communities found by edge clustering are given uniform weight, which implies that follower communities reflect indirect influence to a degree that is difficult to improve by weighting. Second, followers with a path distance of up to two, weighted with the number of shared followees or communities, also hint at the importance of cohesive social groups. Third, followers of distance one are paired with weights based on similarity of users or their message content, promoting homogeneous neighborhoods. The neighborhoods in the best subset for addressive communication are built by a single indicator function, followers with a distance of up to two. Weights are mostly similarity-based and include the number of shared followees and the similarities of both kinds of communication. Figure 1 compares the mean prediction error of the best subset to the base- line predictors. The SCIM outperforms all baselines, with a 10% improvement over the best performing baseline for non-addressive, and 28% for addressive communication. The lower error of the Dirichlet mean baseline predictor in case of addressive communication reflects the spatio-temporal sparsity discussed in Section 2.3. Figure 2 shows the mixture coefficients as leaves of a tree, with the parent nodes representing either role or scope as listed in Table 2. Line width is propor- tional to the coefficient mean across the best subset, while the color corresponds to the ratio of mean and standard deviation: The darker, the less affected is the coefficient by the experiment parameters. Both addressive and non-addressive communication are strongly driven by inertia, but the predictive value of direct 11 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA 1 1 0.9 0.9 0.8 0.8 0.7 0.7 0.6 0.6 0.5 0.5 0.4 0.4 0.3 0.3 0.2 0.2 0.1 0.1 0 0 m m tia tia n n IM IM ea ea o do er er SC SC nd m m n in in ra ra (a) non-addressive comm. (b) addressive comm. Fig. 1. Mean error of the SCIM and the baseline predictors exposure is unexpectedly low, contradicting the principle of locality. The value of indirect exposure from the neighborhood is as expected, while the high value of the data-driven component θd suggests the existence of patterns of indirect in- fluence not covered by the SCIM. Communication is mainly influenced by other communication of the same type. Components aggregating the relationship-topic distributions of a large number of users are generally of low predictive value. 6 Discussion We report two main results: First, a novel point of view on the question whether Twitter is a social network, or a bipartite network of content producers and consumers [4]. A major difference to other social media is the high volume of non-addressive communication. Messaging behavior of individuals is highly vari- able, with the proportion of addressive communication having a one-SD range of 12% to 60%. The difference between the two modes of communication is visible in the influence process: Non-addressive communication is more resistant to in- fluence, so the more stable communication behavior can be exploited by longer observation periods. Users are influenced in their non-addressive communication by their edge communities, while their addressive communication receives influ- ence from a larger set of neighbors, weighted by similarity. In effect, the Twitter social network is a product of the follower network, which governs the flow of non-addressive communication, and the implicit network formed by addressive messaging. Second, future behavior can be predicted to a certain extent from local sources of information, which the SCIM learns to exploit. However, our results do not fully confirm the decomposability of social influence into inertia, direct, and indirect exposure, which follows from the principle of locality. The low ex- 12 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA µ=0.62, σ=0.12 sent by ego (non-addr.) µ=0.62, σ=0.12 sent by ego (non-addr.) inertia µ=0.68, σ=0.13 personal µ=0.68, σ=0.12 µ=0.04, σ=0.03 sent by ego (addr.) µ=0.04, σ=0.03 sent by ego (addr.) µ=0.02, σ=0.04 from ego to neighbors (addr.) µ<0.01, σ<0.01 received by ego (non-addr.) µ=0.01, σ=0.02 received by ego (addr.) µ<0.01, σ<0.01 received by ego (non-addr.) direct exposure µ=0.01, σ=0.02 received by ego (addr.) µ=0.01, σ=0.02 from neighbors to ego (addr.) µ=0.02, σ=0.03 µ=0.01, σ=0.02 from neighbors to ego (addr.) µ=0.02, σ=0.04 from ego to neighbors (addr.) neighborhood µ=0.14, σ=0.06 µ=0.08, σ=0.03 sent by neighbors (non-addr.) µ=0.08, σ=0.03 sent by neighbors (non-addr.) µ=0.03, σ=0.02 sent by neighbors (addr.) µ=0.03, σ=0.02 sent by neighbors (addr.) indirect exposure µ<0.01, σ<0.01 all (non-addr.) µ<0.01, σ<0.01 all (non-addr.) µ=0.30, σ=0.08 µ<0.01, σ<0.01 all (addr.) µ<0.01, σ<0.01 all (addr.) medium µ=0.19, σ=0.06 data-driven µ=0.19, σ=0.06 µ=0.19, σ=0.06 data-driven (a) non-addressive communication by (b) non-addressive communication by role scope µ=0.17, σ=0.04 from ego to alter (addr.) µ=0.06, σ=0.03 sent by ego (non-addr.) personal µ=0.08, σ=0.06 sent by ego (addr.) µ=0.06, σ=0.03 sent by ego (non-addr.) inertia µ=0.15, σ=0.06 µ<0.01, σ<0.01 received by ego (non-addr.) µ=0.37, σ=0.09 µ=0.08, σ=0.06 sent by ego (addr.) µ<0.01, σ=0.01 received by ego (addr.) µ=0.06, σ=0.05 from ego to neighbors (addr.) µ=0.17, σ=0.04 from ego to alter (addr.) µ=0.03, σ=0.03 from alter to ego (addr.) relationship µ=0.03, σ=0.03 from alter to ego (addr.) µ<0.01, σ<0.01 received by ego (non-addr.) µ=0.24, σ=0.06 µ=0.03, σ=0.02 sent by alter (non-addr.) direct exposure µ=0.04, σ=0.03 µ<0.01, σ=0.01 received by ego (addr.) µ=0.01, σ=0.01 sent by alter (addr.) µ<0.01, σ=0.01 from neighbors to ego (addr.) µ<0.01, σ=0.01 from neighbors to ego (addr.) µ=0.03, σ=0.02 sent by alter (non-addr.) µ=0.06, σ=0.05 from ego to neighbors (addr.) µ=0.01, σ=0.01 sent by alter (addr.) neighborhood µ<0.01, σ=0.01 sent by neighbors (non-addr.) µ<0.01, σ=0.01 sent by neighbors (non-addr.) µ=0.32, σ=0.09 µ=0.25, σ=0.07 sent by neighbors (addr.) µ=0.25, σ=0.07 sent by neighbors (addr.) indirect exposure µ=0.59, σ=0.09 µ<0.01, σ<0.01 all (non-addr.) µ<0.01, σ<0.01 all (non-addr.) µ<0.01, σ<0.01 all (addr.) µ<0.01, σ<0.01 all (addr.) medium µ=0.29, σ=0.06 data-driven µ=0.29, σ=0.06 µ=0.29, σ=0.06 data-driven 0 ≥1 (c) addressive communication by role (d) addressive communication by scope Fig. 2. Mixture coefficients of the SCIM experiments in the best subset planatory value of direct exposure implies that locality is not sufficient on its own to explain why the SCIM is able to outperform the baselines: If interac- tions within and from outside the medium have similar potential for influence, observable interactions are responsible for just a fraction of the overall influence. Therefore it is important to exploit indirect influence, which allows information to cross the medium boundary. The best-performing neighborhood definitions fa- vor nodes that are similar to the ego, and likely to be exposed to similar external influences. Future work involves repeating the experiments on new datasets from differ- ent social media to test if our results apply to social interaction in general. References 1. Ahn, Y., Bagrow, J., Lehmann, S.: Link communities reveal multiscale complexity in networks. Nature 466, 761–764 (2010) 2. Christakis, N., Fowler, J.: Social contagion theory: Examining dynamic social net- works and human behavior. Statistics in Medicine 32(4), 556–577 (2013) 3. Honeycutt, C., Herring, S.: Beyond microblogging: Conversation and collaboration via Twitter. In: Proceedings of HICSS (Jan 2009) 4. Kwak, H., Lee, C., Park, H., Moon, S.: What is Twitter, a social network or a news media? In: Proceedings of WWW (Apr 2010) 13 Proceedings of the 2nd International Workshop on Social Influence Analysis (SocInf 2016) July 9th, 2016 - New York, USA 5. Latané, B.: Dynamic social impact: The creation of culture by communication. Journal of Communication 46(4), 13–25 (1996) 6. Liu, L., Tang, J., Han, J., Jiang, M., Yang, S.: Mining topic-level influence in heterogeneous networks. In: Proceedings of CIKM (Oct 2010) 7. McCallum, A., Wang, X., Corrada-Emmanuel, A.: Topic and role discovery in social networks with experiments on Enron and academic email. Journal of Artificial Intelligence Research 30, 249–272 (2007) 8. Myers, S., Zhu, C., Leskovec, J.: Information diffusion and external influence in networks. In: Proceedings of SIGKDD (Aug 2012) 9. Palla, G., Derényi, I., Farkas, I., Vicsek, T.: Uncovering the overlapping community structure of complex networks in nature and society. Nature 435(7043), 814–818 (2005) 10. Rashotte, L.: Social influence. In: Ritzer, G. (ed.) The Blackwell Encyclopedia of Sociology, vol. 9, pp. 4426–4429. Blackwell (2007) 11. Steyvers, M., Griffiths, T.: Probabilistic topic models. In: Landauer, T., McNa- mara, D., Dennis, S., Kintsch, W. (eds.) Handbook of Latent Semantic Analysis, chap. 21. Lawrence Erlbaum (2007) 12. Sun, J., Tang, J.: A survey of models and algorithms for social influence analysis. In: Social Network Data Analysis, chap. 7. Springer (2011) 13. Wallach, H., Mimno, D., McCallum, A.: Rethinking LDA: Why priors matter. In: Proceedings of NIPS (Dec 2009) 14. Walther, J., Carr, C., Choi, S., DeAndrea, D., Kim, J., Tong, S., Van Der Heide, B.: Interaction of interpersonal, peer, and media influence sources online. In: Pa- pacharissi, Z. (ed.) A Networked Self, chap. 1. Routledge (2010) 15. Zhang, J., Tang, J., Li, J., Liu, Y., Xing, C.: Who influenced you? Predicting retweet via social influence locality. ACM Transactions on Knowledge Discovery from Data 9(3), 25 (2014) 14