=Paper=
{{Paper
|id=Vol-2069/STREAMEVOLV1
|storemode=property
|title=Detecting Events in Evolving Social Networks through Node Centrality Analysis
|pdfUrl=https://ceur-ws.org/Vol-2069/STREAMEVOLV1.pdf
|volume=Vol-2069
|authors=Fabiola S. F. Pereira,Sandra de Amo,João Gama
|dblpUrl=https://dblp.org/rec/conf/pkdd/PereiraAG16
}}
==Detecting Events in Evolving Social Networks through Node Centrality Analysis ==
Detecting Events in Evolving Social Networks through Node Centrality Analysis Fabiola S. F. Pereira1 , Sandra de Amo1 , and João Gama2 1 Federal University of Uberlândia {fabiola.pereira,deamo}@ufu.br 2 University of Porto, LIAAD INESC TEC, Porto, Portugal jgama@fep.up.pt Abstract. Social networks have an evolving characteristic because of continuous interaction between users. Existing event detection tasks do not consider the analysis under a user-centric perspective. In this paper we propose to detect node centrality events, that is the task of finding events based on the position and roles of the nodes. We present a naive algorithm for detecting such events in network streams. Moreover, we apply our proposal in a case study, showing how node centrality events can be used for tracking user preferences changes. 1 Introduction Social networks streams are dynamic networks that have a fast rate of edge arrival [1]. The analysis of such networks is especially challenging, because it needs to be performed with an online approach, under the one-pass constraint of data streams. When considering the evolving characteristic of such networks, changes as the impact on communities or the impact on network structural parameters such as node degrees, needs to be analyzed. These changes are stated as event detection problems. Detecting events in evolving networks can be investigated under different perspectives: anomaly detection [3], burst detection [6], concept drift [14] or topic evolving in social streams [7]. Generally, these perspectives are related with the whole graph structure evol- ving behavior, as in anomaly and burst detection or related with the evolving nature of the content being discussed in the network (concept drift and topic evolving analysis). In this paper we focus on the analysis of nodes positions evo- lution – a user-centric perspective. At a high level, our goal is to identify the behavior of nodes evolution in the network. For so, we define node centrality event detection: the task of finding events based on the position of the nodes. Our proposal is to maintain nodes centrality values summarizing topological information as real numbers, which allows us to leverage the changes in nodes roles. We perform analytical evolution analysis, always considering data streams constraints [9]. In order to illustrate how our problem can be applied in evolving social networks, we meet node centrality event detection with dynamics of user prefer- ences. Preferences dynamics refer to the way a user evolves his or her preferences over time [13]. Looking for node centrality events in a Twitter interaction net- work, we were able to detect changes in the user preferences, thus just considering the topology of network. Summarizing, this work makes the following contributions: (i) proposal of a technique for detecting events in network streams; (ii) empirical observations of the proposed technique over a Twitter dataset and (iii) a case study describ- ing how the proposed technique can be applied for tracking user preferences dynamics in social networks. 2 Related Work We highlight some related work in the directions of graph stream processing and event detection in networks. Our proposal is innovative when considering event detection from a node-centric perspective in a stream processing environment. Processing graphs as streams is an incoming problem. The work [4] is one of the most complete work when considering data mining in evolving graph streams. The focus, however, is on mining closed graphs, not on event detection. In [8] a framework for processing graphs as streams is proposed for the link prediction task. This framework considers the cumulative grown of the graph, not addressing the space saving feature [9]. The most studied events in evolving networks are anomalies and bursts [7]. Anomaly detection refers to the discovery of rare occurrences in datasets. The most representative work in anomaly detection for dynamic graphs is [10]. It addresses the problem considering a time sequence of graphs (graph sequences). The focus is on faults occurring in the application layer of Web-based systems. First, they extract activity vectors from the principal eigenvector of dependency matrix. Next, via singular value decomposition, it is possible to find a typical activity pattern (in t − 1) and the current activity vector (t). In the end, the angular variable between the vectors defines the anomaly metric. The network processing is through snapshots, not in a streaming fashion. Moreover, this Eigen Behavior based Event Detection (EBED) method is orthogonal to ours as it detects events in a global perspective of the network, while ours is node-centric. Burst events are generally related to topic evolving detection and tracking [7]. These works are looking for events like hot buzz words, what are users’ sentiments about a product release or how is a specific topic evolving. In [6] the goal is to track interest profiles in real time by detecting bursts in Twitter’s social media stream in real time using linear regression. These approaches are orthogonal to ours because are focused on the content of the network (texts, topics) not in the topology evolution analysis. The work [2] incorporates network structure in event discovery over purely content-based methods. Each text message is associated with at least a pair of actors in the social network. The events detected are also related with topics evolving. Finally, in [15] the authors consider the problem of mining activity networks to identify interesting events, such as a big concert in a city, or a trending keyword in a user community in a social network. The algorithms are founded in geo-spatial event detection information. Any stream processing strategy is addressed. 3 Detecting Events in Network Streams The problem of detecting events in evolving networks can be investigate under different forms: anomaly detection [3], burst detection [6], concept drift [14]. Our focus is on event detection from nodes centralities. We formalize the problem as the task of detecting changes in nodes centralities values over time, according to some centrality function. As example of centrality function we can cite: be- tweenness, closeness, degree, eigenvector etc. The foundation of our method is the technique of change point detection [16]. Definition 1 (Network Stream). A network stream is a continuous and tem- poral sequence of edges S = e1 , ..., en , ..., such that each edge ei = (u, v, t) corre- sponds to an interaction from node u to node v on time t. We define V the set of all nodes that arrived in the stream since the beginning of the observation. Et is the set of edges that arrived on t. Definition 2 (Node Event). Let us define a node centrality function ct : V → R that assigns a nonnegative value ct (s) for each node s ∈ V on time t. This centrality function is computed considering the network G = (V, Et ) and can be calculated according to any node centrality metric (betweenness, degree, close- ness, PageRank, ...). A node event Es,c,t is defined as ( 1, |c̄t−1 (s) − ct (s)|> θ Es,c,t = (1) 0, otherwise where – s∈V, – t is the time of the analysis, – θ ∈ R is a threshold value, – c is the centrality Pt function and 1 – c̄t = |W | ∗ i=t−|W |+1 ci (s), for W being the sliding window (c̄t is the average of all centrality values of s inside W ). The intuition of above definitions is: we detect an event for node s at certain time t if the centrality value of s had a high variation in relation to its past centrality values. In Algorithm 1 we present a sketch of the node centrality event detection task. We adopt the sliding window stream processing strategy. The most important task is to compute the centrality function ct (s) (line 8). This specific function is not computed in streaming fashion, as we consider just the edges of the current instant t. So, classic batch algorithms can be applied in this task [17]. In fact, the computation of node centrality in streaming environ- ment is an open challenge for many centrality functions [11]. The computational cost and scalability of this algorithm is proportional to the window size and the time t granularity. Algorithm 1 Sketch for detecting events through node centralities Input: Sliding window W , threshold θ, centrality function c, network stream S Output: A vector E containing the events detected for all nodes in V at any time t 1: V ← ∅ 2: for all time instant t do 3: Et ← ∅ 4: for all ei = (u, v, t) arriving in the stream do 5: Et ← Et ∪ {ei } 6: V ← V ∪ {u, v} 7: Add G = (V, Et ) in W 8: Compute ct (s) for all s ∈ V , considering G = (V, Et ) 9: for all s ∈ V do 10: if |c̄t−1 (s) − ct (s)|> θ then 11: Ec,t [s] = 1 12: else 13: Ec,t [s] = 0 14: Slides W Though naive, this event detection approach is able to report changes in the network without complex text analysis, just observing the topology and nodes position. As we will show in the following, detecting such events provide meaningful evidences of network evolution. However, we are not able to gain insight about what kind of event we are detecting (drift, burst, anomaly). 4 The Evolving Social Network 4.1 Dataset Folha de São Paulo (or Folha, for short) is one of the most influential newspapers in Brazil. Taking advantage of the fact that Twitter is widespread in the country, we performed our analysis over the news domain in Twitter social network. We collected a large body of tweets from Folha over the course of 3 weeks, starting in June 24, 2016. Our data collection strategy was as follows. First, we used Twitter’s streaming API to collect all tweets related to the newspaper (user @folha). Thus, our dataset consist of tweets about the news tweeted by Folha newspaper, the retweets and all inherent information men- tioning these news. Next, we built the following interaction network: nodes are Twitter users. One edge from user u1 to u2 means that u2 retweed on t some text originally posted by u1 , i.e. edges represent the information flow. The edges are temporal and just exist in the moment of the interaction (time t), then they disappear. Fig. 1 illustrates the evolving aspect of our network. The topics rep- resented by colored edges were obtained using LDA. In Section 6 we detail this process. In all, we collected 200806 tweets, 78944 nodes (users) and 108133 distinct edges considering the 3 weeks of observation period. An important characteristic of our network is that it has a low average path length. This is consequence of the fact that in Twitter a retweet always comes from the original post, not mattering from where the user read that post – from the user who originally posted it or from an intermediate user who already retweeted it. On average, the path length is 1.033. Politics International Corruption Sports 13 Jul 15 Jul 17 Jul Fig. 1. Snapshots of samples of the evolving interaction network. Nodes are Twitter users. One tie from user u1 to u2 means that u2 retweed at t some text originally posted by u1 . The colors represent topics that users are talking about at t. The samples were built by filtering nodes with degree between 50-22000 and edges representing the 4 most popular topics. Each snapshot corresponds to 1 day time-interval. This figure highlights the edges evolving aspect. Nodes are not evolving for better visualization. 4.2 Network Semantics We analyzed the evolving behavior of each node in the network considering the closeness centrality measure [17]. Closeness is related to the visibility of a node in the network. It is the capacity of a node the reach the others in a fast way. Thus, a high closeness value means a good information spreading capacity. In our context, we identify three types of user: consumers, producers and consumers&producers. Consumers are the users who most often just retweet, not publishing any new content. Generally, they have low closeness values. Producers are always publishing popular tweets and have a medium closeness value. Finally, the consumers&producers have a high activity in the network, tweeting and retweeting all the time. These users have the highest closeness values. As example, let us consider the scenario illustrated in Fig. 2. Events can occur with any type of user, meaning that their usual role changed at that moment. User 3 has a typical consumer behavior until time t6. Just retweeting or even with no activity in the network. From time t7 user 3 presents a different behavior, which can be a persistent change or an ephemeral behavior. Thus, an event occur around t7 and t8. User 1 is clearly a producer from t1 to t3. And users 2 and 4 are consumers&producers during the whole observation period. Fig. 2. Example of evolving behavior with closeness centrality. The darker nodes, the greater the centrality. 5 Empirical Analysis: Detecting Events with Closeness Centrality We analyzed the evolving behavior of each node in the network considering closeness centrality measure. In Fig. 3 we show the centrality evolving behav- ior for three different users, one of each type (consumer, producer and con- sumer &producer ). It is possible to distinguish that, generally, the types of users are related with their closeness centrality. 5.1 Influence of Parameters Setting The balance between window size |W | and threshold θ determines what we call of nature of the event that we are detecting. θ adjusts the intensity of the events, varying from smooth to drastic events. In Fig. 4 we present an analysis for user u4, with θ assuming 0.1, 0.2 and 0.5 and |W |= 4. We chose u4 due to its high activity level in the network and |W |= 4 as an intermediate value according to our observation period. As expected, when considering smooth variations more events were detected. Drastic events indicate that the user changed drastically his role in the net- work. Around day 15 the events indicate that u4 leaves a central position as a consumer &producer to assume a consumer role. Now, analyzing the impact of the window size |W |, in Fig. 5 there are the events detected for |W | assuming 2, 4 and 10, θ = 0.2 and user u4. Varying |W | means that we are considering the recent past for low values (short-term events) or a big historic for high values (long-term events). As our dataset is relatively short, in these experiments the window size variation did not result in interesting findings. Short-term events are not interesting in our context due the high variation of centrality values in the network. Considering a mid-term period (|W |= 4) reflected better the evolving user role. u1 (consumer) u2 (producer) u3 (consumer&producer) 0.5 Closeness 0 0 5 10 15 20 Day Fig. 3. Closeness evolving for three different types of user. θ = 0.1 θ = 0.2 θ = 0.5 Closeness 0.5 0.5 0.5 0 0 0 1 5 10 15 20 1 5 10 15 20 1 5 10 15 20 Day Day Day Fig. 4. Impact of θ (intensity of the events) for a high activity user. Detected events are highlighted in red lines. |W | = 2 |W | = 4 |W | = 10 Closeness 0.5 0.5 0.5 0 0 0 1 5 10 15 20 1 5 10 15 20 1 5 10 15 20 Day Day Day Fig. 5. Impact of |W | (window size) for a high activity user. Detected events are highlighted in red lines. u5 u6 u7 0.4 0.4 0.8 Closeness 0.6 0.2 0.2 0.4 0.2 0 0 0 1 5 10 15 20 1 5 10 15 20 1 5 10 15 20 Day Day Day Fig. 6. Detected events highlighted in red lines for three different users. 5.2 Detected Events Analysis From the previous analysis, we consider θ = 0.2 and |W |= 4 as the default values. Thus, here we are interested in smooth mid-term events. In Fig. 6 we present the events detected for three users u5, u6 and u7. For users u5 and u6 it is possible to distinguish that the sequence of detected events reflects the moment they change their roles. In case of user u5 the change is permanent and for u6 the change is just ephemeral. However, in the case of u7, a lot of events were detected sequentially, reflecting a behavior of intermittent activities in the network. In fact, a weak point in our event detection method is this: just observing the events, we are not able to distinguish if the events are persistent, if they reflect a burst in the network or if they are ephemeral. 6 Case study: what detected node centrality events could mean? Besides just reflecting changes in the network topology, we can interpret the events detected by analyzing the meaning of the centrality metric used so far. In our case study, nodes with high closeness are influential users that posted about topics with high interest. If a user u starts to occupy (or leaves) a central position this could mean that (i) his activity is high (low) and (ii) his activity is (not) interesting for the other nodes. This behavior could indicate a preference change for u, i.e., he probably changed his interests and is posting about different topics. We now investigate to what extent we could establish the correlation be- tween: events with closeness centrality (network topology) and preference change behavior (network content). 6.1 Preliminaries User preference is a specific type of opinion, that establishes an order relation between two objects. For example, when a user says: “I prefer sports than edu- cation”, we clearly identify his preference to sports subjects over education ones. These preference order relations (or preferences, for short) respects the irreflex- ive and transitive properties. When analyzing user preferences over time, we can discovery interesting patterns of users’ behavior. In this case study, we are interested in detecting what are the moments that users changed their preferences. For example, let us suppose that on day 24, user A prefers to read/post/share on his social network news about sport, but between celebrity and religion topics, he is in the mood of celebrities. On the following days, A’s preferences practically do not change, just appearing a preference of celebrity over corruption. However, on day 30, A’s presented a preference change, as corruption became preferred over celebrity. So, we have detected a preference-change. These definitions were formally proposed in [13]. The detection of preference- change events is based on the existence of inconsistencies in the temporal pref- erences of a user. These inconsistencies appear if the resultant composition of user preferences does not hold the irreflexive property. 6.2 Extracting preferences In order to discover what users are talking about in the network, we performed topic modeling with LDA algorithm [5] considering all tweets of the entire ob- servation period. We got a total of 15 topics, that then were manually grouped in 7 more general topics, as detailed in Table 1. The 7 general topics are the domain of preferences. The intuition in this preference mining process is: if user u tweets (or retweets) about politics on time t, then u has more interest in politics over the remaining topics in that moment. Thus, we mined preferences of the type: politics tu celebrity, politics tu sports and so on. We also considered a weight based on the number of tweets posted in the same time (our time granularity is of 1 day. So, one user can post many tweets on t). If the user posts three times about corruption and two times about sports on t, then we can establish a preference order between corruption and sports (corruption >tu sports and sports is preferred over the remaining topics). It is worth to mention that this preference mining process did not consider the network as a stream. We extracted the preferences of all users based on the global configuration of the topics during the whole period of observation. Topic Issues Politics Impeachment, Eduardo Cunha, Temer, Coup, Dilma International Terrorism, Brexit Corruption Sergio Moro, Lava Jato, Paulo Bernardo Sports Olympic Games, Eurocup Security Rio de Janeiro Education Science without Borders Celebrity Luana Piovani Table 1. Topics about Brazilian news extracted from Twitter dataset. Issues are the keywords that guided each topic creation. 6.3 Preference-Change Events Remarking on Figure 7, the preference change events are highlighted for each user. These events were found following the strategy describe in Section 6.1. Basically, they describe the respective users changing their interests over time about the 7 topics of Brazilian news. For the three users of our case study, we found that changes in preferences oc- curs around the same time of closeness centrality events. According to [12] there are many factors that influence on preferences dynamics. In this case study, we illustrate that applying closeness centrality event detection can be a good strat- egy for mapping user preferences changes just observing the evolving topology of the network, without any content information. We did not try to understand why users are changing their preferences. We just found the events that indicate these changes. Fig. 7. Evolving behavior of three users considering the closeness centrality. Detected events/preferences changepoints are highlighted. 7 Conclusion In this paper we introduced the notion of event detection based on node cen- trality. We proposed an algorithm to detect these events in streams of social networks. Empirical observations in an interaction network, built from the Twit- ter dataset of Brazilian news, were performed to validate our proposal. We also presented a case study showing how node centrality events can be applied for tracking user preferences changes. A lot of work remains to be done. We intend to (i) explore a large range of centrality metrics; (ii) propose a more robust algorithm for node centrality event detection, considering other factors and (iii) compare our algorithms with state-of-the-art event detection streaming algorithms. Acknowledgments This work was supported by the research project “TEC4Gr- owth - Pervasive Intelligence, Enhancers and Proofs of Concept with Industrial Impact / NORTE-01-0145-FEDER-000020”, financed by the North Portugal Re- gional Operational Programme (NORTE 2020), under the PORTUGAL 2020 Partnership Agreement, and through the European Regional Development Fund (ERDF) and by European Commission through the project MAESTRA (Grant number ICT-2013-612944). We also would like to thank the Brazilian Research Agencies CAPES, FAPEMIG and CNPq for partially supporting this work. References 1. Aggarwal, C., Subbian, K.: Evolutionary network analysis: a survey. ACM Com- puting Surveys 47(1), 10–36 (2014) 2. Aggarwal, C.C., Subbian, K.: Event detection in social streams. In: 12th SIAM International Conference on Data Mining, USA. pp. 624–635 (2012) 3. Akoglu, L., Tong, H., Koutra, D.: Graph based anomaly detection and description: a survey. Data Mining and Knowledge Discovery 29(3), 626–688 (2015) 4. Bifet, A., Holmes, G., Pfahringer, B., Gavaldà, R.: Mining frequent closed graphs on evolving data streams. In: 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 591–599. KDD ’11 (2011) 5. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. J. Mach. Learn. Res. 3, 993–1022 (Mar 2003) 6. Buntain, C., Lin, J.: Burst detection in social media streams for tracking interest profiles in real time. In: 39th International ACM SIGIR conference (2016) 7. Cordeiro, M., Gama, J.: Online Social Networks Event Detection: A Survey, pp. 1–41. Springer International Publishing, Cham (2016) 8. Fairbanks, J., Ediger, D., McColl, R., Bader, D.A., Gilbert, E.: A statistical fra- mework for streaming graph analysis. In: IEEE/ACM Int. Conf. on Advances in Social Networks Analysis and Mining. pp. 341–347. ASONAM ’13 (2013) 9. Gama, J.: Knowledge Discovery from Data Streams. Chapman & Hall/CRC (2010) 10. IDÉ, T., KASHIMA, H.: Eigenspace-based anomaly detection in computer sys- tems. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. pp. 440–449. KDD ’04 (2004) 11. Kourtellis, N., Morales, G.D.F., Bonchi, F.: Scalable online betweenness centrality in evolving graphs. In: 32nd IEEE International Conference on Data Engineering, ICDE 2016, Helsinki, Finland, May 16-20, 2016. pp. 1580–1581 (2016) 12. Liu, F.: Reasoning about Preference Dynamics. Spring (2011) 13. Pereira, F., de Amo, S., Gama, J.: On Using Temporal Networks to Analyze User Preferences Dynamics (2016) 14. Ranshous, S., Shen, S., Koutra, D., Harenberg, S., Faloutsos, C., Samatova, N.F.: Anomaly detection in dynamic networks: a survey. Wiley Interdisciplinary Reviews: Computational Statistics 7(3), 223–247 (2015) 15. Rozenshtein, P., Anagnostopoulos, A., Gionis, A., Tatti, N.: Event detection in activity networks. In: Proceedings of the 20th ACM SIGKDD International Con- ference on Knowledge Discovery and Data Mining. pp. 1176–1185. KDD ’14 (2014) 16. Wei, W., Carley, K.M.: Measuring temporal patterns in dynamic social networks. ACM Transactions on Knowledge Discovery from Data (TKDD) 10(1), 9 (2015) 17. Zafarani, R., Abbasi, M.A., Liu, H.: Social Media Mining: An Introduction. Cam- bridge University Press, New York, NY, USA (2014)