Tie Strength Persistence and Transformation Michele A. Brandão, Pedro O. S. Vaz de Melo and Mirella M. Moro Universidade Federal de Minas Gerais, Belo Horizonte, Brazil {micheleabrandao, olmo, mirella} @dcc.ufmg.br Abstract. A tie is a link between two persons in a social network. Here, we analyze tie strength in temporal co-authorship social networks by measuring ties persistence and transformation over time. Surprisingly, most ties tend to perish over time. Also, weak and random ties are more present in real co-authorship networks than bridges and strong ones. Keywords: Social Networks, Tie Strength, Temporal Graph 1 Introduction Time is a fundamental factor when characterizing the nature and the strength of relationships, as acquaintances might become friends and vice-versa. Such time- varying ties may be modeled as a temporal social network (SN), or temporal graph, where each node is a person and an edge connects two nodes in time t if they share any relationship at t. However, most studies focus on static aggregated graphs [1,3], in which the type and the class of the edges are invariant. If in static graphs such temporal aspects are aggregated, and therefore hidden, in temporal graphs they come naturally, serving as an appropriate model for dynamic SNs. Nonetheless, computing temporal SNs properties and their time-varying be- havior is very challenging, as their values evolve. Hence, concepts and metrics to analyze static networks must be adapted and extended to time-varying net- works. Tie strength (a.k.a. strength of the ties) is one of those concepts, which is originally defined as a merge of the time of relationship, the emotional force, the intimacy, and the reciprocal services that link (through a tie) people [4]. Here, our goal is to verify if current definitions of tie strength hold for tem- poral networks. To do so, we analyze the dynamism of tie strength by observing link persistence and link transformation over time. This goal is divided in two research questions. First, how is tie strength defined for temporal networks? We consider a strong tie characterizes interactions likely to appear in the future, whereas a weak tie occurs sporadically. Second, how much does tie strength vary over time? Nicosia et al. [5] claim that if two nodes are strongly (or weakly) connected in a time t1 , they also will be strongly (or weakly) linked at t2 , where t2 > t1 . Here, we challenge such claim in the context of temporal co-authorship SNs. Also, studies observe edge features as good indicators of tie strength, e.g., edge persistence [5,6] and topological overlap [2,6]. Here, we analyze the dy- namism of tie strength by observing the dynamics of four edge classes composed Description # nodes # edges D. Articles 837,583 2,935,590 D. Inproceedings 945,297 3,760,247 PubMed 443,784 5,550,294 APS 180,718 821,870 Table 1: Description of the datasets Fig. 1: The performance of RECAST used to build the co-authorship SNs. and fast-RECAST for PubMed. of edge persistence and topological overlap1 . These properties represent the reg- ularity of interaction and the similarity between people in a relationship. 2 Analyses of Persistence and Transformation In order to proceed, we first need to formally define a model for temporal so- cial networks. Therefore, we associate a start time and a duration to each co- authorship. Then, a temporal co-authorship social network is modeled as a graph Gk (Vk , Ek ) in which time is discretized into steps of duration δ 2 , and k is the time step in which a co-authorship (encounter) occurs. The set of nodes Vk is formed by all network nodes in a co-authorship during the k-th time step, and the set of edges Ek is composed of co-authorships during the same time step. A time- varying representation of the SNs can be defined by a temporal accumulation graph Gt (Vt , Et ), where Gt = G1 ∪ G2 ∪ ... ∪ Gt , in which t is the last time step. Then, Vt and Et are the set of all nodes and edges in the SNs, respectively, in the time step 0 to t. Since Gt accumulates all co-authorships from the datasets and evolves over time, such aggregate graph contains social and random encounters. Thus, a random version GR t of the temporal aggregated graph Gt is necessary to analyze the patterns of such SN. For this model to work, it requires a definition of tie strength in temporal SNs and an algorithm that implements it. Definition of tie strength. Given a temporal graph Gk (Vk , Ek ), where k is the time step in which a co-authorship occurs, a tie (i, j) is likely to be strong if it is present in Gk for most values of k. On the other hand, the tie (i, j) is likely to be weak if it is present in Gk for just a few values of k. Implementation of the algorithm. One contribution of this work is to modify an existing algorithm called RECAST [6] to measure the strength of ties in large temporal SNs. We chose RECAST because it is the only one that defines different classes to the tie strength in temporal networks. Such algorithm was originally applied in relatively small mobile networks to classify users’ wireless interactions differentiating random interactions from the social ones (friends – called as strong, bridges and acquaintances – called as weak). It implements the model previously described by building both Gt and GR t . The construction of GRt increases the complexity of RECAST to O(t × (|Vt | + |Et |)). R 1 Technical Report at http://www.dcc.ufmg.br/~mirella/projs/apoena 2 Here, we consider a duration of δ = 1year. Then, we propose to apply a multiprocessing Pool module from Python3 in such step of RECAST in order to reduce its complexity. We call this novel, multiprocessing algorithm as fast-RECAST. The idea is that more than one random event graph GR t is built at a time in a multi-core computer. Thus, the new computational cost is O( pt ×(|Vt |+|EtR |)), where p is the number of processes. We also add a multiprocessing Pool module from Python to call the functions to compute the edge persistence and topological overlap from the aggregated graphs. Both features are computed in parallel and asynchronously. Dataset. To analyze tie strength persistence and transformation, we consider three publication datasets: DBLP, PubMed and APS, as collected in September 2015, April 2016 and March 2016, respectively. Considering these datasets, we build four co-authorship SNs whose main statistics are in Table 1. Time Performance. In order to show that fast-RECAST performs better than RECAST, we measure the execution time of both algorithms in a laptop with 8 GB 1600 MHz DDR3 of memory and 2.5 GHz Intel two Core i5 of processor. The operation system is Mac OS X El Capitan version 10.11.6. Figure 1 presents the execution time in seconds of fast-RECAST and RECAST. Note that we present the results only for PubMed dataset, because it is the largest one. Tie strength persistence results. In order to analyze the persistence over time, we divide the networks into two time windows, which from now on we call past and future 4 . We apply fast-RECAST in the past and then, verify if the edges of each class (strong, bridge, weak and random) continue to be in that same class in the future. To do that, we split the networks into two time windows and in two ways. First, we split the networks into a time window comprising 80% of the initial timestamp (past) and a time window comprising 20% of the final timestamp (future). Second, we divide the networks into time windows of 70% (past) and 30% (future). For both 80-20% and 70-30%, strong ties and bridges tend to persist over the years more than weak and random ties. Moreover, we emphasize the differences in the results of the APS network in the 80%-20% and 70%-30% partitions. In the first partitioning, the proportion of strong and bridge ties from the past to the present is very high, whereas in the second partitioning such proportion is lower. This result may indicate that the co-authorship social network from APS changes more through the years than the other networks. Another possibility is that physics researchers do not change very much the level of co-authorship with their collaborators over time, and this is a pattern of more recent researchers (note that 80% of data consider more recent co-authorships than 70%). We leave for future work further analyses of such insights. Tie strength transformation results. We now evaluate the amount of ties from a class in the past that continues in the same class (or changes) in the future, i.e, tie strength transformation analyses. To avoid any kind of bias in the process of classifying the edges, here we divide the temporal co-authorship social net- 3 Multiprocessing with python: docs.python.org/2/library/ multiprocessing.html 4 One may see the present as the timestamp between these two time windows works into two time windows of 50% of the timestamp. We apply fast-RECAST in both parts and then we analyze the link transformation through the classes. Surprisingly, we cannot see ties classified as weak and random in DBLP articles and DBLP Inproceedings. This indicates that the features (edge persistence and topological overlap) of these social networks have high (or social ) values. Fur- thermore, most ties from the past tend to disappear in the present, especially the bridges. This result may be explained by the nature of co-authorships, as researchers collaborate during a period towards a common goal and then, start to collaborate with others. This also reinforces the theory of Granovetter that weak ties are the ones that connect different communities [4], which is the case of the bridge edges. Furthermore, we observe similar behavior between PubMed and APS, and most ties tend to disappear, especially the bridges and random ties. Without disappeared links, most strong and weak ties become weak or ran- dom. Surprisingly, the weak ties are the ones that keep more in the same class, compared to the others in both networks. 3 Conclusion We analyzed ties strength dynamism in temporal SNs. We built four temporal co-authorship SNs considering three real publications datasets, and proposed fast-RECAST, a parallel version of an existing tie classification method. The resulting link persistence analysis reveals that strong ties and bridges tend to persist more than weak and random ties. This supports our hypothesis that strong ties persist more than others. The results also show a different pattern for APS when the data is divided in 80% and 20%. In this experimental setting, the proportion of strong and bridge ties from the past to the present is very high compared to other SN. Also, the link transformation analysis revealed most ties tend to disappear over time. As future work, we plan to investigate the patterns discovered here and to modify fast-RECAST to better capture tie strength. Acknowledgements to CAPES, CNPq and FAPEMIG, Brazil. References 1. Brandão, M.A., Moro, M.M.: Affiliation influence on recommendation in academic social networks. In: Procs. of AMW. pp. 230–234 (2012) 2. Brandão, M.A., Moro, M.M.: Analyzing the strength of co-authorship ties with neighborhood overlap. In: Procs. of DEXA. pp. 527–542 (2015) 3. Castilho, D., Vaz de Melo, P.O., Benevenuto, F.: The strength of the work ties. Information Sciences 375, 155–170 (2017) 4. Granovetter, M.S.: The strength of weak ties. The American Journal of Sociology 78(6), 1360–1380 (1973) 5. Nicosia et al, V.: Temporal Networks, chap. Graph Metrics for Temporal Networks, pp. 15–40. Springer Berlin Heidelberg, Berlin, Heidelberg (2013) 6. Vaz de Melo et al, P.O.: Recast: Telling apart social and random relationships in dynamic networks. Performance Evaluation 87, 19–36 (2015)