Temporal Analysis of Online Social Graph by Home Location Shiori Hironaka Mitsuo Yoshida Kyoji Umemura Toyohashi University of Technology Aichi, Japan s143369@edu.tut.ac.jp, yoshida@cs.tut.ac.jp, umemura@tut.jp ABSTRACT In this paper, we tackle temporal analysis of online social An online social graph which represents relationships between graphs by answering the following question: which social users is used for many purposes such as home location esti- graph of certain periods shows the best performance for mation. However, the online social graph changes over time network-based home location estimation on Twitter? We ob- because the user’s environment changes (e.g., house-moving). tain that the estimation performance achieves the best using the We tackle temporal analysis of online social graphs by an- social graph after about half a year. This result indicates that swering the following question: which social graph of certain changes in social graphs due to user’s environmental changes periods shows the best performance for network-based home converge after about half a year. location estimation on Twitter? We obtain that the estimation performance achieves the best using the social graph after NETWORK-BASED HOME LOCATION ESTIMATION about half a year. This result indicates that changes in social A network-based home location estimation method is the home graphs due to user’s environmental changes converge after location estimation method using a social graph, which is about half a year. created with a node as a user and an edge as a relationship between users. The based assumption is that a user is located ACM Classification Keywords geographically close to friends on the social graph. We use J.4. Computer Applications: Social and Behavioral Sciences network-based home location estimation to determine how well the social graph reflects home locations (whether the Author Keywords social graph and home location data represent the state at the Twitter; social graph; home location estimation same time). In this paper, we use the method of Davis Jr. et al. [2] as a pop- INTRODUCTION ular network-based home location estimation. This method People live constructing relationships and interacting with selects the most frequent location among the locations of the each other. An online social graph captures a realistic social user’s friends as the estimated location. The method is repre- graph constructed from such relationships [3]. Therefore, an sented as follows: online social graph is used for many purposes, especially to estimate user attributes such as home locations [1, 4]. The Su = arg max* |{v|v ∈ Nu ∩ L, l = lv }| home location estimation methods using online social graphs l∈{ln |n∈Nu ∩L} are called network-based home location estimation methods. Infer(u) = arg max |{n|n ∈ L, l = ln }| l∈Su The online social graph changes over time because the user’s environment changes (e.g., house-moving). We have to update where L is a set of learning data (nodes), Nu is a set of adjacent online social graph data [6], and the estimation performance of nodes of node u, lu is a correct label (home location) of node u, network-based home location estimation may change depend- and arg max* is defined that returns a set of the equivalent. The ing on when we collect the online social graph data. McGee processing that the maximum value of the number of friends’ et al. [5] indicated that the geographic distance changes by the locations is the equivalent is not clear in the paper [2]. In this relationship between users, which constitutes an online social paper, we then prepare a set Su of l which takes the maximum graph. Is a newer online social graph used for home location value to select the most frequent area in the learning data set. estimation better? DATA We need home location data and social graph data for home location estimation. In this section, we describe how to make the data. We define that a user’s home location is the most frequent location posted with geo-tagged tweets by the user. Actually, ©2018. Copyright for the individual papers remains with the authors. we decided that a home location is an administrative area like Copying permitted for private and academic purposes. a city, the same way as Davis Jr. et al. [2]. We aggregate the WII’18, March 11, 2018, Tokyo, Japan locations of the geo-tagged tweets for each area, and select the most frequent city as a user’s home location. 820000 Number of edges We collected geo-tagged tweets posted in Japan from January 800000 2014 to December 2016 using the Twitter Streaming API. We used only the tweets which have “place” field and its “place_type” is “poi” or “city”. We regard a city in Japan 780000 including the centroid of the bounding box of the “place” as a location of the geo-tagged tweet. We assigned a home location 760000 to a user who posts geo-tagged tweets at least five times each year in a certain area in Japan. As a result, we assigned a home Jul Oct Jan Apr Jul Oct Jan Apr Jul location to 634,789 users in 2014, 851,675 users in 2015, and 2016 2017 828,929 users in 2016. Month of social graph In this paper, we use a social graph based on a mutual following (a) number of edges relationship on Twitter. Our social graph is a simple undirected graph. We collected following relationships every month from July 2015 to July 2017 among users who were assigned a 10.6 home location in 2014. We excluded users whose following Mean degrees relationships could not be collected one or more times due to 10.4 account deletion or becoming a private account. We collected 10.2 following relationships and created 25 monthly social graphs. 10.0 Finally, we use 76,730 users for analysis, who can be assigned a home location for the three years continuously and whose 9.8 following relationships can be collected for those three years. Jul Oct Jan Apr Jul Oct Jan Apr Jul 2016 2017 ANALYSIS Month of social graph Temporal Analysis of Home Location and Social Graph (b) average number of degrees In this section, we report that social graphs and home locations change over time. Number of isolated nodes Firstly, we report the changes of the user’s home locations 18750 from 2014 to 2016. In the 76,730 target users, the home 18500 locations of 39,814 users did not change for the three years, 18250 the home locations of 22,477 users changed once, and the home locations of 14,439 users changed twice. In total, the 18000 home locations of 48% of users changed at least once. 17750 Secondly, we report the changes of the social graph from 17500 July 2015 to July 2017. The changes of the social graph Jul Oct Jan Apr Jul Oct Jan Apr Jul properties are shown in Figure 1. Figure 1(a) shows that the 2016 2017 number of edges between 76,730 users increases over time. Month of social graph Figure 1(b) shows the average number of degrees, which is used as an estimation clue, increases from 9.7 to 10.7 for two (c) number of isolated nodes years. Figure 1(c) shows that the number of isolated nodes, which have no edges, decreases over time. They therefore Figure 1: Size of social graphs: the number of edges increases show that we can use more relationships (edges) to estimate and the number of isolated nodes decreases over time. home locations in after months and years. same period of data. In this analysis, since it is considered that Comparison of Social Graph Collected Month the network-based home location estimation is making a good The network-based home location estimation method uses the guess using the social graph and friends’ locations, we find combination of home locations and a social graph. In this the most adequate period of the social graph for determining paper, the home location is created from a certain period of home locations. geo-tagged tweets, and the social graph is a monthly snapshot. We investigate that the estimation performance combining old We conduct a home location estimation combining three-years home location data and old social graph data, old home loca- of home locations and 25 monthly snapshots of a social graph tion data and new social graph data, and new home location for two years. Since we are interested only in whether a user’s data and new social graph data. We did not use old home loca- home location can be estimated correctly, the performance tion data to estimate new home location in this analysis. That is measured by precision, recall, and F1 with leave-one-out is, the home location data for learning and tests are from the cross-validation. The results of the estimations are shown in Figure 2. When 0.216 we use the home locations of 2014, August 2015 achieved the highest F1. In the home locations of 2015, F1 increases to 0.214 June 2016, and decreases after that. In the home locations of 2016, F1 becomes higher to June 2017. These results show 0.212 Precision the highest values of precision, recall, and F1 are achieved after about half a year from the end of the year when home 0.210 locations were assigned. This result indicates that changes in social graphs due to user’s environmental changes converge 0.208 after about half a year. 2014 0.206 2015 The results also show poor performance when using the home 2016 locations of 2016. We conjecture that this cause is due to a Jul Oct Jan Apr Jul Oct Jan Apr Jul change of default function for location acquisition by Twitter 2016 2017 Month of social graph in April 2015. The default function has been changed from the accurate GPS coordinates to the user’s self-chosen place1 . As (a) Precision a result, the quality of “place” used to assign home location has declined. 0.166 2014 2015 0.164 2016 Analysis of Users Who Have Changed Home Location We assume that the performance changes shown in Figure 2 0.162 are caused by home location changes. We evaluate the per- Recall formance of home location estimation by splitting into two 0.160 user groups: users who have changed their home location at least one time within the three years and users who have never 0.158 changed their home location within those three years. We 0.156 did not distinguish between the home location changes that occurred once or twice. We show just evaluation result of F1 0.154 because precision and recall show the same trend as F1. Jul Oct Jan Apr Jul Oct Jan Apr Jul 2016 2017 Figure 3 shows the evaluation result of F1 by splitting into Month of social graph the above two user groups. The estimation results of users (b) Recall who have changed their location have large differences over time. In contrast, the estimation results of users who have not changed home location have few differences. When we 0.186 consider the reason for location change is house-moving, it seems home location changes cause social graph changes. 0.184 In addition, in contrast to the averages of F1 of users whose 0.182 F1 location are stable are 0.241, 0.243, and 0.233 in 2014, 2015, and 2016 respectively the averages of F1 of users whose lo- 0.180 cation are changed are respectively, 0.127, 0.127, and 0.126. This result indicates that users who have changed home lo- 0.178 2014 cation are hard to estimate using this network-based home 2015 0.176 2016 location estimation method. Jul Oct Jan Apr Jul Oct Jan Apr Jul 2016 2017 Analysis of Users Who Have Changed Friends Month of social graph It is considered that users whose social graph changes make new friends actively and we suppose that the change of social (c) F1 graph converges quickly, specifically before about half a year. We evaluate the performance of home location estimation by Figure 2: Estimation performance combining home location splitting into two user groups: users who have changed friends of each year and social graphs of each month. The highest F1 (adjacent nodes) within the three years and users who have not value is achieved after about half a year from the end of the changed friends. year when home locations were assigned. To check the change of friends, we compare the number of friends (degree) of each user on the social graph July 2015 and July 2017. This metric watches only mutual following friends. In this case, a user just a followee or follower is not a friend. The number of users who have changed the number of friends 1 The number of tweets which have “coordinates” field sharply de- is 44,228, and the number of users who have not changed the creased on April 28, 2015. number of friends is 32,502. 0.130 0.218 0.216 0.128 0.214 0.126 0.212 F1 F1 0.124 0.210 0.208 0.122 2014 2014 0.206 2015 2015 0.120 2016 0.204 2016 Jul Oct Jan Apr Jul Oct Jan Apr Jul Jul Oct Jan Apr Jul Oct Jan Apr Jul 2016 2017 2016 2017 Month of social graph Month of social graph (a) The users having moved location (n = 36916) (a) The results of users who have changed friends (n = 44228) 0.137 0.244 0.136 0.242 0.135 0.240 2014 0.134 0.238 2015 F1 F1 2016 0.133 0.236 0.234 0.132 0.131 2014 0.232 2015 0.130 2016 0.230 Jul Oct Jan Apr Jul Oct Jan Apr Jul Jul Oct Jan Apr Jul Oct Jan Apr Jul 2016 2017 2016 2017 Month of social graph Month of social graph (b) The users having stable location (n = 39814) (b) The results of users who have not changed friends (n = 32502) Figure 3: Estimation performance comparison with stable Figure 4: Comparison of social graph changes and estimation location and moved location. performance. The evaluation result is shown in Figure 4. The estimation In the experiment, we reveal that social graph changes for results of users who have changed friends have large differ- about half a year to a year after a home location changes. ences. In contrast, the estimation results of users who have When a user’s home location changes, the home location data not changed friends have few differences. We surmise the of a user changes when the majority of tweets during that year F1 of users making friends actively finish to change before change to a new place. When the social graph data of a user about half a year, but we cannot observe differences in conver- changes when the majority of friends become new friends, gence speed compared with Figure 2. The average F1 of users the estimation result changes. Our results show that social who have changed friends is 0.214, 0.217, and 0.210 in 2014, graph changes are slower than home location changes. It is 2015, and 2016 respectively, and the average F1 of users who considered that the social graph changes significantly when have not changed friends is 0.136, 0.135, and 0.132 in 2014, the home location is changed. 2015, and 2016 respectively. The users who have not changed friends have a lower F1. CONCLUSION We tackle temporal analysis of online social graphs by an- DISCUSSION swering the following question: which social graph of certain Our research question was “Which social graph of certain periods shows the best performance for network-based home periods shows the best performance for network-based home location estimation on Twitter? We collected monthly snap- location estimation on Twitter?”. We obtained the result that it shots of a social graph for two years and user’s home locations is after about half a year from the end of the year when home for three years. Using the data, we conduct home location locations were assigned. Our home location assigned method estimation. We have obtained that the F1 achieved the highest uses a year’s worth of geo-tagged tweets. Thus, our result performance after about half a year from the end of the year showing a peak after about half a year means a wide variance when home locations were assigned. In addition, we have between about half a year to a year. The revealing of more found that these results can be seen in only users who have detailed timing is a future work. changed their home location at least once in three years. REFERENCES 4. Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and 1. Lars Backstrom, Eric Sun, and Cameron Marlow. 2010. Kevin Chen-Chuan Chang. 2012. Towards Social User Find Me If You Can: Improving Geographical Prediction Profiling: Unified and Discriminative Influence Model for with Social and Spatial Proximity. In Proceedings of the Inferring Home Locations. In Proceedings of the 18th 19th International Conference on World Wide Web. ACM International Conference on Knowledge Discovery 61–70. and Data Mining. 1023–1031. 2. Clodoveu A. Davis Jr., Gisele L. Pappa, Diogo 5. Jeffrey McGee, James Caverlee, and Zhiyuan Cheng. Rennó Rocha de Oliveira, and Filipe de L. Arcanjo. 2011. 2013. Location Prediction in Social Media Based on Tie Inferring the Location of Twitter Messages Based on Strength. In Proceedings of the 22nd ACM International User Relationships. Transactions in GIS 15, 6 (2011), Conference on Information and Knowledge Management. 735–751. 459–468. 3. Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010. 6. Norases Vesdapunt and Hector Garcia-Molina. 2016. Structure and Evolution of Online Social Networks. In Updating an Existing Social Graph Snapshot via a Link Mining: Models, Algorithms, and Applications. Limited API. In Proceedings of the 25th ACM 337–357. International on Conference on Information and Knowledge Management. 1693–1702.