Temporal Analysis of Online Social Graph
                                by Home Location

                                  Shiori Hironaka Mitsuo Yoshida Kyoji Umemura
                                            Toyohashi University of Technology
                                                       Aichi, Japan
                               s143369@edu.tut.ac.jp, yoshida@cs.tut.ac.jp, umemura@tut.jp

ABSTRACT                                                               In this paper, we tackle temporal analysis of online social
An online social graph which represents relationships between          graphs by answering the following question: which social
users is used for many purposes such as home location esti-            graph of certain periods shows the best performance for
mation. However, the online social graph changes over time             network-based home location estimation on Twitter? We ob-
because the user’s environment changes (e.g., house-moving).           tain that the estimation performance achieves the best using the
We tackle temporal analysis of online social graphs by an-             social graph after about half a year. This result indicates that
swering the following question: which social graph of certain          changes in social graphs due to user’s environmental changes
periods shows the best performance for network-based home              converge after about half a year.
location estimation on Twitter? We obtain that the estimation
performance achieves the best using the social graph after             NETWORK-BASED HOME LOCATION ESTIMATION
about half a year. This result indicates that changes in social        A network-based home location estimation method is the home
graphs due to user’s environmental changes converge after              location estimation method using a social graph, which is
about half a year.                                                     created with a node as a user and an edge as a relationship
                                                                       between users. The based assumption is that a user is located
ACM Classification Keywords                                            geographically close to friends on the social graph. We use
J.4. Computer Applications: Social and Behavioral Sciences             network-based home location estimation to determine how
                                                                       well the social graph reflects home locations (whether the
Author Keywords                                                        social graph and home location data represent the state at the
Twitter; social graph; home location estimation                        same time).
                                                                       In this paper, we use the method of Davis Jr. et al. [2] as a pop-
INTRODUCTION                                                           ular network-based home location estimation. This method
People live constructing relationships and interacting with            selects the most frequent location among the locations of the
each other. An online social graph captures a realistic social         user’s friends as the estimated location. The method is repre-
graph constructed from such relationships [3]. Therefore, an           sented as follows:
online social graph is used for many purposes, especially to
estimate user attributes such as home locations [1, 4]. The                       Su = arg max* |{v|v ∈ Nu ∩ L, l = lv }|
home location estimation methods using online social graphs                             l∈{ln |n∈Nu ∩L}
are called network-based home location estimation methods.                         Infer(u) = arg max |{n|n ∈ L, l = ln }|
                                                                                                   l∈Su
The online social graph changes over time because the user’s
environment changes (e.g., house-moving). We have to update            where L is a set of learning data (nodes), Nu is a set of adjacent
online social graph data [6], and the estimation performance of        nodes of node u, lu is a correct label (home location) of node u,
network-based home location estimation may change depend-              and arg max* is defined that returns a set of the equivalent. The
ing on when we collect the online social graph data. McGee             processing that the maximum value of the number of friends’
et al. [5] indicated that the geographic distance changes by the       locations is the equivalent is not clear in the paper [2]. In this
relationship between users, which constitutes an online social         paper, we then prepare a set Su of l which takes the maximum
graph. Is a newer online social graph used for home location           value to select the most frequent area in the learning data set.
estimation better?
                                                                       DATA
                                                                       We need home location data and social graph data for home
                                                                       location estimation. In this section, we describe how to make
                                                                       the data.
                                                                       We define that a user’s home location is the most frequent
                                                                       location posted with geo-tagged tweets by the user. Actually,
©2018. Copyright for the individual papers remains with the authors.   we decided that a home location is an administrative area like
Copying permitted for private and academic purposes.                   a city, the same way as Davis Jr. et al. [2]. We aggregate the
WII’18, March 11, 2018, Tokyo, Japan
locations of the geo-tagged tweets for each area, and select the
most frequent city as a user’s home location.                                                  820000


                                                                    Number of edges
We collected geo-tagged tweets posted in Japan from January
                                                                                               800000
2014 to December 2016 using the Twitter Streaming API.
We used only the tweets which have “place” field and its
“place_type” is “poi” or “city”. We regard a city in Japan                                     780000
including the centroid of the bounding box of the “place” as a
location of the geo-tagged tweet. We assigned a home location                                  760000
to a user who posts geo-tagged tweets at least five times each
year in a certain area in Japan. As a result, we assigned a home                                            Jul     Oct     Jan Apr Jul Oct Jan Apr          Jul
location to 634,789 users in 2014, 851,675 users in 2015, and                                                              2016                       2017
828,929 users in 2016.                                                                                                          Month of social graph
In this paper, we use a social graph based on a mutual following                                                             (a) number of edges
relationship on Twitter. Our social graph is a simple undirected
graph. We collected following relationships every month from
July 2015 to July 2017 among users who were assigned a                                         10.6
home location in 2014. We excluded users whose following


                                                                    Mean degrees
relationships could not be collected one or more times due to                                  10.4
account deletion or becoming a private account. We collected                                   10.2
following relationships and created 25 monthly social graphs.
                                                                                               10.0
Finally, we use 76,730 users for analysis, who can be assigned
a home location for the three years continuously and whose                                      9.8
following relationships can be collected for those three years.
                                                                                                      Jul         Oct      Jan Apr Jul Oct Jan Apr           Jul
                                                                                                                          2016                       2017
ANALYSIS                                                                                                                       Month of social graph
Temporal Analysis of Home Location and Social Graph
                                                                                                                        (b) average number of degrees
In this section, we report that social graphs and home locations
change over time.
                                                                    Number of isolated nodes


Firstly, we report the changes of the user’s home locations                                    18750
from 2014 to 2016. In the 76,730 target users, the home                                        18500
locations of 39,814 users did not change for the three years,                                  18250
the home locations of 22,477 users changed once, and the
home locations of 14,439 users changed twice. In total, the                                    18000
home locations of 48% of users changed at least once.                                          17750
Secondly, we report the changes of the social graph from                                       17500
July 2015 to July 2017. The changes of the social graph                                                 Jul        Oct      Jan Apr Jul Oct Jan Apr          Jul
properties are shown in Figure 1. Figure 1(a) shows that the                                                               2016                       2017
number of edges between 76,730 users increases over time.                                                                       Month of social graph
Figure 1(b) shows the average number of degrees, which is
used as an estimation clue, increases from 9.7 to 10.7 for two                                                           (c) number of isolated nodes
years. Figure 1(c) shows that the number of isolated nodes,
which have no edges, decreases over time. They therefore           Figure 1: Size of social graphs: the number of edges increases
show that we can use more relationships (edges) to estimate        and the number of isolated nodes decreases over time.
home locations in after months and years.
                                                                   same period of data. In this analysis, since it is considered that
Comparison of Social Graph Collected Month
                                                                   the network-based home location estimation is making a good
The network-based home location estimation method uses the
                                                                   guess using the social graph and friends’ locations, we find
combination of home locations and a social graph. In this
                                                                   the most adequate period of the social graph for determining
paper, the home location is created from a certain period of
                                                                   home locations.
geo-tagged tweets, and the social graph is a monthly snapshot.
We investigate that the estimation performance combining old       We conduct a home location estimation combining three-years
home location data and old social graph data, old home loca-       of home locations and 25 monthly snapshots of a social graph
tion data and new social graph data, and new home location         for two years. Since we are interested only in whether a user’s
data and new social graph data. We did not use old home loca-      home location can be estimated correctly, the performance
tion data to estimate new home location in this analysis. That     is measured by precision, recall, and F1 with leave-one-out
is, the home location data for learning and tests are from the     cross-validation.
The results of the estimations are shown in Figure 2. When                      0.216
we use the home locations of 2014, August 2015 achieved the
highest F1. In the home locations of 2015, F1 increases to                      0.214
June 2016, and decreases after that. In the home locations of
2016, F1 becomes higher to June 2017. These results show                        0.212


                                                                    Precision
the highest values of precision, recall, and F1 are achieved
after about half a year from the end of the year when home                      0.210
locations were assigned. This result indicates that changes in
social graphs due to user’s environmental changes converge                      0.208
after about half a year.                                                                                                                         2014
                                                                                0.206                                                            2015
The results also show poor performance when using the home                                                                                       2016
locations of 2016. We conjecture that this cause is due to a                            Jul   Oct     Jan    Apr     Jul     Oct     Jan   Apr          Jul
change of default function for location acquisition by Twitter                                       2016                           2017
                                                                                                            Month of social graph
in April 2015. The default function has been changed from the
accurate GPS coordinates to the user’s self-chosen place1 . As                                              (a) Precision
a result, the quality of “place” used to assign home location
has declined.                                                                   0.166         2014
                                                                                              2015
                                                                                0.164         2016
Analysis of Users Who Have Changed Home Location
We assume that the performance changes shown in Figure 2                        0.162
are caused by home location changes. We evaluate the per-


                                                                    Recall
formance of home location estimation by splitting into two                      0.160
user groups: users who have changed their home location at
least one time within the three years and users who have never                  0.158
changed their home location within those three years. We
                                                                                0.156
did not distinguish between the home location changes that
occurred once or twice. We show just evaluation result of F1                    0.154
because precision and recall show the same trend as F1.                                 Jul   Oct     Jan    Apr     Jul     Oct     Jan   Apr          Jul
                                                                                                     2016                           2017
Figure 3 shows the evaluation result of F1 by splitting into                                                Month of social graph
the above two user groups. The estimation results of users
                                                                                                             (b) Recall
who have changed their location have large differences over
time. In contrast, the estimation results of users who have
not changed home location have few differences. When we
                                                                                0.186
consider the reason for location change is house-moving, it
seems home location changes cause social graph changes.                         0.184
In addition, in contrast to the averages of F1 of users whose
                                                                                0.182
                                                                    F1


location are stable are 0.241, 0.243, and 0.233 in 2014, 2015,
and 2016 respectively the averages of F1 of users whose lo-                     0.180
cation are changed are respectively, 0.127, 0.127, and 0.126.
This result indicates that users who have changed home lo-                      0.178                                                            2014
cation are hard to estimate using this network-based home                                                                                        2015
                                                                                0.176                                                            2016
location estimation method.
                                                                                        Jul   Oct     Jan    Apr     Jul     Oct     Jan   Apr          Jul
                                                                                                     2016                           2017
Analysis of Users Who Have Changed Friends                                                                  Month of social graph
It is considered that users whose social graph changes make
new friends actively and we suppose that the change of social                                                  (c) F1
graph converges quickly, specifically before about half a year.
We evaluate the performance of home location estimation by          Figure 2: Estimation performance combining home location
splitting into two user groups: users who have changed friends      of each year and social graphs of each month. The highest F1
(adjacent nodes) within the three years and users who have not      value is achieved after about half a year from the end of the
changed friends.                                                    year when home locations were assigned.

To check the change of friends, we compare the number of
friends (degree) of each user on the social graph July 2015 and
July 2017. This metric watches only mutual following friends.
In this case, a user just a followee or follower is not a friend.   The number of users who have changed the number of friends
1 The number of tweets which have “coordinates” field sharply de-   is 44,228, and the number of users who have not changed the
creased on April 28, 2015.                                          number of friends is 32,502.
     0.130                                                                                    0.218
                                                                                              0.216
     0.128
                                                                                              0.214
     0.126                                                                                    0.212
F1


                                                                                         F1
     0.124                                                                                    0.210
                                                                                              0.208
     0.122                                                                  2014                                                                              2014
                                                                                              0.206
                                                                            2015                                                                              2015
     0.120                                                                  2016              0.204                                                           2016
             Jul      Oct      Jan   Apr      Jul     Oct     Jan     Apr          Jul                Jul   Oct    Jan   Apr      Jul     Oct     Jan   Apr          Jul
                              2016                           2017                                                 2016                           2017
                                     Month of social graph                                                               Month of social graph

                   (a) The users having moved location (n = 36916)                             (a) The results of users who have changed friends (n = 44228)
                                                                                              0.137
     0.244
                                                                                              0.136
     0.242
                                                                                              0.135
     0.240
                       2014                                                                   0.134
     0.238             2015
F1


                                                                                         F1
                       2016                                                                   0.133
     0.236
     0.234                                                                                    0.132
                                                                                              0.131                                                           2014
     0.232                                                                                                                                                    2015
                                                                                              0.130                                                           2016
     0.230
             Jul      Oct      Jan   Apr      Jul     Oct     Jan     Apr          Jul                Jul   Oct    Jan   Apr      Jul     Oct     Jan   Apr          Jul
                              2016                           2017                                                 2016                           2017
                                     Month of social graph                                                               Month of social graph

                   (b) The users having stable location (n = 39814)                           (b) The results of users who have not changed friends (n = 32502)

Figure 3: Estimation performance comparison with stable                                  Figure 4: Comparison of social graph changes and estimation
location and moved location.                                                             performance.


The evaluation result is shown in Figure 4. The estimation                               In the experiment, we reveal that social graph changes for
results of users who have changed friends have large differ-                             about half a year to a year after a home location changes.
ences. In contrast, the estimation results of users who have                             When a user’s home location changes, the home location data
not changed friends have few differences. We surmise the                                 of a user changes when the majority of tweets during that year
F1 of users making friends actively finish to change before                              change to a new place. When the social graph data of a user
about half a year, but we cannot observe differences in conver-                          changes when the majority of friends become new friends,
gence speed compared with Figure 2. The average F1 of users                              the estimation result changes. Our results show that social
who have changed friends is 0.214, 0.217, and 0.210 in 2014,                             graph changes are slower than home location changes. It is
2015, and 2016 respectively, and the average F1 of users who                             considered that the social graph changes significantly when
have not changed friends is 0.136, 0.135, and 0.132 in 2014,                             the home location is changed.
2015, and 2016 respectively. The users who have not changed
friends have a lower F1.                                                                 CONCLUSION
                                                                                         We tackle temporal analysis of online social graphs by an-
DISCUSSION                                                                               swering the following question: which social graph of certain
Our research question was “Which social graph of certain                                 periods shows the best performance for network-based home
periods shows the best performance for network-based home                                location estimation on Twitter? We collected monthly snap-
location estimation on Twitter?”. We obtained the result that it                         shots of a social graph for two years and user’s home locations
is after about half a year from the end of the year when home                            for three years. Using the data, we conduct home location
locations were assigned. Our home location assigned method                               estimation. We have obtained that the F1 achieved the highest
uses a year’s worth of geo-tagged tweets. Thus, our result                               performance after about half a year from the end of the year
showing a peak after about half a year means a wide variance                             when home locations were assigned. In addition, we have
between about half a year to a year. The revealing of more                               found that these results can be seen in only users who have
detailed timing is a future work.                                                        changed their home location at least once in three years.
REFERENCES                                                    4. Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, and
1. Lars Backstrom, Eric Sun, and Cameron Marlow. 2010.           Kevin Chen-Chuan Chang. 2012. Towards Social User
   Find Me If You Can: Improving Geographical Prediction         Profiling: Unified and Discriminative Influence Model for
   with Social and Spatial Proximity. In Proceedings of the      Inferring Home Locations. In Proceedings of the 18th
   19th International Conference on World Wide Web.              ACM International Conference on Knowledge Discovery
   61–70.                                                        and Data Mining. 1023–1031.
2. Clodoveu A. Davis Jr., Gisele L. Pappa, Diogo              5. Jeffrey McGee, James Caverlee, and Zhiyuan Cheng.
   Rennó Rocha de Oliveira, and Filipe de L. Arcanjo. 2011.      2013. Location Prediction in Social Media Based on Tie
   Inferring the Location of Twitter Messages Based on           Strength. In Proceedings of the 22nd ACM International
   User Relationships. Transactions in GIS 15, 6 (2011),         Conference on Information and Knowledge Management.
   735–751.                                                      459–468.
3. Ravi Kumar, Jasmine Novak, and Andrew Tomkins. 2010.       6. Norases Vesdapunt and Hector Garcia-Molina. 2016.
   Structure and Evolution of Online Social Networks. In         Updating an Existing Social Graph Snapshot via a
   Link Mining: Models, Algorithms, and Applications.            Limited API. In Proceedings of the 25th ACM
   337–357.                                                      International on Conference on Information and
                                                                 Knowledge Management. 1693–1702.