=Paper= {{Paper |id=Vol-1558/paper8 |storemode=property |title=The Origin of Heterogeneity in Human Mobility Ranges |pdfUrl=https://ceur-ws.org/Vol-1558/paper8.pdf |volume=Vol-1558 |authors=Luca Pappalardo |dblpUrl=https://dblp.org/rec/conf/edbt/Pappalardo16 }} ==The Origin of Heterogeneity in Human Mobility Ranges== https://ceur-ws.org/Vol-1558/paper8.pdf
        The origin of heterogeneity in human mobility ranges

                                                                         Luca Pappalardo
                                                           Department of Computer Science
                                                                  University of Pisa
                                                      Largo Bruno Pontecorvo 3, 56127 Pisa, Italy
                                                                   lpappalardo@di.unipi.it


ABSTRACT                                                                                geneity in human mobility ranges? Answering this question
In the last decade, scientists from different disciplines discov-                       is of great importance in contexts like urban planning and
ered a great heterogeneity in human mobility ranges, since a                            the design of smart cities, since it can be helpful for crucial
power law characterizes the distribution of the characteristic                          problems such as movement prediction [3, 20] and activity
distance traveled by individuals, the so-called radius of gyra-                         recognition [11, 8, 15].
tion. The origin of such heterogeneity, however, still remains                             In this paper, we address this question by performing a
unclear. In this paper, we analyze two mobility datasets and                            data-driven study of human mobility. In our analysis we
observe that an individual’s locations tend to be grouped in                            exploit the access to two mobility datasets, each storing the
dense clusters representing geographical mobility cores. We                             trajectories of about 50,000 individuals. We observe that
show that the heterogeneity in human mobility ranges is                                 the locations visited by the individuals tend to cluster in
mainly due to trips between these mobility cores, while it                              dense groups, representing meaningful geographical units or
is greatly reduced when individuals are constrained to move                             mobility cores. We then compute for every individual her
within a single mobility core.                                                          inter-core characteristic traveled distance and her intra-core
                                                                                        characteristic traveled distance, which are defined by the
                                                                                        radius of gyration computed on the trips between mobility
CCS Concepts                                                                            cores and the trips within mobility cores respectively. From
•Applied computing → Physics; Mathematics and statis-                                   the comparison of the total radius of gyration of an indi-
tics;                                                                                   vidual with her intra- and inter-core radius of gyration we
                                                                                        observe two main results. First, a strong linear correlation
Keywords                                                                                emerges between the total radius of an individual and her
                                                                                        inter-core radius, suggesting that the mobility range of an
human mobility; mobility data mining; mobile phone data;                                individual is mainly determined by trips between mobility
GPS data; data science; Big Data                                                        cores. Second, the distribution of the characteristic intra-
                                                                                        core radius of gyration has a peak suggesting that individu-
1.     INTRODUCTION                                                                     als show typical mobility ranges when constrained to move
   In the last decade the availability of big mobility data,                            within mobility cores. Our results, which emerge on differ-
such as GPS tracks from vehicles and mobile phone data,                                 ent types of mobility data and at different geographical and
offered a series of novel insights on the quantitative patterns                         temporal scales, suggest that people perform two types of
characterizing human mobility. In particular, scientists from                           trips: intra-core trips and inter-core trips, the latter being
different disciplines discovered that human movements are                               the origin of the observed heterogeneity in mobility ranges.
not completely random but follow specific statistical laws.                                The paper is organized as follows. Section 2 summarizes
The mobility of an individual can be confined within a sta-                             some works relevant to our topic. Section 3 introduces the
ble circle defined by a center of mass and a radius of gyration                         two mobility datasets we analyze and Section 4 describes
[7, 12]. Interestingly, such circles are found to be highly het-                        the measures of individual human mobility we use during
erogeneous since a power law characterizes the distribution                             the analysis. Section 5 shows the results of our work and
of the radius of gyration of individuals [7, 14]. Although                              finally Section 6 concludes the paper.
these discoveries have doubtless shed light on interesting as-
pects about human mobility, the origin of the observed pat-                             2.   RELATED WORK
terns still remains unclear: what is the origin of the hetero-
                                                                                           The availability of Big Data on human mobility allowed
                                                                                        scientists from different disciplines to discover that tradi-
                                                                                        tional mobility models adapted from the observation of an-
                                                                                        imals [5, 6] and dollar bills [2] are not suitable to describe
                                                                                        people’s movements. Indeed, at a global scale humans are
                                                                                        characterized by a huge heterogeneity, since a power law
(c) 2016, Copyright is with the authors. Published in the Workshop Proceedings of the
EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bordeaux, France) on CEUR-
                                                                                        emerges in the distribution of the radius of gyration, the
WS.org (ISSN 1613-0073). Distribution of this paper is permitted under the terms of     characteristic distance traveled by individuals [7, 12]. De-
the Creative Commons license CC-by-nc-nd 4.0                                            spite this heterogeneity, through the observation of past mo-
                                                                                        bility history the whereabouts of most individuals can be
predicted with an accuracy higher than 80% [4, 18]. More-            thresholds (5, 10, 15, 20, 30 and 40 minutes) without finding
over, according to their recurrent and total mobility patterns       significant differences in the sample of trips and in the statis-
individuals naturally split into two distinct mobility profiles,     tical analysis we present in this paper. We assign each origin
namely returners and explorers, which show communication             and destination point of the obtained sub-trajectories to the
preferences with individuals in the same mobility profile [14].      corresponding Italian census cell, using information provided
   The patterns of individual human mobility have been ob-           by the Italian National Institute of Statistics (ISTAT). We
served in both GSM data and GPS data [7, 12], and have               describe the movements of a vehicle by the time-ordered list
been used to build generative models of individual human             of census cells where the vehicle stopped [14].
mobility [10, 18, 14], generative models to describe human              GSM vs GPS. The GSM and the GPS datasets differ
migration flows [17, 21, 9], methods to discover geographic          in several aspects [13, 12]. The GPS data refers to trips
borders according to recurrent trips of private vehicles [16],       performed during one month (May 2011) in an area corre-
methods to predict the formation of social ties [3, 20], and         sponding to a single Italian region, while the mobile phone
classification models to predict the kind of activity associ-        data cover an entire European country and a period of ob-
ated to individuals’ trips on the only basis of the observed         servation of three months. The GPS data represents a 2%
displacements [11, 8, 15]. Bagrow et al. exploit network sci-        sample of the population of vehicles in Italy [12], while the
ence techniques to split the mobility of individuals into mo-        mobile phone dataset covers users of a major European op-
bility units, or mobility habitats [1]. They find a relationship     erator, about the 25% of the country’s adult population [7,
between the total radius of gyration of an individual and the        14]. The trajectories described by mobile phone data in-
trips between the main mobility habitats. In this paper we           clude all possible means of transportation. In contrast, the
investigate the existence of mobility groups at different ge-        GPS data refers to private vehicle displacements only. The
ographical levels. We use data mining clustering techniques          fact that one dataset contains aspect missing in the other
(instead of network techniques) to aggregate an individual’s         dataset makes the two types of data suitable for an inde-
locations into clusters.                                             pendent validation of human mobility patterns.

3.     MOBILITY DATA                                                 4.   MOBILITY MEASURES
   GSM data. Our first data source consists of anonymized              The radius of gyration rg is a standard measure to describe
mobile phone data collected by a European mobile carrier for         the characteristic distance traveled by an individual, defined
billing and operational purposes. The mobile phones carried          as [7, 12]:
by individuals in their daily routine offer a good proxy to                                s
study the structure and dynamics of human mobility: each                                      1 X
                                                                                     rg =            ni (ri − rcm )2 ,           (1)
time an individual makes a call the tower that communi-                                       N i∈L
cates with her phone is recorded by the carrier, effectively
tracking her current location. The datasets consists of Call         where L is the set of locations visited by the individual,
Detail Records (CDR) describing the calls of 67,000 individ-         ri is a two-dimensional vector describing the geographical
uals during three months selected from 1 million users pro-          coordinates of location
                                                                                      P        i; ni is the visitation frequency of
vided that they visited more than two locations during the           location i; N = i∈L ni is the total number of visits of the
observation period and that their average call frequency was         individual, and rcm is the center of mass of the individual
f ≥ 0.5 hour−1 . Each call is characterized by timestamp,            defined as the mean weighted point of the visited locations
caller and callee identifiers, duration of the call and the ge-      [7, 12]. The distribution of the radius of gyration is well
ographical coordinates of the tower serving the call. We             fitted by a power-law with exponential cutoff, as measured
reconstruct a user’s movements based on the time-ordered             on mobile phone data [7, 14] and GPS data [12, 14].
list of phone towers from which a user made her calls [7].              Given a partition of an individual’s locations in m groups,
   GPS data. Our second data source is a GPS dataset                 or mobility cores, we define a dominant location Di as the
storing information about the trips of 46,000 private vehi-          most visited location in group i, i.e. the preferred location of
cles traveling in Tuscany during one month. The GPS traces           the individual when she visits locations in group i (see Fig-
are provided by Octo Telematics1 , a company that provides           ure 1). We define the inter-core radius rginter of an individual
a data collection service for insurance companies. The GPS           as the radius of gyration computed on her m dominant loca-
device embedded into a vehicle’s engine automatically turns          tions (m ≥ 2), and the intra-core radius rgintra as the radius
on when the vehicle starts, and the sequence of GPS points           of gyration computed on the locations of a given mobility
that the device transmits every 30 seconds to the server via         core. Table 1 summarizes the mobility measures we use in
a GPRS connection forms the global trajectory of a vehicle.          our analysis and Figure 1 schematizes some of the concepts
We exploit the stops of the vehicles to split the global trajec-     introduced above.
tory into several sub-trajectories, corresponding to the trips
performed by the vehicle. We set a stop duration threshold                                measure                symbol
of at least 20 minutes to create the sub-trajectories, in order                       radius of gyration             rg
to avoid short stops like traffic lights: if the time interval be-                   dominant location               Di
tween two consecutive observations of a vehicle is larger than                  intra-core radius of gyration     rgintra
20 minutes, the first observation is considered as the end of a                 inter-core radius of gyration     rginter
sub-trajectory and the second one is considered as the start
of another sub-trajectory. We also performed the extrac-             Table 1: The mobility measures used in our study
tion of the sub-trajectories by using different stop duration        and the corresponding mathematical notation.
1
    http://www.octotelematics.com/
                                                                   we do not obtain a power law anymore (Figure 4): a peak
                                                                   emerges from the distribution of rgintra for low eps suggesting
                                                                   that, when restricted to move within mobility cores, individ-
                                                                   uals show typical radii of gyration. In summary, our analysis
                                                                   suggests that: (i) individuals tend to split their mobility in
                                                                   dense groups of locations (mobility cores); (ii) the distance
                                                                   between the dominant locations in mobility cores generates
                                                                   the observed heterogeneity in human mobility ranges; (iii)
                                                                   the heterogeneity is indeed greatly reduced when individuals
           Dominant loca+on                                        are constrained to move within mobility cores.
           Mobility core                                             Interestingly, we observe that similar results emerge from
           Noise loca+on                                           both the mobile phone dataset, which captures displace-
                                                                   ments by any transportation means in an entire European
                                                                   country during three months, and the GPS dataset, which
Figure 1: The image illustrates the locations vis-                 only captures movements by private vehicles occurred in
ited by an individual. Blue circles are visited loca-              Tuscany during one month.
tions, groups of circles within blue dashed shapes
are mobility cores, red circles are dominant loca-                             14000
                                                                                                         clusters per user                                           25000
                                                                                                                                                                                            clusters per user
tions. Green circles are noise locations that are not                          12000                                                  eps = 5km                                                                      eps = 10km
                                                                                                                                                                     20000

part of any mobility core. The radius of gyration                              10000




                                                                                                                                                        # users
                                                                     # users
is computed on all the circles, the inter-core radius                              8000
                                                                                                                                                                     15000




on red circles, the intra-core radius on the circles                               6000
                                                                                                                                                                     10000


within the same dashed shape.                                                      4000

                                                                                                                                                                         5000
                                                                                   2000



                                                                                          0                                                                                   0
                                                                                           0        10            20             30           40   50                          0      5           10         15         20      25   30
                                                                                                                  # clusters                                                                              # clusters

5.   RESULTS                                                                                                      (a)                                                                                  (b)
   For every individual in the two datasets, we partition her                                            clusters per user                                                                   clusters per user
                                                                              45000                                                                                  50000

locations in mobility cores by using the DBSCAN clustering                    40000
                                                                                                                                  eps = 50km                                                                         eps = 100km
algorithm [19], which extracts dense groups of points ac-                     35000
                                                                                                                                                                    40000




cording to two input parameters: eps, the maximum search                      30000
                                                                    # users




                                                                                                                                                        # users
                                                                                                                                                                     30000
                                                                              25000

radius; and minP ts, the minimum number of points (loca-                      20000
                                                                                                                                                                     20000

tions) to form a cluster. Every location have two features,                    15000


                                                                               10000

the latitude and the longitude of the location’s position on                       5000
                                                                                                                                                                     10000




the space. The DBSCAN algorithm uses the latitude and                                     0
                                                                                           0             5              10               15        20
                                                                                                                                                                              0
                                                                                                                                                                               0      2           4          6          8       10   12
                                                                                                                 # clusters                                                                               # clusters
longitude of locations to group them in clusters according to
the input parameters minP ts and eps. We set minP ts = 2                                                          (c)                                                                                  (d)
and eps = 5, 10, 50, 100km in our experiments and eliminate
the noise clusters produced by the algorithm, i.e. locations       Figure 2: Distribution of the number of clus-
that do not belong to any dense cluster of locations accord-       ters per individual on the GSM dataset for eps =
ing to the input parameters (see Figure 1).                        5, 10, 50, 100km (the GPS dataset produces similar
   We compute the distribution of the number of obtained           results). The plots highlight a clear tendency of
(non-noise) clusters per individual, at different values of eps    locations to cluster in dense groups. We observe
parameter (see Figure 2). We observe a peaked distribution         that: (i) the majority of individuals have few mobil-
where the majority of individuals have few mobility cores,         ity cores (2 or 3), (ii) as eps increases the mode of
e.g. two mobility cores when eps = 5km and one mobil-              the distribution approaches to one.
ity core when eps = 100km, and individuals having more
than ten mobility cores are extremely rare (Figure 2). The
fact that the algorithm produces non-noise clusters indicates                                                    rg vs inter-rg                                                                        rg vs inter-rg

that that the locations of an individual are not randomly                                      # mobility cores = 2                                                                # mobility cores = 2


distributed but tend to aggregated in dense groups of loca-
tions, representing geographical units of individual mobility.
                                                                          inter-rg [km]




                                                                                                                                                              inter-rg [km]




Our distribution of cores per person is in contrast with pre-
vious works which build mobility groups using network sci-                                                                            eps = 5km                                                                         eps = 10km

ence techniques [1], where most users possess 5-20 mobility
groups and only ≈7% of users have a single mobility group.                                                             rg [km]                                                                             rg [km]

   We also compare an individual’s radius of gyration rg with                                                     (a)                                                                                  (b)
her inter-core radius rginter , observing a strong linear corre-
lation (see Figure 3). Since the inter-core radius is computed
on the dominant locations of the individual’s mobility cores,      Figure 3: Radius of gyration (on x axis) versus inter-
this result suggests that the radius of gyration is mainly de-     core radius (y axis) of individuals having two mobil-
termined by the tendency of an individual to partition her         ity cores, for eps = 5km (a) and eps = 10km (b).
mobility in different geographical units. If we compute the        Plots refer to the GSM dataset (the GPS dataset
distribution of individuals’ intra-core radius rgintra , indeed,   produces similar results).
                                   PDF of intra-rg                                              PDF of intra-rg
                   0.5                                                            0.20                                                 [5] G. M. V. et al. Lévy flight search patterns of
                                              eps = 5km
                   0.4
                                                                                                                                           wandering albatrosses. Nature, 381:413–415, 1996.
                                                                                  0.15

                                                                                                                                       [6] G. R.-F. et al. Lévy walk patterns in the foraging
     p(intra-rg)




                                                                    p(intra-rg)
                   0.3
                                                                                                    eps = 10km                             movements of spider monkeys. Behavioral Ecology and
                                                                                  0.10

                   0.2
                                                                                                                                           Sociobiology, 55(25), 2003.
                                                                                  0.05
                   0.1                                                                                                                 [7] M. C. González, C. A. Hidalgo, and A.-L. Barabási.
                   0.0
                      0    2   4     6   8    10     12   14   16
                                                                                  0.00
                                                                                      0   20   40   60   80   100   120   140   160
                                                                                                                                           Understanding individual human mobility patterns.
                                     intra-rg [km]                                                  intra-rg [km]
                                                                                                                                           Nature, 453(7196):779–782, June 2008.
                                      (a)                                                            (b)                               [8] S. Jiang, J. F. Jr, and M. González. Clustering daily
                                                                                                                                           patterns of human activities in the city. Data Mining
Figure 4: Distribution of intra-core radius rgintra                                                                                        and Knowledge Discovery, 25:478–510, 2012.
across individuals in the GSM dataset (the GPS                                                                                         [9] W. S. Jung, F. Wang, and H. E. Stanley. Gravity
dataset produces similar results), for eps = 5km (a)                                                                                       model in the korean highway. EPL (Europhysics
and eps = 50km (b). We observe that, for eps = 5km,                                                                                        Letters), 81:48005, 2008.
the distribution is not a power law anymore but a                                                                                     [10] D. Karamshuk, C. Boldrini, M. Conti, and
peak emerges denoting a characteristic radius of gy-                                                                                       A. Passarella. Human mobility models for
ration (a). For eps = 50km the distribution starts                                                                                         opportunistic networks. IEEE Communications
approaching a power law.                                                                                                                   Magazine, 49(12):157–165, 2011.
                                                                                                                                      [11] L. Liao, D. J. Patterson, D. Fox, and H. Kautz.
                                                                                                                                           Learning and inferring transportation routines. Artif.
6.                        CONCLUSIONS                                                                                                      Intell., 171(5-6):311–331, Apr. 2007.
                                                                                                                                      [12] L. Pappalardo, S. Rinzivillo, Z. Qu, D. Pedreschi, and
   In this paper we showed that the locations visited by indi-
                                                                                                                                           F. Giannotti. Understanding the patterns of car
viduals tend to cluster in a small number of mobility cores.
                                                                                                                                           travel. The European Physical Journal Special Topics,
The radius of gyration computed on the dominant locations
                                                                                                                                           215(1):61–73, 2013.
of each mobility cores highly correlates with the standard
radius of gyration, meaning that the characteristic distance                                                                          [13] L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedreschi,
traveled by individuals is mainly determined by their dom-                                                                                 and F. Giannotti. Comparing general mobility and
inant locations. Moreover, individuals show homogenous                                                                                     mobility by car. In Proceedings of the 1st BRICS
radii of gyration when constrained to travel within mobility                                                                               Countries Congress (BRICS-CCI) and 11th Brazilian
cores. Our results showed that individual human mobility                                                                                   Congress (CBIC) on Computational Intelligence, 2013.
is composed by two types of trips: intra-core trips, which                                                                            [14] L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedreschi,
represent movement within a given geographical unit, and                                                                                   F. Giannotti, and A.-L. Barabasi. Returners and
inter-core trips, which define trips between locations belong-                                                                             explorers dichotomy in human mobility. Nature
ing to different mobility cores and generate the heterogene-                                                                               Communications, 6, 09 2015.
ity observed in human mobility ranges. As future work, we                                                                             [15] S. Rinzivillo, L. Gabrielli, M. Nanni, L. Pappalardo,
plan to investigate deeply the structure of intra- and inter-                                                                              D. Pedreschi, and F. Giannotti. The purpose of
trips and quantify the contribution of every single intra- or                                                                              motion: Learning activities from individual mobility
inter-trip in shaping the characteristic traveled distance of                                                                              networks. In Proceedings of International Conference
an individual.                                                                                                                             on Data Science and Advanced Analytics, DSAA’14,
                                                                                                                                           2014.
7.                        ACKNOWLEDGMENTS                                                                                             [16] S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia,
                                                                                                                                           D. Pedreschi, and F. Giannotti. Discovering the
  This work has been partially funded by the EU under                                                                                      geographical borders of human mobility. Künstliche
the FP7-ICT Program by project Petra n. 609042, under                                                                                      Intelligenz, 26(3):253–260, 2012.
H2020 Program by projects SoBigData grant n. 654024 and
                                                                                                                                      [17] F. Simini, M. C. González, A. Maritan, and A.-L.
Cimplex grant n. 641191.
                                                                                                                                           Barabási. A universal model for mobility and
                                                                                                                                           migration patterns. Nature, 484(7392):96–100, 2012.
8.                        REFERENCES                                                                                                  [18] C. Song, Z. Qu, N. Blumm, and A.-L. Barabási.
 [1] J. Bagrow and Y.-R. Lin. Mesoscopic structure and                                                                                     Limits of predictability in human mobility. Science,
     social aspects of human mobility. PLoS ONE, 7(5),                                                                                     327:1018–1021, 2010.
     2012.                                                                                                                            [19] P.-N. Tan, M. Steinbach, and V. Kumar. Introduction
 [2] D. Brockmann, L. Hufnagel, and T. Geisel. The                                                                                         to Data Mining. Addison Wesley, 2006.
     scaling laws of human travel. Nature,                                                                                            [20] D. Wang, D. Pedreschi, C. Song, F. Giannotti, and
     439(7075):462–465, 2006.                                                                                                              A.-L. Barabási. Human mobility, social ties, and link
 [3] E. Cho, S. A. Myers, and J. Leskovec. Friendship and                                                                                  prediction. In Proceedings of the 17th ACM SIGKDD
     mobility: user movement in location-based social                                                                                      International Conference on Knowledge Discovery and
     networks. In Proceedings of the 17th ACM SIGKDD                                                                                       Data Mining, KDD ’11, pages 1100–1108, New York,
     International Conference on Knowledge Discovery and                                                                                   NY, USA, 2011. ACM.
     Data Mining, KDD’11, pages 1082–1090. ACM, 2011.                                                                                 [21] G. K. Zipf. The p1p2/d hypothesis: On the intercity
 [4] N. Eagle and A. Pentland. Eigenbehaviors: identifying                                                                                 movement of persons. American Sociological Review,
     structure in routine. Behavioral Ecology and                                                                                          11(6):677–686, 1946.
     Sociobiology, 63:1057–1066, 2009.