=Paper=
{{Paper
|id=Vol-1558/paper8
|storemode=property
|title=The Origin of Heterogeneity in Human Mobility Ranges
|pdfUrl=https://ceur-ws.org/Vol-1558/paper8.pdf
|volume=Vol-1558
|authors=Luca Pappalardo
|dblpUrl=https://dblp.org/rec/conf/edbt/Pappalardo16
}}
==The Origin of Heterogeneity in Human Mobility Ranges==
The origin of heterogeneity in human mobility ranges Luca Pappalardo Department of Computer Science University of Pisa Largo Bruno Pontecorvo 3, 56127 Pisa, Italy lpappalardo@di.unipi.it ABSTRACT geneity in human mobility ranges? Answering this question In the last decade, scientists from different disciplines discov- is of great importance in contexts like urban planning and ered a great heterogeneity in human mobility ranges, since a the design of smart cities, since it can be helpful for crucial power law characterizes the distribution of the characteristic problems such as movement prediction [3, 20] and activity distance traveled by individuals, the so-called radius of gyra- recognition [11, 8, 15]. tion. The origin of such heterogeneity, however, still remains In this paper, we address this question by performing a unclear. In this paper, we analyze two mobility datasets and data-driven study of human mobility. In our analysis we observe that an individual’s locations tend to be grouped in exploit the access to two mobility datasets, each storing the dense clusters representing geographical mobility cores. We trajectories of about 50,000 individuals. We observe that show that the heterogeneity in human mobility ranges is the locations visited by the individuals tend to cluster in mainly due to trips between these mobility cores, while it dense groups, representing meaningful geographical units or is greatly reduced when individuals are constrained to move mobility cores. We then compute for every individual her within a single mobility core. inter-core characteristic traveled distance and her intra-core characteristic traveled distance, which are defined by the radius of gyration computed on the trips between mobility CCS Concepts cores and the trips within mobility cores respectively. From •Applied computing → Physics; Mathematics and statis- the comparison of the total radius of gyration of an indi- tics; vidual with her intra- and inter-core radius of gyration we observe two main results. First, a strong linear correlation Keywords emerges between the total radius of an individual and her inter-core radius, suggesting that the mobility range of an human mobility; mobility data mining; mobile phone data; individual is mainly determined by trips between mobility GPS data; data science; Big Data cores. Second, the distribution of the characteristic intra- core radius of gyration has a peak suggesting that individu- 1. INTRODUCTION als show typical mobility ranges when constrained to move In the last decade the availability of big mobility data, within mobility cores. Our results, which emerge on differ- such as GPS tracks from vehicles and mobile phone data, ent types of mobility data and at different geographical and offered a series of novel insights on the quantitative patterns temporal scales, suggest that people perform two types of characterizing human mobility. In particular, scientists from trips: intra-core trips and inter-core trips, the latter being different disciplines discovered that human movements are the origin of the observed heterogeneity in mobility ranges. not completely random but follow specific statistical laws. The paper is organized as follows. Section 2 summarizes The mobility of an individual can be confined within a sta- some works relevant to our topic. Section 3 introduces the ble circle defined by a center of mass and a radius of gyration two mobility datasets we analyze and Section 4 describes [7, 12]. Interestingly, such circles are found to be highly het- the measures of individual human mobility we use during erogeneous since a power law characterizes the distribution the analysis. Section 5 shows the results of our work and of the radius of gyration of individuals [7, 14]. Although finally Section 6 concludes the paper. these discoveries have doubtless shed light on interesting as- pects about human mobility, the origin of the observed pat- 2. RELATED WORK terns still remains unclear: what is the origin of the hetero- The availability of Big Data on human mobility allowed scientists from different disciplines to discover that tradi- tional mobility models adapted from the observation of an- imals [5, 6] and dollar bills [2] are not suitable to describe people’s movements. Indeed, at a global scale humans are characterized by a huge heterogeneity, since a power law (c) 2016, Copyright is with the authors. Published in the Workshop Proceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bordeaux, France) on CEUR- emerges in the distribution of the radius of gyration, the WS.org (ISSN 1613-0073). Distribution of this paper is permitted under the terms of characteristic distance traveled by individuals [7, 12]. De- the Creative Commons license CC-by-nc-nd 4.0 spite this heterogeneity, through the observation of past mo- bility history the whereabouts of most individuals can be predicted with an accuracy higher than 80% [4, 18]. More- thresholds (5, 10, 15, 20, 30 and 40 minutes) without finding over, according to their recurrent and total mobility patterns significant differences in the sample of trips and in the statis- individuals naturally split into two distinct mobility profiles, tical analysis we present in this paper. We assign each origin namely returners and explorers, which show communication and destination point of the obtained sub-trajectories to the preferences with individuals in the same mobility profile [14]. corresponding Italian census cell, using information provided The patterns of individual human mobility have been ob- by the Italian National Institute of Statistics (ISTAT). We served in both GSM data and GPS data [7, 12], and have describe the movements of a vehicle by the time-ordered list been used to build generative models of individual human of census cells where the vehicle stopped [14]. mobility [10, 18, 14], generative models to describe human GSM vs GPS. The GSM and the GPS datasets differ migration flows [17, 21, 9], methods to discover geographic in several aspects [13, 12]. The GPS data refers to trips borders according to recurrent trips of private vehicles [16], performed during one month (May 2011) in an area corre- methods to predict the formation of social ties [3, 20], and sponding to a single Italian region, while the mobile phone classification models to predict the kind of activity associ- data cover an entire European country and a period of ob- ated to individuals’ trips on the only basis of the observed servation of three months. The GPS data represents a 2% displacements [11, 8, 15]. Bagrow et al. exploit network sci- sample of the population of vehicles in Italy [12], while the ence techniques to split the mobility of individuals into mo- mobile phone dataset covers users of a major European op- bility units, or mobility habitats [1]. They find a relationship erator, about the 25% of the country’s adult population [7, between the total radius of gyration of an individual and the 14]. The trajectories described by mobile phone data in- trips between the main mobility habitats. In this paper we clude all possible means of transportation. In contrast, the investigate the existence of mobility groups at different ge- GPS data refers to private vehicle displacements only. The ographical levels. We use data mining clustering techniques fact that one dataset contains aspect missing in the other (instead of network techniques) to aggregate an individual’s dataset makes the two types of data suitable for an inde- locations into clusters. pendent validation of human mobility patterns. 3. MOBILITY DATA 4. MOBILITY MEASURES GSM data. Our first data source consists of anonymized The radius of gyration rg is a standard measure to describe mobile phone data collected by a European mobile carrier for the characteristic distance traveled by an individual, defined billing and operational purposes. The mobile phones carried as [7, 12]: by individuals in their daily routine offer a good proxy to s study the structure and dynamics of human mobility: each 1 X rg = ni (ri − rcm )2 , (1) time an individual makes a call the tower that communi- N i∈L cates with her phone is recorded by the carrier, effectively tracking her current location. The datasets consists of Call where L is the set of locations visited by the individual, Detail Records (CDR) describing the calls of 67,000 individ- ri is a two-dimensional vector describing the geographical uals during three months selected from 1 million users pro- coordinates of location P i; ni is the visitation frequency of vided that they visited more than two locations during the location i; N = i∈L ni is the total number of visits of the observation period and that their average call frequency was individual, and rcm is the center of mass of the individual f ≥ 0.5 hour−1 . Each call is characterized by timestamp, defined as the mean weighted point of the visited locations caller and callee identifiers, duration of the call and the ge- [7, 12]. The distribution of the radius of gyration is well ographical coordinates of the tower serving the call. We fitted by a power-law with exponential cutoff, as measured reconstruct a user’s movements based on the time-ordered on mobile phone data [7, 14] and GPS data [12, 14]. list of phone towers from which a user made her calls [7]. Given a partition of an individual’s locations in m groups, GPS data. Our second data source is a GPS dataset or mobility cores, we define a dominant location Di as the storing information about the trips of 46,000 private vehi- most visited location in group i, i.e. the preferred location of cles traveling in Tuscany during one month. The GPS traces the individual when she visits locations in group i (see Fig- are provided by Octo Telematics1 , a company that provides ure 1). We define the inter-core radius rginter of an individual a data collection service for insurance companies. The GPS as the radius of gyration computed on her m dominant loca- device embedded into a vehicle’s engine automatically turns tions (m ≥ 2), and the intra-core radius rgintra as the radius on when the vehicle starts, and the sequence of GPS points of gyration computed on the locations of a given mobility that the device transmits every 30 seconds to the server via core. Table 1 summarizes the mobility measures we use in a GPRS connection forms the global trajectory of a vehicle. our analysis and Figure 1 schematizes some of the concepts We exploit the stops of the vehicles to split the global trajec- introduced above. tory into several sub-trajectories, corresponding to the trips performed by the vehicle. We set a stop duration threshold measure symbol of at least 20 minutes to create the sub-trajectories, in order radius of gyration rg to avoid short stops like traffic lights: if the time interval be- dominant location Di tween two consecutive observations of a vehicle is larger than intra-core radius of gyration rgintra 20 minutes, the first observation is considered as the end of a inter-core radius of gyration rginter sub-trajectory and the second one is considered as the start of another sub-trajectory. We also performed the extrac- Table 1: The mobility measures used in our study tion of the sub-trajectories by using different stop duration and the corresponding mathematical notation. 1 http://www.octotelematics.com/ we do not obtain a power law anymore (Figure 4): a peak emerges from the distribution of rgintra for low eps suggesting that, when restricted to move within mobility cores, individ- uals show typical radii of gyration. In summary, our analysis suggests that: (i) individuals tend to split their mobility in dense groups of locations (mobility cores); (ii) the distance between the dominant locations in mobility cores generates the observed heterogeneity in human mobility ranges; (iii) the heterogeneity is indeed greatly reduced when individuals Dominant loca+on are constrained to move within mobility cores. Mobility core Interestingly, we observe that similar results emerge from Noise loca+on both the mobile phone dataset, which captures displace- ments by any transportation means in an entire European country during three months, and the GPS dataset, which Figure 1: The image illustrates the locations vis- only captures movements by private vehicles occurred in ited by an individual. Blue circles are visited loca- Tuscany during one month. tions, groups of circles within blue dashed shapes are mobility cores, red circles are dominant loca- 14000 clusters per user 25000 clusters per user tions. Green circles are noise locations that are not 12000 eps = 5km eps = 10km 20000 part of any mobility core. The radius of gyration 10000 # users # users is computed on all the circles, the inter-core radius 8000 15000 on red circles, the intra-core radius on the circles 6000 10000 within the same dashed shape. 4000 5000 2000 0 0 0 10 20 30 40 50 0 5 10 15 20 25 30 # clusters # clusters 5. RESULTS (a) (b) For every individual in the two datasets, we partition her clusters per user clusters per user 45000 50000 locations in mobility cores by using the DBSCAN clustering 40000 eps = 50km eps = 100km algorithm [19], which extracts dense groups of points ac- 35000 40000 cording to two input parameters: eps, the maximum search 30000 # users # users 30000 25000 radius; and minP ts, the minimum number of points (loca- 20000 20000 tions) to form a cluster. Every location have two features, 15000 10000 the latitude and the longitude of the location’s position on 5000 10000 the space. The DBSCAN algorithm uses the latitude and 0 0 5 10 15 20 0 0 2 4 6 8 10 12 # clusters # clusters longitude of locations to group them in clusters according to the input parameters minP ts and eps. We set minP ts = 2 (c) (d) and eps = 5, 10, 50, 100km in our experiments and eliminate the noise clusters produced by the algorithm, i.e. locations Figure 2: Distribution of the number of clus- that do not belong to any dense cluster of locations accord- ters per individual on the GSM dataset for eps = ing to the input parameters (see Figure 1). 5, 10, 50, 100km (the GPS dataset produces similar We compute the distribution of the number of obtained results). The plots highlight a clear tendency of (non-noise) clusters per individual, at different values of eps locations to cluster in dense groups. We observe parameter (see Figure 2). We observe a peaked distribution that: (i) the majority of individuals have few mobil- where the majority of individuals have few mobility cores, ity cores (2 or 3), (ii) as eps increases the mode of e.g. two mobility cores when eps = 5km and one mobil- the distribution approaches to one. ity core when eps = 100km, and individuals having more than ten mobility cores are extremely rare (Figure 2). The fact that the algorithm produces non-noise clusters indicates rg vs inter-rg rg vs inter-rg that that the locations of an individual are not randomly # mobility cores = 2 # mobility cores = 2 distributed but tend to aggregated in dense groups of loca- tions, representing geographical units of individual mobility. inter-rg [km] inter-rg [km] Our distribution of cores per person is in contrast with pre- vious works which build mobility groups using network sci- eps = 5km eps = 10km ence techniques [1], where most users possess 5-20 mobility groups and only ≈7% of users have a single mobility group. rg [km] rg [km] We also compare an individual’s radius of gyration rg with (a) (b) her inter-core radius rginter , observing a strong linear corre- lation (see Figure 3). Since the inter-core radius is computed on the dominant locations of the individual’s mobility cores, Figure 3: Radius of gyration (on x axis) versus inter- this result suggests that the radius of gyration is mainly de- core radius (y axis) of individuals having two mobil- termined by the tendency of an individual to partition her ity cores, for eps = 5km (a) and eps = 10km (b). mobility in different geographical units. If we compute the Plots refer to the GSM dataset (the GPS dataset distribution of individuals’ intra-core radius rgintra , indeed, produces similar results). PDF of intra-rg PDF of intra-rg 0.5 0.20 [5] G. M. V. et al. Lévy flight search patterns of eps = 5km 0.4 wandering albatrosses. Nature, 381:413–415, 1996. 0.15 [6] G. R.-F. et al. Lévy walk patterns in the foraging p(intra-rg) p(intra-rg) 0.3 eps = 10km movements of spider monkeys. Behavioral Ecology and 0.10 0.2 Sociobiology, 55(25), 2003. 0.05 0.1 [7] M. C. González, C. A. Hidalgo, and A.-L. Barabási. 0.0 0 2 4 6 8 10 12 14 16 0.00 0 20 40 60 80 100 120 140 160 Understanding individual human mobility patterns. intra-rg [km] intra-rg [km] Nature, 453(7196):779–782, June 2008. (a) (b) [8] S. Jiang, J. F. Jr, and M. González. Clustering daily patterns of human activities in the city. Data Mining Figure 4: Distribution of intra-core radius rgintra and Knowledge Discovery, 25:478–510, 2012. across individuals in the GSM dataset (the GPS [9] W. S. Jung, F. Wang, and H. E. Stanley. Gravity dataset produces similar results), for eps = 5km (a) model in the korean highway. EPL (Europhysics and eps = 50km (b). We observe that, for eps = 5km, Letters), 81:48005, 2008. the distribution is not a power law anymore but a [10] D. Karamshuk, C. Boldrini, M. Conti, and peak emerges denoting a characteristic radius of gy- A. Passarella. Human mobility models for ration (a). For eps = 50km the distribution starts opportunistic networks. IEEE Communications approaching a power law. Magazine, 49(12):157–165, 2011. [11] L. Liao, D. J. Patterson, D. Fox, and H. Kautz. Learning and inferring transportation routines. Artif. 6. CONCLUSIONS Intell., 171(5-6):311–331, Apr. 2007. [12] L. Pappalardo, S. Rinzivillo, Z. Qu, D. Pedreschi, and In this paper we showed that the locations visited by indi- F. Giannotti. Understanding the patterns of car viduals tend to cluster in a small number of mobility cores. travel. The European Physical Journal Special Topics, The radius of gyration computed on the dominant locations 215(1):61–73, 2013. of each mobility cores highly correlates with the standard radius of gyration, meaning that the characteristic distance [13] L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedreschi, traveled by individuals is mainly determined by their dom- and F. Giannotti. Comparing general mobility and inant locations. Moreover, individuals show homogenous mobility by car. In Proceedings of the 1st BRICS radii of gyration when constrained to travel within mobility Countries Congress (BRICS-CCI) and 11th Brazilian cores. Our results showed that individual human mobility Congress (CBIC) on Computational Intelligence, 2013. is composed by two types of trips: intra-core trips, which [14] L. Pappalardo, F. Simini, S. Rinzivillo, D. Pedreschi, represent movement within a given geographical unit, and F. Giannotti, and A.-L. Barabasi. Returners and inter-core trips, which define trips between locations belong- explorers dichotomy in human mobility. Nature ing to different mobility cores and generate the heterogene- Communications, 6, 09 2015. ity observed in human mobility ranges. As future work, we [15] S. Rinzivillo, L. Gabrielli, M. Nanni, L. Pappalardo, plan to investigate deeply the structure of intra- and inter- D. Pedreschi, and F. Giannotti. The purpose of trips and quantify the contribution of every single intra- or motion: Learning activities from individual mobility inter-trip in shaping the characteristic traveled distance of networks. In Proceedings of International Conference an individual. on Data Science and Advanced Analytics, DSAA’14, 2014. 7. ACKNOWLEDGMENTS [16] S. Rinzivillo, S. Mainardi, F. Pezzoni, M. Coscia, D. Pedreschi, and F. Giannotti. Discovering the This work has been partially funded by the EU under geographical borders of human mobility. Künstliche the FP7-ICT Program by project Petra n. 609042, under Intelligenz, 26(3):253–260, 2012. H2020 Program by projects SoBigData grant n. 654024 and [17] F. Simini, M. C. González, A. Maritan, and A.-L. Cimplex grant n. 641191. Barabási. A universal model for mobility and migration patterns. Nature, 484(7392):96–100, 2012. 8. REFERENCES [18] C. Song, Z. Qu, N. Blumm, and A.-L. Barabási. [1] J. Bagrow and Y.-R. Lin. Mesoscopic structure and Limits of predictability in human mobility. Science, social aspects of human mobility. PLoS ONE, 7(5), 327:1018–1021, 2010. 2012. [19] P.-N. Tan, M. Steinbach, and V. Kumar. Introduction [2] D. Brockmann, L. Hufnagel, and T. Geisel. The to Data Mining. Addison Wesley, 2006. scaling laws of human travel. Nature, [20] D. Wang, D. Pedreschi, C. Song, F. Giannotti, and 439(7075):462–465, 2006. A.-L. Barabási. Human mobility, social ties, and link [3] E. Cho, S. A. Myers, and J. Leskovec. Friendship and prediction. In Proceedings of the 17th ACM SIGKDD mobility: user movement in location-based social International Conference on Knowledge Discovery and networks. In Proceedings of the 17th ACM SIGKDD Data Mining, KDD ’11, pages 1100–1108, New York, International Conference on Knowledge Discovery and NY, USA, 2011. ACM. Data Mining, KDD’11, pages 1082–1090. ACM, 2011. [21] G. K. Zipf. The p1p2/d hypothesis: On the intercity [4] N. Eagle and A. Pentland. Eigenbehaviors: identifying movement of persons. American Sociological Review, structure in routine. Behavioral Ecology and 11(6):677–686, 1946. Sociobiology, 63:1057–1066, 2009.