Toward Finding Latent Cities with Non-Negative Matrix Factorization Eduardo Graells-Garrido Diego Caro Data Science Institute Data Science Institute Universidad del Desarrollo Universidad del Desarrollo Telefonica R&D Telefonica R&D Santiago, Chile Santiago, Chile egraells@udd.cl dcaro@udd.cl Denis Parra Dept. of Computer Science Faculty of Engineering Pontificia Universidad Catolica Santiago, Chile dparra@ing.puc.cl ABSTRACT available to them [12]. However, this is not enough to un- In the last decade, digital footprints have been used to cluster derstand the city. As Charles Montgomery says in his book, population activity into functional areas of cities. However, Happy City: “When we talk about cities, we usually end up a key aspect has been overlooked: we experience our cities talking about how various places look and perhaps how it not only by performing activities at specific destinations, but feels to be there in those places. But to stop there misses half also by moving from one place to another. In this paper, we the story, because they way we experience most parts of cities propose to analyze and cluster the city based on how people is at velocity: we glide past on the way to somewhere else. move through it. Particularly, we introduce Mobilicities, auto- City life is as much about moving through landscapes as it is matically generated travel patterns inferred from mobile phone about being in them” [32]. network data using NMF, a matrix factorization model. We Since people may spend a considerable amount of time while evaluate our method in a large city and we find that mobilic- moving through the city, and the quality of that time has a ities reveal latent but at the same time interpretable mobility strong influence on mood, health, and productivity [39], it is structures of the city. Our results provide evidence on how important to understand city structure with respect to mobility. clustering and visualization of aggregated phone logs could be Given that the growth of cities is faster than the capability of used in planning systems to interactively analyze city structure traditional methods to understand the city, it is important to and population activity. have cost-effective ways to analyze the city at scale [46]. Author Keywords In this paper, inspired by Montogomery’s ideas, we estimated Mobile Phone Networks; Urban Informatics; Urban Mobility; a characterization of the city defined by the collective experi- Non-Negative Matrix Factorization. ence of its several areas. Particularly, we analyzed intra-city transportation inferred from mobile phone network records, which we represented in a Waypoints Matrix. This matrix, sim- INTRODUCTION ilar to document-term matrices used in Information Retrieval, The increasing availability of digital footprints, such as was decomposed using Non-Negative Matrix Factorization Web/App access logs, user-generated content, and mobile (NMF) [11]. We interpreted and labeled the obtained com- phone network data, has allowed to characterize the city at ponents, which we denoted Mobilicities. We evaluated our spatio-temporal granularities never seen before. This means pipeline by performing a case study in Santiago, Chile, using that the different functional areas of the city can be estimated, mobile phone network data from the biggest telecommunica- based not only on how planners thought that the city would tions company in the country. Our pipeline delivered inter- be lived, but on how people actually used the different spaces pretable results, in contrast with a well-established method. We concluded that mobilicities can be used within an intel- ligent user interface aimed at mobility and transportation- analysis tasks. BACKGROUND AND RELATED WORK There has been a flurry of research of mobile phone network ©2018. Copyright for the individual papers remains with the authors. data known as eXtended and Call Detail Records (X/CDR), as Copying permitted for private and academic purposes. evidenced in recent surveys on the area [4, 8]. Some examples UISTDA ’18, March 11, 2018, Tokyo, Japan include: understanding socio-economic factors on the popula- 3. The decomposition of W using NMF into the product of two tion [41], understanding family and social relations [13], char- matrices, U and T , according to a number of k components, acterizing response to emergencies and critical events [33], which we denote as mobilicities. crime detection [5], credit scoring [40], test of urban theo- Trip Detection. To detect trips from X/CDR traces, we resort ries [14]; and the provision of a cost-effective way of un- to an algorithm based on transportation rules and trajectory derstanding population dynamics and behavior in developing countries [21]. simplification [19]. The algorithm builds a space-time trajec- tory from daily X/CDR events, where space is the cumula- The mobile phone network events of a given device depict tive distance between consecutive connected towers, starting a spatio-temporal trajectory that can be processed to infer from zero at the first connection of the day. This trajectory is trips, by using geometric approaches based on transportation simplified using a line-simplification algorithm. Then, each rules [19], or by clustering events in the trajectory [7]. When segment from the simplified trajectory is categorized accord- individual trips are known, it is possible to aggregate them into ing to transportation rules, such as the relationship between Origin-Destination (OD) matrices. This analysis is common the approximated trip distance and time, which is visually in the literature from X/CDR [2, 23, 16, 7], and shows that inspected through the slope of the segment. In other words, inferring individual mobility is a relevant problem. the trip detection allows us to separate X/CDR events into the following: stationary events (the user was performing an Other important aspects are the characterization of land use activity), trip start events (denoting the origin), trip end events (e.g., residential areas, business areas, etc.) and functional (denoting the destination), and within-trip events (denoting areas (i.e., delimited areas that serve specific or multiple land mobility). uses). Since it is crucial to understand the dynamics of these aspects, functional areas have been measured, monitored and Building the Waypoints Matrix. Ideally, characterization of categorized using digital footprints [45, 43] and X/CDR [34, within-trips events does not need to be done using aggrega- 26, 42, 1, 18]. A similar work to ours has applied NMF to tion. For instance, GPS data allows to do rich clustering over understand trip purpose, and build functional areas based on specific trajectories [47]. However, due to the billing purpose the spatial distribution of such purposes [36]. of X/CDR data, it is possible that trips have few within-trip events because of the billing cycle. Since we will focus on The key difference between the aforementioned work and our within-trip events, we need to aggregate such events from proposal is the focus. Other work focuses on the destinations an extended period of time. Furthermore, some trips do not of trips, as well as activities performed within places. As have within-trip events, such as those with duration near to such, their definition of functional area is limited by those the billing cycle time, and those within zones with low tower places that, in transportation terms, attract people [20]; yet, as density. Hence, by aggregating all within-trip events for a user mentioned by urbanists, the city is experienced in sequence by in a period of time, the likelihood of identifying the towers moving from one place to another [32]. Each citizen has an that characterize a specific user’s mobility increases. We use unique version of the city, built upon the sequence of nodes, this schema to define a Waypoints Matrix W, defined as: landmarks, and paths traversed [28]. In this paper we show that, by using mobility inferred from X/CDR, and using NMF to decompose/cluster the different cities experienced by mobile # of within-trip events of user ui at tower t j wi, j = phone users, we are able to identify the different Mobilicities # of within-trip events of user ui that comprise a big urban city.Even though we have centered the discussion around mobile phone network data, it is possible This schema is equivalent to the L1-normalized document- to infer transportation and urban patterns from other sources, term matrices found in Information Retrieval, but without particularly social media. Twitter has been shown to be a weighting with Tf-Idf [44]. We do not apply Tf-Idf because good predictor of commuter flows [31] at several scales [27]. its purpose is to identify discriminative features; conversely, Now, these approaches have the same limitation as previous we want to extract collective features. Additionally, note that approaches: a focus on the origins and destinations of trips, this matrix is different to those used in related work with NMF mainly due to their way of modelling mobility: using gravity decompositions [36]: there is a semantic difference between and radiation models (see [29] for a comparison). Twitter within-trips and trip start/end events. To avoid this polysemic data, while massive and longitudinal, does not allow to infer behavior, we focus only on within-trips events. within-trip behavior. Applying Non-Negative Matrix Factorization. To represent how users interact with towers, we propose to decompose this METHODS matrix into two: W = U × T , where U is a |u| × k matrix that Our methods can be summarized in a pipeline of three steps: encodes k user latent features for |u| users, and T is a k × |t| matrix that encode k latent tower features for |t| different cell 1. Trip detection from X/CDR data, which, for each device phone towers. Note that, by definition, all wi, j ≥ 0. NMF in the dataset, identifies its corresponding daily trips, with allows us to decompose the matrix W into two non-negative origin, intermediate, and destination towers. matrices, which gives a lower rank approximation for W, such 2. The construction of a Waypoint Matrix W that aggregates that W ≈ U × T [24]. This can be formalized as the following the intermediate towers of trips, of a given period of time, optimization problem: minU,T kW − U × T kF subject to U into a device-antenna matrix. and T be non-negative, where number of rows in U and the Figure 2: Number of events in the dataset per day. The effect of weekends in the number of events is easily identifiable. Metric # of Trips 4,213,400 # of Users 124,415 # of Users with Within-Trip events 95,027 Mean trips per user 33.87 Std. Dev. 19.62 Min 1 Percentile 25% 18 Figure 1: Choropleth map of the urban area of Santiago, Chile. Percentile 50% 34 Each municipality is colored according to their average income Percentile 75% 48 (in CLP$). Max 140 number of columns in T correspond to the desired lower-rank Table 1: Statistics with respect to the number of inferred trips. approximation k. Even though there is a variety of methods to decompose ma- trices, we choose NMF, which has been applied in similar contexts with interpretable results [36]. Then, we define a trips between 6AM and midnight, and had either a pre-paid mobilicity mc as the weighted set of towers within the c com- or contract subscription. They generated an average of 5.33 ponent of the decomposition, i.e., the c-th column of matrix million billing records per day (c.f. Fig. 2). The average inter- T. The parameter k must be chosen manually, and its value event times for users range within 14.71 and 30.96 minutes, should be ideally decided jointly between data scientists and which shows a billing cycle between fifteen and thirty minutes. domain experts according to the context. Previous strategies Telefónica has 1,464 cell phone towers in the municipalities for choosing k have focused in measuring the stability of the under consideration. We discarded towers that were installed components [6] and in the variation of the residual sum of in in-door contexts (e.g., malls, hospitals, etc.). This is possible squares curve between the original matrix and its decompo- because tower meta-data includes their geographical position sition [17, 22]. However, we prefer to manually choose the and their name. For out-door towers, the name usually contains number of components as these methods do not allow us to the nearest crossing, while in-door towers contain the name of incorporate external information such as the socio-economic the place they lie in. In total, there were |t| = 1,082 out-door distribution of the city. towers (see Fig. 3, Towers). The only in-door towers that we kept were those installed within underground metro stations. CASE STUDY: SANTIAGO, CHILE We performed a case study on Santiago, the capital of Chile, OpenStreetMap. OSM (http://openstreetmap.org) is a crowd- with almost 8 million inhabitants. Its urban area covers a sur- sourced maps platform. We downloaded a dump of its data for face of 867.75 square kilometers, and is composed of 35 inde- Chile, and then identified the highways within Santiago. We pendent administrative units called municipalities (c.f. Fig. 1). used this information to contextualize the different mobilici- Because this city has experienced accelerated growth, and ties identified by the NMF. We did so by finding the out-door it is expected to keep growing at least until 2045 [38], un- towers that lie within 250 meters of each highway, as shown derstanding its structure at scale is an important and timely in Fig. 3 (Labeled Towers). task. Datasets Trip Detection and Waypoints Matrix Mobile Phone Network Data. We studied an anonymized Using the trip inference algorithm we detected 4,213,400 trips X/CDR dataset from Telefónica Chile, the biggest telco. in for 124,415 users (c.f. Table 1 for descriptive statistics). Fig. 4 Chile, with a market share of 33% in 2016. The dataset con- shows the departure time distribution of all trips. One can see tained records between July 27th and August 10th from 2016. that business days exhibit expected peak-times related to work In total, we analyzed 124,414 users, who had enough connec- hours, and that weekends exhibit different patterns, such as a tions to the cell towers under analysis to estimate their daily higher density at lunch time. Figure 3: Maps of Santiago: cell phone tower network, highway and primary streets, the metro network, and the set of labeled towers according to their distance to highways or metro lines. C1 : people that live in the southern part of the city, mostly between two metro lines. Since this area is characterized by low income, this means that they need to take a bus to reach the metro. C2 : the southeast area of the city, which is characterized by their dependency of two metro lines. This component con- tains mixed-income municipalities. C3 : In contrast with the previous components, this one is com- Figure 4: Trip departure time distribution per day. pletely focused on public transportation: it fully contains two metro lines in full, and partially other two. It also After detecting trips, we built the W matrix, of dimensions contains bus corridors that tend to connect to metro lines. |u| × |t|. Note that |u| = 95,027, because not all users had C4 : the northern part of the city, which is mostly residential within-trip events. and of low income. The component also has a main street of the city as a kind of tentacle, showing that people who Factorization of the Waypoints Matrix lives/work in this area, but who work/lives in another, uses To perform the NMF decomposition, we chose k = 8, because this street as a way to get into the component. the city is usually divided into six big areas (north, south, east, C5 : the south-west part of the city, which is connected to west, south-east, center) and, since we expect that the results downtown primarily through a highway and a metro line exhibit relationship with modes of transportation, we wanted that is parallel to the highway. to see the effect of private and public transportation. Thus, k = C6 : the western area of the city. This area contains one of the 8 is an arguably reasonable choice (note that we discuss the most populated municipalities in the city. choice of k at the next section). C7 : similar to C6, but extending its reach to center areas of the city through a metro line a bus corridors. This makes Tower-Component Matrix. Figure 5 shows the results of this component dependent on public transport, and thus, the the factorization, with one map for each component-tower routes followed by its inhabitants tend to cluster, in contrast column from the matrix. One can see that there is a strong to what happens in C6. geographical clustering of towers, which may be explained as W is essentially a co-occurrence matrix. In summary, latent cities seem to be comprised by three kinds of clusters of towers: those where people lives and moves, Fig. 6 show how the sets of labeled towers (those near high- enclosed by specific limits (C0, C1, C6), those where people ways, near surface metro and within underground metro sta- lives and work, but in different areas of the city, connected tions) relate to each component. This allows to see that some through the transportation network (C2, C4, C5, C7), and components tend to be more associated than others to some transportation infrastructure (C3). modes of transportation: C1, C2, C3 and C7 are more associ- ated to metro than highways, while C4 exhibits the opposite User-Component Matrix. The user-component matrix may behavior. Having both figures into account, the following is suggest that users pass through different mobilicities in their an interpretation of each mobilicity: daily lives. Fig. 7 explores this potential behavior, by display- ing how a sample of 25,000 users cluster around the corre- C0 : the east side of the city, including part of the center, next sponding components. One can see that, indeed, users tend to the yellow metro line and one important highway of the to have a primary component, but they still belong to others. city. This area is characterized for its business districts and This would be the equivalent to, for instance, living in the high income residential areas (c.f. Fig. 3). As such, it is suburbs, and having to travel long distances to go to work. likely that its residents do not use public transportation, nor Note that many users from C0, the wealthiest part of the city, visit other mobilicities. Note how metro towers have lower have negligible association to other components – something association with this component in Fig. 6. Figure 5: The eight mobilicities of the tower-component matrix obtained by performing NMF. Figure 6: Point-plot of the average association of the labeled sets of towers into the different mobilicities. ized matrix. We do so by estimating mobilicities with k0 = 4 (c.f. Fig. 8) and k00 = 12 (c.f. Fig. 9). With four mobilicities, the clustering is mostly geographic: three components split the city. However, the fourth compo- nent is related to public transportation: it reconstructs several metro lines and bus corridors. In this aspect, it seems that using k0 = 4 allows to obtain a similar result to k = 8. Then, if one would like to differentiate cell towers with respect to general transportation patterns, this could be a reasonable choice. With twelve mobilicities, the geographical clustering is still present, but the routes that connect distinct parts of the city Figure 7: Heatmap of a random sample of 25K users and their become more evident – meaning that a mobilicity is comprised corresponding component associations. by one or two sectors of close towers (for instance, home and work locations), plus “bridges” that connect one mobilicity expected due to the economical segregation of the city (c.f. to another, based on the common routes followed by people. Fig. 3). This behavior is expected, due to the co-occurrence property of the Waypoints Matrix. Understanding k. We showed the eight mobilicities (c.f. Fig. 5) to domain experts, who gave informal feedback – it In summary, several values of k allow to infer soft-partitions made sense to split the city in this way, as we have interpreted of the city, as well as the way its inhabitants move between earlier. Even though a formal evaluation with domain experts those partitions. A mobilicity may be a soft-partition, a soft- is left for future work, here we discuss the patterns that emerge partition with bridges to other mobilicities, or a network of when varying the parameter k which is the rank of the factor- those bridges – namely, a transportation network. ization methods for positive-only data such as SLIM, which has shown promising results in the past [25], and would allow to understand how the choice of k influences the output and its interpretability. On the other hand, visualization and ex- ploratory interfaces are tools valued by domain experts [10], and mobility has been a recurring topic in visual analytics [3]. Finally, our work did not consider the temporal aspects of transportation. Hence, future work should consider how to incorporate that dimension into the definition of Mobilicities. Acknowledgements. The analysis was performed using Jupyter Notebooks [37], jointly with the scikit-learn [35], pan- das [30], and geopandas libraries. The maps on this paper include data from ©OpenStreetMap contributors. We also thank Telefónica R&D in Santiago for facilitating the data for this study, in particular Pablo Garcı́a Briosso. The au- thor Denis Parra has been funded by Conicyt, Fondecyt grant 11150783, as well as Fondef grant id16i10222 and the BRT+ Centre of Excellence funded by VREF. Finally, we thank the anonymous reviewers for the insightful comments that helped to improve this paper. Figure 8: Mobilicities obtained with k =4. REFERENCES 1. Rein Ahas, Anto Aasa, Siiri Silm, and Margus Tiru. Daily rhythms of Comparing Interpretability with PCA/Truncated SVD. To suburban commuters’ movements in the Tallinn metropolitan area: case discuss further whether NMF is a good choice of model in study with mobile positioning data. Transportation Research Part C: terms of interpretability, we estimated a Truncated SVD de- Emerging Technologies, 18(1):45–54, 2010. composition (equivalent to PCA) with k = 8 . Fig. 10 shows 2. Lauren Alexander, Shan Jiang, Mikel Murga, and Marta C González. that, in contrast to NMF, there is no geographical clustering Origin–destination trips by purpose and time of day inferred from mobile phone data. Transportation Research Part C: Emerging nor correspondence to any infrastructure available in the city. Technologies, 2015. Thus, even though PCA is a widely used dimensionality re- 3. Natalia Andrienko and Gennady Andrienko. Visual analytics of duction technique, it does not allow the interpretation nor movement: An overview of methods, tools and procedures. Information clustering of the city in the same way as NMF does. Visualization, 12(1):3–24, 2013. 4. Vincent D Blondel, Adeline Decuyper, and Gautier Krings. A survey of CONCLUSIONS results on mobile phone datasets analysis. EPJ Data Science, 4(1):1, 2015. We proposed the concept of mobilicities, which denotes the different cities experienced by the inhabitants of a big city, 5. Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Emmanuel Letouzé, Nuria Oliver, Fabio Pianesi, and Alex Pentland. Moves on the street: and depict its dynamics with respect to mobility and usage Classifying crime hotspots using aggregated anonymized data on people of modes of transportation. The suitability of NMF to this dynamics. Big Data, 3(3):148–158, 2015. kind of spatial data could be related to the fact that NMF is 6. Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov. equivalent to spectral clustering [15], which has performed Metagenes and molecular pattern discovery using matrix factorization. well when grouping trip destination data [12]. However, as we Proceedings of the National Academy of Sciences, 101(12):4164–4169, have noted in our motivation, our input is not destination nor 2004. origin data; instead, it is spatial location while moving. This 7. Francesco Calabrese, Giusy Di Lorenzo, Liang Liu, and Carlo Ratti. Estimating origin-destination flows using mobile phone location data. focus was inspired by the book Happy City [32], reflecting IEEE Pervasive Computing, 10(4):0036–44, 2011. that our purpose was to help domain experts and policy de- 8. Francesco Calabrese, Laura Ferrari, and Vincent D Blondel. Urban signers to make better, happier cities. Such purpose implies sensing using mobile phone network data: a survey of research. ACM collaboration between the emerging field of data science and Computing Surveys (CSUR), 47(2):25, 2015. the corresponding disciplines – transportation and urban plan- 9. Davide Castelvecchi. Can we open the black box of AI? Nature News, ning. However, evidence-based policy in those areas requires 538(7623):20, 2016. transparency and interpretability, and many state of the art 10. Tao Cheng, Garavig Tanaksaranond, Chris Brunsdon, and James machine learning techniques do not offer both qualities [9]. Haworth. Exploratory visualisation of congestion evolutions on urban In this paper, we have shown that NMF does offer both qual- transport networks. Transportation Research Part C: Emerging Technologies, 36:296–306, 2013. ities when applied to mobility data, and thus, is a promising 11. Andrzej Cichocki and Anh-Huy Phan. Fast local algorithms for large technique to apply in the field of Urban Computing [46]. scale nonnegative matrix and tensor factorizations. IEICE transactions Limitations and Future Work. Critics may rightly say that on fundamentals of electronics, communications and computer sciences, 92(3):708–721, 2009. we need a well-defined criteria to choose k. Future work 12. Justin Cranshaw, Raz Schwartz, Jason Hong, and Norman Sadeh. The should tackle this limitation using intelligent user interfaces Livehoods Project: Utilizing social media to understand the dynamics of aimed at domain experts. This opens two lines of research a city. In Sixth International AAAI Conference on Weblogs and Social within the IUI: on the one hand, we could try other factor- Media, 2012. Figure 9: Mobilicities obtained with k =12. Figure 10: Tower-component results of the application of Truncated SVD/PCA to the Waypoints Matrix, with k = 8. 13. Tamas David-Barrett, Janos Kertesz, Anna Rotkirch, Asim Ghosh, Kunal 32. Charles Montgomery. Happy city: transforming our lives through urban Bhattacharya, Daniel Monsivais, and Kimmo Kaski. Communication design. Macmillan, 2013. with family and friends across the life course. PloS one, 33. Benyounes Moumni, Vanessa Frias-Martinez, and Enrique 11(11):e0165687, 2016. Frias-Martinez. Characterizing social response to urban earthquakes 14. Marco De Nadai, Jacopo Staiano, Roberto Larcher, Nicu Sebe, Daniele using cell-phone network data: the 2012 Oaxaca earthquake. In Quercia, and Bruno Lepri. The death and life of great Italian cities: a Proceedings of the 2013 ACM conference on Pervasive and ubiquitous mobile phone data perspective. In Proceedings of the 25th International computing adjunct publication, pages 1199–1208. ACM, 2013. Conference on World Wide Web, pages 413–423. International World 34. Anastasios Noulas, Cecilia Mascolo, and Enrique Frias-Martinez. Wide Web Conferences Steering Committee, 2016. Exploiting Foursquare and cellular data to infer user activity in urban 15. Chris Ding, Xiaofeng He, and Horst D Simon. On the equivalence of environments. In 2013 IEEE 14th International Conference on Mobile nonnegative matrix factorization and spectral clustering. In Proceedings Data Management, pages 167–176. IEEE, 2013. of the 2005 SIAM International Conference on Data Mining, pages 35. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, 606–610. SIAM, 2005. O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, 16. Vanessa Frias-Martinez, Cristina Soguero, and Enrique Frias-Martinez. J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and Estimation of urban commuting patterns using cellphone network data. E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of In Proceedings of the ACM SIGKDD International Workshop on Urban Machine Learning Research, 12:2825–2830, 2011. Computing, pages 9–16. ACM, 2012. 36. Chengbin Peng, Xiaogang Jin, Ka-Chun Wong, Meixia Shi, and Pietro 17. Attila Frigyesi and Mattias Höglund. Non-negative matrix factorization Liò. Collective human mobility pattern from taxi trips in urban area. for the analysis of complex gene expression data: identification of PloS one, 7(4):e34487, 2012. clinically relevant tumor subtypes. Cancer informatics, 6(2003):275–92, 37. Fernando Pérez and Brian E Granger. IPython: a system for interactive jan 2008. scientific computing. Computing in Science & Engineering, 9(3):21–29, 18. Eduardo Graells-Garrido, Oscar Peredo, and José Garcı́a. Sensing urban 2007. patterns with antenna mappings: the case of Santiago, Chile. Sensors, 38. Olga Lucia Puertas, Cristian Henrı́quez, and Francisco Javier Meza. 16(7):1098, 2016. Assessing spatial dynamics of urban growth using an integrated land use 19. Eduardo Graells-Garrido and Diego Saez-Trumper. A day of your days: model. application in Santiago metropolitan area, 2010–2045. Land Use estimating individual daily journeys using mobile data to understand Policy, 38:415–425, 2014. urban flow. In Proceedings of the Second International Conference on 39. Heiko Rüger, Simon Pfaff, Heide Weishaar, and Brenton M Wiernik. IoT in Urban Space, pages 1–7. ACM, 2016. Does perceived stress mediate the relationship between commuting and 20. Randolph Hall. Handbook of transportation science, volume 23. health-related quality of life? Transportation research part F: traffic Springer Science & Business Media, 2012. psychology and behaviour, 50:100–108, 2017. 21. Martin Hilbert. Big data for development: A review of promises and 40. Jose San Pedro, Davide Proserpio, and Nuria Oliver. Mobiscore: towards challenges. Development Policy Review, 34(1):135–174, 2016. universal credit scoring from mobile phone data. In International Conference on User Modeling, Adaptation, and Personalization, pages 22. Lucie N. Hutchins, Sean M. Murphy, Priyam Singh, and Joel H. Graber. 195–207. Springer, 2015. Position-dependent motif characterization using non-negative matrix factorization. Bioinformatics, 24(23):2684–2690, 2008. 41. Victor Soto, Vanessa Frias-Martinez, Jesus Virseda, and Enrique Frias-Martinez. Prediction of socioeconomic levels using cell phone 23. Md Shahadat Iqbal, Charisma F Choudhury, Pu Wang, and Marta C records. In International Conference on User Modeling, Adaptation, and González. Development of origin–destination matrices using mobile Personalization, pages 377–388. Springer, 2011. phone call data. Transportation Research Part C: Emerging Technologies, 40:63–74, 2014. 42. Jameson L Toole, Michael Ulm, Marta C González, and Dietmar Bauer. Inferring land use from mobile phone activity. In Proceedings of the 24. Da Kuang, Chris Ding, and Haesun Park. Symmetric Nonnegative Matrix ACM SIGKDD international workshop on urban computing, pages 1–8. Factorization for Graph Clustering, pages 106–117. 2012. ACM, 2012. 25. Santiago Larraı́n, Denis Parra, and Alvaro Soto. Towards improving 43. Carmen Karina Vaca, Daniele Quercia, Francesco Bonchi, and Piero top-N recommendation by generalization of SLIM. In RecSys Posters, Fraternali. Taxonomy-based discovery and annotation of functional areas 2015. in the city. In Ninth International AAAI Conference on Web and Social 26. Maxime Lenormand, Miguel Picornell, Oliva G Cantú-Ros, Thomas Media, 2015. Louail, Ricardo Herranz, Marc Barthelemy, Enrique Frı́as-Martı́nez, 44. R Baeza Yates and B Ribeiro Neto. Modern information retrieval: the Maxi San Miguel, and José J Ramasco. Comparing and modelling land concepts and technology behind search. Addison-Wesley Professional, use organization in cities. Royal Society Open Science, 2(12):150449, 2011. 2015. 45. Jing Yuan, Yu Zheng, and Xing Xie. Discovering regions of different 27. Jiajun Liu, Kun Zhao, Saeed Khan, Mark Cameron, and Raja Jurdak. functions in a city using human mobility and pois. In Proceedings of the Multi-scale population and mobility estimation with geo-tagged tweets. 18th ACM SIGKDD international conference on Knowledge discovery In Data Engineering Workshops (ICDEW), 2015 31st IEEE and data mining, pages 186–194. ACM, 2012. International Conference on, pages 83–86. IEEE, 2015. 46. Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. Urban computing: 28. Kevin Lynch. The image of the city, volume 11. MIT press, 1960. concepts, methodologies, and applications. ACM Transactions on 29. A Paolo Masucci, Joan Serras, Anders Johansson, and Michael Batty. Intelligent Systems and Technology (TIST), 5(3):38, 2014. Gravity versus radiation models: On the importance of scale and 47. Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. Mining interesting heterogeneity in commuting flows. Physical Review E, 88(2):022812, locations and travel sequences from gps trajectories. In Proceedings of 2013. the 18th international conference on World wide web, pages 791–800. ACM, 2009. 30. Wes McKinney. Data structures for statistical computing in Python. In Proceedings of the 9th Python in Science Conference, volume 445, pages 51–56, 2010. 31. Graham McNeill, Jonathan Bright, and Scott A Hale. Estimating local commuting patterns from geolocated Twitter data. EPJ Data Science, 6(1):24, 2017.