=Paper= {{Paper |id=Vol-2068/uistda4 |storemode=property |title=Toward Finding Latent Cities |pdfUrl=https://ceur-ws.org/Vol-2068/uistda4.pdf |volume=Vol-2068 |authors=Eduardo Graells-Garrido,Diego Caro,Denis Parra |dblpUrl=https://dblp.org/rec/conf/iui/Graells-Garrido18 }} ==Toward Finding Latent Cities== https://ceur-ws.org/Vol-2068/uistda4.pdf
                               Toward Finding Latent Cities
                          with Non-Negative Matrix Factorization
                         Eduardo Graells-Garrido                                  Diego Caro
                           Data Science Institute                            Data Science Institute
                         Universidad del Desarrollo                        Universidad del Desarrollo
                             Telefonica R&D                                    Telefonica R&D
                              Santiago, Chile                                   Santiago, Chile
                             egraells@udd.cl                                     dcaro@udd.cl
                                                                 Denis Parra
                                                        Dept. of Computer Science
                                                          Faculty of Engineering
                                                       Pontificia Universidad Catolica
                                                               Santiago, Chile
                                                             dparra@ing.puc.cl
ABSTRACT                                                                available to them [12]. However, this is not enough to un-
In the last decade, digital footprints have been used to cluster        derstand the city. As Charles Montgomery says in his book,
population activity into functional areas of cities. However,           Happy City: “When we talk about cities, we usually end up
a key aspect has been overlooked: we experience our cities              talking about how various places look and perhaps how it
not only by performing activities at specific destinations, but         feels to be there in those places. But to stop there misses half
also by moving from one place to another. In this paper, we             the story, because they way we experience most parts of cities
propose to analyze and cluster the city based on how people             is at velocity: we glide past on the way to somewhere else.
move through it. Particularly, we introduce Mobilicities, auto-         City life is as much about moving through landscapes as it is
matically generated travel patterns inferred from mobile phone          about being in them” [32].
network data using NMF, a matrix factorization model. We
                                                                        Since people may spend a considerable amount of time while
evaluate our method in a large city and we find that mobilic-
                                                                        moving through the city, and the quality of that time has a
ities reveal latent but at the same time interpretable mobility
                                                                        strong influence on mood, health, and productivity [39], it is
structures of the city. Our results provide evidence on how
                                                                        important to understand city structure with respect to mobility.
clustering and visualization of aggregated phone logs could be
                                                                        Given that the growth of cities is faster than the capability of
used in planning systems to interactively analyze city structure
                                                                        traditional methods to understand the city, it is important to
and population activity.
                                                                        have cost-effective ways to analyze the city at scale [46].

Author Keywords                                                         In this paper, inspired by Montogomery’s ideas, we estimated
Mobile Phone Networks; Urban Informatics; Urban Mobility;               a characterization of the city defined by the collective experi-
Non-Negative Matrix Factorization.                                      ence of its several areas. Particularly, we analyzed intra-city
                                                                        transportation inferred from mobile phone network records,
                                                                        which we represented in a Waypoints Matrix. This matrix, sim-
INTRODUCTION                                                            ilar to document-term matrices used in Information Retrieval,
The increasing availability of digital footprints, such as              was decomposed using Non-Negative Matrix Factorization
Web/App access logs, user-generated content, and mobile                 (NMF) [11]. We interpreted and labeled the obtained com-
phone network data, has allowed to characterize the city at             ponents, which we denoted Mobilicities. We evaluated our
spatio-temporal granularities never seen before. This means             pipeline by performing a case study in Santiago, Chile, using
that the different functional areas of the city can be estimated,       mobile phone network data from the biggest telecommunica-
based not only on how planners thought that the city would              tions company in the country. Our pipeline delivered inter-
be lived, but on how people actually used the different spaces          pretable results, in contrast with a well-established method.
                                                                        We concluded that mobilicities can be used within an intel-
                                                                        ligent user interface aimed at mobility and transportation-
                                                                        analysis tasks.

                                                                        BACKGROUND AND RELATED WORK
                                                                        There has been a flurry of research of mobile phone network
©2018. Copyright for the individual papers remains with the authors.    data known as eXtended and Call Detail Records (X/CDR), as
Copying permitted for private and academic purposes.                    evidenced in recent surveys on the area [4, 8]. Some examples
UISTDA ’18, March 11, 2018, Tokyo, Japan
include: understanding socio-economic factors on the popula-         3. The decomposition of W using NMF into the product of two
tion [41], understanding family and social relations [13], char-        matrices, U and T , according to a number of k components,
acterizing response to emergencies and critical events [33],            which we denote as mobilicities.
crime detection [5], credit scoring [40], test of urban theo-
                                                                     Trip Detection. To detect trips from X/CDR traces, we resort
ries [14]; and the provision of a cost-effective way of un-
                                                                     to an algorithm based on transportation rules and trajectory
derstanding population dynamics and behavior in developing
countries [21].                                                      simplification [19]. The algorithm builds a space-time trajec-
                                                                     tory from daily X/CDR events, where space is the cumula-
The mobile phone network events of a given device depict             tive distance between consecutive connected towers, starting
a spatio-temporal trajectory that can be processed to infer          from zero at the first connection of the day. This trajectory is
trips, by using geometric approaches based on transportation         simplified using a line-simplification algorithm. Then, each
rules [19], or by clustering events in the trajectory [7]. When      segment from the simplified trajectory is categorized accord-
individual trips are known, it is possible to aggregate them into    ing to transportation rules, such as the relationship between
Origin-Destination (OD) matrices. This analysis is common            the approximated trip distance and time, which is visually
in the literature from X/CDR [2, 23, 16, 7], and shows that          inspected through the slope of the segment. In other words,
inferring individual mobility is a relevant problem.                 the trip detection allows us to separate X/CDR events into
                                                                     the following: stationary events (the user was performing an
Other important aspects are the characterization of land use         activity), trip start events (denoting the origin), trip end events
(e.g., residential areas, business areas, etc.) and functional       (denoting the destination), and within-trip events (denoting
areas (i.e., delimited areas that serve specific or multiple land    mobility).
uses). Since it is crucial to understand the dynamics of these
aspects, functional areas have been measured, monitored and          Building the Waypoints Matrix. Ideally, characterization of
categorized using digital footprints [45, 43] and X/CDR [34,         within-trips events does not need to be done using aggrega-
26, 42, 1, 18]. A similar work to ours has applied NMF to            tion. For instance, GPS data allows to do rich clustering over
understand trip purpose, and build functional areas based on         specific trajectories [47]. However, due to the billing purpose
the spatial distribution of such purposes [36].                      of X/CDR data, it is possible that trips have few within-trip
                                                                     events because of the billing cycle. Since we will focus on
The key difference between the aforementioned work and our
                                                                     within-trip events, we need to aggregate such events from
proposal is the focus. Other work focuses on the destinations
                                                                     an extended period of time. Furthermore, some trips do not
of trips, as well as activities performed within places. As
                                                                     have within-trip events, such as those with duration near to
such, their definition of functional area is limited by those        the billing cycle time, and those within zones with low tower
places that, in transportation terms, attract people [20]; yet, as   density. Hence, by aggregating all within-trip events for a user
mentioned by urbanists, the city is experienced in sequence by       in a period of time, the likelihood of identifying the towers
moving from one place to another [32]. Each citizen has an           that characterize a specific user’s mobility increases. We use
unique version of the city, built upon the sequence of nodes,        this schema to define a Waypoints Matrix W, defined as:
landmarks, and paths traversed [28]. In this paper we show
that, by using mobility inferred from X/CDR, and using NMF
to decompose/cluster the different cities experienced by mobile                      # of within-trip events of user ui at tower t j
                                                                           wi, j =
phone users, we are able to identify the different Mobilicities                            # of within-trip events of user ui
that comprise a big urban city.Even though we have centered
the discussion around mobile phone network data, it is possible      This schema is equivalent to the L1-normalized document-
to infer transportation and urban patterns from other sources,       term matrices found in Information Retrieval, but without
particularly social media. Twitter has been shown to be a            weighting with Tf-Idf [44]. We do not apply Tf-Idf because
good predictor of commuter flows [31] at several scales [27].        its purpose is to identify discriminative features; conversely,
Now, these approaches have the same limitation as previous           we want to extract collective features. Additionally, note that
approaches: a focus on the origins and destinations of trips,        this matrix is different to those used in related work with NMF
mainly due to their way of modelling mobility: using gravity         decompositions [36]: there is a semantic difference between
and radiation models (see [29] for a comparison). Twitter            within-trips and trip start/end events. To avoid this polysemic
data, while massive and longitudinal, does not allow to infer        behavior, we focus only on within-trips events.
within-trip behavior.
                                                                     Applying Non-Negative Matrix Factorization. To represent
                                                                     how users interact with towers, we propose to decompose this
METHODS                                                              matrix into two: W = U × T , where U is a |u| × k matrix that
Our methods can be summarized in a pipeline of three steps:          encodes k user latent features for |u| users, and T is a k × |t|
                                                                     matrix that encode k latent tower features for |t| different cell
1. Trip detection from X/CDR data, which, for each device            phone towers. Note that, by definition, all wi, j ≥ 0. NMF
   in the dataset, identifies its corresponding daily trips, with    allows us to decompose the matrix W into two non-negative
   origin, intermediate, and destination towers.                     matrices, which gives a lower rank approximation for W, such
2. The construction of a Waypoint Matrix W that aggregates           that W ≈ U × T [24]. This can be formalized as the following
   the intermediate towers of trips, of a given period of time,      optimization problem: minU,T kW − U × T kF subject to U
   into a device-antenna matrix.                                     and T be non-negative, where number of rows in U and the
                                                                    Figure 2: Number of events in the dataset per day. The effect
                                                                    of weekends in the number of events is easily identifiable.


                                                                           Metric
                                                                           # of Trips                                4,213,400
                                                                           # of Users                                  124,415
                                                                           # of Users with Within-Trip events           95,027
                                                                           Mean trips per user                            33.87
                                                                           Std. Dev.                                      19.62
                                                                           Min                                                1
                                                                           Percentile 25%                                    18
Figure 1: Choropleth map of the urban area of Santiago, Chile.             Percentile 50%                                    34
Each municipality is colored according to their average income             Percentile 75%                                    48
(in CLP$).                                                                 Max                                              140
number of columns in T correspond to the desired lower-rank         Table 1: Statistics with respect to the number of inferred trips.
approximation k.
Even though there is a variety of methods to decompose ma-
trices, we choose NMF, which has been applied in similar
contexts with interpretable results [36]. Then, we define a         trips between 6AM and midnight, and had either a pre-paid
mobilicity mc as the weighted set of towers within the c com-       or contract subscription. They generated an average of 5.33
ponent of the decomposition, i.e., the c-th column of matrix        million billing records per day (c.f. Fig. 2). The average inter-
T. The parameter k must be chosen manually, and its value           event times for users range within 14.71 and 30.96 minutes,
should be ideally decided jointly between data scientists and       which shows a billing cycle between fifteen and thirty minutes.
domain experts according to the context. Previous strategies        Telefónica has 1,464 cell phone towers in the municipalities
for choosing k have focused in measuring the stability of the       under consideration. We discarded towers that were installed
components [6] and in the variation of the residual sum of          in in-door contexts (e.g., malls, hospitals, etc.). This is possible
squares curve between the original matrix and its decompo-          because tower meta-data includes their geographical position
sition [17, 22]. However, we prefer to manually choose the          and their name. For out-door towers, the name usually contains
number of components as these methods do not allow us to            the nearest crossing, while in-door towers contain the name of
incorporate external information such as the socio-economic         the place they lie in. In total, there were |t| = 1,082 out-door
distribution of the city.                                           towers (see Fig. 3, Towers). The only in-door towers that we
                                                                    kept were those installed within underground metro stations.
CASE STUDY: SANTIAGO, CHILE
We performed a case study on Santiago, the capital of Chile,        OpenStreetMap. OSM (http://openstreetmap.org) is a crowd-
with almost 8 million inhabitants. Its urban area covers a sur-     sourced maps platform. We downloaded a dump of its data for
face of 867.75 square kilometers, and is composed of 35 inde-       Chile, and then identified the highways within Santiago. We
pendent administrative units called municipalities (c.f. Fig. 1).   used this information to contextualize the different mobilici-
Because this city has experienced accelerated growth, and           ties identified by the NMF. We did so by finding the out-door
it is expected to keep growing at least until 2045 [38], un-        towers that lie within 250 meters of each highway, as shown
derstanding its structure at scale is an important and timely       in Fig. 3 (Labeled Towers).
task.

Datasets                                                            Trip Detection and Waypoints Matrix
Mobile Phone Network Data. We studied an anonymized                 Using the trip inference algorithm we detected 4,213,400 trips
X/CDR dataset from Telefónica Chile, the biggest telco. in         for 124,415 users (c.f. Table 1 for descriptive statistics). Fig. 4
Chile, with a market share of 33% in 2016. The dataset con-         shows the departure time distribution of all trips. One can see
tained records between July 27th and August 10th from 2016.         that business days exhibit expected peak-times related to work
In total, we analyzed 124,414 users, who had enough connec-         hours, and that weekends exhibit different patterns, such as a
tions to the cell towers under analysis to estimate their daily     higher density at lunch time.
 Figure 3: Maps of Santiago: cell phone tower network, highway and primary streets, the metro network, and the set of labeled
 towers according to their distance to highways or metro lines.


                                                                       C1 : people that live in the southern part of the city, mostly
                                                                          between two metro lines. Since this area is characterized by
                                                                          low income, this means that they need to take a bus to reach
                                                                          the metro.
                                                                       C2 : the southeast area of the city, which is characterized by
                                                                          their dependency of two metro lines. This component con-
                                                                          tains mixed-income municipalities.
                                                                       C3 : In contrast with the previous components, this one is com-
       Figure 4: Trip departure time distribution per day.                pletely focused on public transportation: it fully contains
                                                                          two metro lines in full, and partially other two. It also
 After detecting trips, we built the W matrix, of dimensions              contains bus corridors that tend to connect to metro lines.
 |u| × |t|. Note that |u| = 95,027, because not all users had          C4 : the northern part of the city, which is mostly residential
 within-trip events.                                                      and of low income. The component also has a main street
                                                                          of the city as a kind of tentacle, showing that people who
 Factorization of the Waypoints Matrix                                    lives/work in this area, but who work/lives in another, uses
 To perform the NMF decomposition, we chose k = 8, because                this street as a way to get into the component.
 the city is usually divided into six big areas (north, south, east,   C5 : the south-west part of the city, which is connected to
 west, south-east, center) and, since we expect that the results          downtown primarily through a highway and a metro line
 exhibit relationship with modes of transportation, we wanted             that is parallel to the highway.
 to see the effect of private and public transportation. Thus, k =     C6 : the western area of the city. This area contains one of the
 8 is an arguably reasonable choice (note that we discuss the             most populated municipalities in the city.
 choice of k at the next section).                                     C7 : similar to C6, but extending its reach to center areas of
                                                                          the city through a metro line a bus corridors. This makes
 Tower-Component Matrix. Figure 5 shows the results of                    this component dependent on public transport, and thus, the
 the factorization, with one map for each component-tower                 routes followed by its inhabitants tend to cluster, in contrast
 column from the matrix. One can see that there is a strong               to what happens in C6.
 geographical clustering of towers, which may be explained as
 W is essentially a co-occurrence matrix.                               In summary, latent cities seem to be comprised by three kinds
                                                                        of clusters of towers: those where people lives and moves,
 Fig. 6 show how the sets of labeled towers (those near high-           enclosed by specific limits (C0, C1, C6), those where people
 ways, near surface metro and within underground metro sta-             lives and work, but in different areas of the city, connected
 tions) relate to each component. This allows to see that some          through the transportation network (C2, C4, C5, C7), and
 components tend to be more associated than others to some              transportation infrastructure (C3).
 modes of transportation: C1, C2, C3 and C7 are more associ-
 ated to metro than highways, while C4 exhibits the opposite            User-Component Matrix. The user-component matrix may
 behavior. Having both figures into account, the following is           suggest that users pass through different mobilicities in their
 an interpretation of each mobilicity:                                  daily lives. Fig. 7 explores this potential behavior, by display-
                                                                        ing how a sample of 25,000 users cluster around the corre-
C0 : the east side of the city, including part of the center, next      sponding components. One can see that, indeed, users tend
   to the yellow metro line and one important highway of the            to have a primary component, but they still belong to others.
   city. This area is characterized for its business districts and      This would be the equivalent to, for instance, living in the
   high income residential areas (c.f. Fig. 3). As such, it is          suburbs, and having to travel long distances to go to work.
   likely that its residents do not use public transportation, nor      Note that many users from C0, the wealthiest part of the city,
   visit other mobilicities. Note how metro towers have lower           have negligible association to other components – something
   association with this component in Fig. 6.
                 Figure 5: The eight mobilicities of the tower-component matrix obtained by performing NMF.




           Figure 6: Point-plot of the average association of the labeled sets of towers into the different mobilicities.

                                                                     ized matrix. We do so by estimating mobilicities with k0 = 4
                                                                     (c.f. Fig. 8) and k00 = 12 (c.f. Fig. 9).
                                                                     With four mobilicities, the clustering is mostly geographic:
                                                                     three components split the city. However, the fourth compo-
                                                                     nent is related to public transportation: it reconstructs several
                                                                     metro lines and bus corridors. In this aspect, it seems that using
                                                                     k0 = 4 allows to obtain a similar result to k = 8. Then, if one
                                                                     would like to differentiate cell towers with respect to general
                                                                     transportation patterns, this could be a reasonable choice.
                                                                     With twelve mobilicities, the geographical clustering is still
                                                                     present, but the routes that connect distinct parts of the city
Figure 7: Heatmap of a random sample of 25K users and their          become more evident – meaning that a mobilicity is comprised
corresponding component associations.                                by one or two sectors of close towers (for instance, home and
                                                                     work locations), plus “bridges” that connect one mobilicity
expected due to the economical segregation of the city (c.f.         to another, based on the common routes followed by people.
Fig. 3).                                                             This behavior is expected, due to the co-occurrence property
                                                                     of the Waypoints Matrix.
Understanding k. We showed the eight mobilicities (c.f.
Fig. 5) to domain experts, who gave informal feedback – it           In summary, several values of k allow to infer soft-partitions
made sense to split the city in this way, as we have interpreted     of the city, as well as the way its inhabitants move between
earlier. Even though a formal evaluation with domain experts         those partitions. A mobilicity may be a soft-partition, a soft-
is left for future work, here we discuss the patterns that emerge    partition with bridges to other mobilicities, or a network of
when varying the parameter k which is the rank of the factor-        those bridges – namely, a transportation network.
                                                                  ization methods for positive-only data such as SLIM, which
                                                                  has shown promising results in the past [25], and would allow
                                                                  to understand how the choice of k influences the output and
                                                                  its interpretability. On the other hand, visualization and ex-
                                                                  ploratory interfaces are tools valued by domain experts [10],
                                                                  and mobility has been a recurring topic in visual analytics [3].
                                                                  Finally, our work did not consider the temporal aspects of
                                                                  transportation. Hence, future work should consider how to
                                                                  incorporate that dimension into the definition of Mobilicities.
                                                                  Acknowledgements. The analysis was performed using
                                                                  Jupyter Notebooks [37], jointly with the scikit-learn [35], pan-
                                                                  das [30], and geopandas libraries. The maps on this paper
                                                                  include data from ©OpenStreetMap contributors. We also
                                                                  thank Telefónica R&D in Santiago for facilitating the data
                                                                  for this study, in particular Pablo Garcı́a Briosso. The au-
                                                                  thor Denis Parra has been funded by Conicyt, Fondecyt grant
                                                                  11150783, as well as Fondef grant id16i10222 and the BRT+
                                                                  Centre of Excellence funded by VREF. Finally, we thank the
                                                                  anonymous reviewers for the insightful comments that helped
                                                                  to improve this paper.
         Figure 8: Mobilicities obtained with k =4.               REFERENCES
                                                                   1. Rein Ahas, Anto Aasa, Siiri Silm, and Margus Tiru. Daily rhythms of
Comparing Interpretability with PCA/Truncated SVD. To                 suburban commuters’ movements in the Tallinn metropolitan area: case
discuss further whether NMF is a good choice of model in              study with mobile positioning data. Transportation Research Part C:
terms of interpretability, we estimated a Truncated SVD de-           Emerging Technologies, 18(1):45–54, 2010.
composition (equivalent to PCA) with k = 8 . Fig. 10 shows         2. Lauren Alexander, Shan Jiang, Mikel Murga, and Marta C González.
that, in contrast to NMF, there is no geographical clustering         Origin–destination trips by purpose and time of day inferred from
                                                                      mobile phone data. Transportation Research Part C: Emerging
nor correspondence to any infrastructure available in the city.       Technologies, 2015.
Thus, even though PCA is a widely used dimensionality re-
                                                                   3. Natalia Andrienko and Gennady Andrienko. Visual analytics of
duction technique, it does not allow the interpretation nor           movement: An overview of methods, tools and procedures. Information
clustering of the city in the same way as NMF does.                   Visualization, 12(1):3–24, 2013.
                                                                   4. Vincent D Blondel, Adeline Decuyper, and Gautier Krings. A survey of
CONCLUSIONS                                                           results on mobile phone datasets analysis. EPJ Data Science, 4(1):1,
                                                                      2015.
We proposed the concept of mobilicities, which denotes the
different cities experienced by the inhabitants of a big city,     5. Andrey Bogomolov, Bruno Lepri, Jacopo Staiano, Emmanuel Letouzé,
                                                                      Nuria Oliver, Fabio Pianesi, and Alex Pentland. Moves on the street:
and depict its dynamics with respect to mobility and usage            Classifying crime hotspots using aggregated anonymized data on people
of modes of transportation. The suitability of NMF to this            dynamics. Big Data, 3(3):148–158, 2015.
kind of spatial data could be related to the fact that NMF is      6. Jean-Philippe Brunet, Pablo Tamayo, Todd R. Golub, and Jill P. Mesirov.
equivalent to spectral clustering [15], which has performed           Metagenes and molecular pattern discovery using matrix factorization.
well when grouping trip destination data [12]. However, as we         Proceedings of the National Academy of Sciences, 101(12):4164–4169,
have noted in our motivation, our input is not destination nor        2004.
origin data; instead, it is spatial location while moving. This    7. Francesco Calabrese, Giusy Di Lorenzo, Liang Liu, and Carlo Ratti.
                                                                      Estimating origin-destination flows using mobile phone location data.
focus was inspired by the book Happy City [32], reflecting            IEEE Pervasive Computing, 10(4):0036–44, 2011.
that our purpose was to help domain experts and policy de-         8. Francesco Calabrese, Laura Ferrari, and Vincent D Blondel. Urban
signers to make better, happier cities. Such purpose implies          sensing using mobile phone network data: a survey of research. ACM
collaboration between the emerging field of data science and          Computing Surveys (CSUR), 47(2):25, 2015.
the corresponding disciplines – transportation and urban plan-     9. Davide Castelvecchi. Can we open the black box of AI? Nature News,
ning. However, evidence-based policy in those areas requires          538(7623):20, 2016.
transparency and interpretability, and many state of the art      10. Tao Cheng, Garavig Tanaksaranond, Chris Brunsdon, and James
machine learning techniques do not offer both qualities [9].          Haworth. Exploratory visualisation of congestion evolutions on urban
In this paper, we have shown that NMF does offer both qual-           transport networks. Transportation Research Part C: Emerging
                                                                      Technologies, 36:296–306, 2013.
ities when applied to mobility data, and thus, is a promising
                                                                  11. Andrzej Cichocki and Anh-Huy Phan. Fast local algorithms for large
technique to apply in the field of Urban Computing [46].              scale nonnegative matrix and tensor factorizations. IEICE transactions
Limitations and Future Work. Critics may rightly say that             on fundamentals of electronics, communications and computer sciences,
                                                                      92(3):708–721, 2009.
we need a well-defined criteria to choose k. Future work
                                                                  12. Justin Cranshaw, Raz Schwartz, Jason Hong, and Norman Sadeh. The
should tackle this limitation using intelligent user interfaces       Livehoods Project: Utilizing social media to understand the dynamics of
aimed at domain experts. This opens two lines of research             a city. In Sixth International AAAI Conference on Weblogs and Social
within the IUI: on the one hand, we could try other factor-           Media, 2012.
                                  Figure 9: Mobilicities obtained with k =12.




Figure 10: Tower-component results of the application of Truncated SVD/PCA to the Waypoints Matrix, with k = 8.
13. Tamas David-Barrett, Janos Kertesz, Anna Rotkirch, Asim Ghosh, Kunal       32. Charles Montgomery. Happy city: transforming our lives through urban
    Bhattacharya, Daniel Monsivais, and Kimmo Kaski. Communication                 design. Macmillan, 2013.
    with family and friends across the life course. PloS one,                  33. Benyounes Moumni, Vanessa Frias-Martinez, and Enrique
    11(11):e0165687, 2016.                                                         Frias-Martinez. Characterizing social response to urban earthquakes
14. Marco De Nadai, Jacopo Staiano, Roberto Larcher, Nicu Sebe, Daniele            using cell-phone network data: the 2012 Oaxaca earthquake. In
    Quercia, and Bruno Lepri. The death and life of great Italian cities: a        Proceedings of the 2013 ACM conference on Pervasive and ubiquitous
    mobile phone data perspective. In Proceedings of the 25th International        computing adjunct publication, pages 1199–1208. ACM, 2013.
    Conference on World Wide Web, pages 413–423. International World           34. Anastasios Noulas, Cecilia Mascolo, and Enrique Frias-Martinez.
    Wide Web Conferences Steering Committee, 2016.                                 Exploiting Foursquare and cellular data to infer user activity in urban
15. Chris Ding, Xiaofeng He, and Horst D Simon. On the equivalence of              environments. In 2013 IEEE 14th International Conference on Mobile
    nonnegative matrix factorization and spectral clustering. In Proceedings       Data Management, pages 167–176. IEEE, 2013.
    of the 2005 SIAM International Conference on Data Mining, pages            35. F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion,
    606–610. SIAM, 2005.                                                           O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg,
16. Vanessa Frias-Martinez, Cristina Soguero, and Enrique Frias-Martinez.          J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and
    Estimation of urban commuting patterns using cellphone network data.           E. Duchesnay. Scikit-learn: Machine learning in Python. Journal of
    In Proceedings of the ACM SIGKDD International Workshop on Urban               Machine Learning Research, 12:2825–2830, 2011.
    Computing, pages 9–16. ACM, 2012.                                          36. Chengbin Peng, Xiaogang Jin, Ka-Chun Wong, Meixia Shi, and Pietro
17. Attila Frigyesi and Mattias Höglund. Non-negative matrix factorization        Liò. Collective human mobility pattern from taxi trips in urban area.
    for the analysis of complex gene expression data: identification of            PloS one, 7(4):e34487, 2012.
    clinically relevant tumor subtypes. Cancer informatics, 6(2003):275–92,    37. Fernando Pérez and Brian E Granger. IPython: a system for interactive
    jan 2008.                                                                      scientific computing. Computing in Science & Engineering, 9(3):21–29,
18. Eduardo Graells-Garrido, Oscar Peredo, and José Garcı́a. Sensing urban        2007.
    patterns with antenna mappings: the case of Santiago, Chile. Sensors,      38. Olga Lucia Puertas, Cristian Henrı́quez, and Francisco Javier Meza.
    16(7):1098, 2016.                                                              Assessing spatial dynamics of urban growth using an integrated land use
19. Eduardo Graells-Garrido and Diego Saez-Trumper. A day of your days:            model. application in Santiago metropolitan area, 2010–2045. Land Use
    estimating individual daily journeys using mobile data to understand           Policy, 38:415–425, 2014.
    urban flow. In Proceedings of the Second International Conference on       39. Heiko Rüger, Simon Pfaff, Heide Weishaar, and Brenton M Wiernik.
    IoT in Urban Space, pages 1–7. ACM, 2016.                                      Does perceived stress mediate the relationship between commuting and
20. Randolph Hall. Handbook of transportation science, volume 23.                  health-related quality of life? Transportation research part F: traffic
    Springer Science & Business Media, 2012.                                       psychology and behaviour, 50:100–108, 2017.
21. Martin Hilbert. Big data for development: A review of promises and         40. Jose San Pedro, Davide Proserpio, and Nuria Oliver. Mobiscore: towards
    challenges. Development Policy Review, 34(1):135–174, 2016.                    universal credit scoring from mobile phone data. In International
                                                                                   Conference on User Modeling, Adaptation, and Personalization, pages
22. Lucie N. Hutchins, Sean M. Murphy, Priyam Singh, and Joel H. Graber.           195–207. Springer, 2015.
    Position-dependent motif characterization using non-negative matrix
    factorization. Bioinformatics, 24(23):2684–2690, 2008.                     41. Victor Soto, Vanessa Frias-Martinez, Jesus Virseda, and Enrique
                                                                                   Frias-Martinez. Prediction of socioeconomic levels using cell phone
23. Md Shahadat Iqbal, Charisma F Choudhury, Pu Wang, and Marta C                  records. In International Conference on User Modeling, Adaptation, and
    González. Development of origin–destination matrices using mobile             Personalization, pages 377–388. Springer, 2011.
    phone call data. Transportation Research Part C: Emerging
    Technologies, 40:63–74, 2014.                                              42. Jameson L Toole, Michael Ulm, Marta C González, and Dietmar Bauer.
                                                                                   Inferring land use from mobile phone activity. In Proceedings of the
24. Da Kuang, Chris Ding, and Haesun Park. Symmetric Nonnegative Matrix            ACM SIGKDD international workshop on urban computing, pages 1–8.
    Factorization for Graph Clustering, pages 106–117. 2012.                       ACM, 2012.
25. Santiago Larraı́n, Denis Parra, and Alvaro Soto. Towards improving         43. Carmen Karina Vaca, Daniele Quercia, Francesco Bonchi, and Piero
    top-N recommendation by generalization of SLIM. In RecSys Posters,             Fraternali. Taxonomy-based discovery and annotation of functional areas
    2015.                                                                          in the city. In Ninth International AAAI Conference on Web and Social
26. Maxime Lenormand, Miguel Picornell, Oliva G Cantú-Ros, Thomas                 Media, 2015.
    Louail, Ricardo Herranz, Marc Barthelemy, Enrique Frı́as-Martı́nez,        44. R Baeza Yates and B Ribeiro Neto. Modern information retrieval: the
    Maxi San Miguel, and José J Ramasco. Comparing and modelling land             concepts and technology behind search. Addison-Wesley Professional,
    use organization in cities. Royal Society Open Science, 2(12):150449,          2011.
    2015.
                                                                               45. Jing Yuan, Yu Zheng, and Xing Xie. Discovering regions of different
27. Jiajun Liu, Kun Zhao, Saeed Khan, Mark Cameron, and Raja Jurdak.               functions in a city using human mobility and pois. In Proceedings of the
    Multi-scale population and mobility estimation with geo-tagged tweets.         18th ACM SIGKDD international conference on Knowledge discovery
    In Data Engineering Workshops (ICDEW), 2015 31st IEEE                          and data mining, pages 186–194. ACM, 2012.
    International Conference on, pages 83–86. IEEE, 2015.
                                                                               46. Yu Zheng, Licia Capra, Ouri Wolfson, and Hai Yang. Urban computing:
28. Kevin Lynch. The image of the city, volume 11. MIT press, 1960.                concepts, methodologies, and applications. ACM Transactions on
29. A Paolo Masucci, Joan Serras, Anders Johansson, and Michael Batty.             Intelligent Systems and Technology (TIST), 5(3):38, 2014.
    Gravity versus radiation models: On the importance of scale and            47. Yu Zheng, Lizhu Zhang, Xing Xie, and Wei-Ying Ma. Mining interesting
    heterogeneity in commuting flows. Physical Review E, 88(2):022812,             locations and travel sequences from gps trajectories. In Proceedings of
    2013.                                                                          the 18th international conference on World wide web, pages 791–800.
                                                                                   ACM, 2009.
30. Wes McKinney. Data structures for statistical computing in Python. In
    Proceedings of the 9th Python in Science Conference, volume 445, pages
    51–56, 2010.
31. Graham McNeill, Jonathan Bright, and Scott A Hale. Estimating local
    commuting patterns from geolocated Twitter data. EPJ Data Science,
    6(1):24, 2017.