=Paper=
{{Paper
|id=Vol-2841/BMDA_10
|storemode=property
|title=City Indicators for Mobility Data Mining
|pdfUrl=https://ceur-ws.org/Vol-2841/BMDA_10.pdf
|volume=Vol-2841
|authors=Mirco Nanni,Agnese Bonavita,Riccardo Guidotti
|dblpUrl=https://dblp.org/rec/conf/edbt/NanniBG21
}}
==City Indicators for Mobility Data Mining==
<pdf width="1500px">https://ceur-ws.org/Vol-2841/BMDA_10.pdf</pdf>
<pre>
                              City Indicators for Mobility Data Mining
                   Mirco Nanni                                           Agnese Bonavita                             Riccardo Guidotti
                  ISTI-CNR, Pisa                                    Scuola Normale Superiore                           University of Pisa
                    Pisa, Italy                                             Pisa, Italy                                    Pisa, Italy
              mirco.nanni@isti.cnr.it                                agnese.bonavita@sns.it                      riccardo.guidotti@di.unipi.it

ABSTRACT                                                                               from networks that represent the places and movement of single
Classifying cities and other geographical units is a classical task in                 users; last, characteristics of road networks and how traffic is
urban geography, typically carried out through manual analysis                         distributed in them. The group of global city indicators, instead,
of specific characteristics of the area. The primary objective of                      looks at the mobility between cities as a graph, where each city
this paper is to contribute to this process through the definition                     is represented by a node, and extracts network features for each
of a wide set of city indicators that capture different aspects                        node. Both the complete network and the ego-network for each
of the city, mainly based on human mobility and automatically                          city are considered.
computed from a set of data sources, including mobility traces                             After describing all the city indicators we introduce a mobility
and road networks. The secondary objective is to prove that such                       prediction problem, and we use it to test how much predictive
set of characteristics is indeed rich enough to support a simple                       models are transferable across different regions. In particular, we
task of geographical transfer learning, namely identifying which                       study the relationship between transferability between two areas,
groups of geographical areas can share with each other a basic                         i.e. the performances of a model built on one area and used to
traffic prediction model. The experiments show that similarity in                      make predictions on the other one, and their similarity in terms
terms of our city indicators also means better transferability of                      of city indicators. The results confirm our hypothesis that cities
predictive models, opening the way to the development of more                          with similar indicators are more likely to be transfer-compliant,
sophisticated solutions that leverage city indicators.                                 this providing a first guide to understand which predictive models
                                                                                       can be reused in other areas.
                                                                                           Finally, a key feature of this work is that all methods are
1    INTRODUCTION                                                                      implemented in a way that makes it possible to automatically
Classifying a geographical territory into semantic categories is                       calculate all characteristics for hundreds of different cities and
one of the most common tasks in research areas such as urban                           entire regions. The resulting software (a Python library) enables
geography, urban planning and mobility data analytics [7]. Char-                       the user to process an unlimited amount of data simply by passing
acterizing human mobility is a key component of this process,                          a database with trajectories and a list containing the positions of
and it is well known that mobility often does not work the same                        the geographical areas of interest as an input.
way across different regions. A movement pattern in a moun-                                The rest of this paper is organized as follows. Section 2 in-
tainous countryside may have other implications than the same                          troduces some related works; Section 3 presents the dataset and
pattern has in the suburbs of a large town. The movement trajec-                       geographical areas used as testbed in the paper; Sections 4 and
tories in a planned city with rectangular streets and strict zoning                    5 describe, respectively, the local and global city indicators; Sec-
laws might be completely different than the ones in a town that                        tion 6 presents our case study on evaluating the relations between
has grown organically without any clear structure. Therefore,                          geographical transferability of a simple predictive model between
any kind of property that was learned in a particular area, in                         any pair of areas and their similarity based on our indicators;
general cannot simply be assumed to hold in another one.                               finally, Section 7 closes the paper with conclusive remarks.
    This paper aims at making a first step towards the characteri-
zation of a geographical area. That is achieved through a range                        2   RELATED WORK
of quantitative measures that provide a multilayer description                         Chracterizing urban spaces is a fundamental task of urban ge-
of urban regions and are a means for displaying differences be-                        ography, which considers the spatial distribution of spaces and
tween cities, municipalities, or other geographical units. Such a                      patterns of movement, focusing both on structural properties
numerical description of urban areas can have a wide spectrum of                       and how the different parts interact [7]. Historically, determining
applications. Among them, the measures presented in this work                          such characteristics was usually a domain expert-driven process,
can be used as an input for geographical transfer learning, that                       that required a huge amount of time, and also particular care to
is the transformation of knowledge gained in one geographical                          ensure that results are comparable across different places. Ge-
region in order to apply it to another region. This problem will                       ographical Information Science introduced several innovations
be considered as a case study for the extracted indicators.                            that helped also to automatize and extend the approach, includ-
    We consider two main approaches: (i) computing features that                       ing statistical methods for geography [33] and computational
describe each area isolated from the others, that we call local city                   tools for managing large databases of information.
indicators; and (ii) computing features that describe its relation                        On another direction, city indicators have a important applica-
with the others, named global city indicators. The first group                         tion in defining the sustainability characteristics of urban areas.
covers four different families of measures: spatial concentration                      Various attempts have been made to design indicators for moni-
indexes of human activities; network features of intra-city traffic                    toring sustainability at various levels, such as national [11] and
flows; mobility characteristics of the individual mobility, obtained                   city level [32]. As described in the review paper [17], the litera-
© 2021 Copyright for this paper by its author(s). Published in the Workshop Proceed-   ture covers a wide range of aspects, including mobility-related
ings of the EDBT/ICDT 2021 Joint Conference (March 23–26, 2021, Nicosia, Cyprus)       ones (e.g. mobility space usage and functional diversity). How-
on CEUR-WS.org. Use permitted under Creative Commons License Attribution 4.0
International (CC BY 4.0)                                                              ever, very few attempts were made to systematically exploit big
                                                                                       data sources to estimate them. One example was the Air Quality
                                                                       4.1    Spatial Concentration
                                                                       Spatial concentration is one of the most important aspects in the
                                                                       description of urban regions and answer the question how the
                                                                       density of people and activities vary across the area? This question
                                                                       was traditionally focused on people’s residency and workplace,
                                                                       since that was the only available data, mostly coming from census
                                                                       or government records. More recent research is profiting from
                                                                       the availability of more detailed data from mobile phones, vehicle
                                                                       trackers and satellite imaging [2, 22, 39]. Spatial concentration
                                                                       is used in a vast range of different fields [16, 19, 20, 23, 36]. In
                                                                       this work, the concept of spatial concentration is focused on the
                                                                       overall amount of mobility, undifferentiated by types of activity.
                                                                       The question of interest is: are the activities concentrated in cluster-
                                                                       like centers of high density or are they spread-out across the map?
Figure 1: The areas of study: 10×10km squares centered on                 In the following, we present three approaches to answer this
each municipality in Tuscany.                                          question: spatial entropy, Moran’s measure, and the average near-
                                                                       est neighbor distance. The first two approaches can only be calcu-
                                                                       lated after the geographical space has been partitioned into a set
                                                                       of disjoint areas. In this work, we do that adopting an equally-
Now EU project [9], which used vehicular and public transport
                                                                       spaced grid, and divide the 100𝑘𝑚 2 region representing each area
data to infer some measures. Yet, that is limited to direct and sim-
                                                                       using different resolutions, including a grid of 10x10 (i.e. each
ple ones, such as traffic, speeds and exposure to pollution. The
                                                                       cell is a square of side 1 km), 20x20 and 50x50 cells.
literature also considers mobility indicators and road network
properties as potential measures to adopt, which is aligned with          4.1.1 Entropy. It can be used to measures how equally activi-
our approach [38].                                                     ties are distributed across the grid. Let 𝑋 be a discrete random
    Finally, exploiting big mobility data to understand the proper-    variable modeling the positions of an individual ending up in 𝑛
ties of geographical spaces is a very active area. It includes data    different fields [5]. The entropy is defined as [35]:
mining methods to find mobility patterns and regularities [15],                                        𝑛
                                                                                                       Õ
simulations to estimate various indicators, like the impact of                            𝐸 (𝑋 ) = −         𝑃 (𝑥𝑖 ) log 𝑃 (𝑥𝑖 )
alternative transportation means as car pooling [20], the visual                                       𝑖=1
exploration of patterns and contextual features [13], etc. How-
                                                                       where {𝑥 1, . . . , 𝑥𝑛 } are the possible values of 𝑋 and 𝑃 (𝑥𝑖 ) is the
ever, to the best of our knowledge, no existing work tried to
                                                                       probability of 𝑋 being in state 𝑖. For maximum entropy (𝑙𝑜𝑔(𝑛))
collect a wide set of complex indicators in a systematic and re-
                                                                       there is an equal amount of activity in all fields; for minimum
producible way, directly aimed to make cities comparable in a
                                                                       entropy (0) all the activity is amassed in a single field. In order
computational way.
                                                                       to compare entropy scores of different-sized grids, the measure
                                                                       must be normalized by dividing it by the expected entropy of a
3    DATASET                                                           uniform distribution, i.e., 𝑙𝑜𝑔(𝑛).
The testbed considered in this paper is a dataset of GPS traces of
private vehicles provided within the Track &Know project1 mov-            4.1.2 Moran’s I. It overcomes the entropy weakness by con-
ing in the Tuscany region, Italy. The experiments were performed       sidering how the fields are positioned in space: spatial autocorre-
on a sample of 18.9 million trajectories of 250,239 cars, which        lation [33] that represents the degree to which the fields’ values
were collected during a period of seven weeks. The geographical        are correlated to the value of neighboring fields. For spatial au-
unit adopted to model a “city” is the municipality.                    tocorrelation, the nearness between all pairs of fields must be
   All measures were calculated separately for each of the 276         defined with a so-called weight matrix 𝑤, where 𝑤𝑖 𝑗 is the near-
municipalities of Tuscany. For the sake of simplicity and applica-     ness between nodes 𝑖 and 𝑗. A simple form of weight matrix is
bility to a wider range of situations, the areas to investigate were   an adjacency matrix, with the value 1 if fields are adjacent, 0
chosen to be a 10 × 10 km rectangle for each municipality, with        otherwise. An important difference to the entropy is that spatial
the sides approximately parallel to the meridians and parallels,       autocorrelation has two directions. A high autocorrelation indi-
centered around the town or village center (see Figure 1).             cates that values of the same magnitude are prone to be next to
   It should be noted that, for the purpose of this work, only a       each other, while a low autocorrelation means that similar values
partial subset of trajectories is considered, namely those starting    are less likely to be near each other than under random posi-
and ending in Tuscany, and lasting less than 24 hours (indeed,         tioning. Somewhere in between lies a value of autocorrelation in
longer trips are exceptional and not very representative).             which the population of the fields is how one would expect it to
                                                                       be under a random distribution with no spatial autocorrelation.
                                                                       The most famous autocorrelation measures is Moran’s I [26]:
4    LOCAL CITY INDICATORS                                                                      Í Í
Here we introduce the local city indicators designed individually
                                                                                                                  ¯ 𝑗 − 𝑥)
                                                                                              𝑁 𝑖 𝑗 𝑤𝑖 𝑗 (𝑥𝑖 − 𝑥)(𝑥      ¯
                                                                                     𝐼 (𝑋 ) =           Í            2
for each municipality. They are grouped in spatial concentration                              𝑊                   ¯
                                                                                                          𝑖 (𝑥𝑖 − 𝑥)
measures, flows measure, individual mobility and street network.       where 𝑁 is the number of fields, 𝑥 is the amount of activity or
                                                                       population, 𝑥¯ is the average field value, and 𝑊 is the sum of all
                                                                       the weights. The minimum and maximum values of Moran’s I
1 https://trackandknowproject.eu/
                                                                       depend on the weight matrix. We highlight that the absence of
autocorrelation is given at Moran’s I equals to −1/(𝑁 − 1), that      have the same expected weight. We highlight that the direction
tends to zero in grids with an high amounts of fields.                of traffic flow is not important here. Thus, the grid networks in
                                                                      this work are transformed into non-directed networks before the
   4.1.3 Nearest Neighbor Distance. The Average Nearest Neigh-
                                                                      modularity is calculated. Modularity does not describe a network
bor Distance (ANND) is not dependent on a grid and its param-
                                                                      on its own, but a network along with its partition. In order to
eters. For every point, the distance to its nearest neighbor is
                                                                      quantify how well an urban region is separable into different sub-
calculated. The mean of those values is the ANND :
                                Í                                     areas we adopt the Louvain Algorithm [6] that does not guarantee
                                    𝑚𝑖𝑛(𝑑𝑖 )                          an optimal solution but it performs well empirically.
                      ANND = 𝑖
                                     𝑁
where 𝑑𝑖 is a vector containing the distances of point 𝑖 to all           4.2.3 Interaction Models. The flow network allows us to test
the other points, and 𝑁 is the amount of points. The lower the        how well the empirical data aligns with two established mod-
ANND , the higher is the average spatial concentration in the         els that describe human interaction in space. The Gravitation
areas surrounding the points. We highlight that this definition       Model [1] idea is that the traffic flow from place 𝑖 to place 𝑗
bears a similar weakness as the entropy. The expected ANND            depends on the origin population 𝑚𝑖 and the destination popula-
under assumption of a uniform distribution of points across the       tion 𝑛 𝑗 . Highly populated places, attract flow towards them. The
area is the Mean                                                      classic model predicts the traffic flow from 𝑖 to 𝑗 which have a dis-
               p Random Nearest Neighbor Distance (MRNND)                                              𝛽
MRNND = 0.5 𝐴/𝑁 , where 𝐴 is the surface of the area and 𝑁            tance of 𝑟 as 𝐺𝑖 𝑗 = 𝐴𝑚𝑖𝛼 𝑛 𝑗 /𝑟 𝛾 , where 𝐴 is a normalization factor,
the amount of points. By dividing the ANND by MRNND we                and 𝛼, 𝛽, 𝛾 are the model’s parameters. They can be optimized by
obtain the Nearest Neighbor Index (NNI ) which is comparable          multiple regression when fitting data to the model. In this work
among samples with different sizes and areas. A NNI smaller           we adopt a simpler model [25] with 𝛼 = 𝛽 = 1. The Radiation
than 1 indicates a higher spatial concentration than in a random      Model [37] updates 𝐺𝑖 𝑗 by introducing 𝑠𝑖 𝑗 that is the population
case, whilst value above 1 shows that the points are spread out       within a circle around place 𝑖, with a radius of its distance to
across the map more than one expects in a random scenario.            place 𝑗, minus 𝑚𝑖 and 𝑛 𝑗 . The intuition is that outgoing trips are
                                                                      being attracted by nearby populations [25]. It predicts the flow
4.2     Flows in a Grid Network                                       𝑇𝑖 𝑗 as 𝑇𝑚  𝑖            𝑚𝑖 𝑛 𝑗
                                                                                                                  where 𝑇𝑖 is the sum of outflows
                                                                               1− 𝑀𝑖 (𝑚𝑖 +𝑠Í
                                                                                           𝑖 𝑗 ) (𝑚𝑖 +𝑛 𝑗 +𝑠𝑖 𝑗 )
In order to capture the information about flows in urban regions,     from 𝑖, and 𝑀 = 𝑖 𝑚𝑖 is the total sample population.
the data can be transformed into a directed weighted graph that
represents the flow of the people’s trajectories:                     4.3     Individual Mobility
      • a set of nodes 𝑉 representing places that are origins and
                                                                      Here we consider the mobility at level of individual users. From
        destinations of trajectories,
                                                                      this perspective, urban regions can be described by aggregated
      • a set of edges 𝐸 representing the directed connections
                                                                      values of their inhabitants’ mobility, therefore a set of statistics
        between the nodes,
                                                                      are calculated for each individual from their trajectories:
      • a weight function 𝑤 : 𝐸 → R that maps each edge to a
        weight, which indicates the amount of trajectories that             • Average distance and duration per trip
        occur along the edge.                                               • Average driving distance and duration per day
The map is split into fields of a grid and all origins and destina-         • Average amount of trips per day
tions of the trajectories in the area are assigned to the field in       Also, following the methods described in [21, 31], individu-
which they lie. The network is created by assigning every node to     als’ mobility data can be transformed into Individual Mobility
a cell, and to each edge the weight the amount of flows occurring     Networks (IMN ), which is a representation of a person’s travel
along the edge. The weight function 𝑤 is equivalent to an origin      behavior in the form of a weighted directed network, where the
destination matrix. The network allows us to gain knowledge           set of nodes 𝑉 represents places that are visited once or repeat-
about the structure of a region by looking at the properties of       edly by the individual, and the edges 𝐸 represent trajectories
the resulting network described in the following.                     from one of those places to another. The edge’s weights model
                                                                      the amount of times was followed a trajectory from one node
   4.2.1 Node Degrees. A basic property of the network is the
                                                                      to another. From an IMN , we can describe the individuals travel
distribution of its degrees. Degree is hereby defined as the total
                                                                      behavior with the following indicators:
traffic (sum of in- and out-flow) of a grid field. This measure is
sometimes also referred to as node-flux [34].                               • Size of the network: number of nodes and edges.
                                                                            • Temporal-uncorrelated entropy: measure how equally the
   4.2.2 Louvain Modularity. An interesting quality of networks
                                                                              different places of the IMN are visited.
is the degree to which nodes can be partitioned into groups,
                                                                            • Radius of gyration [28]: approximates the average distance
such that the connectivity is high within those groups, and low
                                                                              of an individual from its center of mass [18].
in between. In the context of urban regions, the corresponding
                                                                            • Regularity of trajectories: percentage of trips that are driven
question is: can the city be split into areas that are relatively
                                                                              more often than a certain threshold per time [19, 21].
autonomous and have only low interaction between them? In
                                                                            • Modularity: the Louvain Algorithm [6] applied to the IMN .
network science, modularity measures this property for a given
partitioning: a graph partitioning separates the graph’s nodes into
non-overlapping communities. Modularity shows the difference          4.4     Roads and Traffic
between the relative amount of inner-community links and the             4.4.1 Static Road Network. This section focuses on the road
expected relative amount under random linking in a non-directed       network modeled as a directed graph (𝐺 = (𝐸, 𝑉 ), where 𝑉 is
weighted graph [4]. The modularity goes from −1 to +1, where 0        a set of nodes representing roads intersections, 𝐸 is the set of
marks the value expected in a network where all possible edges        directed edges which model the the road segments, and 𝑙 : 𝐸 → R
maps each edge to its length in meters. Some basic statistics of
the road network can be calculated:
    (1) amount of edges and nodes/node density
    (2) amount of intersections/intersection density
    (3) average node degree/average intersection degree
    (4) total length of edges/mean edge length
In addition, since nodes in any network can be evaluated w.r.t.
their centrality, we evaluate the road network’s closeness centrality
in terms of the length of the shortest path to any given node. The
average of those path lengths is a node’s average farness from
other nodes. The reciprocal of this value is a node’s closeness
centrality 𝐶 (𝑥) = Í 𝑑1(𝑦,𝑥) , where 𝑥 and 𝑦 are nodes and the 𝑑
                     𝑦
returns the length of the shortest path between its arguments.
As distance function we consider the length as the summed road                Figure 2: Disconnected nodes vs. flow threshold.
lengths of the edges of the shortest path [30].
   4.4.2 Traffic in the Road Network. To investigate how traffic is     through Figure 2. The plot shows the number of disconnected
distributed in a road network one must map match the sequences          nodes corresponding to a selected threshold. The fraction of
of GPS locations that represent the trajectories to nodes and           “isolated” cities grows as the threshold increases, but there is
edges in the road network. There is a variety of algorithms that        a little plateau between 110 and 130, which led to our choice.
handle this problem, such as hidden Markov models [27]. In the          With the selected threshold, the final graph consists of 276 nodes
case study of this work, a simpler algorithm was implemented            (corresponding to municipalities), 22 of which are disconnected
due to the high reliability of the data. It independently maps          from the giant component.
every point of a trajectory to a node in the road network. The             The properties related to each node of the network constitute
nodes are then connected and build a path that describes the            the first set of attributes to be considered for clustering:
individual’s trajectory.
   Given a map matching, it is possible to create a function that             • Self-loops: # trajectories starting and ending in that node.
reveals the fraction of total traffic that flows through a given              • In/Out degree: fraction of nodes its incoming/outgoing
percentage of the most dense roads. For this purpose, all edges are             edges are connected to.
sorted by their traffic flow in a non-ascending order. Cumulative             • Closeness: the closeness centrality of a node 𝑢 is the recip-
traffic, measured as #𝑐𝑎𝑟𝑠 × 𝑚𝑒𝑡𝑒𝑟𝑠, is calculated for the end                  rocal of the average shortest path distance (see Section 4.4).
of every edge by multiplying the edge length with the amount                  • Betweenness: the betweenness of a node 𝑣 is the sum of
of traffic flow and adding the result to the previous amount of                 the fraction of all-pairs shortest paths that pass through 𝑣.
cumulative traffic. The intermediary values within edges can                  • Clustering coefficient: the local clustering coefficient 𝐶𝑖
be calculated by linear interpolation. For any given percentage                 for a vertex 𝑣𝑖 is given by the proportion of links between
of roads, the percentage of traffic in those roads is calculated                the vertices within its neighborhood divided by the num-
by dividing the cumulative traffic until that point by the total                ber of links that could possibly exist between them.
amount of traffic.                                                            • Radius of Gyration:  q the    radius of gyration of a city 𝑐 is
                                                                                                        1
                                                                                defined as 𝑟𝑔 (𝑐) = 𝑁 𝑖 𝑤𝑖 (𝑟𝑖 − 𝑟𝑐𝑚 ) 2 , where 𝑁 is the
                                                                                                          Í𝑁
5     GLOBAL CITY INDICATORS                                                    total number of travels from 𝑐, 𝑤𝑖 is the number of travels
In this section we introduce the global city indicators designed                from 𝑐 to 𝑖, 𝑟𝑖 is the pair of coordinates of location 𝑖 and
to compare two cities. To compare and cluster cities in groups,                 𝑟𝑐𝑚 is the center of mass (i.e., the average position) of the
we need some quantitative features. Therefore, we have to define                visited cities starting from 𝑐.
some metrics describing a city with respect to traffic. A possible            • Random Entropy: the random entropy captures the de-
approach is to exploit again a network structure where each city                gree of predictability of the destination starting from a
(in our case study, 276 municipalities in Tuscany) is a node, and               city 𝑖 if each location is visited with equal probability
edges are drawn based on the trajectories between them. Starting                𝑆𝑟𝑎𝑛 = log2 𝑀, where 𝑀 is the number of distinct cities
from the trajectories we infer descriptive attributes from two                  visited starting from city 𝑖;
perspectives: (i) graph measures from the complete network of                 • Uncorrelated Entropy: the temporal-uncorrelated entropy
cities; (ii) graph measures from the ego-network of each city.                  is the historical probability that a location 𝑗 was visited
                                                                                starting from a city 𝑖, characterizing the heterogeneity its
                                                                                                                   Í
5.1    Complete Network of Cities                                               of visitation patterns 𝑆𝑢𝑛𝑐 = − 𝑁     𝑗 𝑝 𝑗 log 𝑝 𝑗 where 𝑝 𝑗 is
We can derive a set of global indicators through a network of                   𝑖’s probability of visiting location 𝑗. We can also normalize
cities as described in the following. Given the trajectories on                 the uncorrelated entropy by dividing it by log2 𝑁 .
the territory, we can derive an Origin-Destination Matrix (OD),
which measures the number of trips that starts from city 𝐴 and          5.2     Ego-Networks
ends in city 𝐵 for each pair (𝐴, 𝐵). Since connections established      In Social Network Analysis, it is usual to refer to Ego Networks
through very few trajectories might be not significant, a threshold     as social networks made of an individual (called ego) along with
is needed to establish if an edge should be drawn. In our case          all the social links he has with other users (called alters)[3, 10].
study, after empirical evaluation, we fixed this threshold to 110       Several fundamental properties of social relationships can be char-
trajectories by analyzing the results yielded by different values       acterized by studying them. Adapting the terms to the present
context, we can obtain an ego network for each city, where the
ego is the city itself and the alters are its neighbors. The additional
set of attributes obtained consists of:

      • Number of nodes of the ego network.
      • Number of edges of the ego network.
      • Average clustering coefficient: the clustering coefficient
        is the average 𝐶 = 𝑛1 𝑣 ∈𝐺 𝑐 𝑣 , where 𝑛 is the nbr. of nodes
                               Í
        in 𝐺 and 𝑐 𝑣 is the clustering coefficient of each node;.
      • Diameter: is the longest shortest path of the ego network.
      • Assortativity: is measured as the Pearson correlation co-
        efficient of degree between pairs of linked nodes. It mea-
        sures the preference for a network’s nodes to attach to
        others that are similar in some way.

                                                                           Figure 3: Network of correlations for the first set of at-
6     CASE STUDY: TRANSFER-COMPLIANT                                       tributes (total graph).
      GEOGRAPHICAL LOCATIONS
The huge amount of urban data generated by smartphones, vehi-
cles, and infrastructures (e.g., traffic cameras, air quality moni-
toring stations) opens up new opportunities to learn about city
dynamics from a variety of perspectives and facilitates various
smart city applications for traffic monitoring, public safety, urban
planning, etc. – all contributing to what is called urban computing.
   However, there are some questions that remains still almost
unexplored: what if the administration of a city wanted to predict
the impact of an event on the urban mobility without having
historical data on it? Is it possible to infer some useful insights
exploiting the experience gained by other municipalities? Can
knowledge be transferred from any city or are there some con-
straints? How can you compare two cities, for example in terms of
urban mobility? Lately there have been different attempts to over-
come the data scarcity issue in “new” urban contexts. All these
studies have in common the application of Transfer Learning, a                     Figure 4: Dendogram and selected clusters.
very broad family of approaches which focuses on developing
methods to transfer knowledge learned in one or more “source
tasks”, and use it to improve learning in a related “target task”.            Preprocessing and Feature Selection. Since the range of differ-
This section studies these questions in the context of Machine             ent indicators varies widely, we applied a form of normalization
Learning (ML) and big data analytics for mobility data. In partic-         to make them homogeneous. We adopted the min-max scaling,
ular, our goal it to verify the feasibility of a model transfer, i.e., a   where feature are re-scaled in the interval [0, 1]. Then, we per-
ML model is trained in the source domain and then transferred              formed a study of correlation on each set of features (local and
to the target domain, in the prediction of urban traffic, exploiting       global) to eliminate unnecessary ones. To efficiently filter them,
the city indicators developed in the previous sections.                    we adopted a network-based correlations finder, where the fea-
   The basic idea is that cities that are similar can be represented       tures are interpreted as nodes of a graph, and a link is drawn
by the same model more easily than very different cities. For              between two features if they are highly correlated. As evaluation
instance, a highly populated city with heavy traffic and users             metrics, the standard Pearson’s Correlation Coefficient is used [29].
that frequently make long trips is expected to have mobility                  Considering each couple of features (𝑖, 𝑗), an edge is drawn if
dynamics very different from small, country-side cities with low           𝜌𝑖,𝑗 > 0.65. The result obtained on the global features is shown
traffic. The approach proposed in this section is developed in             as example in Figure 3. The removal of features is an iterative
three steps: first, using a similarity measure between cities based        process that removes the node (feature) with the highest degree
on the indicators presented in Sections 4 and 5, cities are clustered      (thus is correlated to the highest number of non-filtered features)
into similarity groups; next, for each city a traffic prediction task      and repeats until the average degree of the network is 0. The
is defined, which is approached through a standard machine                 remaining nodes are the features which are preserved. This pre-
learning solution (XGBoost regression [8]); finally, the prediction        processing step is applied to global and local indicators separately,
model of a city is applied to make predictions in each of the              and then on the set of survived features of both categories. Apply-
others, aiming to test whether cities in the same cluster show a           ing the procedure to our case study, the initial set of indicators,
better transferability of their models.                                    composed of a total of 178 measures, was reduced to 21 features.
                                                                              Hierarchical Clustering. The city clustering step has been re-
                                                                           alized through a Hierarchical agglomerative clustering schema,
6.1     City Clustering
                                                                           adopting Ward’s linkage criterion, which at each step of aggre-
In this step, the city indicators built in the previous sections are       gation aims to minimize the total within-cluster variance. In our
first preprocessed and filtered, and then used to cluster cities.          case study, a small fraction of cities resulted to be disconnected
                   cluster id   # of cities   % of cities
                       0            22           8.0
                       1            53           19.2
                       2            47           17.0
                       3           110           39.9
                       4            44           15.9
                 Table 1: Cluster Population


                                                                              Figure 6: Selected cells for some municipalities.


                                                                        from outside or going outside. Indeed, they have high values
                                                                        for entropy and low values for Moran’s I, the highest number of
                                                                        nodes regularly visited from users and a large ego network radius.
                                                                        This cluster is the most populated, comprising almost 40% of the
                                                                        dataset. Finally, cluster 4 was called Hubs, since it comprises all
                                                                        the biggest cities, encompassing most of the busiest roads in
                                                                        Tuscany. Municipalities are pretty similar to those belonging to
                                                                        cluster 3, excepted that they have a large Moran’s I, which reflects
                                                                        the presence of specific patterns within the city.

                                                                        6.2    Traffic Forecasting in City Grids
                                                                        Urban traffic prediction is a discipline that aims to exploit ML
                                                                        models to capture hidden traffic characteristics from substantial
                                                                        historical mobility data, making then use of trained models to
Figure 5: Map of clustered municipalities. Colors white,                predict traffic conditions in the future [24]. However, there is
yellow, orange, red and dark red correspond resp. to clus-              a main problem to face: is it possible to extract specific traffic
ters 0, 1, 2, 3 and 4.                                                  patterns that reflect the peculiarities of a city structure?
                                                                            A Grid to Split the City. Following one of the most used ap-
from all the others in terms of flows, thus making them outliers        proach in traffic prediction problems [24], we divide every geo-
w.r.t. the global features (e.g., the assortativity measure is null).   graphical area corresponding to municipalities in adjacent squared
Therefore, we decided to put them in a separate cluster, and apply      cells having side of 0.5 km, and our predictive objective is to fore-
the hierarchical clustering on the remaining ones. The results          cast the traffic flow that crosses a given cell. In our case study we
of applying the clustering to our dataset is shown in Figure 4 as       select a subset of representative cells and, in order to avoid the
a dendogram of the hierarchical clusters found (notice that the         possible issues emerging when a random or top-frequency subset
dendogram is truncated, in order to show only the last 12 aggre-        is selected, we adopt a mixed approach, randomly selecting 5 cells
gations. Based on the gaps between splits/merge points in the           among those having a traffic volume above the 90𝑡ℎ percentile
dendogram, the aggregation is stopped at distance 4.0, yielding         over the municipality, and other 5 cells among those having a
four clusters. To these, we add another cluster (id 0) containing       traffic volume between the 80𝑡ℎ and the 90𝑡ℎ percentiles.
the isolated cities. A summary of clusters’ size is in Table 1.
                                                                           Time Series Preprocessing. Based on the trajectories that cross
   An analysis of the properties of each cluster reveals that they
                                                                        the representative cells identified above, we compute a time series
may be distinguished based on the kind of traffic flows they
                                                                        for each cell with a 1-hour sampling rate, by counting the number
involve. Also, clusters are depicted on the map in Figure 5. Cluster
                                                                        of vehicles that crossed the cell within each hour of each day.
0 was named Disconnected, since it is composed by the nodes not
                                                                        A first operation performed was to compute a moving-average
connected in the inter-city flows network. These municipalities
                                                                        smoothing of the time series, since a preliminary test with the
also have a low entropy and low Moran’s I score, meaning a
                                                                        Augmented Dickey-Fuller test (ADF) [14] reveals that they are not
not significant pattern of traffic, and most of them are located at
                                                                        stationary, i.e. it could not be rejected the null hypothesis that a
the boundary of Tuscany and in the country-side areas, where
                                                                        unit root is present in the time series sample (ADF=-2.38 against
there is a lower concentration of roads. Cluster 1, named Self
                                                                        a critical 90% threshold at -2.57). On the contrary, after smooth-
Sufficient, is characterized by high entropy, high modularity and
                                                                        ing, the null hypothesis is rejected with a very large confidence
high fraction of regular trips, yet a low radius of gyration and
                                                                        (ADF=-5.57 against a 99% threshold at -3.43, p-value = 2 · 10−5 ).
low diameter of the associated ego networks. Also, they are
mostly far from the highways that cross the region. Cluster 2,             Predictive Features. Similarly to what done by several time
called Visited Sites, have a very low entropy (almost as low as         series forecasting solutions [12], we base our predictions for the
those in the disconnected group), low modularity and the lowest         next value of the time series on more recent observations of the
fraction of regular trips, and yet a relatively high betweenness.       same time series. In particular, we adopt as basic features the
Cluster 3 was named Drive Through, as these cities are crossed          24 most recent lagged values, i.e., the observations of the last
by a great flow of traffic, which is however basically coming           24 hours. We remark that in this simplified approach we do not
Figure 7: XGBoost traffic forecasting on Florence (green)
against real values (blue). The two curves differ in very
few points.


include features about other time series in the same municipal-
ity, as done in more complex solutions that exploit the spatial
autocorrelation of this kind of phenomena.                                Figure 8: Transfer scores matrix with cluster separation
   Another important property that can be encoded is related to           (red lines). Each row/column represents a municipality.
the weekday; at this regard, we introduce the Boolean feature
is_weekend that is true if the weekday is Saturday or Sunday and
false otherwise, since we expect to see different behaviors in the
weekends. Finally, we can encode information about a weekday
by inserting the average traffic volume at that day.
   Having a total of 26 new features, we can now try to forecast
the smoothed time series.
    Predictive Model. As regressive model, we selected the popular
and effective algorithm XGBoost [8]. XGBoost has proved to be
highly reliable in regression tasks, providing in general a good
accuracy of predictions and remarkable speed of execution, yield-
ing good results in term of robustness with its default settings,
which simplifies our task. XGBoost adopts a Boosting procedure,
i.e., is a ML ensemble meta-algorithm for primarily reducing bias
and variance in supervised learning, where a set of weak learners
is turned into a single strong learner.
    In Figure 7 we can see an example of XGBoost predictions ex-
ploiting the features previously introduced over the municipality
of Florence, which shows results very close to the real values. The         Figure 9: NMRSE mean values for all train-test pairs.
model performance is evaluated through the standardq       Normal-
                                                            ( 𝑦ˆ −𝑦 ) 2
                                                       Í𝑇
                                                         𝑡 =1 𝑡 𝑡
ized Root Mean Squared Error, defined as RMSE =               𝑇    ,      of prediction scores where on the rows there are the cities in
having predicted values 𝑦ˆ𝑡 for times 𝑡 of a regression’s dependent       which the model is trained and in the columns those where the
variable 𝑦𝑡 , with variables observed over 𝑇 times. RMSE is always        model is tested. The algorithm implemented iteratively trains
non-negative, the lower is the value the better are the predictions.      a model on each city, tests it against all the cities and fills the
Since RMSE is scale-dependent, we adopt the Normalized RMSE               score matrix with the corresponding NRMSE score obtained. To
(NRMSE), computed as: NRMSE = RMSE      𝜎 , where 𝜎 is the standard       enable a more meaningful comparison, NRMSE scores are log-
deviation of the observed values.                                         transformed to reduce the skewness.
   Empirical evaluation shows that the most important feature is             The final result is visually shown Figure 8 which shows the
the value of traffic 1 hour before, as expected, while the previous       transfer scores by sorting the cities based on their cluster be-
hours have all a comparable influence. Instead, it is apparently          longing. Keeping in mind that the squares around the diagonal
almost irrelevant to know if a day is a week-end day or not.              represent training and testing on cities of the same cluster, while
                                                                          the other rectangles depict training and testing on different clus-
6.3    Testing Model Transferability                                      ters, we can observe:
In this section we study the transferability of the predictive mod-          (1) the transfer is far better between cities of the same cluster
els built above, and its relation with the similarity groups found               (the NRMSE values obtained making predictions on a mu-
through clustering. The hypothesis we want to test is that the                   nicipality using a model built on a different one is lower
similarity based on our city indicators is indeed useful to identify             if the two belong to the same cluster);
groups of areas such that any model built from an area in the                (2) it is worth noting that also cluster 0, that we built up
cluster is usable in other areas within the same cluster.                        artificially behave exactly as the others;
   The first step is to split the traffic time series of each city in        (3) the matrix is not symmetric: training on city 𝐴 and testing
training and test sets. In this way it is possible to obtain a matrix            on 𝐵 is different from training on 𝐵 and testing on 𝐴.
The trend noticed in Figure 8 can be better identified by comput-                           [7] Harold Carter. 1995. The Study of Urban Geography. E. Arnold publications.
ing the average error among the clusters, i.e., considering all the                         [8] Tianqi Chen et al. 2016. XGBoost: A Scalable Tree Boosting System.
                                                                                            [9] CITEAIR consortium. 2007. Air Quality in Europe web site. http://www.
possible source areas in each cluster (where the models are built)                              airqualitynow.eu/ [Online; accessed 21-December-2020].
and all the possible target areas in each other cluster (where the                         [10] Michele Coscia, Giulio Rossetti, et al. 2012. Demon: a local-first discovery
                                                                                                method for overlapping communities. In ACM SIGKDD. 615–623.
model is tested), including the case source = target. This is shown                        [11] H.G De Sherbinin, A.; Bittar. London, UK, 2003. The Role of Sustainability
in Figure 9, where each bar corresponds to one of the rectangles                                Indicators as a Tool for Assessing Territorial. Environmental Competitiveness;
outlined in red in Figure 8. We observe that the lowest mean                                    International Forum for Rural Development (London, UK, 2003).
                                                                                           [12] George E. P. Box et al. 2015. Time Series Analysis: Forecasting and Control.
values are always those corresponding to central squares, where                                 John Wiley and Sons.
the source and the target cities are from the same cluster. Overall,                       [13] Liu F. et al. 2020. Citywide Traffic Analysis Based on the Combination of
these results confirm our hypothesis, namely that the similarity                                Visual and Analytic Approaches. J geovis spat anal 4, 15 (2020).
                                                                                           [14] W. A. Fuller. 1976. Introduction to Statistical Time Series. John Wiley and Sons.
of cities based on our city indicators is a good proxy of model                            [15] Fosca Giannotti, Mirco Nanni, Dino Pedreschi, et al. 2011. Unveiling the
transferability, at least for the simple predictive task we adopted.                            complexity of human mobility by querying and mining massive trajectory
                                                                                                data. The VLDB Journal 20, 5 (Oct. 2011), 695–719.
                                                                                           [16] Fosca Giannotti, Mirco Nanni, Fabio Pinelli, and Dino Pedreschi. 2007. Tra-
7     CONCLUSIONS                                                                               jectory pattern mining. In Proceedings of the 13th ACM SIGKDD international
                                                                                                conference on Knowledge discovery and data mining. 330–339.
In this work we have defined a large array of local and global city                        [17] D. Gillis, I. Semanjski, and D. Lauwers. 2015. How to Monitor Sustainable Mo-
indicators, we have calculated them on a real case study, and we                                bility in Cities? Literature Review in the Frame of Creating a Set of Sustainable
have proved that they can be successfully exploited in a task of                                Mobility Indicators. Sustainability 8 (2015), 29.
                                                                                           [18] Marta C. Gonzalez, Cesar A. Hidalgo, et al. 2008. Understanding individual
mobility transfer learning. In particular, we have clustered mu-                                human mobility patterns. Nature 453, 7196 (June 2008), 779–782.
nicipalities based on the mobility behavior described by the city                          [19] Riccardo Guidotti and Mirco Nanni. 2020. Crash Prediction and Risk As-
                                                                                                sessment with Individual Mobility Networks. In 2020 21st IEEE International
indicators. Then, we have assessed the transferability of a ma-                                 Conference on Mobile Data Management (MDM). IEEE, 89–98.
chine learning model for traffic forecasting. Experimental results                         [20] Riccardo Guidotti, Mirco Nanni, Salvatore Rinzivillo, Dino Pedreschi, and
show that models trained on a municipality perform markedly                                     Fosca Giannotti. 2017. Never drive alone: Boosting carpooling with network
                                                                                                analysis. Information Systems 64 (2017), 237–257.
better when tested on other municipalities belonging to the same                           [21] Riccardo Guidotti, Roberto Trasarti, Mirco Nanni, Fosca Giannotti, and Dino
cluster, and thus more similar (according to the city indicators)                               Pedreschi. 2017. There’s a path for everyone: A data-driven personal model
to the first one.                                                                               reproducing mobility agendas. In 2017 IEEE International Conference on Data
                                                                                                Science and Advanced Analytics (DSAA). IEEE, 303–312.
   As future work, it would be interesting to extend the set of                            [22] M. K. Jat, P. K. Garg, and D. Khare. 2008. Modelling of urban growth using
features used to describe a city, for example including census                                  spatial analysis techniques: a case study of Ajmer city (India). International
                                                                                                Journal of Remote Sensing 29, 2 (2008), 543–567. https://doi.org/10.1080/
and cartographic data or some indicators related to economy,                                    01431160701280983 arXiv:https://doi.org/10.1080/01431160701280983
industry level and information about the most florid commercial                            [23] Gabriel Lang, Eric Marcon, and Florence Puech. 2016. Distance-based Measures
activities in each area. All these extra properties would also                                  of Spatial Concentration: Introducing a Relative Density Function. (Sept. 2016).
                                                                                           [24] Z. Liu, Z. Li, K. Wu, and M. Li. 2018. Urban Traffic Prediction from Mobility
help to interpret the results of clustering, to identify patterns of                            Data Using Deep Learning. IEEE Network 32, 4 (2018), 40–46. https://doi.org/
similarity and eventually to supervise with some kind of feedback                               10.1109/MNET.2018.1700411
the allocation of a city to a determinate group. More models                               [25] A Paolo Masucci, Joan Serras, Anders Johansson, and Michael Batty. 2013.
                                                                                                Gravity versus radiation models: On the importance of scale and heterogeneity
should be analyzed and compared to evaluate which is the most                                   in commuting flows. Physical Review E 88, 2 (2013), 022812.
effective. Finally, the approach presented here works on a city-to-                        [26] P. A. P. Moran. 1950. Notes on Continuous Stochastic Phenomena. Biometrika
                                                                                                37, 1/2 (1950), 17–23.
city transfer, namely the model of a single city is used to make                           [27] Paul Newson and John Krumm. 2009. Hidden markov map matching through
prediction on the destination city. That assumes that there exists                              noise and sparseness. In Proceedings of the 17th ACM SIGSPATIAL International
at least one origin city that is similar enough to perform the                                  Conference on Advances in Geographic Information Systems. ACM, 336–343.
                                                                                           [28] Luca Pappalardo, Salvatore Rinzivillo, Zehui Qu, Dino Pedreschi, and Fosca
transfer. Alternatively, all the data and known city models can                                 Giannotti. 2013. Understanding the patterns of car travel. The European
be exploited to achieve better prediction on the target city.                                   Physical Journal Special Topics 215, 1 (01 Jan 2013), 61–73.
                                                                                           [29] Karl Pearson. 1895. Notes on regression and inheritance in the case of two
                                                                                                parents. Proceedings of the Royal Society of London 58 (1895), 240–242.
ACKNOWLEDGMENTS                                                                            [30] S. Porta, P. Crucitti, and V. Latora. 2006. Centrality measures in spatial net-
                                                                                                works of urban streets. Physical Review E 73, 3, part 2 (24 3 2006), 036125–1.
This work is partially supported by the European Community                                 [31] Salvatore Rinzivillo, Lorenzo Gabrielli, Mirco Nanni, Luca Pappalardo, Dino
H2020 programme under the funding scheme Track &Know (Big                                       Pedreschi, and Fosca Giannotti. 2014. The purpose of motion: Learning activi-
Data for Mobility Tracking Knowledge Extraction in Urban Ar-                                    ties from individual mobility networks. In 2014 International Conference on
                                                                                                Data Science and Advanced Analytics (DSAA). IEEE, 312–318.
eas), G.A. 780754, https://trackandknowproject.eu/ and SoBig-                              [32] Antônio Nélson Rodrigues da Silva et al. 2015. A comparative evaluation of
Data++, G.A. 871042, http://www.sobigdata.eu. We thank Christoph                                mobility conditions in selected cities of the five Brazilian regions. Transport
Pfaltz and Matteo Centonze for their contribution to preliminary                                Policy 37 (2015), 147 – 156.
                                                                                           [33] P.A. Rogerson. 2010. Statistical Methods for Geography: A Student’s Guide.
materials of this work.                                                                         SAGE Publications. https://books.google.ch/books?id=Zz69Ab8i0QsC
                                                                                           [34] Meead Saberi, Hani S. Mahmassani, Dirk Brockmann, and Amir Hosseini.
                                                                                                2017. A complex network perspective for characterizing urban travel demand
REFERENCES                                                                                      patterns: graph theoretical analysis of large-scale origin–destination demand
 [1] W Alonso. 1976. A Theory of Movements: Introduction. Working Paper 266                     networks. Transportation 44, 6 (November 2017), 1383–1402.
     (1976).                                                                               [35] Claude Elwood Shannon. 1948. A Mathematical Theory of Communication.
 [2] Gennady Andrienko et al. 2020. (So) Big Data and the transformation of the                 The Bell System Technical Journal 27, 3 (7 1948), 379–423.
     city. International Journal of Data Science and Analytics (2020).                     [36] Sulochana Shekhar. 2004. Urban sprawl assessment Entropy approach. GIS
 [3] Valerio Arnaboldi, Marco Conti, Andrea Passarella, and Robin IM Dunbar.                    Development 2004, Vol 8 issue 5, Page ., 6 Pages (05 2004), 43 – 48.
     2017. Online social networks and information diffusion: The role of ego               [37] Filippo Simini, Marta C. Gonzalez, Amos Maritan, et al. 2012. A universal
     networks. Online Social Networks and Media 1 (2017), 44–55.                                model for mobility and migration patterns. Nature 484, 7392 (2012), 96–100.
 [4] Albert-László Barabási and Márton Pósfai. 2016. Network science. Cambridge            [38] Pavlos Tafidis et al. 2017. Sustainable urban mobility indicators: policy versus
     University Press, Cambridge.                                                               practice in the case of Greek cities. Transportation Research Procedia 24 (2017),
 [5] Michael Batty. [n.d.]. Spatial Entropy. Geographical Analysis 6, 1 ([n. d.]), 1–31.        304 – 312. 3rd Conference on Sustainable Urban Mobility.
     https://doi.org/10.1111/j.1538-4632.1974.tb01014.x                                    [39] Roberto Trasarti, Riccardo Guidotti, Anna Monreale, and Fosca Giannotti.
 [6] Vincent D Blondel, Jean-Loup Guillaume, Renaud Lambiotte, and Etienne                      2017. Myway: Location prediction via mobility profiling. Information Systems
     Lefebvre. 2008. Fast unfolding of communities in large networks. Journal of                64 (2017), 350–367.
     Statistical Mechanics: Theory and Experiment 2008, 10 (2008), P10008.

</pre>