Predicting Land Use of Italian Cities using Structural Semantic Models
                 Gianni Barlacchi1,2 , Bruno Lepri3 , Alessandro Moschitti1,4
      1
        Department of Information Engineering and Computer Science, University of Trento
                    2
                      TIM Semantics and Knowledge Innovation Lab, Trento
                               3
                                 Fondazione Bruno Kessler, Trento
                          4
                            Qatar Computing Research Institute, HBKU
                   {gianni.barlacchi,amoschitti}@gmail.com
                                       lepri@fbk.eu

                     Abstract                          activities. For example, the automatic analysis of
     English. We propose a hierarchical se-            land use enables the possibility of better adminis-
     mantic representation of urban areas ex-          trating a city in terms of resources and provided
     tracted from a social network to classify         services. However, such analysis requires specific
     the most predominant land use, which is           information, which is often not available for pri-
     a very common task in urban computing.            vacy concerns. In this paper we follow the ap-
     We encode geo-social data from Location-          proach proposed in (Barlacchi et al., 2017) and
     Based Social Networks with standard fea-          we use public textual descriptions of urban ar-
     ture vectors and a conceptual tree structure      eas to design a novel machine learning represen-
     that we call Geo-Tree. We use the latter          tation. We represent urban areas as: (i) a bag-
     in kernel machines, which can thus per-           of-concepts (BOC), e.g., the terms Arts and En-
     form accurate classification, exploiting hi-      tertainment, College and University, Event, Food
     erarchical substructure of concepts as fea-       extracted from the Foursquare description of the
     tures. Our comparative study on three             area; and (ii) the same concepts above organized in
     datasets extracted from Milan, Rome and           a tree, which reflects the hierarchical organization
     Naples shows that Tree Kernels applied            of Foursquare activities. We combine BOC vec-
     to Geo-Trees are very effective improving         tors with Tree Kernels (TKs) (Collins and Duffy,
     the state of the art.                             2002; Moschitti, 2006) applied to concept trees
                                                       (Geo-Tree) and use them in Support Vector Ma-
     Italiano. In questo lavoro, proponiamo un
                                                       chines (SVMs). The Geo-Tree allows the model
     nuovo modello semantico per la rappre-
                                                       to learn complex structural and semantic patterns
     sentazione di aree urbane utilizzando dati
                                                       from the hierarchical conceptualization of an area.
     da social media. In particolare, model-
                                                       We show that TKs not only can capture seman-
     liamo tale informazione con una struttura
                                                       tic information from natural language text, e.g., as
     ad albero che abbiamo chiamato Geo-
                                                       shown for semantic role labeling (Moschitti et al.,
     Tree. Questa viene utilizzata, in combi-
                                                       2008) and question answering (Severyn and Mos-
     nazione con un vettore di feature clas-
                                                       chitti, 2013; Barlacchi et al., 2015b), but they can
     sico, nelle kernel machine per fare clas-
                                                       also learn from the hierarchy above to perform se-
     sificazione della destinazione di uso delle
                                                       mantic inference, such as deciding which is the
     aree urbane. Abbiamo valutato il nostro
                                                       major activity of a land.
     approccio su tre grandi metropoli italiane
     quali Milano, Roma e Napoli. I risultati             We carried out a study on land use prediction
     mostrano come i Geo-Tree, applicati ai            of three Italian cities: Milan, Rome and Naples
     Tree Kernel, riescono a raggiungere risul-        as follows: (i) we divided each city in squares of
     tati di molto superiori ad altri modelli at-      200x200 meters; (ii) then, we classify the most
     tualmente stato dell’arte.                        predominant land use class (e.g., High Density Ur-
                                                       ban Fabric or Open Space and Outdoor), assigned
1    Introduction                                      by the city administration. The results show that
The growing availability of data from cities (Bar-     GeoTKs achieve an impressive improvement over
lacchi et al., 2015a) (e.g., traffic flow, human mo-   state-of-the-art classification approaches based on
bility and geographical data) opens new opportu-       BOC., i.e., 21.2%, 13.6% and 54.3% of relative
nities for predicting and thus optimizing human        improvement in Macro-F1 over Milan, Rome and
Naples datasets, respectively.                              (i) High Density Urban Fabric, (ii) Medium Den-
                                                            sity Urban Fabric, (iii) Low Density Urban Fab-
2   Related Work
                                                            ric, (iv) Industrial, commercial, public, military
Previous work has modeled land use classification           and private units, (v) Open Space & Recreation,
by means of different sources of information. For           (vi) Transportation. We collapsed Medium and
example, Yuan et al. (2012) built a framework that,         Low Density Urban Fabric into one single cate-
using human mobility patterns derived from taxi-            gory, ML-Density Urban Fabric as they only have
cab trajectories and Point Of Interests (POIs), clas-       few samples. Land use distribution is very fine-
sifies the functionality of an area for the city of         grained, making its classification based on POI in-
Beijing. Assem et al. (2016) proposed a spatio-             formation very difficult. A trade-off between clas-
temporal approach based on three different clus-            sification accuracy and the desired area granular-
tering algorithms to model the change of function-          ity consists in segmenting the regions in squared
ality of a city’s region over time. They extracted          cells. As each cell can contain more than one land
features from Foursquare’s POIs and check-in ac-            use label, we consider the predominant label as its
tivities of Manhattan. Yao et al. (2017) built se-          primary use.
quences of POI concepts reflecting their spatial
distance. Then, they applied Word2Vec (Mikolov              3.2 Point-Of-Interest
et al., 2013) to these sequences to derive vectors          A POI is usually characterized by a location (i.e.,
representing each area, which was used to train             latitude and longitude), textual information (e.g.,
a land use classifier. In general, most previous            a description of the activity in that place) and
work applies extensive feature engineering, which           a hierarchical categorization that provides differ-
is typically costly as it requires to fully understand      ent levels of detail about the activity of the place
the target domain. Our approach alleviates this             (e.g., Food, Asian Restaurant, Chinese Restau-
problem with automatic feature engineering ap-              rant). We used POIs extracted from Foursquare, a
plied to an abstract land representation.                   geolocation-based social network supported with
                                                            web search facilities for places and a recommen-
3   Land Description Data                                   dation system. In particular, we extracted 46,731,
Geospatial city areas are described with the pop-           43,389 and 7,219 POIs from Milan, Rome and
ular shape file format, where each shape is a col-          Naples4 , respectively. We focused on the ten
lection of points geo-localized using their coordi-         macro-categories of such POIs5 , each one special-
nates. The latter are provided with the well-known          ized in maximum four levels of detail.
Coordinate Reference System (CRS) WGS84,
                                                            4       Structural Models
adopted for the common latitude/longitude geolo-
cation. We use (i) shape files provided by Urban            In most machine learning algorithms data exam-
Atlas1 , a website providing data for large urban ar-       ples are transformed in feature vectors, which
eas (more than 100, 000 inhabitants) and (ii) POIs          in turn are used in dot products to carry out
from Foursquare2 .                                          both learning and classification. Kernel Machines
                                                            (KMs) allow for replacing the dot product with
3.1 Land Use                                                kernel functions, which directly compute it on the
Cities are divided in small areas associated with           examples, i.e., they avoid the transformation of ex-
a main land use. In total, there are 17 differ-             amples in vectors. The main advantage of KMs is
ent land use classes defined from the open dataset          a much lower computational complexity as it does
Urban Atlas 3 . We focused on those related to              not directly depend on the feature space size.
city centers, discarding those less interesting from
                                                            4.1 Point-of-interests Features
a social viewpoint, i.e., associated with rural ar-
                                                            The most straightforward way to represent an area
eas such as forests, agricultural, semi-natural and
                                                            by means of Foursquare data is the use its POIs.
wetland areas and mineral extraction and dump
                                                            Every venue is hierarchically categorized (e.g.,
sites. Thus, we selected the following categories:
                                                            Professional and Other Places → Medical Center
    1
      https://www.eea.europa.eu/data-and-maps/data/urban-   → Doctor’s office) and the categories are used to
atlas                                                       produce an aggregated representation of the area.
    2
      https://foursquare.com/
    3                                                           4
      https://www.eea.europa.eu/data-and-maps/data/urban-           For some reasons Foursquare is less popular in Naples
                                                                5
atlas#tab-additional-information                                    https://developer.foursquare.com/categorytree
We define a feature vector for a grid cell by count-    the paths of FH starting from grid concepts. Figure
ing the macro-level category (e.g., Food) in all the    1 shows an example of the FH paths of a cell and
POIs that we found in that cell.                        the resulting Geo-Tree.
                                                           This way, the nodes of the first level, i.e.,
4.2 Geographical Tree Kernel
                                                        the root children, correspond to the most general
Foursquare has its own hierarchy of categories,         FH categories, e.g., Arts & Entertainment, Event,
which is used to characterize each location and ac-     Food, etc., the second level of our tree corre-
tivity (e.g., restaurants or shops) in the database.    sponds to the second level of the hierarchical tree
Thus, each Foursquare POI is associated with a hi-      of Foursquare, and so on. The terminal nodes are
erarchical path, which semantically describes the       the finest-grained descriptions in terms of category
type of location/activity (e.g., for Chinese Restau-    about the area, e.g., College Baseball Diamond
rant, we have the path Food → Asian Restau-             or Southwestern French Restaurant. For exam-
rant → Chinese Restaurant). The path is much            ple, Fig. 2 illustrates the semantic structure of a
more informative than just the target POI name,         grid cell obtained by combining all the categories’
as it provides feature combinations following the       chains of each venue.
structure and the node proximity information, e.g.,
Food & Asian Restaurant or Asian Restaurant
& Chinese Restaurant are valid features whereas
Food & Chinese Restaurant is not.


                                                        Figure 2: Example of Geo-Tree in Milan for an
                                                        area labeled as Open Space & Recreation.
                                                        GeoTK: given a Geo-Tree, we can encode all
                                                        its substructures in kernel machines using TKs.
                                                        In particular, we used the Syntactic Tree Kernels
                                                        (STKb ) with Bag-Of-Words and the Partial Tree
                                                        Kernel (PTK) (Moschitti, 2006). Our TKs by con-
                                                        struction do not consider the frequency6 of the
                                                        POIs present in a given grid cell.
                                                        BOC kernel: to complement GeoTK, we repre-
                                                        sent a cell also creating a BOC representation,
                                                        namely we count the macro-level category (e.g.,
                                                        Food) in all the POIs that we found in any cell
                                                        grid. This way, we generate feature vectors by
                                                        counting the number of each activity under each
                                                        macro-category. In order to take into consideration
                                                        the popularity of the area, we included (i) the total
                                                        sum of unique users that did at least one check-in
                                                        in the cell, and (ii) the total sum of check-in done
                                                        in the cell. Note that, given an area, the number of
                                                        unique users provides an idea on how many peo-
                                                        ple visited it, while the number of check-in can be
Figure 1: Example of Geo-Tree built from a col-         used to represent its popularity.
lection POIs in a cell.                                 Kernel combination: finally, given two geo-
                                                        graphical areas, xa and xb , we define a kernel
Geo-Tree: we propose a new tree structure, i.e.,
                                                        combining Geo-Tree and BOC as: K(xa , xb ) =
Geo-Tree, whose nodes and edges among them are
                                                        T K(ta , tb ) + KV (va , vb ), where T K is any
subsets of the Foursquare hierarchy (FH). A Geo-
                                                            6
Tree of a grid cell is constituted by a new root node         It is possible to add the frequency in the kernel computa-
                                                        tion but for our study we preferred to have a completely dif-
connecting the subtrees of FH rooted in concepts        ferent representation from previous typical frequency-based
present in the cell. In other words, we connect all     approaches.
structural kernel function applied to tree represen-        City       Model           Prec.     Rec.      F1
                                                                       baseline        0.200 0.119 0.149
tations, ta and tb of the geographical areas and                       XGBoost         0.294 0.317 0.297
KV is a kernel applied to the feature vectors, va                      STK b+Rbf 0.368 0.364 0.360
                                                             Milan
and vb , extracted from xa and xb using any data                       PTK+Rbf         0.430 0.350 0.345
                                                                       STK b           0.448 0.307 0.320
source available (e.g., text, social media, mobile                     PTK             0.364 0.302 0.309
phone and census data).                                                baseline        0.200 0.089 0.124
                                                                       XGBoost         0.291 0.306 0.279
5       Experiments and Results                                        STK b+Lin 0.359 0.314 0.317
                                                             Rome
We performed our experiments on the data from                          STK             0.338 0.300 0.302
                                                                       PTK             0.340 0.300 0.299
Milan, Rome and Naples. We used a grid of                              PTK+Lin         0.359 0.297 0.291
200x200meters as it is indicated as the best size                      baseline        0.200 0.100 0.133
from other similar previous work on land use                           XGBoost         0.236 0.272 0.219
classification (Toole et al., 2012; Zhan et al.,                       STK b+Rbf 0.361 0.331 0.338
                                                             Naples
                                                                       STK b+Lin       0.338 0.302 0.300
2014; Barlacchi et al., 2017). We applied a                            STK b           0.409 0.290 0.299
pre-processing step in order to filter out cells for                   PTK             0.318 0.298 0.297
which land use classification cannot be performed.      Table 1: Classification results on Rome, Milan and Naples.
                                                        Prec., Rec. and F1 are averaged over all categories.
In particular, for Milan and Rome, we selected
the central point of the shape and we included
                                                        nomial and radial basis function kernels, named
those cells that have their centroid in the radius
                                                        SVM-{Lin, Poly, Rbf}, respectively, and our
of 15 and 8 kilometers, respectively. For Naples,
                                                        structural semantic models, indicated with STKb
we kept all the cells due to the smaller size of the
                                                        and PTK. We also combined kernels with a sim-
city. Then, for all the three cities, we removed the
                                                        ple summation, e.g., PTK+Lin indicates an SVM
cells that (i) cover areas without a specified land
                                                        using such kernel combination.
use (e.g., the cells in the sea) and (ii) do not have
                                                           Table 1 shows the average of F1, Precision and
POIs (e.g., the countryside cells). After this step,
                                                        Recall over the different categories. The model
we obtained a grid with 2,581, 5,657 and 1,314
                                                        baseline is obtained by always classifying an ex-
cells for Milan, Rome and Naples, respectively.
                                                        ample with the label High Density Urban Fabric,
We created, separately for each city, the training
                                                        which is the most frequent. Due to space con-
and test set randomly sampling 80% vs. 20% of
                                                        straint, we only reported six models, namely: the
the cells. We labelled the dataset following the
                                                        baseline, XGBoost and the top four kernel models.
same category aggregation strategy proposed by
Zhan et al. (2014), who assigned the predominant
                                                           We note that: (i) GeoTK always outperforms
land use class to each grid cell.
                                                        XGBoost and the baseline, demonstrating the su-
                                                        periority of our novel approach. This is an inter-
   To train our models, we applied SVM-Light-
                                                        esting finding as XGboost is the current state of the
TK7 , which enables the use of structural kernels
                                                        art for land use classification. (ii) STKb combined
(Moschitti, 2006) in SVM-Light8 . In particular,
                                                        with feature vector always produces the best re-
due to the nature of the task, we used the Python
                                                        sults, improving the F1-score over XGBoost up to
wrapper around SVM-Light-TK to perform mul-
                                                        6.3, 3.8 and 11.9 absolute points for Milan, Rome
ticlass classification9 . We experimented with lin-
                                                        and Naples, respectively. (iii) Kernel combina-
ear, polynomial and radial basis function kernels
                                                        tions always provide the best results.
applied to standard feature vectors. We measured
the performance of our classifier by averaging Pre-     6   Conclusions
cision, Recall and F1 over all land use categories.     In this paper, we have introduced Geo-Trees, a
5.1 Results for Land Use Classification                 novel semantic representation based on a hierar-
We trained multi-class classifiers using com-           chical classification of POIs, to better exploit geo-
mon learning algorithm such XGboost (Chen and           social data to the classification of the primary land
Guestrin, 2016), and SVM using linear, poly-            use of an urban area. This is an important task
    7
                                                        as it gives the urban planners and policy makers
     http://disi.unitn.it/moschitti/Tree-Kernel.htm
    8                                                   the possibility to better administrate and renew a
     http://svmlight.joachims.org/
   9
     https://github.com/aseveryn/SVMTK-Multiclass-      city in terms of infrastructures, resources and ser-
Classifier                                              vices. More in detail, we have built our classi-
fiers with combinations of a kernel over BOC and         Jameson L Toole, Michael Ulm, Marta C González,
TKs applied to Geo-Trees, thus exploiting hierar-          and Dietmar Bauer. 2012. Inferring land use from
                                                           mobile phone activity. In SIGKDD International
chical substructure of concepts as features. Our
                                                           Workshop on Urban Computing, pages 1–8. ACM.
comparative study on three large Italian cities, Mi-
lan, Rome and Naples shows that our models can           Yao Yao, Xia Li, Xiaoping Liu, Penghua Liu, Zhaotang
relatively improve the state of the art up to 11.9         Liang, Jinbao Zhang, and Ke Mai. 2017. Sens-
                                                           ing spatial distribution of urban land use by integrat-
absolute points in F1-score.                               ing points-of-interest and google word2vec model.
                                                           International Journal of Geographical Information
Acknowledgments                                            Science, 31(4):825–848.

This work has been partially supported by the EC         Jing Yuan, Yu Zheng, and Xing Xie. 2012. Discov-
project CogNet, 671625 (H2020-ICT-2014-2, Re-               ering regions of different functions in a city using
                                                            human mobility and pois. In KDD, pages 186–194.
search and Innovation action).                              ACM.
                                                         Xianyuan Zhan, Satish V Ukkusuri, and Feng Zhu.
References                                                 2014. Inferring urban land use using large-scale so-
                                                           cial media check-in data. Networks and Spatial Eco-
Haytham Assem, Lei Xu, Teodora Sandra Buda, and            nomics, 14(3-4):647–667.
  Declan O’Sullivan. 2016. Spatio-temporal clus-
  tering approach for detecting functional regions in
  cities. In ICTAI, pages 370–377. IEEE.

Gianni Barlacchi, Marco De Nadai, Roberto Larcher,
  Antonio Casella, Cristiana Chitic, Giovanni Torrisi,
  Fabrizio Antonelli, Alessandro Vespignani, Alex
  Pentland, and Bruno Lepri. 2015a. A multi-source
  dataset of urban life in the city of milan and the
  province of trentino. Scientific data, 2:150055.

Gianni Barlacchi, Massimo Nicosia, and Alessandro
  Moschitti. 2015b. Sacry: Syntax-based automatic
  crossword puzzle resolution system. ACL-IJCNLP
  2015, page 79.

G Barlacchi, A Rossi, B Lepri, and A Moschitti. 2017.
  Structural semantic models for automatic analysis of
  land use.

Tianqi Chen and Carlos Guestrin. 2016. Xgboost: A
   scalable tree boosting system. In KDD, pages 785–
   794, New York, NY, USA. ACM.

Michael Collins and Nigel Duffy. 2002. New Ranking
  Algorithms for Parsing and Tagging: Kernels over
  Discrete Structures, and the Voted Perceptron. In
  ACL.

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey
  Dean. 2013. Efficient estimation of word represen-
  tations in vector space. CoRR, abs/1301.3781.

Alessandro Moschitti, Daniele Pighin, and Roberto
  Basili. 2008. Tree kernels for semantic role label-
  ing. Computational Linguistics, 34(2):193–224.

Alessandro Moschitti. 2006. Efficient convolution ker-
  nels for dependency and constituent syntactic trees.
  In ECML, pages 318–329. Springer.

Aliaksei Severyn and Alessandro Moschitti. 2013. Au-
  tomatic feature engineering for answer selection and
  extraction. In EMNLP, volume 13, pages 458–467.