=Paper= {{Paper |id=Vol-1743/paper13 |storemode=property |title=Knowledge Tier Platform for Graph Mining in (Smart) Cities |pdfUrl=https://ceur-ws.org/Vol-1743/paper13.pdf |volume=Vol-1743 |authors=Miguel Nuñez-del-Prado,Edgardo Bravo Miguel Sierra,Isaias Hoyos Miguel Canchay |dblpUrl=https://dblp.org/rec/conf/simbig/Nunez-del-Prado16 }} ==Knowledge Tier Platform for Graph Mining in (Smart) Cities== https://ceur-ws.org/Vol-1743/paper13.pdf
         Knowledge Tier Platform for Graph Mining in (Smart) Cities
                   Miguel Nuñez-del-Prado Edgardo Bravo Miguel Sierra
                               Isaias Hoyos Miguel Canchay
                                    Universidad del Pacfico
                                      Av. Salaverry 2020
                                         Lima - Peru
      {m.nunezdelpradoc,er.bravoo,l.sierraflores,i.hoyoslopez,cacnayd}@up.edu.pe



                     Abstract                             form of graphs. This platform enables people to
                                                          share the knowledge of the area where they live
    In the present effort, we present a knowl-
                                                          allowing them to inform about pollution, crime
    edge tier platform to collect information
                                                          levels, traffic jams, streets topology, commerces,
    from cities in a form of graphs. This plat-
                                                          markets, etc. The primary objective is to provide
    form enables people to share the infor-
                                                          information about the city to find spatio-temporal
    mation of the area where they live allow-
                                                          patterns using Graph Mining techniques.
    ing them to inform about pollution, crime
                                                             The present paper is organized as follows. Sec-
    levels, traffic jams, streets topology, com-
                                                          tion 2 introduce some basic concepts, while Sec-
    merces, markets, etc. The main objec-
                                                          tion 3 describes the platform architecture. Sec-
    tive is to provide information, stored in
                                                          tions 4 and 5 show some preliminary results and
    Elastic about a city to find spatio-temporal
                                                          present the discussion about the platform. Finally,
    patterns using Graph Mining techniques
                                                          Section 6 concludes the paper and presents future
    based on Apache Spark GraphX.
                                                          works.
1   Introduction
                                                          2     Basic Concepts
In the last years, we have seen the explosion of
data from on-line activity, user content generated,       In the current section, we introduce some ba-
health, scientific computing, mobile phones activ-        sic concepts, such as graph, knowledge tiers and
ity, etc. This data increments due to the daily           Spark for describing the platform.
transaction of people in urban centers and still
                                                          2.1    Graph
grows. By 2030, 60% of the worldwide popu-
lation will live in cities appearing 27 megacities        A graph is a mathematical structure composed of
greater than 10 million inhabitants (Chourabi et          vertices, nodes or points, which are connected
al., 2012). One technique to solve this problem           through edges, lines or arcs as depicted in Figure
is to generate new instruments for gathering and          1. A graph (G = (V, E)) is composed of a set of
combining information continuously (Hernández-           V vertices and E edges. in our context this struc-
Muñoz et al., 2011). Consequently, there is an in-       ture allows us to represent street intersections as
crement of collaborative platforms to collect data.       geo-referenced nodes and roads as edges.
For instance, a platform, called WebCar, to col-          2.2    Haversine distance
lect GPS data from vehicles to estimate traffic in a
city (Lo et al., 2008). In the field of human health,     The Haversine distance (Shumaker and Sinnott,
Psychlog (Gaggioli et al., 2013) is a mobile phone        1984) computes the shortest distance between two
platform designed to collect users psychological,         points represented by latitude and longitude in the
physiological, and activity information for mental        earth’s surface.
health research relying on a self-report question-
naire. The last example developed an Internet site              dlon = lon2 lon1
and implemented the collection of data for a mul-               dlat = lat2 lat1
ticenter study of ethical decision-making (Avidan                  a = (sin( dlat  2
                                                                              2 )) + cos(lat1 )⇥          (1)
et al., 2005).                                                         cos(lat2 ) ⇥ (sin( dlon 2
                                                                                              ))
                                                                                   p p 2
   In the present effort, we present a knowledge                   c = 2 ⇥ atan2( a, 1 a)
tire platform to collect information on cities in a                d = R⇥c



                                                    110
                                                                             Figure 3: Spark framework.

           Figure 1: Example of a graph.
                                                                 As shown in Figure 3 Apache Spark provides
                                                              at the top of its framework a tool for graph mining
  Where lat,lon and R are the latitude, longitude             call GraphX 2 . This API allows parallel graph com-
and radius of the Earth, respectively.                        putation and integrates tools for extraction, trans-
2.3 Knowledge Tiers                                           formation and load. More detail about the archi-
                                                              tecture as well as the capabilities of Spark is given
Since we are able to model street network of a city           in the next section.
in the form of a graph. Note that each node and
edge could have a weight representing different               3       System Overview
phenomena of a city, such as: (1) congestion, (2)
crime, (3) pollution, (4) population density, (5) ur-         In the current section, we describe the architecture
ban transportation, (6) subway network, etc. Thus,            of our platform. As illustrated in Figure 4, our
for each phenomenon, we have a graph model-                   platform allows collecting data from Open Street
ing this particular fact. Finally, we can stack each          Maps 3 (OSM) to build the graph representing
node as depicted in Figure 2 to have a knowledge              streets and intersections in the form of a comma
stack.                                                        separated values CSV files. Then, these CSV files
                                                              are stored in a NoSQL database. We use Elas-
                                                              ticsearch4 as NoSQL database due to its scalable,
                                                              flexible and performant search and analytics en-
                                                              gine (c.f., Figure 5).




            Figure 2: Knowledge tiers1 .


2.4 Apache Spark
Apache Spark is an open source cluster developed
by the University of Berkeley. Then, the code                         Figure 4: Example of a graph over streets.
was maintained by Apache Software Foundation.
Apache provides distributed computation taking                   Once data is saved in the NoSQL database, we
charge of task dispatching, scheduling, and basic             are able to analyze the knowledge tiers represented
I/O functionalities. These functionalities are avail-         and combined in form of graphs trough Spark
able through Java, Python, Scala and R interfaces.                2
                                                                  GraphX: http://spark.apache.org/graphx/
                                                                  3
                                                                  OSM: https://www.openstreetmap.org/
   1                                                            4
     Fereshteh ASGARI, Inferring User Multimodal Trajec-          Elasticsearch      :https://www.elastic.
tories from Cellular Network Metadata in Metropolitan Ar-     co/guide/en/elasticsearch/reference/
eas                                                           current/index.html




                                                        111
GraphX as depicted in Figure 5. For instance, with
this platform, we could optimize supply chain in
cities minimizing cost, avoiding traffic jams and
passing over low crime rate zones. We can also
discover spatial patterns to understand common
features of high crime rates areas in a city. All
these analytics could be performed using program-
ming languages such as: Scala5 , Java6 , Python7 or
R8 .




                                                        Figure 6: Visualization of the graph over the
                                                        streets.

                                                        edges model the streets connecting nodes or inter-
                                                        sections as shown in Figure 6.


Figure 5: Overview of the Knowledge Tier Plat-
form.

   Finally, we implement a Python script to visu-
alize the result of the pattern mining process us-
ing Google Maps9 . In the next section, we present
some preliminary visualization of graphs stored in
the platform.

4       Preliminary results
                                                        Figure 7: Visualization of the heatmap of tweets
In this section, we present some preliminary re-        over the cartography.
sults, of the Knowledge Tier Platform, about data
gathering, and visualizations.                             Another possibility of visualization are
   Concerning the data collection, we have done         Heatmaps. In our case, Heatmaps are generated
two campaigns to collect data from streets and          based on nodes weight. For example, Figure
tweets in Lima, Peru. The former campaign               7 presents a Heatmap of collected tweets in
was performed in the month of May collecting            the platform. It is worth noting that tweets are
1̃00 000 and 4̃20 000 nodes and vertices, respec-       affected to the nearest node relying on latitude
tively. The latter campaign was carried on between      and longitude of both nodes and tweets. We use
the months of April to Jun obtaining 7̃,1 millions      as distance function the Haversine function (c.f.,
of geolocated tweets.                                   Subsection 2.2). In the next section, we argue
                                                        about the platform, and we present our vision of
   About visualization, the platform allows to plot
                                                        its application to research on Smart Cities.
a graph over a cartography, where the nodes are
placed in the intersections of the streets and the      5   Discussion
    5
      Scala: www.scala-lang.org                         We firmly believe in the potential of this project as
    6
      Java: www.java.com
    7
      Python: www.python.org
                                                        the cornerstone to enable new research directions.
    8
      R: www.r-project.org                              Graphs have been widely used to model different
    9
      Google Maps: /maps.google.com                     kinds of phenomena ranging from: urban street



                                                  112
network (Jiang and Claramunt, 2004), urban and             Hafedh Chourabi, Taewoo Nam, Shawn Walker, J Ra-
regional models (O’Sullivan, 2001), macroscopic              mon Gil-Garcia, Sehl Mellouli, Karine Nahon,
                                                             Theresa A Pardo, and Hans Jochen Scholl. 2012.
model of city traffic (Prasanna et al., 2009), model
                                                             Understanding smart cities: An integrative frame-
city evacuation plan (Yamada, 1996), to plan strat-          work. In System Science (HICSS), 2012 45th
egy for vehicular ad hoc network in a city environ-          Hawaii International Conference on, pages 2289–
ments (Lochert et al., 2003) to mobility models              2297. IEEE.
(Mogre et al., 2007). In this project, we plan to          Andrea Gaggioli, Giovanni Pioggia, Gennaro Tar-
use this graph model representing streets and in-            tarisco, Giovanni Baldus, Daniele Corda, Pietro Ci-
tersections to study:                                        presso, and Giuseppe Riva. 2013. A mobile data
                                                             collection platform for mental health research. Per-
Supply chain from a transportation point of view.            sonal and Ubiquitous Computing, 17(2):241–251.
    When cities have more nanostores than retail-
                                                           José M Hernández-Muñoz, Jesús Bernat Vercher,
    ers, it is more complicated to transport prod-            Luis Muñoz, José A Galache, Mirko Presser, Luis
    ucts to small spare stores.                               A Hernández Gómez, and Jan Pettersson. 2011.
                                                              Smart cities at the forefront of the future internet.
Multi-modal transportation is a problem in ur-                In The Future Internet Assembly, pages 447–462.
    ban context where individuals need to opti-               Springer.
    mize their movements within a city by using            Bin Jiang and Christophe Claramunt. 2004. A struc-
    different massive transportation mode.                   tural approach to the model generalization of an ur-
                                                             ban street network. GeoInformatica, 8(2):157–171.
Crime patterns could be extracted by combining
    different features from the graph model.               Chia-Hao Lo, Wen-Chih Peng, Chien-Wen Chen, Ting-
                                                             Yu Lin, and Chun-Shuo Lin. 2008. Carweb: A
Pollution dispersion could be modeled by a street            traffic data collection platform. In The Ninth Inter-
                                                             national Conference on Mobile Data Management
     and intersection models to represent and fore-
                                                             (mdm 2008), pages 221–222. IEEE.
     cast particles of matter dynamic in a city.
                                                           Christian Lochert, Hannes Hartenstein, Jing Tian, Hol-
Social network activity levels could be repre-               ger Fussler, Dagmar Hermann, and Martin Mauve.
    sented in the urban graph to detect social ac-           2003. A routing strategy for vehicular ad hoc net-
    tivity for extracting the hot spots in a city.           works in city environments. In Intelligent Vehicles
                                                             Symposium, 2003. Proceedings. IEEE, pages 156–
Privacy perception to understand how people                  161. IEEE.
    consider privacy and what are the real dan-            Parag S Mogre, Matthias Hollick, Nico d’Heureuse,
    gers and risks.                                          Hans Werner Heckel, Tronje Krop, and Ralf Stein-
                                                             metz. 2007. A graph-based simple mobility model.
   The aforementioned list of possible research di-          In Communication in Distributed Systems (KiVS),
rections is not limited to these topics. There are           2007 ITG-GI Conference, pages 1–12. VDE.
many issues related to smart cities still opened.          David O’Sullivan. 2001. Graph-cellular automata: a
                                                             generalised discrete urban and regional model. En-
6   Conclusions                                              vironment and Planning B: Planning and Design,
                                                             28(5):687–705.
In the present work, we have detailed the architec-
ture of the Knowledge tier platform. The novelty           UR Prasanna, M Srinivas, and L Umanand. 2009.
of this platform is to gather diverse kind of data           Macroscopic model of city traffic using bond graph
                                                             modelling. International Journal of Engineering
from different knowledge layers to extract spatio-           Systems Modelling and Simulation, 1(2-3):176–183.
temporal patterns for smart cities applications. We
have shown the potential of this platform as the           BP Shumaker and RW Sinnott. 1984. Astronomi-
stone corner for many research question in the near          cal computing: 1. computing under the open sky. 2.
                                                             virtues of the haversine. Sky and telescope, 68:158–
future.                                                      159.
                                                           Takeo Yamada. 1996. A network flow approach to a
References                                                   city emergency evacuation planning. International
                                                             Journal of Systems Science, 27(10):931–936.
Alexander Avidan, Charles Weissman, and Charles L
  Sprung. 2005. An internet web site as a data collec-
  tion platform for multicenter research. Anesthesia &
  Analgesia, 100(2):506–511.




                                                     113