=Paper=
{{Paper
|id=Vol-1743/paper13
|storemode=property
|title=Knowledge Tier Platform for Graph Mining in (Smart) Cities
|pdfUrl=https://ceur-ws.org/Vol-1743/paper13.pdf
|volume=Vol-1743
|authors=Miguel Nuñez-del-Prado,Edgardo Bravo Miguel Sierra,Isaias Hoyos Miguel Canchay
|dblpUrl=https://dblp.org/rec/conf/simbig/Nunez-del-Prado16
}}
==Knowledge Tier Platform for Graph Mining in (Smart) Cities==
Knowledge Tier Platform for Graph Mining in (Smart) Cities Miguel Nuñez-del-Prado Edgardo Bravo Miguel Sierra Isaias Hoyos Miguel Canchay Universidad del Pacfico Av. Salaverry 2020 Lima - Peru {m.nunezdelpradoc,er.bravoo,l.sierraflores,i.hoyoslopez,cacnayd}@up.edu.pe Abstract form of graphs. This platform enables people to share the knowledge of the area where they live In the present effort, we present a knowl- allowing them to inform about pollution, crime edge tier platform to collect information levels, traffic jams, streets topology, commerces, from cities in a form of graphs. This plat- markets, etc. The primary objective is to provide form enables people to share the infor- information about the city to find spatio-temporal mation of the area where they live allow- patterns using Graph Mining techniques. ing them to inform about pollution, crime The present paper is organized as follows. Sec- levels, traffic jams, streets topology, com- tion 2 introduce some basic concepts, while Sec- merces, markets, etc. The main objec- tion 3 describes the platform architecture. Sec- tive is to provide information, stored in tions 4 and 5 show some preliminary results and Elastic about a city to find spatio-temporal present the discussion about the platform. Finally, patterns using Graph Mining techniques Section 6 concludes the paper and presents future based on Apache Spark GraphX. works. 1 Introduction 2 Basic Concepts In the last years, we have seen the explosion of data from on-line activity, user content generated, In the current section, we introduce some ba- health, scientific computing, mobile phones activ- sic concepts, such as graph, knowledge tiers and ity, etc. This data increments due to the daily Spark for describing the platform. transaction of people in urban centers and still 2.1 Graph grows. By 2030, 60% of the worldwide popu- lation will live in cities appearing 27 megacities A graph is a mathematical structure composed of greater than 10 million inhabitants (Chourabi et vertices, nodes or points, which are connected al., 2012). One technique to solve this problem through edges, lines or arcs as depicted in Figure is to generate new instruments for gathering and 1. A graph (G = (V, E)) is composed of a set of combining information continuously (Hernández- V vertices and E edges. in our context this struc- Muñoz et al., 2011). Consequently, there is an in- ture allows us to represent street intersections as crement of collaborative platforms to collect data. geo-referenced nodes and roads as edges. For instance, a platform, called WebCar, to col- 2.2 Haversine distance lect GPS data from vehicles to estimate traffic in a city (Lo et al., 2008). In the field of human health, The Haversine distance (Shumaker and Sinnott, Psychlog (Gaggioli et al., 2013) is a mobile phone 1984) computes the shortest distance between two platform designed to collect users psychological, points represented by latitude and longitude in the physiological, and activity information for mental earth’s surface. health research relying on a self-report question- naire. The last example developed an Internet site dlon = lon2 lon1 and implemented the collection of data for a mul- dlat = lat2 lat1 ticenter study of ethical decision-making (Avidan a = (sin( dlat 2 2 )) + cos(lat1 )⇥ (1) et al., 2005). cos(lat2 ) ⇥ (sin( dlon 2 )) p p 2 In the present effort, we present a knowledge c = 2 ⇥ atan2( a, 1 a) tire platform to collect information on cities in a d = R⇥c 110 Figure 3: Spark framework. Figure 1: Example of a graph. As shown in Figure 3 Apache Spark provides at the top of its framework a tool for graph mining Where lat,lon and R are the latitude, longitude call GraphX 2 . This API allows parallel graph com- and radius of the Earth, respectively. putation and integrates tools for extraction, trans- 2.3 Knowledge Tiers formation and load. More detail about the archi- tecture as well as the capabilities of Spark is given Since we are able to model street network of a city in the next section. in the form of a graph. Note that each node and edge could have a weight representing different 3 System Overview phenomena of a city, such as: (1) congestion, (2) crime, (3) pollution, (4) population density, (5) ur- In the current section, we describe the architecture ban transportation, (6) subway network, etc. Thus, of our platform. As illustrated in Figure 4, our for each phenomenon, we have a graph model- platform allows collecting data from Open Street ing this particular fact. Finally, we can stack each Maps 3 (OSM) to build the graph representing node as depicted in Figure 2 to have a knowledge streets and intersections in the form of a comma stack. separated values CSV files. Then, these CSV files are stored in a NoSQL database. We use Elas- ticsearch4 as NoSQL database due to its scalable, flexible and performant search and analytics en- gine (c.f., Figure 5). Figure 2: Knowledge tiers1 . 2.4 Apache Spark Apache Spark is an open source cluster developed by the University of Berkeley. Then, the code Figure 4: Example of a graph over streets. was maintained by Apache Software Foundation. Apache provides distributed computation taking Once data is saved in the NoSQL database, we charge of task dispatching, scheduling, and basic are able to analyze the knowledge tiers represented I/O functionalities. These functionalities are avail- and combined in form of graphs trough Spark able through Java, Python, Scala and R interfaces. 2 GraphX: http://spark.apache.org/graphx/ 3 OSM: https://www.openstreetmap.org/ 1 4 Fereshteh ASGARI, Inferring User Multimodal Trajec- Elasticsearch :https://www.elastic. tories from Cellular Network Metadata in Metropolitan Ar- co/guide/en/elasticsearch/reference/ eas current/index.html 111 GraphX as depicted in Figure 5. For instance, with this platform, we could optimize supply chain in cities minimizing cost, avoiding traffic jams and passing over low crime rate zones. We can also discover spatial patterns to understand common features of high crime rates areas in a city. All these analytics could be performed using program- ming languages such as: Scala5 , Java6 , Python7 or R8 . Figure 6: Visualization of the graph over the streets. edges model the streets connecting nodes or inter- sections as shown in Figure 6. Figure 5: Overview of the Knowledge Tier Plat- form. Finally, we implement a Python script to visu- alize the result of the pattern mining process us- ing Google Maps9 . In the next section, we present some preliminary visualization of graphs stored in the platform. 4 Preliminary results Figure 7: Visualization of the heatmap of tweets In this section, we present some preliminary re- over the cartography. sults, of the Knowledge Tier Platform, about data gathering, and visualizations. Another possibility of visualization are Concerning the data collection, we have done Heatmaps. In our case, Heatmaps are generated two campaigns to collect data from streets and based on nodes weight. For example, Figure tweets in Lima, Peru. The former campaign 7 presents a Heatmap of collected tweets in was performed in the month of May collecting the platform. It is worth noting that tweets are 1̃00 000 and 4̃20 000 nodes and vertices, respec- affected to the nearest node relying on latitude tively. The latter campaign was carried on between and longitude of both nodes and tweets. We use the months of April to Jun obtaining 7̃,1 millions as distance function the Haversine function (c.f., of geolocated tweets. Subsection 2.2). In the next section, we argue about the platform, and we present our vision of About visualization, the platform allows to plot its application to research on Smart Cities. a graph over a cartography, where the nodes are placed in the intersections of the streets and the 5 Discussion 5 Scala: www.scala-lang.org We firmly believe in the potential of this project as 6 Java: www.java.com 7 Python: www.python.org the cornerstone to enable new research directions. 8 R: www.r-project.org Graphs have been widely used to model different 9 Google Maps: /maps.google.com kinds of phenomena ranging from: urban street 112 network (Jiang and Claramunt, 2004), urban and Hafedh Chourabi, Taewoo Nam, Shawn Walker, J Ra- regional models (O’Sullivan, 2001), macroscopic mon Gil-Garcia, Sehl Mellouli, Karine Nahon, Theresa A Pardo, and Hans Jochen Scholl. 2012. model of city traffic (Prasanna et al., 2009), model Understanding smart cities: An integrative frame- city evacuation plan (Yamada, 1996), to plan strat- work. In System Science (HICSS), 2012 45th egy for vehicular ad hoc network in a city environ- Hawaii International Conference on, pages 2289– ments (Lochert et al., 2003) to mobility models 2297. IEEE. (Mogre et al., 2007). In this project, we plan to Andrea Gaggioli, Giovanni Pioggia, Gennaro Tar- use this graph model representing streets and in- tarisco, Giovanni Baldus, Daniele Corda, Pietro Ci- tersections to study: presso, and Giuseppe Riva. 2013. A mobile data collection platform for mental health research. Per- Supply chain from a transportation point of view. sonal and Ubiquitous Computing, 17(2):241–251. When cities have more nanostores than retail- José M Hernández-Muñoz, Jesús Bernat Vercher, ers, it is more complicated to transport prod- Luis Muñoz, José A Galache, Mirko Presser, Luis ucts to small spare stores. A Hernández Gómez, and Jan Pettersson. 2011. Smart cities at the forefront of the future internet. Multi-modal transportation is a problem in ur- In The Future Internet Assembly, pages 447–462. ban context where individuals need to opti- Springer. mize their movements within a city by using Bin Jiang and Christophe Claramunt. 2004. A struc- different massive transportation mode. tural approach to the model generalization of an ur- ban street network. GeoInformatica, 8(2):157–171. Crime patterns could be extracted by combining different features from the graph model. Chia-Hao Lo, Wen-Chih Peng, Chien-Wen Chen, Ting- Yu Lin, and Chun-Shuo Lin. 2008. Carweb: A Pollution dispersion could be modeled by a street traffic data collection platform. In The Ninth Inter- national Conference on Mobile Data Management and intersection models to represent and fore- (mdm 2008), pages 221–222. IEEE. cast particles of matter dynamic in a city. Christian Lochert, Hannes Hartenstein, Jing Tian, Hol- Social network activity levels could be repre- ger Fussler, Dagmar Hermann, and Martin Mauve. sented in the urban graph to detect social ac- 2003. A routing strategy for vehicular ad hoc net- tivity for extracting the hot spots in a city. works in city environments. In Intelligent Vehicles Symposium, 2003. Proceedings. IEEE, pages 156– Privacy perception to understand how people 161. IEEE. consider privacy and what are the real dan- Parag S Mogre, Matthias Hollick, Nico d’Heureuse, gers and risks. Hans Werner Heckel, Tronje Krop, and Ralf Stein- metz. 2007. A graph-based simple mobility model. The aforementioned list of possible research di- In Communication in Distributed Systems (KiVS), rections is not limited to these topics. There are 2007 ITG-GI Conference, pages 1–12. VDE. many issues related to smart cities still opened. David O’Sullivan. 2001. Graph-cellular automata: a generalised discrete urban and regional model. En- 6 Conclusions vironment and Planning B: Planning and Design, 28(5):687–705. In the present work, we have detailed the architec- ture of the Knowledge tier platform. The novelty UR Prasanna, M Srinivas, and L Umanand. 2009. of this platform is to gather diverse kind of data Macroscopic model of city traffic using bond graph modelling. International Journal of Engineering from different knowledge layers to extract spatio- Systems Modelling and Simulation, 1(2-3):176–183. temporal patterns for smart cities applications. We have shown the potential of this platform as the BP Shumaker and RW Sinnott. 1984. Astronomi- stone corner for many research question in the near cal computing: 1. computing under the open sky. 2. virtues of the haversine. Sky and telescope, 68:158– future. 159. Takeo Yamada. 1996. A network flow approach to a References city emergency evacuation planning. International Journal of Systems Science, 27(10):931–936. Alexander Avidan, Charles Weissman, and Charles L Sprung. 2005. An internet web site as a data collec- tion platform for multicenter research. Anesthesia & Analgesia, 100(2):506–511. 113