=Paper=
{{Paper
|id=Vol-2293/jist2018pd_paper10
|storemode=property
|title=Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods
|pdfUrl=https://ceur-ws.org/Vol-2293/jist2018pd_paper10.pdf
|volume=Vol-2293
|authors=Shusaku Egami,Takahiro Kawamura,Akihiko Ohsuga
|dblpUrl=https://dblp.org/rec/conf/jist/EgamiKO18
}}
==Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods==
Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods Shusaku Egami1 , Takahiro Kawamura1,2 , and Akihiko Ohsuga1 1 University of Electro-Communications, Tokyo, Japan 2 Japan Science and Technology Agency, Tokyo, Japan egami.shusaku@ohsuga.lab.uec.ac.jp Abstract. Local governments have social problems closely related to urban spatial features and human behaviors, such as graffiti, littering, and illegally parked bicycles. Our goal is to estimate the current situa- tion of these urban problems. In this paper, we focused on the littering problem as one of the urban problems, and we compared a graph-based method using knowledge graph embedding and an image-based method using convolutional neural networks (CNN). Specifically, in the graph- based method, we first design and construct the knowledge graph based on the results of the airflow simulation and geospatial features in urban areas. Then, we generate the vector data from the knowledge graph using RDF2vec, and estimate the litter distributions using the vector data. As the result, both of the two methods achieved high and approximately same accuracy, although the graph-based method has a smaller amount of numerical information than the image-based method. Keywords: Knowledge Graph Embedding, Georelational Data, Neural Networks, Urban Problem 1 Introduction There are various social problems closely related to urban spatial features and human behaviors (referred to as “urban problem”), such as graffiti, littering, and illegally parked bicycles. These urban problems not only spoils urban landscapes but may also cause fires and traffic accidents. Moreover, the urban areas will become more dangerous by broken window theory [1]. In this study, we aim to estimate the current situation of these urban prob- lems. Thus, we experimented and evaluated two approaches: (1) graph-based method and (2) image-based method. In the graph-based method, we first con- structed the knowledge graph (KG) based on geospatial features, such as build- ings, roads, and point of interests (POIs), and the results of the Computational Fluid Dynamics (CFD) simulation where human being is modeled as fluid. Next, we obtained vector representations from the constructed KG using graph em- bedding, then we estimated the observation values using neural networks. In the image-based method, we put all the information into the images, then we estimated the observation values using Convolutional Neural Network (CNN), which is a commonly used method for image classification tasks. 2 Shusaku Egami, Takahiro Kawamura, and Akihiko Ohsuga Fig. 1. The heatmap image of the average wind speed around of Ryogoku Station In this paper, we focused on the littering problem as one of the urban prob- lems, then we evaluated the graph-based method and the image-based method for estimating the distribution of garbage littering in urban areas. 2 Related Work RDF2vec was proposed as a graph embedding method for RDF graphs. The RDF2vec generates graph walk paths, then the vertices and edges in subgraphs are relabeled using Weisfeiler-Lehman Subtree Graph Kernel for RDF [3]. Next, sequence sets are generated by graph walks on the relabeled graph. Finally, vector representations of all entities are generated by word2vec [4]. 3 Construction of Geospatial Knowledge Graphs In our previous study [5], we found that there is a correlation between the stag- nation points of airflow and the points of illegally parked bicycles, which is one of the urban problems. Thus, we experimented the CFD simulation considering the crowd flows in cities as fluids, for obtaining pseudo crowd flows as numeri- cal data. In this study, we loaded geographical elevation data, building polygon data, and road network data on Geographic Information System (GIS). Then, we simulated the airflow around the stations using Airflow Analyst3 . Fig. 1 shows the heatmap of the average wind speed around of Ryogoku Station in Tokyo. Fig. 2 shows the schema design of the KG based on geospatial features and the CFD simulation results. Each instance of Cell is created for each cell surrounded by grid lines, and has four instances of GridPoint. An instance of GridPoint has the results of the CFD simulation as literals. In this study, we used values of average wind speed and values of average wind direction. Then, if a part of the building is included in a cell, an instance of PartsOfBuildings is created and is linked from an instance of Cell with have property. Likewise, an instance of PartsOfRoad is created and is linked from an instance of Cell with have property. If a building contains a POI, the instance of Building has the type information of the POI. 3 http://airflowanalyst.com/en/index.php Predicting Urban Problems: A Comparison of Graph-based and Image-based 3 Fig. 2. Constructing the KG based on geospatial features and CFD simulation results 4 Estimating Litter Distribution using RDF2vec 4.1 Approach To estimate the number of litters included in each cell in grid-divided urban areas, we generated vector representations of each instance of Cell from the KG using RDF2vec. The RDF2vec walks locally on the KG, when the resources are northCell linked to each other. For examples, since the Cellx −−−−−−→ Celly has a inverse southCell relationship Celly −−−−−−→ Cellx , the RDF2vec generates a sequence that re- peats walking between Cellx and Celly . Thus, in order to solve this problem, we gave the probability p if the walk algorithm goes back to the previous node. After the graph embedding using our extended RDF2vec, we estimated the number of litters in each cell using a machine learning method, based on the vector representation of the instances of Cell and corresponding observation data. In this study, we used fully connected neural networks for regression as the machine learning method. 4.2 Evaluation In this study, the littering data was provided by Pirika Inc4 . We used the lit- ter data around Ryogoku Station in Tokyo, Yurakucho Station in Tokyo, and Shinyokohama Station in Kanagawa Prefecture in this experiments. The number of triples of our KG was 169,614. We obtained vector represen- tations using the RDF2vec, then estimated the number of litters in each cell. To evaluate the error, we used root mean squared error (RMSE). As the result of the 10-fold cross validation, we achieved the RM SE = 1.50, where the skip-gram vector size is 500, the window size is 5, the depth of graph walks is 3, the limit number of walks per entity is 500, the iteration number of Weisfeiler-Lehman graph labeling is 7, the probability to go back to the previous node is 0.8, the optimization algorithm is Adam, and the neural network model is follows: Input → Dense(32) → Dense(512) → Dense(32) → Dense(1) → Output. The image-based method trained the grid-divided heatmap images of the CFD results such as Fig. 1. As the result of the 10-fold cross validation, we 4 https://en.corp.pirika.org/ 4 Shusaku Egami, Takahiro Kawamura, and Akihiko Ohsuga achieved the RM SE = 1.47, where the optimization algorithm is Adam, and the CNN model is follows: Input → Conv2D(21, 3 × 3) → Conv2D(21, 3 × 3) → Max-Pool(2×2) → Dropout(0.25) → Conv2D(64, 3×3) → Conv2D(64, 3×3) → Max-Pool(2 × 2) → Dropout(0.25) → Dense(256) → Dropout(0.5) → Dense(1) → Output. Therefore, we found that the accuracy of the graph-based method and the image-based method are approximately same. We considered that the difference of the representation of the CFD results is the reason why the RMSE of the image-based method was slightly lower than the graph-based method. In the graph-based method, since each instance of Cell has four instances of GridPoint, each instance of Cell has four CFD results. In the image-based method, since the CFD results were visualized as heatmaps and each cell has RGB values of 21 × 21 pixels, each cell has more than four approximated values of CFD results. Therefore, we considered that the image- based method could better learn the numerical features of the CFD results than the graph-based method. Moreover, we considered that the adjacency relations and the edge labels in the KG contributed to improve the accuracy of the graph-based method. Furthermore, we found that the RMSE of both methods are approximately same, although in the graph-based method we did not use some features depicted in the images, such as road widths and building shapes. 5 Conclusion In this study, we compared the graph-based method and the image-based method for estimating human and urban space related problems. In the graph-based method, we estimated the observation values using RDF2vec, based on the KG including geospatial features and CFD simulation results. In the image-based method, we estimated using CNN, based on the grid-divided heatmap images of the CFD simulation results. Subsequently, we found that the graph-based method was comparable to the image-based method. In the future, we will exper- iment with changing the KG schema, and will combine the graph-based method and image-based method. Acknowledgments. This work was supported by JSPS KAKENHI Grant Numbers JP16K00419, JP16K12411, JP17H04705, JP18H03229, JP18H03340, JP18K19835, JP18J13988. We are grateful to Pirika Inc that has provided us with data. References 1. Wilson, J. Q. and George, L. K.: Broken windows, Critical issues in policing: Contemporary readings, pp. 395–407, 1982. 2. Ristoski, P. and Paulheim, H.: RDF2Vec: RDF Graph Embeddings for Data Mining. In: Proc. of ISWC, pp.498–514, 2016. 3. de Vries, G. K.: A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data. In: Proc. of ECML-PKDD, pp.606–621, 2013. 4. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed Representations of Words and Phrases and their Compositionality. In: Proc. of NIPS, pp.3111–3119, 2013. 5. Egami, S., Kawamura, T., Ohsuga, A.: Temporal and Spatial Expansion of Urban LOD for Solving Illegally Parked Bicycles in Tokyo. IEICE Trans. on Information and Systems, Vol.E101.D, No.1, pp.116–129, 2018.