=Paper= {{Paper |id=Vol-2293/jist2018pd_paper10 |storemode=property |title=Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods |pdfUrl=https://ceur-ws.org/Vol-2293/jist2018pd_paper10.pdf |volume=Vol-2293 |authors=Shusaku Egami,Takahiro Kawamura,Akihiko Ohsuga |dblpUrl=https://dblp.org/rec/conf/jist/EgamiKO18 }} ==Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods== https://ceur-ws.org/Vol-2293/jist2018pd_paper10.pdf
    Predicting Urban Problems: A Comparison of
       Graph-based and Image-based Methods

        Shusaku Egami1 , Takahiro Kawamura1,2 , and Akihiko Ohsuga1
              1
                  University of Electro-Communications, Tokyo, Japan
              2
                  Japan Science and Technology Agency, Tokyo, Japan
                       egami.shusaku@ohsuga.lab.uec.ac.jp



      Abstract. Local governments have social problems closely related to
      urban spatial features and human behaviors, such as graffiti, littering,
      and illegally parked bicycles. Our goal is to estimate the current situa-
      tion of these urban problems. In this paper, we focused on the littering
      problem as one of the urban problems, and we compared a graph-based
      method using knowledge graph embedding and an image-based method
      using convolutional neural networks (CNN). Specifically, in the graph-
      based method, we first design and construct the knowledge graph based
      on the results of the airflow simulation and geospatial features in urban
      areas. Then, we generate the vector data from the knowledge graph using
      RDF2vec, and estimate the litter distributions using the vector data. As
      the result, both of the two methods achieved high and approximately
      same accuracy, although the graph-based method has a smaller amount
      of numerical information than the image-based method.

      Keywords: Knowledge Graph Embedding, Georelational Data, Neural
      Networks, Urban Problem


1    Introduction
There are various social problems closely related to urban spatial features and
human behaviors (referred to as “urban problem”), such as graffiti, littering, and
illegally parked bicycles. These urban problems not only spoils urban landscapes
but may also cause fires and traffic accidents. Moreover, the urban areas will
become more dangerous by broken window theory [1].
    In this study, we aim to estimate the current situation of these urban prob-
lems. Thus, we experimented and evaluated two approaches: (1) graph-based
method and (2) image-based method. In the graph-based method, we first con-
structed the knowledge graph (KG) based on geospatial features, such as build-
ings, roads, and point of interests (POIs), and the results of the Computational
Fluid Dynamics (CFD) simulation where human being is modeled as fluid. Next,
we obtained vector representations from the constructed KG using graph em-
bedding, then we estimated the observation values using neural networks. In
the image-based method, we put all the information into the images, then we
estimated the observation values using Convolutional Neural Network (CNN),
which is a commonly used method for image classification tasks.
2        Shusaku Egami, Takahiro Kawamura, and Akihiko Ohsuga




    Fig. 1. The heatmap image of the average wind speed around of Ryogoku Station

    In this paper, we focused on the littering problem as one of the urban prob-
lems, then we evaluated the graph-based method and the image-based method
for estimating the distribution of garbage littering in urban areas.


2     Related Work
RDF2vec was proposed as a graph embedding method for RDF graphs. The
RDF2vec generates graph walk paths, then the vertices and edges in subgraphs
are relabeled using Weisfeiler-Lehman Subtree Graph Kernel for RDF [3]. Next,
sequence sets are generated by graph walks on the relabeled graph. Finally,
vector representations of all entities are generated by word2vec [4].

3     Construction of Geospatial Knowledge Graphs
In our previous study [5], we found that there is a correlation between the stag-
nation points of airflow and the points of illegally parked bicycles, which is one
of the urban problems. Thus, we experimented the CFD simulation considering
the crowd flows in cities as fluids, for obtaining pseudo crowd flows as numeri-
cal data. In this study, we loaded geographical elevation data, building polygon
data, and road network data on Geographic Information System (GIS). Then, we
simulated the airflow around the stations using Airflow Analyst3 . Fig. 1 shows
the heatmap of the average wind speed around of Ryogoku Station in Tokyo.
    Fig. 2 shows the schema design of the KG based on geospatial features and the
CFD simulation results. Each instance of Cell is created for each cell surrounded
by grid lines, and has four instances of GridPoint. An instance of GridPoint
has the results of the CFD simulation as literals. In this study, we used values
of average wind speed and values of average wind direction. Then, if a part of
the building is included in a cell, an instance of PartsOfBuildings is created
and is linked from an instance of Cell with have property. Likewise, an instance
of PartsOfRoad is created and is linked from an instance of Cell with have
property. If a building contains a POI, the instance of Building has the type
information of the POI.
3
    http://airflowanalyst.com/en/index.php
Predicting Urban Problems: A Comparison of Graph-based and Image-based           3




Fig. 2. Constructing the KG based on geospatial features and CFD simulation results

4     Estimating Litter Distribution using RDF2vec
4.1    Approach
To estimate the number of litters included in each cell in grid-divided urban
areas, we generated vector representations of each instance of Cell from the KG
using RDF2vec. The RDF2vec walks locally on the KG, when the resources are
                                                     northCell
linked to each other. For examples, since the Cellx −−−−−−→ Celly has a inverse
                    southCell
relationship Celly −−−−−−→ Cellx , the RDF2vec generates a sequence that re-
peats walking between Cellx and Celly . Thus, in order to solve this problem, we
gave the probability p if the walk algorithm goes back to the previous node.
    After the graph embedding using our extended RDF2vec, we estimated the
number of litters in each cell using a machine learning method, based on the
vector representation of the instances of Cell and corresponding observation
data. In this study, we used fully connected neural networks for regression as
the machine learning method.

4.2    Evaluation
In this study, the littering data was provided by Pirika Inc4 . We used the lit-
ter data around Ryogoku Station in Tokyo, Yurakucho Station in Tokyo, and
Shinyokohama Station in Kanagawa Prefecture in this experiments.
    The number of triples of our KG was 169,614. We obtained vector represen-
tations using the RDF2vec, then estimated the number of litters in each cell. To
evaluate the error, we used root mean squared error (RMSE). As the result of the
10-fold cross validation, we achieved the RM SE = 1.50, where the skip-gram
vector size is 500, the window size is 5, the depth of graph walks is 3, the limit
number of walks per entity is 500, the iteration number of Weisfeiler-Lehman
graph labeling is 7, the probability to go back to the previous node is 0.8, the
optimization algorithm is Adam, and the neural network model is follows: Input
→ Dense(32) → Dense(512) → Dense(32) → Dense(1) → Output.
    The image-based method trained the grid-divided heatmap images of the
CFD results such as Fig. 1. As the result of the 10-fold cross validation, we
4
    https://en.corp.pirika.org/
4        Shusaku Egami, Takahiro Kawamura, and Akihiko Ohsuga

achieved the RM SE = 1.47, where the optimization algorithm is Adam, and
the CNN model is follows: Input → Conv2D(21, 3 × 3) → Conv2D(21, 3 × 3) →
Max-Pool(2×2) → Dropout(0.25) → Conv2D(64, 3×3) → Conv2D(64, 3×3) →
Max-Pool(2 × 2) → Dropout(0.25) → Dense(256) → Dropout(0.5) → Dense(1)
→ Output. Therefore, we found that the accuracy of the graph-based method
and the image-based method are approximately same.
    We considered that the difference of the representation of the CFD results is
the reason why the RMSE of the image-based method was slightly lower than
the graph-based method. In the graph-based method, since each instance of Cell
has four instances of GridPoint, each instance of Cell has four CFD results.
In the image-based method, since the CFD results were visualized as heatmaps
and each cell has RGB values of 21 × 21 pixels, each cell has more than four
approximated values of CFD results. Therefore, we considered that the image-
based method could better learn the numerical features of the CFD results than
the graph-based method.
    Moreover, we considered that the adjacency relations and the edge labels
in the KG contributed to improve the accuracy of the graph-based method.
Furthermore, we found that the RMSE of both methods are approximately same,
although in the graph-based method we did not use some features depicted in
the images, such as road widths and building shapes.

5    Conclusion
In this study, we compared the graph-based method and the image-based method
for estimating human and urban space related problems. In the graph-based
method, we estimated the observation values using RDF2vec, based on the KG
including geospatial features and CFD simulation results. In the image-based
method, we estimated using CNN, based on the grid-divided heatmap images
of the CFD simulation results. Subsequently, we found that the graph-based
method was comparable to the image-based method. In the future, we will exper-
iment with changing the KG schema, and will combine the graph-based method
and image-based method.
Acknowledgments. This work was supported by JSPS KAKENHI Grant
Numbers JP16K00419, JP16K12411, JP17H04705, JP18H03229, JP18H03340,
JP18K19835, JP18J13988. We are grateful to Pirika Inc that has provided us
with data.

References
1. Wilson, J. Q. and George, L. K.: Broken windows, Critical issues in policing: Contemporary
   readings, pp. 395–407, 1982.
2. Ristoski, P. and Paulheim, H.: RDF2Vec: RDF Graph Embeddings for Data Mining. In: Proc. of
   ISWC, pp.498–514, 2016.
3. de Vries, G. K.: A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data. In:
   Proc. of ECML-PKDD, pp.606–621, 2013.
4. Mikolov, T., Sutskever, I., Chen, K., et al.: Distributed Representations of Words and Phrases
   and their Compositionality. In: Proc. of NIPS, pp.3111–3119, 2013.
5. Egami, S., Kawamura, T., Ohsuga, A.: Temporal and Spatial Expansion of Urban LOD for Solving
   Illegally Parked Bicycles in Tokyo. IEICE Trans. on Information and Systems, Vol.E101.D, No.1,
   pp.116–129, 2018.