<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Predicting Urban Problems: A Comparison of Graph-based and Image-based Methods</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shusaku Egami</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Takahiro Kawamura</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Akihiko Ohsuga</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Japan Science and Technology Agency</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Electro-Communications</institution>
          ,
          <addr-line>Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Local governments have social problems closely related to urban spatial features and human behaviors, such as graffiti, littering, and illegally parked bicycles. Our goal is to estimate the current situation of these urban problems. In this paper, we focused on the littering problem as one of the urban problems, and we compared a graph-based method using knowledge graph embedding and an image-based method using convolutional neural networks (CNN). Speci cally, in the graphbased method, we rst design and construct the knowledge graph based on the results of the air ow simulation and geospatial features in urban areas. Then, we generate the vector data from the knowledge graph using RDF2vec, and estimate the litter distributions using the vector data. As the result, both of the two methods achieved high and approximately same accuracy, although the graph-based method has a smaller amount of numerical information than the image-based method.</p>
      </abstract>
      <kwd-group>
        <kwd>Knowledge Graph Embedding</kwd>
        <kwd>Georelational Data</kwd>
        <kwd>Neural Networks</kwd>
        <kwd>Urban Problem</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        There are various social problems closely related to urban spatial features and
human behaviors (referred to as \urban problem"), such as graffiti, littering, and
illegally parked bicycles. These urban problems not only spoils urban landscapes
but may also cause res and traffic accidents. Moreover, the urban areas will
become more dangerous by broken window theory [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>In this study, we aim to estimate the current situation of these urban
problems. Thus, we experimented and evaluated two approaches: (1) graph-based
method and (2) image-based method. In the graph-based method, we rst
constructed the knowledge graph (KG) based on geospatial features, such as
buildings, roads, and point of interests (POIs), and the results of the Computational
Fluid Dynamics (CFD) simulation where human being is modeled as uid. Next,
we obtained vector representations from the constructed KG using graph
embedding, then we estimated the observation values using neural networks. In
the image-based method, we put all the information into the images, then we
estimated the observation values using Convolutional Neural Network (CNN),
which is a commonly used method for image classi cation tasks.</p>
      <p>In this paper, we focused on the littering problem as one of the urban
problems, then we evaluated the graph-based method and the image-based method
for estimating the distribution of garbage littering in urban areas.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        RDF2vec was proposed as a graph embedding method for RDF graphs. The
RDF2vec generates graph walk paths, then the vertices and edges in subgraphs
are relabeled using Weisfeiler-Lehman Subtree Graph Kernel for RDF [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Next,
sequence sets are generated by graph walks on the relabeled graph. Finally,
vector representations of all entities are generated by word2vec [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Construction of Geospatial Knowledge Graphs</title>
      <p>
        In our previous study [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we found that there is a correlation between the
stagnation points of air ow and the points of illegally parked bicycles, which is one
of the urban problems. Thus, we experimented the CFD simulation considering
the crowd ows in cities as uids, for obtaining pseudo crowd ows as
numerical data. In this study, we loaded geographical elevation data, building polygon
data, and road network data on Geographic Information System (GIS). Then, we
simulated the air ow around the stations using Air ow Analyst3. Fig. 1 shows
the heatmap of the average wind speed around of Ryogoku Station in Tokyo.
      </p>
      <p>Fig. 2 shows the schema design of the KG based on geospatial features and the
CFD simulation results. Each instance of Cell is created for each cell surrounded
by grid lines, and has four instances of GridPoint. An instance of GridPoint
has the results of the CFD simulation as literals. In this study, we used values
of average wind speed and values of average wind direction. Then, if a part of
the building is included in a cell, an instance of PartsOfBuildings is created
and is linked from an instance of Cell with have property. Likewise, an instance
of PartsOfRoad is created and is linked from an instance of Cell with have
property. If a building contains a POI, the instance of Building has the type
information of the POI.</p>
      <sec id="sec-3-1">
        <title>3 http://airflowanalyst.com/en/index.php</title>
        <p>Predicting Urban Problems: A Comparison of Graph-based and Image-based</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Estimating Litter Distribution using RDF2vec</title>
      <p>To estimate the number of litters included in each cell in grid-divided urban
areas, we generated vector representations of each instance of Cell from the KG
using RDF2vec. The RDF2vec walks locally on the KG, when the resources are
linked to each other. For examples, since the Cellx northCe!ll Celly has a inverse
relationship Celly southCe!ll Cellx, the RDF2vec generates a sequence that
repeats walking between Cellx and Celly. Thus, in order to solve this problem, we
gave the probability p if the walk algorithm goes back to the previous node.</p>
      <p>After the graph embedding using our extended RDF2vec, we estimated the
number of litters in each cell using a machine learning method, based on the
vector representation of the instances of Cell and corresponding observation
data. In this study, we used fully connected neural networks for regression as
the machine learning method.
4.2</p>
      <p>Evaluation
In this study, the littering data was provided by Pirika Inc4. We used the
litter data around Ryogoku Station in Tokyo, Yurakucho Station in Tokyo, and
Shinyokohama Station in Kanagawa Prefecture in this experiments.</p>
      <p>The number of triples of our KG was 169,614. We obtained vector
representations using the RDF2vec, then estimated the number of litters in each cell. To
evaluate the error, we used root mean squared error (RMSE). As the result of the
10-fold cross validation, we achieved the RM SE = 1:50, where the skip-gram
vector size is 500, the window size is 5, the depth of graph walks is 3, the limit
number of walks per entity is 500, the iteration number of Weisfeiler-Lehman
graph labeling is 7, the probability to go back to the previous node is 0.8, the
optimization algorithm is Adam, and the neural network model is follows: Input
! Dense(32) ! Dense(512) ! Dense(32) ! Dense(1) ! Output.</p>
      <p>The image-based method trained the grid-divided heatmap images of the
CFD results such as Fig. 1. As the result of the 10-fold cross validation, we</p>
      <sec id="sec-4-1">
        <title>4 https://en.corp.pirika.org/</title>
        <p>achieved the RM SE = 1:47, where the optimization algorithm is Adam, and
the CNN model is follows: Input ! Conv2D(21, 3 3) ! Conv2D(21, 3 3) !
Max-Pool(2 2) ! Dropout(0.25) ! Conv2D(64, 3 3) ! Conv2D(64, 3 3) !
Max-Pool(2 2) ! Dropout(0.25) ! Dense(256) ! Dropout(0.5) ! Dense(1)
! Output. Therefore, we found that the accuracy of the graph-based method
and the image-based method are approximately same.</p>
        <p>We considered that the difference of the representation of the CFD results is
the reason why the RMSE of the image-based method was slightly lower than
the graph-based method. In the graph-based method, since each instance of Cell
has four instances of GridPoint, each instance of Cell has four CFD results.
In the image-based method, since the CFD results were visualized as heatmaps
and each cell has RGB values of 21 21 pixels, each cell has more than four
approximated values of CFD results. Therefore, we considered that the
imagebased method could better learn the numerical features of the CFD results than
the graph-based method.</p>
        <p>Moreover, we considered that the adjacency relations and the edge labels
in the KG contributed to improve the accuracy of the graph-based method.
Furthermore, we found that the RMSE of both methods are approximately same,
although in the graph-based method we did not use some features depicted in
the images, such as road widths and building shapes.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this study, we compared the graph-based method and the image-based method
for estimating human and urban space related problems. In the graph-based
method, we estimated the observation values using RDF2vec, based on the KG
including geospatial features and CFD simulation results. In the image-based
method, we estimated using CNN, based on the grid-divided heatmap images
of the CFD simulation results. Subsequently, we found that the graph-based
method was comparable to the image-based method. In the future, we will
experiment with changing the KG schema, and will combine the graph-based method
and image-based method.</p>
      <p>Acknowledgments. This work was supported by JSPS KAKENHI Grant
Numbers JP16K00419, JP16K12411, JP17H04705, JP18H03229, JP18H03340,
JP18K19835, JP18J13988. We are grateful to Pirika Inc that has provided us
with data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Wilson,
          <string-name>
            <given-names>J. Q.</given-names>
            and
            <surname>George</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. K.</surname>
          </string-name>
          :
          <article-title>Broken windows, Critical issues in policing: Contemporary readings</article-title>
          , pp.
          <volume>395</volume>
          {
          <issue>407</issue>
          ,
          <year>1982</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ristoski</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>RDF2Vec: RDF Graph Embeddings for Data Mining</article-title>
          .
          <source>In: Proc. of ISWC</source>
          , pp.
          <volume>498</volume>
          {
          <issue>514</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. de Vries, G. K.:
          <article-title>A fast approximation of the Weisfeiler-Lehman graph kernel for RDF data</article-title>
          .
          <source>In: Proc. of ECML-PKDD</source>
          , pp.
          <volume>606</volume>
          {
          <issue>621</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , et al.:
          <article-title>Distributed Representations of Words and Phrases and their Compositionality</article-title>
          .
          <source>In: Proc. of NIPS</source>
          , pp.
          <volume>3111</volume>
          {
          <issue>3119</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Egami</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kawamura</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ohsuga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Temporal and Spatial Expansion of Urban LOD for Solving Illegally Parked Bicycles in Tokyo</article-title>
          .
          <source>IEICE Trans. on Information and Systems</source>
          , Vol.E101.D, No.
          <issue>1</issue>
          , pp.
          <volume>116</volume>
          {
          <issue>129</issue>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>