=Paper= {{Paper |id=Vol-3743/paper6 |storemode=property |title=A Planet Scale Spatial-Temporal Knowledge Graph Based On OpenStreetMap And H3 Grid |pdfUrl=https://ceur-ws.org/Vol-3743/paper6.pdf |volume=Vol-3743 |authors=Martin Böckling,Heiko Paulheim,Sarah Detzler |dblpUrl=https://dblp.org/rec/conf/geold/BocklingPD24 }} ==A Planet Scale Spatial-Temporal Knowledge Graph Based On OpenStreetMap And H3 Grid== https://ceur-ws.org/Vol-3743/paper6.pdf
                                A Planet Scale Spatial-Temporal Knowledge Graph
                                Based On OpenStreetMap And H3 Grid
                                Martin Böckling1,∗ , Heiko Paulheim1 and Sarah Detzler2
                                1
                                    Data and Web Science Group University of Mannheim, B6 26, Mannheim, 68159, Germany
                                2
                                    Corporate State University of Mannheim, Colblitzallee 1-9, Mannheim, 68163, Germany


                                              Abstract
                                              Geospatial data plays a central role in modeling our world, for which OpenStreetMap (OSM) provides a
                                              rich source of such data. While often spatial data is represented in a tabular format, a graph based repre-
                                              sentation provides the possibility to interconnect entities which would have been separated in a tabular
                                              representation. We propose in our paper a framework which supports a planet scale transformation of
                                              OpenStreetMap data into a Spatial Temporal Knowledge Graph. In addition to OpenStreetMap data, we
                                              align the different OpenStreetMap geometries on individual h3 grid cells. We compare our constructed
                                              spatial knowledge graph to other spatial knowledge graphs and outline our contribution in this paper.
                                              As a basis for our computation, we use Apache Sedona as a computational framework for our Spatial
                                              Temporal Knowledge Graph construction.

                                              Keywords
                                              Spatial Temporal Knowledge Graph, OpenStreetMap, Apache Sedona




                                1. Introduction
                                Within the spatial domain OpenStreetMap (OSM) provides a large extent of open-source spatial
                                data that is modeled by different contributors. It provides data annotated by different contribu-
                                tors and official data providers, which represents data over the entire planet [1]. Often spatial
                                data is represented in a tabular format. Graphs, but in particular Knowledge Graphs (KGs),
                                provide a good foundation to interconnect related entities in the spatial domain. They can
                                model entities and events in a multi-faceted way, where distance in the graph can express both
                                geographic as well as semantic distances. When embedding such a graph using knowledge
                                graph embedding methods, it is possible to create latent spaces where both facets, geographic
                                and semantic proximity, are jointly reflected.
                                   For our research paper, we focus on the spatial data source provided by OSM. We use the h3
                                Discrete Global Grid (DGG) to provide a regularization for the different OSM geometries, so
                                that each individual cell tessellates the earth uniquely. This does not only involve data from a
                                one-time snapshot, but we aim at providing a KG that is modeled over a temporal dimension.
                                In the following sections, we outline our approach for a scalable Spatial-Temporal Knowledge


                                GeoLD2024: 6th Geospatial Linked Data Workshop, May 26, 2024, Hersonissos, Greece
                                ∗
                                    Corresponding author.
                                Envelope-Open martin.boeckling.gast@uni-mannheim.de (M. Böckling); heiko@informatik.uni-mannheim.de (H. Paulheim);
                                sarah.detzler@dhbw-mannheim.de (S. Detzler)
                                Orcid 0000-0002-1143-4686 (M. Böckling); 0000-0003-4386-8195 (H. Paulheim); 0000-0002-7504-8856 (S. Detzler)
                                            © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Graph (STKG) over the entire planet. Furthermore, we conclude our research by comparing it
conceptually to other STKGs.


2. Related Work
Within the spatial data domain, various data representations for graph-based datasets are
feasible. One representation involves static nodes and edge pairs. Static graphs, exemplified by
road networks or electricity grids [2], are a specific instance of this data representation. The
capability to establish connections between spatial information through a graph enables the
modeling of links between individual geometries or multiple geometries, providing a means to
represent complex systems [3].
   In the realm of KGs within the spatial domain, our focus is directed towards a curated
selection of diverse KGs. In this section we will highlight a selection of KGs, but we will not
provide an extensive list of different KGs. One notable KG, WorldKG, is specifically designed
to encapsulate OSM data. The structural organization of WorldKG involves the transposition
of features extracted from OSM, establishing relationships between various categories. In this
KG, tags derived from OSM function as separate entities, framing the hierarchical structure of
WorldKG. Each individual feature extracted from OSM is represented as a distinct node within
the WorldKG. Moreover, the geometries associated with each OSM entity are also exposed as
point geometry as nodes within the WorldKG. However, no grid is used to represent geographical
areas. In total, the published KG is based on OSM data from June 06, 2021. In total, the published
KG contains over 828 million triples and over 113 million entities, categorized into 33 different
top-level classes. [4]
   In comparison to WorldKG, the KnowWhereGraph framework is constructed upon a diverse
array of datasets encompassing hazard information, climate data, soil properties, crop and land
cover types, as well as demographic and human health data. To facilitate the integration of
these heterogeneous datasets, the framework employs the S2 discrete hierarchical grid, thereby
harmonizing location data from various sources. The S2 grid uses squared shapes as grid cells,
and the hierarchy allows larger and smaller granularity. Each discrete hierarchical grid cell
functions as a unique identifier for the corresponding region. In conjunction to the grid cell
Identifier (ID), various other regional attributes, such as ZIP codes, administrative regions, or
Climate Division Boundaries, are systematically mapped to the respective areas. During the
time of the publication, the Knowledge Graph consisted of 4.9 billion triples. [5]
   The current approaches for constructing spatial Knowledge Graphs come with certain short-
comings. For instance, WorldKG currently considers only Point geometry types as an input.
While providing a semantically enriched representation for OSM metadata, geographic near
elements are not necessarily close in the graph. In comparison, KnowWhereGraph uses other
data sources limited to selected datasets from the United States. It provides in addition relations
between geometries and the DGG.
   In this paper, we propose a framework that allows us to transform OSM data into a KG. Core
considerations for our approach are the involvement of a DGG together with a modeling of
the relations between the geometry and an individual grid cell. Furthermore, the KG should
provide a temporal dimension to the dataset. In the following sections, we provide an overview
of the theoretical background of spatial data, as well as the transformation into a KG.


3. Theoretical Background of Spatial Data and Knowledge
   Graphs
For the following sections we introduce the related concepts for our KG with concepts from the
spatial domain. Furthermore, we will provide an introduction to the data structure of OSM.

3.1. OpenStreetMap
OSM provides a data foundation that allows to map geographical entities. The complete project
is open source and therefore provides a unique opportunity in the spatial domain for data
analysis. OSM provides two different types of data categories. The first data category involves
the so-called OSM map features. Map features provide additional information to OSM entities
and represent geographical attributes. Map features are modeled by using key-value pairs, in
which the key models the primary feature and the value further specifies the primary feature.
We define OSM map features as followed [6]:
Definition 1. Let 𝐾 be the set of all possible keys and 𝑉 the set of all possible values. We
therefore denote all tags as 𝑇. Therefore, OSM map features are represented as 𝑇 ∶ 𝐾 → 𝑉.
   For instance, the key highway can be specified with the value motorway. The map feature
highway=motorway describes therefore a divided highway with more than two lanes. OSM does
not restrict the variety of map features that can be defined, which also provides the possibility
to have multilingual map features defined by users. Nevertheless, OSM provides commonly
accepted map features [6].
   The second data category is described as OSM elements. They are divided into Nodes, Ways
and Relations. The Node element represents a point within a geographic space. Nodes represent
its geographic location as a pair of longitude and latitude. Nodes can represent, for instance,
coffee shops, trees, or park benches. We use for the Node element the following definition:
Definition 2. Let 𝐴𝑛𝑜𝑑𝑒 be the set of OSM Node attribute names and let 𝑊𝑛𝑜𝑑𝑒 be
the possible values for the attributes 𝐴𝑛𝑜𝑑𝑒 . We define properties 𝑃𝑛𝑜𝑑𝑒 as 𝑃𝑛𝑜𝑑𝑒 ∶
𝐴𝑛𝑜𝑑𝑒 → 𝑊𝑛𝑜𝑑𝑒 . For node objects 𝑁𝑛𝑜𝑑𝑒 we define the following attributes 𝐴𝑛𝑜𝑑𝑒 =
{"𝑖𝑑", "𝑣𝑒𝑟𝑠𝑖𝑜𝑛", "𝑐ℎ𝑎𝑛𝑔𝑒𝑠𝑒𝑡", "𝑙𝑎𝑡", "𝑙𝑜𝑛", "𝑢𝑠𝑒𝑟", "𝑢𝑖𝑑", "𝑣𝑖𝑠𝑖𝑏𝑙𝑒", "𝑡𝑖𝑚𝑒𝑠𝑡𝑎𝑚𝑝"}. A node 𝑂𝑛𝑜𝑑𝑒 consists
of the following tuple 𝑂𝑛𝑜𝑑𝑒 = (𝑃𝑛𝑜𝑑𝑒 , 𝑇 ).
   Ways represent in the context of OSM linear geometries. In general, Ways can be translated
into Line or Polygon geometries. A Way within OSM consists of a set of Node elements. The
combination of the Point location from nodes constructs the Line or Polygon geometries. Objects
that use Line geometries are highways or power lines. Polygons encode, for example, buildings
or land-use areas. The data structure of a OSM Way is defined as followed:
Definition 3. Let 𝐴𝑤𝑎𝑦 be the set of OSM way attribute names and let 𝑊𝑤𝑎𝑦 be the pos-
sible values for the attributes 𝐴𝑤𝑎𝑦 . We define 𝑃𝑤𝑎𝑦 as 𝑃𝑤𝑎𝑦 ∶ 𝐴𝑤𝑎𝑦 → 𝑊𝑤𝑎𝑦 . We de-
fine 𝑁 to be the set of of node references in a way. Each element in 𝑁 references a
node identified by an ID. For way objects 𝑂𝑤𝑎𝑦 we define the following attributes 𝐴𝑤𝑎𝑦 =
{"𝑖𝑑", "𝑣𝑒𝑟𝑠𝑖𝑜𝑛", "𝑐ℎ𝑎𝑛𝑔𝑒𝑠𝑒𝑡", "𝑢𝑠𝑒𝑟", "𝑢𝑖𝑑", "𝑡𝑖𝑚𝑒𝑠𝑡𝑎𝑚𝑝"}. A way object 𝑂𝑤𝑎𝑦 consists of the follow-
ing tuple 𝑂𝑤𝑎𝑦 = (𝑃𝑤𝑎𝑦 , 𝑁 , 𝑇 ).
  The third element of the OSM data structure is a relation. Relations consist of Nodes, Ways, or
other Relations to define logical geographical relations to the different elements. Each element
within a relation specifies a role element, which represents the function of the element within
the relation. Those generally represent administrative boundaries and routes. The definition of
an OSM relation is defined as followed:
Definition 4. Let 𝐴𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 be the set of OSM relation attribute names and let 𝑊𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 be the
possible values for the attributes 𝐴𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 . We define the relation property 𝑃𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 as 𝑃𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 ∶
𝐴𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 → 𝑊𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 . Within a relation we define members 𝑀 that either represent Nodes or
Ways. Each member is consisting of a tuple of type, reference and role. For 𝑂𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 we define
following attributes 𝐴𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 = {"𝑖𝑑", "𝑣𝑒𝑟𝑠𝑖𝑜𝑛", "𝑐ℎ𝑎𝑛𝑔𝑒𝑠𝑒𝑡", "𝑢𝑠𝑒𝑟", "𝑢𝑖𝑑", "𝑡𝑖𝑚𝑒𝑠𝑡𝑎𝑚𝑝"}. We define
a relation object 𝑂𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 as a tuple consisting of relation properties 𝑃𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 , members 𝑀 and
tags 𝑇: 𝑅𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 = (𝑃𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛 , 𝑀, 𝑇 ).
  Based on the data structure of OSM, a geometry for an OSM entity cannot be retrieved directly.
Instead, the geometry for each of the three different data structures needs to be constructed
explicitly. In the following subsection 3.2 the theoretical foundation for the used spatial methods
are outlined.

3.2. Spatial Foundation
The geometries that are extracted from OSM can range from simple point geometries to more
complex geometry representations like Polygons or Geometry Collections. To build up the rela-
tionships between the different geographic relationships, the topological model Dimensionally
Extended 9-Intersection Model (DE-9IM) is used. Similarly to our methodology, KnowWhere-
Graph also uses the DE-9IM methodology to model the geometry-grid relationship [5]. Fur-
thermore, different scientific papers use the DE-9IM method to model the spatial relationship
between spatial geometries [7].
   The DE-9IM models the relationships between geometries by using a 3x3-dimensional matrix.
It models the relationship between the different geometries by intersecting the interior 𝐼, exterior
𝐸 and boundary 𝐵 of two geometries. Assuming two geometries 𝑎 and 𝑏, the dimension of the
intersection between geometric objects 𝑎 and 𝑏 can be calculated with the function dim. The
dimension function dim of a general set of geometry 𝑆 returns for relation determination the
highest value, and is defined as followed [8]:

                            ⎧−1      if 𝑆 is empty
                            ⎪0       if 𝑆 contains a point and no lines or areas
                   dim(𝑆) =                                                      .                  (1)
                            ⎨1       if 𝑆 contains a line and no areas
                            ⎪
                            ⎩2       if 𝑆 contains an area
  The complete intersection matrix on which the DE-9IM builds up is given in Equation 2.
The DE-9IM method intersects the two geometric objects a and b for each of the topological
properties of 𝐼, 𝐵 and 𝐸. Based on the intersection of both geometries, the dimension function
𝑑𝑖𝑚 is applied. [8]

                              𝑑𝑖𝑚(𝐼 (𝑎) ∩ 𝐼 (𝑏)) 𝑑𝑖𝑚(𝐼 (𝑎) ∩ 𝐵(𝑏)) 𝑑𝑖𝑚(𝐼 (𝑎) ∩ 𝐸(𝑏))
               DE9IM(𝑎, 𝑏) = [𝑑𝑖𝑚(𝐵(𝑎) ∩ 𝐼 (𝑏)) 𝑑𝑖𝑚(𝐵(𝑎) ∩ 𝐵(𝑏)) 𝑑𝑖𝑚(𝐵(𝑎) ∩ 𝐸(𝑏))]              (2)
                              𝑑𝑖𝑚(𝐸(𝑎) ∩ 𝐼 (𝑏)) 𝑑𝑖𝑚(𝐸(𝑎) ∩ 𝐵(𝑏)) 𝑑𝑖𝑚(𝐸(𝑎) ∩ 𝐸(𝑏))
   When applying the DE-9IM model to two geometries, the 3×3 matrix is filled with the
respective result of the 𝑑𝑖𝑚 function. By concatenating the output of the DE-9IM associated
spatial predicates can be determined. For the determination of the spatial predicates, the value
range from the function dim displayed in Equation 1 can be masked with the symbol set T, F *.
Element T masks dim values 0, 1, 2, element F masks the return value -1, and element * masks
the values -1, 0, 1, 2. When applying the DE-9IM method, the results can be merged into a
string by concatenating element-wise the results from the method. In Table 1 a selection of the
DE-9IM patterns are shown to predicates [9].

Table 1
Spatial predicate and associated DE-9IM pattern [9]
  Spatial predicate                Spatial geometry combination   DE-9IM pattern
  Equals                           All                            T*F**FFF*
  Disjoint                         All                            FF*FF****
  Touches                          All except Point ∩ Point       FT******* ∨ F**T***** ∨
                                                                  F***T****
  Crosses                          Point ∩ Line                   T*T******
  Crosses                          Line ∩ Line                    0********
  Within                           All                            T*F**F***
  Overlaps                         Point ∩ Point, Area ∩ Area     T*T***T**
  Overlaps                         Line ∩ Line                    1*T***T**
  Intersects                       All                            a.intersects(b) =¬b.disjoint(a)
  Contains                         All                            a.contains(b) = b.within(a)

   As a basis for our STKG, we use a DGG to harmonize our geometries that are extracted from
OSM similar to KnowWhereGraph [5]. With DGGs we have the possibility to extract global
unique IDs per individual grid cell which allows for extensibility of the KG. Instead of using the
S2 DGG, we will base our geometry relation on the h3 DGG. The most important difference is
that the S2 DGG uses a square-based grid cell geometry, whereas h3 uses a hexagonal-based grid
cell geometry [10]. The main reason for our decision is that in a hexagonal grid, the distance to
all neighboring cells is uniform, which is not the case for square- or triangular-based grids.


4. Outline of Knowledge Graph representation
For our KG we provide a representation where the geometries of a geographic entity are aligned
on a DGG and modeled over time. Besides the geographic relation also the tag information
provided by OSM is encoded within the KG. Overall, the representation for the KG ontology
should respect the outlined rules:
    • Represent spatial entities from OSM in quadruple structure
    • Include hierarchical relations of common OSM tags in KG
    • Provide spatial relationship between OSM geometries and DGG
    • Provide relationships between individual grid cells in DGG

4.1. Overview of Classes and Properties
Starting with the commonly used OSM tags, we expose the OSM entity in our ontology using
the OSM ID. Between the common tags, we build a subclass relation based on the respective
hierarchy in the OSM tags. For the spatial relationships, we adapt the GeoSPARQL classes and
properties to align the created KG to the respective standard. The individual object of OSM
has for the respective value the relation rdf:type. If an OSM tag is not within the commonly
addressed tags, we use the key value pair within our KG. The OSM ID is the subject, the key
acts as the predicate and the value is used as the object.




Figure 1: Ontology of our Spatial-Temporal Knowledge Graph for OpenStreetMap objects


   For the different individual grid cells, we use the related DE-9IM GeoSPARQL properties to
model the relation to OSM. This is based on the conditions outlined in table 1. This involves the
following properties: geo:sfContains, geo:sfCrosses, geo:sfEquals, geo:sfOverlaps, geo:sfTouches,
geo:sfWithin, geo:ehCovers, geo:ehCoveredBy, and geo:sfIntersects. Due to the hierarchical rela-
tionships between the DE-9IM methodology, multiple relationships can be directed from the
grid cell to the individual OSM geometry. An example of how for one specific OSM ID the
STKG is structured is outlined in subsection 4.2.
   Within our STKG, we use a DGG, on which we align the OSM geometries for the KG.
Specifically for DGG, an ontology has already been proposed. Similar to our ontology, for the
relation to other geometries not from the grid, the GeoSPARQL classes have been adopted using,
for instance, the DE-9IM methodology. However, for the selected h3 grid, the ontology can only
be partially used, as the only relations for the grid cells are hcf:isAdjacentTo and hcf:contains,
which are based on the DE-9IM contains predicate [11]. While this works for square-based
grid systems, as grid cells on different hierarchy levels always contain the respective smaller
grid cell, hexagonal grid cells or triangular grid cells on different resolutions do not fulfill this
argument. We therefore expand the hierarchical relationships properties to isParentCellOf and
isChildCellOf to capture the hierarchical relationship between grid cells.




Figure 2: Ontology of our Spatial-Temporal Knowledge Graph for grid cell ontology



4.2. Example of Knowledge Graph Entity
To display how the structure of the STKG looks, we use a sample OSM object and display the
respective part of the STKG. In addition to the OSM object, we also display the related h3 grid
cell to the OSM object. As an example entity for OSM, we use OSM Way 240974013. In total, we
transform the single element of the OSM ID and the related h3 grid cell with its neighbors into
51 quadruples. As the complete list of quadruples would be too long for the paper, we publish a
full example under the following link of our Github. In table 2, we provide an excerpt from the
complete example.


5. Knowledge Graph Data Preparation
For the data preparation to construct our STKG, we used the OSM files from Geofabrik. To
provide an overview of the implemented approach, we have divided our data preparation into
three different phases: (1) the specific OSM data preparation, (2) the h3 DGG data preparation,
and (3) the STKG construction. Throughout the data preparation phase, we use Parquet files
as the main data storage format. It provides the best trade-off between data storage costs and
possibilities to push down queries to the file itself [12]. The capability of predicate push downs
to Parquet files allows during the data preparation phase to only scan the relevant data for the
necessary transformation [13, 12].
Table 2
Example of our Spatio-Temporal Knowledge Graph for the University building of Mannheim, represented
in Way 240974013
         subject                  predicate                 object                   date
         240974013                rdf:type                  university               2024-01-01
         university               rdfs:subClassOf           amenity                  2024-01-01
         240974013                addr:city                 Mannheim                 2024-01-01
         240974013                geo:hasGeometry           geo240974013             2024-01-01
         geo240974013             geo:asWKT                 POLYGON (([...]))        2024-01-01
         881fae61b9fffff          geo:sfContains            240974013                2024-01-01
         881fae61b9fffff          geo:ehCovers              240974013                2024-01-01
         881fae61b9fffff          geo:sfIntersects          240974013                2024-01-01


5.1. Preparation of OpenStreetMap data
We use OSM as a data foundation for our STKG. Specifically, the .osm.pbf files are used, which
can be derived from Geofabrik1 . Each OSM file represents a specified time stamp for a specific
region. For the three different main OSM data structures, we need to convert the XML-based
structure into a table-based structure for the future processing of the data. For the Node data
structure, the coordinates are given, for all other data structures, the geometries need to be
explicitly constructed. In order to achieve that, we use a gdal-based method called ogr2ogr, which
allows us to natively read .osm.pbf files and convert them to other formats while constructing
the respective geometries for the different formats. In our case, we use ogr2ogr to convert the
.osm.pbf files to Parquet files to further process them for our STKG. For that, each.osm.pbf is
split into five different files, which address the different layers gdal assigns to OSM files.
   After the initial conversion of the .osm.pbf file, we use the five resulting files to optimize
them further for the future STKG construction. For that, we use Apache Sedona as the main
processing method for the STKG. In comparison to other spatial data frameworks, Apache
Sedona has shown a better speed and memory efficiency when using it in computational
extensive workloads [14, 15]. Therefore, we decided to use Apache Sedona as our processing
engine. With Apache Sedona, the possibility exists to write Parquet files that support predicate
pushdown for geometric operations. To support the spatial predicate pushdown, the files need
to contain, for each geometry, a geohash and then be written back to a Parquet file. From the file
name we further extract the specific date the OSM file represents and add the date as a column
to the written file. Based on the transformed geometry files, we build up our grid, where we
will use the h3 grid system. In subsection 5.2, we outline the approach to grid creation for our
STKG.

5.2. h3 Grid Data Preparation
Based on the requirements outlined in section 4, we generate the h3 grid, which is used to
align geometries on it. For the respective grid cells, we use a world map as an input to retrieve

1
    Geofabrik provides OSM data extracts in different formats under the following web page
the h3 grid cells using the h3 Python package [16]. Depending on the parameter, we use the
Nominatim API to create a bounding box and clip the individual geometries of the world map.
This allows to not only construct a world-wide h3 grid, but also region-specific DGG.
   For the h3 grid we allow to specify grid-specific parameters. This includes the definition of
the resolution for each individual grid cell. The values that could be used are in the range of
[0, 15] ⊆ ℕ, where 0 represents the resolution of the biggest cell area with an average size of
4,357,449.41 km2 . For the resolution of 15, an individual h3 grid cell has an average area size
of 0.895 m2 , which is the smallest grid cell area captured in the h3 grid. Additionally, the h3
grid system allows the compacting of all grid cells based on population density. This allows,
especially for sparsely populated regions, to represent the data in fewer grid cells while still
representing the same area. After receiving all the respective grid cell IDs, we retrieve for
each individual grid cell the respective geometry associated with the individual grid cell. After
retrieving for all specified regions all grid cells, we store the grid data in a Parquet file. Similar
to OSM, we use Apache Sedona to store the geohash per individual grid cell to optimize the
predicate push down to the Parquet file. After finishing the grid creation, we outline in the last
step of our data preparation step, the STKG construction.

5.3. Spatio-Temporal Knowledge Graph Construction
For the creation of our STKG we use Apache Sedona, which is an optimized data transformation
engine for spatial data analysis. Overall, we base our STKG creation on the defined ontology
we outlined in section 4. All aspects of the STKG construction are based on Apache Sedona
functionality and use it as a backbone to provide a salable transformation engine. Our STKG
creation is divided into four main parts: OSM data tags, OSM geometry triples, h3 grid cell triples,
h3 grid and OSM geometry relation.
   As outlined in section 4, we separate our OSM tags in two different categories. In the case
where it is defined as a commonly used tag, we expose the key and value of the OSM tags
to individual entities. For all tags that do not fall into the category, we expose the key as a
predicate and the value as the object. In addition, for all the respective we assign the date
from the respective OSM entity to our STKG, marking the respective date for the individual
quadruple.
   For the different parts, we follow the outlined approach from section 4 where we expose
for each geometry of an object in separate entities, storing them in the WKT format for our
STKG. The storage of the geometry in the WKT format instead of the WKB format allows users
that query the graph to be able to directly interpret the geometry of an object. Based on the
respective elements for the grid, we model the neighborhood relationships for the grid cells
with the same resolution. For grid cells that do not have the same resolution, the parent-child
relationship is modeled and determined as outlined in section 4. For the relationships between
the different OSM geometries and the grid cell geometries, we use the predicate functions from
Apache Sedona to determine the DE-9IM predicates. With regard to the constructed STKG we
outline in subsection 5.4 key statistics based on a selected set of osm.pbf files.
5.4. Characteristics of constructed Spatio-Temporal Knowledge Graph
For the creation of the STKG we provide a coding base for which we are able to scale spatial
data analysis on large scale data representations. As a data base we use the yearly geofabrik
data extracts from OSM, which involves the datasets from the year 2018 to 2024. We use the
.osm.pbf files from all continents which are provided on geofabrik. This involves the regions
Africa, Antarctica, Asia, Australia, Central America, Europe, North America and South America
[17]. For the OSM dataset in total this results in 529,065,633 distinct OSM elements. For the
h3 grid in total 3,675,984 individual grid cells are used in our STKG. In table 3 we outline the
different key statistics for our STKG.

Table 3
Overview of Spatio-Temporal Knowledge Graph statistics
   Metric name                                    Count
   Total triples                                  27,042,753,856
   Distinct entity count                          1,841,912,579
   Distinct predicate count                       98,955




6. Conclusion and Outlook
We have provided in our research a framework for the creation of STKG over the entire
planet based on OSM. Our data foundation builds up on a large representation of spatial
data, representing various types of data. Representing those aspects in a STKG allows users
to interact with changing spatial data over time. Similar to the outlined use cases of WorldKG
or KnowWhereGraph, our STKG allows the possibility to explore connected entities based on
shared metadata information or geographic relations.
   In comparison to the presented KGs in section 2, our approach provides a holistic representa-
tion of spatial data over the planet Earth. Compared to WorldKG, we provide with our STKG a
representation of all available geometries in OSM. Additionally, the usage of the DGG allows
consumers to use the constructed STKG for various downstream tasks where the spatial grid is
involved. Similar to KnowWhereGraph we allow consumers to make use of the individual DGG
cell in their downstream tasks. Compared to both presented KGs our STKG has with over 27
billion triples and over 1,8 billion entities the largest coverage of spatial entities.
   For our current implementation, we rely on third-party tooling provided by GDAL to trans-
form the OSM data format. While the module ogr2ogr shows a great efficiency in the trans-
formation process, the handling of multiple large .osm.pbf files provides a certain overhead in
the transformation phase. In a future implementation, the native support of .osm.pbf files in
frameworks like Apache Sedona might help to reduce current initial processing times that are
outside the data preparation related to the STKG construction. While OSM provides a rich set
of spatial data, it needs to be emphasized that OSM data does not reflect the exact spatial reality
and is also subject to vandalism.
   With our approach, we have showcased that for the complete planet, all relevant geometries
from OSM and metadata. However, our current approach poses some limitations when it comes
to traditional KG specific standards. We currently produce as an output file format delta files for
our STKG. It provides for our large resulting STKG a good trade-off between the compression
size of our data and the fulfillment of the ACID transaction consistency. However, KG specific
frameworks like (Geo)SPARQL or are not directly supported in our file structure out of the box.
There has been research around the mapping of SPARK SQL to SPARQL [18, 19] or GeoSPARQL
[20] to support efficient queries on large KGs. For future research, this could be evaluated
compared to traditional KG frameworks that STKG. In addition, in future research a comparison
of the different STKGs on downstream spatial benchmark datasets could be performed to
compare the different STKG.


Acknowledgments
Map data copyrighted OpenStreetMap contributors and available from https://www.
openstreetmap.org.


References
 [1] OpenStreetMap contributors, OpenStreetMap, 2024. URL: https://www.openstreetmap.
     org/about.
 [2] M. Barthélemy, Spatial networks, Physics Reports 499 (2011) 1–101. doi:10.1016/j.
     physrep.2010.11.002 .
 [3] R. G. Morris, M. Barthelemy, Transport on Coupled Spatial Networks, Physical Review
     Letters 109 (2012) 128703. doi:10.1103/PhysRevLett.109.128703 .
 [4] A. Dsouza, N. Tempelmeier, R. Yu, S. Gottschalk, E. Demidova, WorldKG: A World-Scale
     Geographic Knowledge Graph, in: Proceedings of the 30th ACM International Conference
     on Information & Knowledge Management, ACM, Virtual Event Queensland Australia,
     2021, pp. 4475–4484. doi:10.1145/3459637.3482023 .
 [5] K. Janowicz, P. Hitzler, W. Li, D. Rehberger, M. Schildhauer, R. Zhu, C. Shimizu, C. K. Fisher,
     L. Cai, G. Mai, J. Zalewski, L. Zhou, S. Stephen, S. Gonzalez, B. Mecum, A. Lopez‐Carr,
     A. Schroeder, D. Smith, D. Wright, S. Wang, Y. Tian, Z. Liu, M. Shi, A. D’Onofrio, Z. Gu,
     K. Currier, Know, Know Where, KnowWhereGraph: A densely connected, cross‐domain
     knowledge graph and geo‐enrichment service stack for applications in environmental
     intelligence, AI Magazine 43 (2022) 30–39. doi:10.1002/aaai.12043 .
 [6] OpenStreetMap contributors, Map features - OpenStreetMap Wiki, 2023. URL: https:
     //wiki.openstreetmap.org/wiki/Map_features.
 [7] E. Romanschek, C. Clemen, W. Huhnt, A Novel Robust Approach for Computing DE-9IM
     Matrices Based on Space Partition and Integer Coordinates, ISPRS International Journal of
     Geo-Information 10 (2021) 715. doi:10.3390/ijgi10110715 .
 [8] E. Clementini, J. Sharma, M. J. Egenhofer, Modelling topological spatial relations: Strate-
     gies for query processing, Computers & Graphics 18 (1994) 815–822. doi:10.1016/
     0097- 8493(94)90007- 8 .
 [9] T. Feng, Z. Zeng, X. Wu, R. Liu, L. Gao, Discovery of Multi-Level Spatial Association Rules
     Based on DE-9IM, in: 2010 International Conference on Management and Service Science,
     IEEE, Wuhan, China, 2010, pp. 1–5. doi:10.1109/ICMSS.2010.5577834 .
[10] M. Li, E. Stefanakis, Geospatial Operations of Discrete Global Grid Systems—a Comparison
     with Traditional GIS, Journal of Geovisualization and Spatial Analysis 4 (2020) 26. doi:10.
     1007/s41651- 020- 00066- 3 .
[11] C. Shimizu, R. Zhu, G. Mai, C. Fisher, L. Cai, M. Schildhauer, K. Janowicz, P. Hitzler, L. Zhou,
     S. Stephen, A Pattern for Features on a Hierarchical Spatial Grid, in: Proceedings of the
     10th International Joint Conference on Knowledge Graphs, ACM, Virtual Event Thailand,
     2021, pp. 108–114. doi:10.1145/3502223.3502236 .
[12] J. Hu, Y. Wang, F. Shi, C. Xu, Compared Analysis of Row-Based Storage and Column-Based
     Storage, in: 2018 Eighth International Conference on Instrumentation & Measurement,
     Computer, Communication and Control (IMCCC), IEEE, Harbin, China, 2018, pp. 168–173.
     doi:10.1109/IMCCC.2018.00043 .
[13] Z. Luo, L. Niu, V. Korukanti, Y. Sun, M. Basmanova, Y. He, B. Wang, D. Agrawal, H. Luo,
     C. Tang, A. Singh, Y. Li, P. Du, G. Baliga, M. Fu, From Batch Processing to Real Time
     Analytics: Running Presto® at Scale, in: 2022 IEEE 38th International Conference on Data
     Engineering (ICDE), IEEE, Kuala Lumpur, Malaysia, 2022, pp. 1598–1609. doi:10.1109/
     ICDE53745.2022.00165 .
[14] R. Y. Tahboub, T. Rompf, Architecting a Query Compiler for Spatial Workloads, in:
     Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data,
     ACM, Portland OR USA, 2020, pp. 2103–2118. doi:10.1145/3318464.3389701 .
[15] V. Pandey, A. Kipf, T. Neumann, A. Kemper, How good are modern spatial analytics
     systems?, Proceedings of the VLDB Endowment 11 (2018) 1661–1673. doi:10.14778/
     3236187.3236213 .
[16] Uber Technologies, h3-py: Uber’s H3 Hexagonal Hierarchical Geospatial Indexing System
     in Python, 2023. URL: https://uber.github.io/h3-py.
[17] M. Boeckling, Geofabrik data links, 2024. doi:10.5281/ZENODO.10798949 .
[18] D. Graux, L. Jachiet, P. Genevès, N. Layaïda, SPARQLGX: Efficient Distributed Evaluation
     of SPARQL with Apache Spark, in: P. Groth, E. Simperl, A. Gray, M. Sabou, M. Krötzsch,
     F. Lecue, F. Flöck, Y. Gil (Eds.), The Semantic Web – ISWC 2016, volume 9982, Springer
     International Publishing, Cham, 2016, pp. 80–87. doi:10.1007/978- 3- 319- 46547- 0_9 ,
     series Title: Lecture Notes in Computer Science.
[19] A. Schätzle, M. Przyjaciel-Zablocki, S. Skilevic, G. Lausen, S2RDF: RDF Querying with
     SPARQL on Spark (2015). doi:10.48550/ARXIV.1512.07021 .
[20] D. Bilidas, T. Ioannidis, N. Mamoulis, M. Koubarakis, Strabo 2: Distributed Management
     of Massive Geospatial RDF Datasets, in: U. Sattler, A. Hogan, M. Keet, V. Presutti, J. P. A.
     Almeida, H. Takeda, P. Monnin, G. Pirrò, C. d’Amato (Eds.), The Semantic Web – ISWC
     2022, volume 13489, Springer International Publishing, Cham, 2022, pp. 411–427. doi:10.
     1007/978- 3- 031- 19433- 7_24 , series Title: Lecture Notes in Computer Science.



A. Online Resources
The documentation as also the coding for our research paper can be found on GitHub