<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Leveraging Link Prediction for Geospatial Data Integration</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Albulen Pano</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mattia Fumagalli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Davide Lanti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Calvanese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Free University of Bozen-Bolzano, Faculty of Engineering</institution>
          ,
          <addr-line>39100 Bolzano</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>Link prediction is a technique used to predict new relationships between entities in a given graph. There are several domains in which this technique is applied. Those span from social media friendship links suggestions to correlated products prediction. Nevertheless, the use of link prediction to support knowledge integration is still a subject of debate, especially in the context of geospatial data. In this paper, we aim to discuss the role and some of the potential benefits of link prediction in the context of geospatial data completion and integration. To this end, we aim to position and discuss the role of geospatial link prediction within the framework of Ontology-Based Data Access (OBDA), highlighting the potential contribution of link prediction in this field. Additionally, we present a series of preliminary experiments designed to predict relationships of “competition” among business activities within a specific geographic area. Finally, we explore how the injection of knowledge from information-rich schemas about concepts related to the geospatial domain can positively influence the accuracy of the prediction model.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge graphs</kwd>
        <kwd>Link prediction</kwd>
        <kwd>Geospatial data</kwd>
        <kwd>Knowledge completion</kwd>
        <kwd>Knowledge integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Semantic geospatial applications, such as geographic search engines, heavily rely on knowledge graphs
(KGs). These can be the output of the integration of several autonomous and independent data sources.
For instance, the knowledge graph about a specific geographic area may result from the integration of
data related to the buildings of that area, streets and even green areas. Geospatial knowledge graphs are
commonly designed to support map navigation, enhance the visualization of geographic regions, and
provide insights into spatial relationships. Additionally, they can be enriched with semantic information
beyond purely spatial attributes. For instance, they may capture relations between locations frequently
co-visited by tourists, highlight maximum occupancy rates of specific buildings, or represent other
contextual associations that do not strictly depend on spatial properties.</p>
      <p>A significant challenge in geospatial knowledge graphs is their extraction from diverse sources. These
are often available only in unstructured (textual) or semi-structured data (e.g., JSON, XML). Moreover,
they often result in incomplete representations and limited interoperability with other knowledge graphs
containing related information. Addressing these limitations requires techniques for KG enrichment to
enhance the density and depth of knowledge representations, thereby improving completeness and
even enabling the derivation of new knowledge.</p>
      <p>
        KG enrichment approaches can be broadly categorized into two main classes: (1) logic-based reasoning
and (2) machine learning (ML)-based KG completion. The logic-based approach employs automated
reasoning techniques over ontological axioms, as outlined by Baader et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], or rule-based systems,
such as those described by Horrocks et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], to infer new triples that are not explicitly asserted in
the knowledge graph. For geospatial data, the GeoSPARQL standard facilitates spatial reasoning by
enabling the inference of geospatial relationships [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. While logic-based approaches can yield highly
accurate results, they are contingent on the availability of high-quality input data and explicitly defined
ontological axioms or rules.
      </p>
      <p>
        In contrast, ML-based approaches ofer greater flexibility in handling incomplete or noisy data,
especially when axioms and rules are missing or dificult to formulate. These methods first transform the
KG into a high-dimensional vector space using embedding techniques, after which link prediction
algorithms rank candidate missing links based on learned patterns. However, most existing KG embedding
models overlook spatial characteristics, leading to suboptimal performance in geospatial applications.
In response to this limitation, Mai et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] introduced SE-KGE, a location-aware KG embedding model
that explicitly incorporates spatial information, such as point coordinates and bounding boxes, directly
into the KG embedding space, thereby enhancing geospatial inference capabilities.
      </p>
      <p>In this paper, building upon a pipeline for geospatial data integration proposed in a previous study,
we aim to discuss the introduction of a new module that leverages the link prediction technique. We
assume that this new module can serve as a support for the existing data integration components within
the current pipeline. Furthermore, the link prediction module itself can benefit from the knowledge
graph generated through the pipeline. As this is a discussion paper, the issues addressed herein are
primarily introductory. However, to make the discussion more concrete, in addition to presenting the
extended version of the integration pipeline incorporating the link prediction module, we also describe
several tests conducted on an existing dataset, reusing established strategies for applying link prediction
in the context of geospatial data.</p>
      <p>The structure of the paper is as follows: in section 2 we introduce some works that are relevant for
our proposal, in particular by focusing on similar approaches that leverage link prediction in the context
of geospatial data. In section 3 we briefly discuss the overall approach of geospatial data integration
upon which we want to plug-in a new link-prediction component and we describe the role of the link
prediction module itself. In section 4 we discuss some preliminary experiments to explore the feasibility
and the potential utility of our proposal. section 5 is about conclusion and future work.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. Geospatial Link Prediction</title>
        <p>
          Link prediction is a technique used to predict relationships between nodes in a graph [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Typically, in
its simplest setup, this technique involves transforming a graph into an adjacency matrix. This matrix
is then used to train a predictive algorithm, which can later be employed to predict relationships in
a new graph provided as test input (also encoded as an adjacency matrix). In this context, most link
prediction algorithms are designed to address ranking problems by assigning a score proportional to
the likelihood of a relationship between two nodes. A threshold is set by the algorithm or the user, and
node pairs with scores exceeding this threshold are considered positive predictions. In this sense, link
prediction can be framed as a binary classification problem.
        </p>
        <p>
          This technique is widely used across various domains for specific tasks, such as suggesting friendship
links in social networks, identifying hyperlinks between websites, or recommending related products
based on browsing profiles in e-commerce. In recent years, the application of link prediction to geospatial
data has gained traction, yielding significant results. For instance, Liu et al. [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] employed link prediction
to suggest optimal geographic locations for opening commercial establishments within urban areas.
Another example is provided by Mann et al. [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], where various supervised and unsupervised machine
learning adaptations of this technique were tested in the context of sparsely interlinked geospatial
knowledge graphs.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Geospatial Knowledge Graphs</title>
        <p>
          Geospatial knowledge graphs are KGs containing geospatial objects, geometries, and their
relationships. GeoSPARQL is an Open Geospatial Consortium (OGC) standard for representing and querying
geospatial KGs [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. The GeoSPARQL ontology introduces classes such as features, geometries, and their
representations using the Geography Markup Language (GML)1 and Well-Known Text (WKT)2 literals. It
also includes vocabularies for topological relationships. Additionally, GeoSPARQL extends the standard
SPARQL query language with topological functions for quantitative reasoning.
        </p>
        <p>
          Geospatial KGs are often converted from geospatial data sources stored in spatial databases or other
popular formats such as Shapefiles. The ontology-based data access (OBDA) paradigm provides a
systematic approach to this conversion, by allowing end-users to access data sources through a domain
ontology. Typically, the domain ontology imports the GeoSPARQL ontology and is semantically linked
to the data sources via a mapping expressed in the R2RML language [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], standardized by the W3C.
        </p>
        <p>
          One of the most well-known geospatial KG projects is LinkedGeoData [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], which primarily follows
the OBDA approach to expose data from Open Street Map (OSM)3 as geospatial KGs.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>
        Our proposal builds upon a geospatial data integration pipeline previously described in a prior work [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
Figure 1, using a Business Process Model (BPMN),4 presents a simplified version of this pipeline,
incorporating a new set of tasks that represent the novel aspect we aim to discuss. The depicted
pipeline primarily serves two purposes. First, it demonstrates how a KG can be generated to support
query-answering services on geospatial data. Second, it illustrates how the KG can be evolved by
1https://www.ogc.org/publications/standard/gml/
2https://libgeos.org/specifications/wkt/
3https://www.openstreetmap.org/
4Note that BPMN is a conceptual modeling language adopted to represent tasks and procedures within a system. For more
information about BPMN, the authors refer the readers to [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
incorporating new data and knowledge, thereby extending query-answering services. Concerning this
second point, we aim to discuss how tasks related to link prediction can play a useful role.
      </p>
      <p>As shown in the figure, the pipeline consists of five main phases: (i) Initialization, (ii) KG Construction,
(iii) Integration, (iv) Link Prediction, and (v) Application. Each of these phases comprises tasks or steps
that can receive and/or produce diferent types of data. The group labeled as ‘KG’ represents the
resource that supports query-answering activities, which are described by the tasks in the application
phase.</p>
      <p>
        In this context, we do not delve into the details of what constitutes the KG. It sufices to note that it can
be either a Virtual Knowledge Graph (VKG) or a Materialized Knowledge Graph (MKG). The VKG consists
of two subcomponents: an ontology function and a mapping function, which are used to generate RDF
triples on demand from physical storage. In contrast, representing the KG as an MKG eliminates the
need for a virtualized pipeline for RDF [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] data, at the cost of increased storage requirements and the
need to rematerialize RDF triples whenever the source data change. All these components can then
evolve through the steps in the integration phase. For more information, we refer readers to [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
      <p>Returning to the phase descriptions, Initialization primarily aims to generate an integrated phyisical
data storagefrom input data (e.g., CityGML5 data) while also providing an initial ontology to represent
the data (e.g., the CityGML ontology6), as a base version of a knowledge graph. To create the target data
repository, the input dataset is incorporated into a relational database (see Generate SQL in Figure 1),
mapping each data row to entities and columns to specific information fields. This ensures that the data
can be queried, retrieved, stored, and updated. Although various automated and complete solutions
are available for this phase, some custom ad hoc refinements may be necessary. This is because the
physical storage of this phase’s output must align with the technology used to generate the knowledge
graph in the subsequent phase.</p>
      <p>The second phase, KG Construction, aims to generate a KG that serves as a reference point for
query-answering activities. This phase can be iterated multiple times. The first iteration occurs after
the initialization phase, using as input the ontology selected in the initialization phase, the Integrated
Physical Storage that houses the selected data, and the mapping required to link the two. In subsequent
iterations, the input to this phase comes from the integration phase’s output. In both cases, the KG
construction phase primarily deals with defining the ontology and its corresponding mappings, with
three key objectives: (i) defining the set of concepts, relationships, and properties within the reference
knowledge domain; (ii) capturing the semantics of stored information to enable enhanced reasoning
and inference capabilities; and (iii) fostering interoperability among integrated data sources.</p>
      <p>A crucial aspect is identifying an existing ontology that best captures the semantics of the selected
data. Once the ontology is selected, a mapping phase follows, aligning the database generated in the
initialization phase with the ontology concepts. If the information cannot be straightforwardly mapped,
manual intervention is required, typically involving modifications to the selected ontology to adequately
incorporate the information stored in the physical repository.</p>
      <p>After completing the initialization and KG construction phases, a KG is ready to support
queryanswering activities. However, our approach also enables the integration of additional data sources,
and here comes into play the Integration phase, allowing for the creation of an extended KG beyond its
initial version. As described in prior work, this evolution is addressed in the integration phase, where
the key tasks are: (i) user selection of new data sources, (ii) integration of the new data into the existing
physical storage, and (iii) user selection of a new ontology or additional ontological information to
account for the integrated data.</p>
      <p>The new Link Prediction phase is designed to support the integration phase by training a predictive
model capable of inferring new links within the KG. A crucial step in this process is data preparation,
which involves key sub-steps such as database generation, area discretization, the creation of new
relations, KG construction, and ontology tuning.7 The model can then be trained and deployed using a</p>
      <sec id="sec-3-1">
        <title>5https://www.ogc.org/publications/standard/citygml/</title>
        <p>6https://smartcity.linkeddata.es/ontologies/cui.unige.chcitygml2.0.html
7Note that some of these sub steps can be also handled in the data integration phase. Here, due to lack of space, we do not
delve into this aspect.
partially extended version of the KG, incorporating newly inferred relationships (as demonstrated in
the experimental example in the next section). Alternatively, training can be conducted on new graphs.
Training can also be extended by leveraging embeddings to enhance predictive performance. The type
of input graph depends on the adopted prediction model, which may support either link prediction
alone or both link and node prediction. It should be noted that, at present, how to integrate newly
predicted information remains an open question. One possible approach is to allow users to decide
whether to incorporate the predicted information after reviewing the output.</p>
        <p>Finally, the last phase of the entire process consists of the tasks grouped under Application, which is
dedicated to using the output KG. In the existing solution, users are enabled, via a dedicated ad hoc
interface, to query the KG using SPARQL queries, with results presented in textual or visual formats.
With the introduction of the link prediction phase, users may also request information about newly
predicted links within the KG.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Preliminary Experiment</title>
      <p>We present below the preliminary experiment we conducted to explore the link prediction technique’s
usefulness within the pipeline described in section 3. Specifically, the research questions we want to
address are:
• RQ1. How can link prediction be leveraged to complete a target KG?
• RQ2. How can the results of the link prediction model be made more reliable?</p>
      <p>All files and processes used in the experiment are available on a GitHub repository. 8
4.1. Setup
The experiment was conducted on a high-performance computing (HPC) cluster equipped with NVIDIA
A100-SXM4-80GB GPUs. A standard computer is insuficient due to the high computational cost of the
link prediction task, especially in the context of geospatial data where graphs are rich in information.
The library used to run the models was Pykeen,9 a Python package capable of running easily reproducible
knowledge graph embedding (KGE) models. Pykeen (v. 1.11.0) was selected for its extensive library of
Knowledge Graph Embedding (KGE) models and its simple code execution pipeline.</p>
      <sec id="sec-4-1">
        <title>4.2. Resources</title>
        <p>For this preliminary discussion, we limited the experiment to a single geospatial data source,
OpenStreetMap (OSM), an open and free map database with volunteered geographic information. The
discussion of the role of the methodological integration phase’s iterations with additional data sources
and ontologies, as discussed in section 3, is deferred to a more extended future work. We selected as
the area of interest the city of Bolzano-Bozen (Italy) and its immediate surroundings by defining a
bounding box with limits 11.3°E to 11.4°E longitude and 45.52°N to 46.45°N latitude. We retrieved an
OSM dataset from Geofabrik10, which stores daily data dumps of OSM data and filtered it based on
Bolzano’s coordinates. OSM relies on geographic information contributed by volunteers, resulting in a
diverse array of key-value pairs—over 13,00011 in Italy alone. While numerous OSM ontologies exist,
we chose the widely recognized LinkedGeoData ontology, which encompasses more than 150 classes
and helps to structure the knowledge base more efectively.</p>
        <p>Although the LinkedGeoData ontology serves as a starting point for our analysis, it sufers from
several pitfalls due to its relatively flat structure. It contains very few object properties making it dificult
to discriminate specific subclasses, e.g., both lgdo:Restaurant and lgdo:Bakery are subclasses
8https://github.com/D2G2Project/KGLinkPrediction
9https://pykeen.readthedocs.io/en/stable/
10https://www.geofabrik.de/geofabrik/
11https://taginfo.geofabrik.de/europe:italy/keys
of lgdo:Amenity and do not have any idiosyncratic properties to diferentiate them. Therefore, for
our link prediction task we choose an additional well-known vocabulary to enrich the graph to be
used for training the link prediction model. Such a vocabulary is Schema.org, version 28.1.12 It is
important to note that the number of classes modelled decreases from 115 to 52 when moving from
LinkedGeoData to Schema.org, as the latter is less expressive in its taxonomy for our use case. For
instance, while Schema.org uses a single combined class BarOrPub, LinkedGeoData distinguishes
between two separate classes Bar and Pub. However, this reduction in class granularity comes with
a trade-of as Schema.org ofers greater richness in properties. An example of this is the hasMenu
property, which Schema.org applies to various food establishments but not bakeries.
4.3. Tests
We structured the experiment into two main tests, both following the same two-step process: data
preparation and model deployment. In the second test, we introduced an additional step—knowledge
injection—which can be considered a sub-step of data preparation.</p>
        <p>
          In the first test, we focused on data preparation and model deployment without incorporating knowledge
injection, primarily addressing RQ1. In contrast, the second test included knowledge injection, with a
primary focus on RQ2. For both tests, we applied the same parameters and evaluation metrics.
4.3.1. Data preparation
Database generation. As a first step, to load OSM data for Bolzano we relied on a PostgreSQL database,
version 17.3, and its geospatial extension PostGIS version 3.5. The geospatial extension is needed to
render geometry data. A total of 532.794 entities were retrieved. Two sample rows and the first five
columns of the entities table are provided in Table 1.
Areas discretization. Secondly, following the approach adopted in the UrbanKG work [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], where the cities
analyzed were divided into business areas based also on roads, we adopted a geographic discretization step,
namely we divide the areas of our dataset into a finer group of sub-areas of interest. For that purpose, we used the
OSM defined Primary, Secondary and Tertiary highways to segment the city of Bolzano into sub-areas. We then
added to each record in the OSM relational table a link to the respective sub-area it belongs to. This procedure is
mainly intended to improve the connection paths in the knowledge graph between geographically closer entities.
Competitive Relation. With this step, we introduced a new relation (note that this step can also be addressed
in the Integration phase of the pipeline), namely the competitive relation, which we selected as the target link
for prediction. For the purpose of this experiment, the competitive relation is initially defined to hold between
any two entities belonging to the same OpenStreetMap (OSM) category, provided that their absolute distance
from each other does not exceed 500 meters.
        </p>
        <p>
          KG construction. Thirdly, we constructed the KG utilizing Ontop v.5.1.1 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ],13 an open-source platform that
provides support for querying over relational databases using Semantic Web technologies, specifically the RDF
data model, SPARQL [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] query language, OWL 2 QL [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] ontology, and R2RML mapping language. To create the
KG in Ontop, we designed mappings between the PostgreSQL database and the ontology. A mapping consists of
three components: 1) mapping identifier, 2) relational source, and 3) target. The mapping identifier is any unique
identifier. The source refers to an SQL query expressed over a relational database to retrieve data. The target is
an RDF triple pattern(s) that uses the answer variables from the preceding SQL query as placeholders. A sample
mapping is presented below for rdf:type:
12https://github.com/schemaorg/schemaorg/tree/main/data/releases/28.1
13https://ontop-vkg.org/
mappingId
target
source
        </p>
        <p>OSM classes
lgd:entity/{osm_id} a lgdo:{class_name} .</p>
        <p>SELECT "osm_id", "class_name"
FROM public.entities LEFT JOIN public.classes ON entities.class = classes.</p>
        <p>class_id::TEXT</p>
        <p>For these experiments, we limited mappings to rdf:type, :competitive as well as :locateAt and
:borderBy. We generated the relations :locateAt for OSM points of interest (POIs) within these sub-areas
and :borderBy for sub-areas that border one-another (i.e., share an edge) based on the geographic discretization
described previously whereas we discuss :competitive separately further below.</p>
        <p>Ontology tuning. The choice to add a business-oriented relation such as “competitive” to our graph made
it necessary to further filter the LinkedGeoData classes used in the ontology. Specifically, we filtered classes
where a relation in the context of commercial operations is meaningful. Therefore, we preserve classes like Bar,
Restaurant, Hotel but remove classes like Museum and Community Center, leaving in total 115 classes. The filter
is also applied to the corresponding classes in Schema.org.</p>
        <p>
          With the specification of the ontology and mappings, we leveraged Ontop to generate a knowledge graph
(KG) based on the relational OSM data. Finally, the virtual knowledge graph (VKG) can be transformed into a
materialized knowledge graph (MKG) in RDF format.
4.3.2. Model Deployment
We implement Knowledge Graph Embedding (KGE) models to define the embeddings and learn new links. For this
experiment, we focused on 2 transductive KGE models TransE [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and TransR [20], but the experiments can also
be applied to inductive models. Rather than use random initialization, we also leveraged pre-trained Space2Vec [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]
geospatial embeddings as a beginning point. We adopted this option to check the role of the embeddings in
performance improvement. Spatial awareness, i.e. make the embeddings of entities that are geographically closer
also have closer embeddings in the n-dimensional space, can help the model achieve better convergence and
performance and, perhaps, also allow the preservation of spatial relationships.
        </p>
        <p>As a last point, in the experiment, no text embeddings were used. Bolzano-Bozen is a unique case due to its
bilingual nature, and the issue of text embeddings for the names of the establishments needs to be analyzed in
detail separately. This is an issue that can be revisited in the future.
4.3.3. Knowledge injection
This step was addressed in the second test of the experiment, where we primarily focused on answering RQ2. In
this phase, triples containing object property information from the selected ontologies were added to the training
dataset. As previously discussed, due to the simpler structure of the LinkedGeoData ontology—comprising only
classes and data properties—no additional training data could be generated. However, for Schema.org, we were
able to generate additional data to enrich the training phase of our pipeline.</p>
        <p>Specifically, we imported new information from Schema.org concerning the classes of entities already present
in the KG. This involved incorporating properties such as &lt;schema:hasMenu&gt; and associating them with
existing classes like &lt;schema:FoodEstablishment&gt;. The goal was to enhance the discriminative
information within the KG. For example, both &lt;schema:Restaurant&gt; and &lt;schema:Bakery&gt; are subclasses of
&lt;schema:FoodEstablishment&gt;, but the &lt;schema:hasMenu&gt; property is characteristic only of the former,
providing additional distinguishing information.
4.3.4. Parameters and Metrics
In both tests, we split the input data into training, test, and validation sets using an 8/1/1 ratio. The embedding
dimensionality was set to 64, and we employed an early stopping strategy during training—meaning that training
would halt if the loss did not decrease for 10 consecutive iterations. However, the total number of epochs was
capped at 20.</p>
        <p>For our analysis, we selected the evaluation metrics Hits@10 and Mean Reciprocal Rank (MRR) for the link
prediction task, both of which are commonly used in knowledge graph completion tasks. Hits@10 measures the
percentage of cases where the correct entity appears within the top 10 ranked results. MRR, on the other hand,
computes the average of the reciprocal ranks of the correct answers, assigning greater weight to higher-ranked
predictions. Since link prediction is more efectively evaluated through ranking rather than pure classification,
using these two metrics in conjunction provides a more comprehensive assessment.
TransE
TransR
RotatE</p>
        <p>Random
Random
Random
Space2Vec
Space2Vec
Space2Vec
Random
Random
Random
Space2Vec
Space2Vec
Space2Vec
Random
Random
Random
Space2Vec
Space2Vec
Space2Vec</p>
        <p>None
Schema.org
LinkedGeoData
None
Schema.org
LinkedGeoData
None
Schema.org
LinkedGeoData
None
Schema.org
LinkedGeoData
None
Schema.org
LinkedGeoData
None
Schema.org
LinkedGeoData</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.4. Results and Discussion</title>
        <p>The results of our experiment highlight that, given the current setup, the most efective approach for leveraging
link prediction to infer new connections is through the adoption of RotatE [21] without embeddings and without
incorporating knowledge injection or ontology-based information. This serves as evidence to start answering
RQ1. However, when focusing on RQ2, we observe that using alternative models such as TransE and TransR
demonstrates that the integration of embeddings and ontological data can influence performance outcomes.</p>
        <p>The key takeaway from our findings is that, in our setting, link prediction is a viable strategy for identifying
new relations with a reasonable level of accuracy, particularly when employing RotatE. Within the context of
our methodology, these predicted links can be queried and potentially integrated into the knowledge graph to
enhance its completeness and utility.</p>
        <p>An open research question remains: can our data integration pipeline, when used in reverse, contribute to
the construction of more robust knowledge graphs that, in turn, facilitate the development of more accurate
predictive models. Further investigation is required to explore whether refining the KG through data integration
could lead to improvements in link prediction performance.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion and Perspectives</title>
      <p>In this discussion paper we provide some initial ideas on how to integrate link prediction over geospatial
knowledge graphs into a knowledge graph construction pipeline. We demonstrate that it is feasible to add
additional links to an existing knowledge graph to make it denser. Furthermore, we review the importance of
the choice of the ontology and its transformation into the input data can have on model performance for link
prediction.</p>
      <p>Future work needs to be dedicated to testing a greater variety of KGE models, better ways to integrate the
geospatial embeddings in model evaluation, as well as the choice of data to use for the construction of the KG
and its respective training dataset. Due to the use of KGs at the heart of the architecture, integrating new data
sources and therefore adding more data to the training phase can be performed in a systematic fashion.</p>
    </sec>
    <sec id="sec-6">
      <title>Declaration on Generative AI</title>
      <sec id="sec-6-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>This research has been supported by the German Research Foundation (DFG) and the Autonomous Province
of Bolzano-Bozen through its joint project “Dense and Deep Geographic Virtual Knowledge Graphs for Visual
Analysis - D2G2” (grant number 500249124), by the HEU project CyclOps (grant agreement 101135513), by
the Province of Bolzano and FWF through the project Ontegra (DOI 10.55776/PIN8884924), by the Province of
Bolzano and EU through the project EFRE/FESR 1078 CRIMA, and by the Italian PRIN project S-PIC4CHU (grant
agreement 2022XERWK9). This work has been carried out while Albulen Pano was enrolled in the Italian National
Doctorate on Artificial Intelligence run by Sapienza University of Rome in collaboration with Free University of
Bozen-Bolzano.
[20] A. Bordes, N. Usunier, A. García-Durán, J. Weston, O. Yakhnenko, Translating embeddings for
modeling multi-relational data, in: Proc. of the 27th Annual Conf. on Neural Information
Processing Systems (NIPS 2013), 2013, pp. 2787–2795. URL: https://proceedings.neurips.cc/paper/2013/hash/
1cecc7a77928ca8133fa24680a88d2f9-Abstract.html.
[21] Z. Sun, Z.-H. Deng, J.-Y. Nie, J. Tang, RotatE: Knowledge graph embedding by relational rotation in
complex space, in: 7th Int. Conf. on Learning Representations (ICLR 2019), OpenReview.net, 2019. URL:
https://openreview.net/forum?id=HkgEQnRqYQ.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <source>The Description Logic Handbook: Theory, Implementation and Applications</source>
          , 2 ed., Cambridge University Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Boley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Grossof</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Dean, SWRL: A Semantic Web Rule Language Combining OWL and RuleML</article-title>
          , W3C Member Submission,
          <source>World Wide Web Consortium</source>
          ,
          <year>2004</year>
          . URL: https://www.w3.org/Submission/SWRL/.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Perry</surname>
          </string-name>
          , J. Herring,
          <string-name>
            <surname>GeoSPARQL - A Geographic Query</surname>
          </string-name>
          <article-title>Language for RDF Data, OGC Implementation Standard OGC 11-052r4</article-title>
          , Open Geospatial Consortium,
          <year>2012</year>
          . URL: http://www.opengeospatial.org/standards/ geosparql.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Janowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Cai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Regalia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Yan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Lao</surname>
          </string-name>
          , SE-KGE:
          <article-title>A location-aware Knowledge Graph embedding model for geographic question answering and spatial semantic lifting</article-title>
          ,
          <source>Trans. in GIS 24</source>
          (
          <year>2020</year>
          )
          <fpage>623</fpage>
          -
          <lpage>655</lpage>
          . doi:
          <volume>10</volume>
          .1111/tgis.12629.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Martínez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Berzal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Cubero</surname>
          </string-name>
          ,
          <article-title>A survey of link prediction in complex networks</article-title>
          ,
          <source>ACM Computing Surveys</source>
          <volume>49</volume>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>UrbanKG:</surname>
          </string-name>
          <article-title>An urban knowledge graph system</article-title>
          ,
          <source>ACM Trans. on Intelligent Systems and Technology</source>
          <volume>14</volume>
          (
          <year>2023</year>
          )
          <volume>60</volume>
          :
          <fpage>1</fpage>
          -
          <lpage>60</lpage>
          :
          <fpage>25</fpage>
          . doi:
          <volume>10</volume>
          .1145/3588577.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Mann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Dsouza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Yu</surname>
          </string-name>
          , E. Demidova,
          <article-title>Spatial link prediction with spatial and semantic embeddings</article-title>
          ,
          <source>in: Proc. of the 22nd Int. Semantic Web Conf. (ISWC)</source>
          , volume
          <volume>14265</volume>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>196</lpage>
          . doi:
          <volume>10</volume>
          . 1007/978-3-
          <fpage>031</fpage>
          -47240-4_
          <fpage>10</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>E. van Rees</surname>
          </string-name>
          ,
          <source>Open Geospatial Consortium (ogc)</source>
          ,
          <source>Geoinformatics</source>
          <volume>16</volume>
          (
          <year>2013</year>
          )
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>Das</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sundara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <article-title>R2RML: RDB to RDF Mapping Language</article-title>
          , W3C Recommendation, World Wide Web Consortium,
          <year>2012</year>
          . URL: http://www.w3.org/TR/r2rml/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Höfner</surname>
          </string-name>
          , S. Auer,
          <article-title>LinkedGeoData: A core for a web of spatial open data</article-title>
          ,
          <source>Semantic Web</source>
          <volume>3</volume>
          (
          <year>2012</year>
          )
          <fpage>333</fpage>
          -
          <lpage>354</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fumagalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Fan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Meng</surname>
          </string-name>
          ,
          <article-title>Integrating 3D city data through knowledge graphs, Geo-spatial Information Science (</article-title>
          <year>2024</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          . doi:
          <volume>10</volume>
          .1080/10095020.
          <year>2024</year>
          .
          <volume>2337360</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S. A.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <article-title>Introduction to BPMN, IBM Cooperation 2 (</article-title>
          <year>2004</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Raimond</surname>
          </string-name>
          , RDF
          <volume>1</volume>
          .1 Primer, W3C Working Group Note,
          <source>World Wide Web Consortium</source>
          ,
          <year>2014</year>
          . URL: http://www.w3.org/TR/rdf11-primer/.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <article-title>Virtual Knowledge Graphs: An overview of systems and use cases</article-title>
          ,
          <source>Data Intelligence</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>201</fpage>
          -
          <lpage>223</lpage>
          . doi:
          <volume>10</volume>
          .1162/dint_a_
          <fpage>00011</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lanti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kontchakov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Komla-Ebri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Güzel-Kalayci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Corman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Cogrel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <surname>E. Botoeva,</surname>
          </string-name>
          <article-title>The virtual knowledge graph system Ontop</article-title>
          ,
          <source>in: Proc. of the 19th Int. Semantic Web Conf. (ISWC)</source>
          , volume
          <volume>12507</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>259</fpage>
          -
          <lpage>277</lpage>
          . doi:
          <volume>10</volume>
          .1007/ 978-3-
          <fpage>030</fpage>
          -62466-8_
          <fpage>17</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>UrbanKG:</surname>
          </string-name>
          <article-title>An urban knowledge graph system</article-title>
          ,
          <source>ACM Trans. Intell. Syst. Technol</source>
          .
          <volume>14</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1145/3588577.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Seaborne</surname>
          </string-name>
          , SPARQL
          <volume>1</volume>
          .
          <article-title>1 Query Language</article-title>
          , W3C Recommendation, World Wide Web Consortium,
          <year>2013</year>
          . URL: http://www.w3.org/TR/sparql11-query.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.</given-names>
            <surname>Motik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fokoue</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          , OWL 2
          <string-name>
            <given-names>Web</given-names>
            <surname>Ontology Language Profiles (Second Edition</surname>
          </string-name>
          ),
          <source>W3C Recommendation, World Wide Web Consortium</source>
          ,
          <year>2012</year>
          . URL: http://www.w3.org/TR/ owl2-profiles/.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <article-title>Learning entity and relation embeddings for knowledge graph completion</article-title>
          ,
          <source>in: Proc. of the 29th AAAI Conf. on Artificial Intelligence (AAAI)</source>
          , AAAI Press,
          <year>2015</year>
          , pp.
          <fpage>2181</fpage>
          -
          <lpage>2187</lpage>
          . doi:
          <volume>10</volume>
          .1609/AAAI.V29I1.9491.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>