<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>M. Andresel);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Adapting Ontology-based Data Access for Data Spaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Medina Andresel</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Veronika Siska</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Robert David</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Schlarb</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Axel Weißenfeld</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AIT Austrian Institute of Technology GmbH</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Semantic Web Company GmbH</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The exponential growth of data across various sectors requires robust frameworks for eficient data management and exchange, particularly in the context of training and improving artificial intelligence (AI) models. Data spaces emerge as a solution, facilitating seamless data exchange among organizations while safeguarding data sovereignty. This article explores the landscape of data spaces, emphasizing the role of Semantic Web standards in achieving interoperability and facilitating data sharing. It begins with use cases in crisis management and manufacturing to provide concrete requirements for discussing data space challenges and benefits. The IDS Data Space Architecture is presented, alongside an examination of the relevance of Semantic Web standards for data sharing. Examples of searching using Ontology-based Data Access (ODBA) ofer insights into the potential of Semantic Web technologies to further improve the interoperability within data spaces. Finally, we explore how to setup a data space and publish data to enable OBDA-based search and the process to conduct the search itself.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Semantic Web Technologies</kwd>
        <kwd>Data Spaces</kwd>
        <kwd>Ontology-based Data Access</kwd>
        <kwd>Metadata</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Data is the driving factor for numerous IT systems in practical applications, ranging from
curated enterprise data utilized for informed business decisions to training data necessary for
refining machine learning algorithms and enhancing artificial intelligence (AI) models. Since
various use cases and systems rely on extensive data repositories, the necessity arises to facilitate
data sharing, oftentimes driven by commercial interests, all while preserving data sovereignty.
Hence, there is a need for robust data management and exchange frameworks.</p>
      <p>
        This is where the concept of data spaces [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] ofers a framework for enabling eficient data
exchange between organisations while preserving the data sovereignty for each participating
entity. This article outlines the landscape of data spaces with a focus on how Semantic Web
technologies can be used to achieve interoperability and enable data sharing between partner
organisations and systems. In particular, we propose to enhance data spaces with the
functionalities that the Ontology-based Data Access (OBDA) paradigm [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] ofers. In OBDA, access to
disparate and heterogeneous data sources is mediated by an ontology through the mappings of
raw information to concepts and relations defined in the domain-specific ontology. In OBDA,
access to all data sources is then enabled by means of standard SPARQL [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] querying.
      </p>
      <p>Our main contribution is a conceptual model that integrates and adapts OBDA to the
requirements imposed by the data protection and sharing mechanisms existing in data spaces. The
outline of this paper is as follows. We start by presenting use cases in two key areas: supporting
authorities in crisis situations and manufacturing. This will serve as a basis for discussing the
challenges and benefits associated with data spaces. We then proceed by introducing the IDS
Data Space Architecture and outline the role of Semantic Web standards within this framework.
We continue with our conceptualization of how OBDA can be realized in the data space in
relation to the use cases described before. We then delve into data space setup and explain the
principle of publishing and accessing data within this novel framework. Finally, we highlight
discussion points and challenges and provide our conclusions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Preliminaries</title>
      <p>In this section we present two motivating use-cases, and briefly describe the concept of data
spaces and existing Semantic Web based approaches used for data spaces.</p>
      <sec id="sec-2-1">
        <title>2.1. Use Cases</title>
        <sec id="sec-2-1-1">
          <title>2.1.1. Supporting Authorities in Crisis Situations</title>
          <p>In the course of disruptive political, economic or social crisis, it may be necessary for
authorities (government, local municipalities, etc.) to implement appropriate intervention measures.
However, experience from the recent pandemic also showed that the conditions for optimal,
data-driven decisions are not yet available. One of the main limitations is the lack of suficiently
detailed information on critical goods and services (e.g. food, fuel, medical services) that could
be used for early detection, as well as to define the ideal response strategy. The Dagmar 1 project
develops data-driven tools to be used by authorities in such crisis situations.</p>
          <p>There is already a legal basis for authorities to access such information for crisis prevention
and management. On the European level, the Data Act enables public sector bodies to access
and use data held by the private sector for specific public interest purposes. In Austria, a set
of economic management laws (Lebensmittelbewirtschaftungsgesetz 1997 (LMBG), BGBl. Nr.
789/1996 idF. BGBl. I Nr. 113/2016, Energielenkungsgesetz (EnLG 2012) BGBl. I Nr. 41/2013
idF. BGBl. I Nr. 68/2022 and Versorgungssicherungsgesetz (VerssG 1992) BGBl. Nr. 380/1992
idF. BGBl. I Nr. 94/2016) enable authorities to implement counter-measures for specific critical
supplies (e.g. food, energy) and access the corresponding data. However, the corresponding
technical framework and implementation is missing.</p>
          <p>To enable a data-based crisis management system, we need to make the data available and
searchable, so that various applications, such as dashboards or question-answering systems, can
be developed on top of it. However, most of the data in this scenario is not publicly accessible,
but instead only available for certain actors (e.g. authorities) under certain conditions (e.g. when
a certain alert limit has been reached or during a crisis situation) – both of which may also
change with time. Data spaces provide a basis for sharing data with well-defined policies, while
OBDA provides a way for fast and eficient queries over data sources. However, we also need to
be able to limit access to query results according to the dataset-specific usage right as specified
by the corresponding policies.</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>2.1.2. Manufacturing</title>
          <p>The UNDERPIN2 project develops and deploys a data space for critical manufacturing
sectors, where we explore two application areas, refineries and wind farms, for dynamic asset
management as well as predictive and prescriptive maintenance.</p>
          <p>For refineries, the aim is to improve the maintenance process and decision making to
determine the best timing for preventive maintenance so as to minimize the downtime and the impact
on the production capabilities. The wind farms use case is about optimizing the maintenance
of wind farms, where Wind Turbine Generators (WTG) are deployed. WTG failure can have
various reasons, with gearbox failures being one of the most common ones, and maintenance
tasks as well as downtimes have high associated costs.</p>
          <p>Data sharing along the value chain in such application areas is crucial due to the fragmented
nature of data access, hindering the efective implementation of machine learning (ML) models.
Each manufacturer may use diferent sensors, data formats, and communication protocols,
resulting in fragmented data. With various stakeholders involved, access to all relevant data
becomes a challenge. A holistic approach requires integrating data from multiple stages of the
value chain, which may be managed by diferent stakeholders and systems. Integrating these
diverse data sources in a beneficial way for all stakeholders requires harmonization eforts.</p>
          <p>To address this, OBDA-based data consolidation (see Sec. 2.3.2) emerges as a viable solution.
By employing standardized ontologies, stakeholders can eficiently and in a unified way access
relevant data, streamlining the integration process and facilitating the utilization of ML models
on a larger and more diverse data basis for improved decision-making and operational eficiency
across the value chain. In the concrete use cases around predictive maintenance for wind
turbines and refineries, developers of ML models could search for relevant datasets (e.g. a
particular class of sensor data from the type of machines of interest) based on the ontologies,
describe and integrate data models of machines and use such data for training, increasing the
quality of their predictions.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Data Spaces</title>
        <p>Data spaces provide a distributed architecture for cross-organisational data exchange while
maintaining data sovereignty, that is, where the data owner retains control over the access
and usage of their own data. In this way the data economy is supported twofold: providing
technical and non-technical standards for sovereign data exchange and trust.</p>
        <p>Diferent initiatives address diferent aspects of data space building blocks. Here, we focus
on the technical features, but there are also on-going eforts to clarify the business, legal and</p>
        <p>Virtual Knowledge</p>
        <p>Graph</p>
        <p>Ontology</p>
        <p>Mappings</p>
        <p>Application</p>
        <p>Layer
Virtualization</p>
        <p>Layer
Semantic
Integration</p>
        <p>Layer
Data Layer</p>
        <p>Catalog</p>
        <p>Other services
Metasdeaatarc-bhased
Publish data and
metadata
governance aspects of data spaces. The International Data Spaces Association (IDSA)3 is an
initiative to define a standardized model and architecture for secure and trusted data exchange to
drive the digital economy, as described in the International Data Spaces Reference Architecture
Model (IDS-RAM)4. GAIA-X5 envisions a service architecture built on three pillars: compliance,
federation (multiple actors cooperating based on shared rules) and data exchange. Gaia-X
currently focuses on compliance, established on the basis of a decentralized trust framework,
but also provides specifications for federations and data exchange.</p>
        <p>There are also coordination initiatives integrating diferent frameworks. The Big Data
Value Association (BDVA)6 is an industry-driven research and innovation organisation with a
mission to develop an innovation ecosystem that enables the data-driven and AI-driven digital
transformation of the economy and society in Europe. The Data Spaces Support Centre (DSSC)7,
funded by the European Commission, supports the creation of data spaces with the aim of
enabling data reuse within and across sectors to support the European economy and society. On
the technical level, Simpl8 is an upcoming open source, smart and secure middleware platform
that supports data access and interoperability among European data spaces, also funded by the
European Commission.</p>
        <p>
          Data semantics play an important part in data spaces to provide FAIR principles [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] for
data sharing as Semantic Web standards provide a solid formal basis to define and publish
vocabularies and ontologies. Moreover, shared and open standards for the semantic descriptions
3https://internationaldataspaces.org/
4https://docs.internationaldataspaces.org/ids-ram-4/
5https://gaia-x.eu/
6https://bdva.eu/
7https://dssc.eu/
8https://digital-strategy.ec.europa.eu/en/policies/simpl
of the metadata of data assets, and also for the data itself, support semantic interoperability[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
Such descriptions make data in the data space findable via catalogs, automatically accessible
via software components, interoperable by linking descriptions together and reusable based on
standard descriptions. Furthermore, Semantic Web standards can also be used to describe the
contracts and obligations for data sharing. In particular, the Open Digital Rights Language
(ODRL) [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] provides a common basis to formulate contract requirements (policies), which can
also be processed automatically and thereby provide a technical basis for automatic policy
enforcement.
        </p>
        <sec id="sec-2-2-1">
          <title>2.2.1. IDS Data Space Architecture</title>
          <p>Our data space concept is built on the IDS architecture as depicted in Figure 1b. We chose IDS
because of its widespread use in projects, like UNDERPIN, availability of implementations such
as EDC9 and uptakes in practice e.g. mobility-dataspace10. We briefly describe the components
of the IDS reference architecture model (IDS-RAM). The core building block of an IDS data space
is the Connector. A Connector is a software component, which represents a participant of the data
space, and which can provide and consume data under contractual obligations. In other words,
a Connector is a gateway to secure data sharing in a data space. For security purposes, the
IDSRAM includes a Certificate Authority (CA) and a Dynamic Attributes Provisioning Service (DAPS)
for dynamic access management, which manages dynamic access tokens. While Connectors, CA
and DAPS are mandatory, the following components are optional based on specific needs of a
data space:
• Meta Data Broker acts as a central metadata index-service, where connectors can publish
information about data assets based on FAIR principles.
• Vocabulary Hub provides detailed (semantic) data models, which are used by the Meta</p>
          <p>Data Broker for metadata descriptions.
• App Store provides a platform for secure data apps which run in conjunction with a</p>
          <p>Connector.
• Clearing House provides a central logging service for clearing and billing as well as usage
control.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.2.2. IDS and Semantic Web Standards</title>
          <p>The International Data Spaces Association publishes an information model (IDS-IM) [7] for data
spaces, which is based on Semantic Web standards like RDF and OWL for interoperability, trust
and usage control. Especially on the interoperability side, Semantic Web technologies provide
lfexibility for data modelling and integration of data and metadata alike.</p>
          <p>
            The IDS-RAM includes components to enable interoperability for shared assets on the data
and metadata level. Specifically, the Meta Data Broker acts as a metadata repository for
published assets and enables the adherence to the FAIR principles based on RDF vocabularies. The
Vocabulary Hub hosts (standard) RDF-based vocabularies used in the metadata descriptions
9https://projects.eclipse.org/projects/technology.edc
10https://mobility-dataspace.eu/
of the assets, where the data can be made available using standards like SPARQL [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ] for easy
integration. The vocabularies can also be used to describe the data itself by applying them to
unstructured and structured data. For unstructured data using natural language, principles like
semantic annotations can be applied. Vocabularies based on the Simple Knowledge Organisation
System (SKOS) [8] provide concepts with multilingual labels which can be efectively used for
semantic annotations of textual content. For structured data, there are diferent ways to map
and transform towards an RDF-based representation, such as the RDF Extension for OpenRefine
[9], for example. For relational data, W3C provides two recommendations, which are a direct
mapping to relational data [10] and the mapping language R2RML [11].
          </p>
        </sec>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Semantic Web Technologies for Semantic Interoperability</title>
        <p>We describe below two existing approaches that rely upon Semantic Web technologies for
improving semantic interoperability between datasets.</p>
        <sec id="sec-2-3-1">
          <title>2.3.1. Vocabulary-based Data Access</title>
          <p>When it comes to arbitrary data sources, a lightweight approach to support interoperability
is by using published standard vocabularies to describe the metadata of each dataset. A well
established W3C recommendation, which has seen wide adoption, is the Data Catalog Vocabulary
DCAT [12]. DCAT is designed to facilitate interoperability between published data sets using
a lightweight and generic approach to descriptions. DCAT is based on RDF and thereby can
be easily extended and complemented with other vocabularies, expanding the descriptions
of datasets based on FAIR principles. However, DCAT focuses on metadata only and does
not provide ways to consolidate diferent vocabularies for descriptions, i.e. defining relations
between diferent vocabularies, which can increase the level of interoperability in practice. An
approach to achieve these two requirements for annotating both metadata and data and to
interlink diferent vocabularies are taxonomic crossovers [ 13]. Taxonomic crossovers enable
us to interlink concepts within diferent vocabularies or concept between diferent versions
of the same vocabulary to achieve interoperability. Having established taxonomic crossovers
for interoperability, we can automatically leverage them based on Semantic Web standards
by evaluating them using SPARQL. When looking at the interoperability requirements for
data spaces, where many participants can provide data assets and there is a high need for
interoperability, such a solution is a major improvement for collaboration.</p>
        </sec>
        <sec id="sec-2-3-2">
          <title>2.3.2. Ontology-based Data Access</title>
          <p>
            The ontology-based data access (OBDA) paradigm enables access to a variety of disparate and
heterogeneous data sources by semantically mapping information of each data source to the
concepts and relations defined in a domain-specific ontology [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]. A key notion in OBDA is that
of a mapping which consists of: (i) a source query, in the language of the data format, which
extracts the relevant data values, and (ii) a target declaration which describes how the result of
the query should be interpreted based on the domain ontology. An example of a mapping over
a relational database is as follows:
@prefix sc: &lt;http://www.dataspace-supplychain.com/&gt; .
@prefix owl: &lt;http://www.w3.org/2002/07/owl#&gt; .
source SELECT prodID,country,pname FROM product
          </p>
          <p>INNER JOIN exportban ON prodID
target sc:product/{prodID} a sc:Product ;
sc:name "{pname}" ;
sc:hasExportBan sc:country/{country} .
sc:country/{country} a sc:Exporter/{pname} .</p>
          <p>sc:Exporter/{pname} owl:subclassOf sc:Exporter .</p>
          <p>In this mapping, the source declaration is an SQL query that looks up the product id, name
and country where this product is banned from being exported. The target declaration creates
the following RDFS assertions: class sc:Product is instantiated using product ids, an individual
for each country exporter is created as an instance of a new product exporter class which in
turn is a subclass of sc:Exporter.</p>
          <p>A result is a map of the form (, , ) ↦→ (0142, ℎ, )
yielding the following RDFS assertions (in Turtle syntax):
@prefix sc: &lt;http://www.dataspace-supplychain.com/&gt; .
@prefix owl: &lt;http://www.w3.org/2002/07/owl#&gt;
sc:product/0142 a sc:Product ;
sc:name "Germanium" ;
sc:hasExportBan sc:country/China .
sc:country/China a sc:Exporter/Germanium .
sc:Exporter/Germanium owl:subclassOf sc:Exporter .</p>
          <p>Another important element in OBDA is the virtualization of data, meaning that the actual
transformation of the data into RDF(S) and storage of the knowledge graph is not materialized.
In this approach, the ontology and mappings for each data source, also denoted as OBDA
specification, expose the underlying data as a virtual set of RDF(S) assertions, making it accessible
at query time using SPARQL. This mechanism is realized by transforming each SPARQL query
using the OBDA specification into a set of format-specific queries over each data source, then
aggregating the answers.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Information Access in Data Spaces via OBDA</title>
      <p>In this section we describe how OBDA can be realized within the IDS model to enable instant
access to relevant information using Semantic Web technologies, while covering the
requirements imposed by the data space framework. The envisioned conceptual model of a data space,
that supports eficient search and retrieval of relevant information for a data consumer while
preserving the policies of data access instated by automatic contractual agreements with data
providers, is presented in Figure 2. We consider three required phases: setting up a data space
for data exchange, publishing data in the data space, and finally searching the available data.</p>
      <sec id="sec-3-1">
        <title>3.1. Setup for Building a Data Space</title>
        <p>First we need to provide the basis for data sharing, where participants are willing and able to
provide and consume data, while respecting data sovereignty. The rules and building blocks for</p>
        <p>Query engine</p>
        <p>Ontologies
both business/organisational and technical aspects need to be defined and implemented and
participants need to be on-boarded to the system in accordance with these rules.</p>
        <p>For participants, verifiable descriptions with a set of mandatory attributes could be provided
by the participant and verified as part of the on-boarding process. The data model for such
participant descriptions needs to be stored at a reliable location (e.g. decentralised storage or a
trusted central authority) and made publicly accessible .</p>
        <p>For assets, extendible base models may be defined and managed by a component provided
for all data space participants (semantic hub in EDC). These models may also include links to
the relevant domain specific ontologies.</p>
        <p>Each data space is related to a specific domain, for which an ontology can be developed
or curated from existing vocabularies. The ontology should define all the key concepts and
relations to describe which information is relevant for the majority of the stakeholders in the
data space thus enabling a common understanding of the data that is being exchanged. In the
context of the Dagmar project, most of the stakeholders are involved in designing the ontology.
These eforts will enable a common understanding of the data being exchanged, therefore this
task is done in parallel with the task of designing the data space.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Publishing Data</title>
        <p>In our approach a data provider is advised to publish the metadata and the ontology mappings
alongside the data itself. A main benefit about this new approach is the modularization of
the data, as access can be moved to the level of particular information within the dataset, but
the data provider still has control over the mapping definitions and thus can restrict access to
sensitive information.</p>
        <p>The creation of the mappings can be challenging for non-expert users, therefore several
techniques for automatic generation have been proposed in the literature, such as using machine
learning solutions [14] or editing tools for manual crafting of the mappings [15]. Within the
dataspace framework, data providers should receive technical support and services to create
the mappings, therefore this aspect must be taken into account in designing the dataspace.</p>
        <p>The access and usage policies are established and clearly defined by the data owner. If
some participants should have special data access privileges, their participant descriptions
(credentials) need to include the information to enable granting these; as described by the policy
specific to the data asset. During the querying time, the credentials and policies are then taken
into account when retrieving answers.</p>
        <p>For example, the access levels can be defined as follows:
• Allow: enabled for all users who have valid credentials to access the information.
• Restricted: enabled for some users that have valid credentials and satisfy the
corresponding policies.</p>
        <p>• Disallowed: disallowed for all users.</p>
        <p>On a technical level, the levels can be defined in the form of an ODLR policy, with diferent
rules specified for diferent assignees (recipients of the rule, i.e. the user consuming the data).</p>
        <p>Then the mapping language can include such access schema to the level of the target
declarations. For example, if we want to restrict all information related to export bans, we can update
the previous mapping:
@prefix sc: &lt;http://www.dataspace-supplychain.com/&gt; .
@prefix owl: &lt;http://www.w3.org/2002/07/owl#&gt; .
source SELECT prodID,country,pname FROM product</p>
        <p>INNER JOIN exportban ON prodID
target sc:product/{prodID} a sc:Product ;
sc:name "{pname}" ;
sc:hasExportBan sc:country/{country} . @restricted
sc:country/{country} a sc:Exporter/{pname} . @restricted
sc:Exporter/{pname} owl:subclassOf sc:Exporter . @restricted</p>
        <p>This would generate the following restricted RDF graph to users that have valid credentials
but do not satisfy the requirements described by the dataset’s policy:</p>
        <p>Note that the access level can be always updated in the mappings, without the need to change
anything else in the approach. In the case that the ontology mappings are not provided, then
the standard search and exchange mechanism remain in place.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Searching in the Data Space</title>
        <p>Next, we need to search mappings of available data sources in the data space, but without
revealing any sensitive information (policy-protected OBDA). We first describe two options for
the general search process and then describe possible queries and their evaluation in detail.
Search process We envision two diferent solutions: a one-step constrained search, or a
two-step approach of search on accessible data; both conceptually interoperable with the mutual
contracting method defined in the IDS dataspace protocol.</p>
        <p>In the one-step constrained search, the search itself would be a special case of data access
consisting of the following steps:
1. Consumer formulates search query and sends it to the query engine.
2. Query engine requests results for consumer from each provider.
3. Providers evaluate the query, with constraints applied from the policies combined with
the consumer’s identity.
4. Query engine combines results and returns them to the consumer.</p>
        <p>In the two-step approach of search on accessible data, we propose a preparation phase and a
query phase. In the preparation phase, searchable assets in the data space are assembled for
a given consumer by evaluating the policies of each asset to determine if they are searchable
for that consumer. Then, when the consumer formulates a query, an unconstrained search can
be performed. The query step in this method is simple and quick, since policies are enforced
in the preparation step. However, since the preparation step is costly and only performed on
demand, this is only suitable if (1) consumers are known in advance and (2) the set of available
data assets in the data space is stable. This is the case for our supply chain resilience use case,
but not for our manufacturing example.</p>
        <p>Search queries In either approach, the data consumer can access the data catalog and then use
the standard metadata-based search to find relevant datasets and their providers. Additionally,
the consumer can also pose SPARQL queries based on the ontology such as:
SELECT *
WHERE {
?product a sc:Product ;
sc:name ?name ;
sc:hasExportBan ?country
}</p>
        <p>Based on a query, that encodes the information needs, the data provider can either: (a) Verify
access: Request to verify if the query is allowed, given the participant’s credentials, and if there
are any matches on specific datasets. (b) Get answers: Request to construct answers to the
query, with the option of selecting the datasets of interest.</p>
        <p>Considering the above query and the previous mapping example that restricts the access to
exports bans, the answer to request (a) would be "no" if no special privileges are in place, and
"yes" otherwise. In the case of "yes", the user can use service (b) and gets the following answer:
? ? ?
sc:product/0142 "Germanium" sc:country/China</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Related Work</title>
      <p>
        Auer et al [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] discusses the potential of using Semantic Web technologies for achieving semantic
interoperability in data spaces. Similarly, Theissen-Lipp et al. [16] also point out the potential
use of ontologies to mitigate the access to data within the data space. However, no concrete
conceptual model is proposed and the problem of mapping the data to an ontology in a data
space is not addressed.
      </p>
      <p>In Boukhers et al.[14], the authors propose the use of machine learning techniques for
automatic meta-data extraction and ontology alignment as well as for mappings generation.
Such techniques are still applicable in our framework to ease the creation of the mappings and
to improve searchability of datasets, however our focus is on querying the datasets and how
can OBDA paradigm be used within a data space. In Langer et al.[17], the authors propose the
use of ontologies to mediate the access to the datasets, however without the consideration of
mappings and access restriction.</p>
      <p>Regarding OBDA approaches that can be applied for data spaces, existing approaches that
support access control have been proposed [18, 19], however access rights and control is
modeled in the ontology or the access restrictions are placed upon the properties in the ontology.
Cima et al.[20] introduced the notion of policy-protected OBDA (PPOBDA), where an OBDA
specification (consisting of the domain ontology, schema and mapping) is extended by a set
of policy constraints. The authors describe a method to reduce PPOBDA specifications to
OBDA specifications that keep the same domain ontology and schema, but incorporate policy
constraints into the mapping. They also conduct experiments to show runtimes on a set of
SPARQL queries in this setting. Our approach is related to PPOBDA and their solutions can still
be applied in our conceptual model tailored for data spaces.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion and Conclusions</title>
      <p>In this paper we presented a conceptual model for adapting ontology-based data access paradigm
to enhance searchability and semantic interoperability in data spaces.</p>
      <p>We described two motivating examples for which OBDA functionalities in the data space is
highly beneficial. In our novel conceptualization of a data space, we propose to publish the data
assets alongside metadata and additionally with the mappings to a domain specific ontology that
enable searching the data in the entire data space. Due to the shared conceptualization encoded
in a common ontology, information that comes from multiple sources can be retrieved using the
same query. This mechanism is enabled by mapping each data asset to the data space ontology.
To restrict access to some particular information, we proposed to add access restriction labels as
part of the data asset policy in the mapping declaration. We also described and exemplified the
searching and data access mechanism in our novel data space framework. As outlined below,
we address some of the existing open points in our approach and potential challenges.</p>
      <p>A first observation is that our paper focuses on an architecture based on the IDS RAM and
in particular on the notion of connectors handling all operations on behalf of a participant.
Our concepts are agnostic to the exact specification of the connector, but would have to be
slightly adapted for a connector-less design, such as the blockchain-based system of
PontusX [21]. In such a case, the queries could be initiated directly by the participant, e.g. via a
central management platform, which would send the request, together with the participant’s
credential, and trigger the query. Such systems also normally include a non-connector-based
policy enforcement engine, which could be extended to handle policy enforcement for OBDA
queries.</p>
      <p>A second observation is regarding the ontology creation and maintenance procedure. The
design of the ontology has to be discussed among all relevant stakeholders, however if there
exists some governing entity, then, in principle, it can take the responsibility to design and
maintain the ontology.</p>
      <p>A third observation is about feasibility in practice, namely checking access control for each
query which can be problematic for the query engine system. However due to the static nature
of the credentials of each consumer to each dataset and the fact that the mappings are not
frequently updated, the mappings-based access credentials can be computed in advance and
eficiently stored and used at query time (see two-step approach in subsection 3.3).</p>
      <p>Last but not least, ontology reasoning has to be taken into account when accessing and
computing the answers to queries. For instance, if a property has restricted access in a mapping
and in a query a sub-property is being used, then the query should not have access to the data.
For this challenge query evaluation techniques such as the one proposed in [20] can be used.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was funded by the Austrian security research program KIRAS of the Federal Ministry
of Finance (BMF) through the DAGMAR project (grant No. 52224305), Austrian Research
Promotion Agency (FFG) under grant No. FO999913202 UNDERPIN as well as by the European
Commission under contract No. 101123179 UNDERPIN.
[7] C. Lange, J. Langkau, S. Bader, The ids information model: a semantic vocabulary for
sovereign data exchange, Designing data spaces (2022) 111.
[8] A. Miles, S. Bechhofer, SKOS Simple Knowledge Organization System Reference, Working</p>
      <p>Draft, W3C, 2008. URL: http://www.w3.org/TR/skos-reference.</p>
      <p>[9] R. Verborgh, M. De Wilde, Using openrefine, Packt Publishing Ltd, 2013.
[10] M. Arenas, A. Bertails, E. Prud’hommeaux, J. Sequeda, et al., A direct mapping of relational
data to rdf, W3C recommendation 27 (2012) 1–11.
[11] S. Das, R2rml: Rdb to rdf mapping language, http://www. w3. org/TR/r2rml/ (2011).
[12] R. Albertoni, D. Browning, S. Cox, A. N. Gonzalez-Beltran, A. Perego, P. Winstanley, The
w3c data catalog vocabulary, version 2: Rationale, design principles, and uptake, 2023.
arXiv:2303.08883.
[13] A. Ahmeti, J.-K. Schakel, R. David, A. Revenko, Towards preserving biodiversity using
nature first knowledge graph with crossovers (2023).
[14] Z. Boukhers, C. Lange, O. Beyan, Enhancing data space semantic interoperability through
machine learning: a visionary perspective, in: Companion Proceedings of the ACM
Web Conference 2023, WWW ’23 Companion, Association for Computing Machinery,
New York, NY, USA, 2023, p. 1462–1467. URL: https://doi.org/10.1145/3543873.3587658.
doi:10.1145/3543873.3587658.
[15] A. Paulus, A. Pomp, T. Meisen, The plasma framework: Laying the path to
domainspecific semantics in dataspaces, in: Companion Proceedings of the ACM Web Conference
2023, WWW ’23 Companion, Association for Computing Machinery, New York, NY, USA,
2023, p. 1474–1479. URL: https://doi.org/10.1145/3543873.3587662. doi:10.1145/3543873.
3587662.
[16] J. Theissen-Lipp, M. Kocher, C. Lange, S. Decker, A. Paulus, A. Pomp, E. Curry, Semantics
in dataspaces: Origin and future directions, in: Companion Proceedings of the ACM
Web Conference 2023, WWW ’23 Companion, Association for Computing Machinery,
New York, NY, USA, 2023, p. 1504–1507. URL: https://doi.org/10.1145/3543873.3587689.
doi:10.1145/3543873.3587689.
[17] T. Langer, A. Pomp, T. Meisen, Towards a data space for interoperability of analytic
provenance, in: Companion Proceedings of the ACM Web Conference 2023, WWW ’23
Companion, Association for Computing Machinery, New York, NY, USA, 2023, p. 1502–1503.</p>
      <p>URL: https://doi.org/10.1145/3543873.3587686. doi:10.1145/3543873.3587686.
[18] C. Choi, J. Choi, P. Kim, Ontology-based access control model for security policy reasoning
in cloud computing, J. Supercomput. 67 (2014) 711–722.
[19] C. Brewster, B. Nouwt, S. Raaijmakers, J. Verhoosel, Ontology-based access control for</p>
      <p>FAIR data, Data Intell. 2 (2020) 66–77.
[20] G. Cima, D. Lembo, L. Marconi, R. Rosati, D. F. Savo, Controlled query evaluation in
ontology-based data access, in: ISWC (1), volume 12506 of Lecture Notes in Computer
Science, Springer, 2020, pp. 128–146.
[21] deltaDAO AG., Pontus-X Documentation, 2024. URL: https://docs.pontus-x.eu/.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>B.</given-names>
            <surname>Otto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hompel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wrobel</surname>
          </string-name>
          , Designing Data Spaces: The Ecosystem Approach to Competitive Advantage, Springer International Publishing,
          <year>2022</year>
          . URL: https://books.google. at/books?id=gfbWzgEACAAJ.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Poggi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. D.</given-names>
            <surname>Giacomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <article-title>Linking data to ontologies</article-title>
          ,
          <source>J. Data Semant</source>
          .
          <volume>10</volume>
          (
          <year>2008</year>
          )
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          . URL: https://api.semanticscholar.org/ CorpusID:1325494.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux, S. Harris, A. Seaborne, SPARQL 1.1 Query Language</article-title>
          ,
          <source>Technical Report, W3C</source>
          ,
          <year>2013</year>
          . URL: http://www.w3.org/TR/sparql11-query.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Wilkinson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dumontier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. J.</given-names>
            <surname>Aalbersberg</surname>
          </string-name>
          , G. Appleton,
          <string-name>
            <given-names>M.</given-names>
            <surname>Axton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Blomberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-W.</given-names>
            <surname>Boiten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. B. da Silva</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. E.</given-names>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The fair guiding principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          )
          <fpage>1</fpage>
          -
          <lpage>9</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <article-title>Semantic integration and interoperability</article-title>
          ,
          <source>in: Designing Data Spaces</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>195</fpage>
          -
          <lpage>210</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Ianella</surname>
          </string-name>
          ,
          <article-title>Open digital rights language (odrl), Open Content Licensing: Cultivating the Creative Commons (</article-title>
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>