<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>P. Moosmann);</journal-title>
      </journal-title-group>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Enhanced and Scalable RDF Validation Techniques for Dataspaces</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paul Moosmann</string-name>
          <email>paul.moosmann@fit.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Johannes Theissen-Lipp</string-name>
          <email>theissen-lipp@dbis.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christoph Lange</string-name>
          <email>christoph.lange-bever@fit.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Workshop</string-name>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Dataspaces, Scalable Data Validation, SHACL, RDF Inference</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Applied Information Technology FIT</institution>
          ,
          <addr-line>Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RWTH Aachen University</institution>
          ,
          <addr-line>Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1949</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>With the exponential growth of data on the Internet, the need to control data, especially after it has been shared, has become increasingly important. Dataspaces promise data sovereignty as a solution to this challenge, providing a decentralized environment where participants can share data while retaining control over its use. In such environments, where descriptive vocabularies and resource descriptions are decentralized and exchanged among participants, ensuring accurate and correct information becomes essential for integrated data use in mature software solutions. Validation principles play a crucial role in ensuring that the diverse information exchanged conforms to specified standards.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction and Motivation</title>
      <p>
        In the last years, the topic of dataspaces has become increasingly relevant, leading to multiple large-scale
cross-domain dataspace initiatives being launched in Europe [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The most prominent examples include
the International Data Spaces [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] and Gaia-X [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ]. Concrete, domain-specific implementations
of these initiatives include examples such as the Mobility Data Space [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for the International Data
Spaces or the automotive network Catena-X [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for Gaia-X. Dataspaces ofer approaches to exchange
heterogeneous data from heterogeneous sources by supporting multiple data models and providing
mechanisms to query and analyze the data, thereby discovering relationships amongst the data. However,
an integrated use of data (e.g., in mature software solutions) demands precise and correct data. This
requires validation principles to ensure that the heterogeneous information exchanged in dataspaces
conforms to certain specifications. To achieve this, dataspaces use Semantic Web technologies such
as SHACL [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The correct implementation of so-called validation shapes, which define the expected
form of data, can be a non-trivial task for dataspace users who are not Semantic Web experts. This is
especially true when validating against more complex constraints that are not natively supported by
validation languages, and require the use of more advanced features, such as SHACL-SPARQL [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
Systems (SEMANTICS), Amsterdam, Sept. 17-19, 2024
∗Corresponding author.
      </p>
      <p>CEUR</p>
      <p>ceur-ws.org</p>
      <p>In this paper, we address the requirements for an easy-to-use validation process and develop validation
approaches to facilitate common information modeling in the decentralized environment of dataspaces.
In this environment, two diferent sorts of data exist, that can be validated. There is (1) content data,
meaning the actual payload data that is being exchanged and (2) metadata, which is data providing
additional information about the data itself, e.g., who provided the data or what standards the data
conforms to. The validation approach we suggest in this paper can be applied to both payload data and
metadata, though we focused on metadata in our evaluation. We propose sound but convenient methods
to facilitate validation in dataspaces. Sound in this context refers to precise and scalable solutions,
including state-of-the-art solutions using advanced techniques such as knowledge inference to ensure
the best possible results, while convenient refers to ease of use and inclusion of a wide range of users
of a data space, including non-experts. We achieve this by simplifying the use of SHACL-SPARQL
constraint shapes and by suggesting an inference step to pre-process the shapes. Therefore, this paper
contributes:
• Shortcuts for Commonly Used Constraints: We provided shortcuts in the form of
templatebased properties for commonly used constraints that are not yet natively supported by validation
languages, simplifying and unifying their application for users.
• A Scaling Inference Solution for Validation: Our modification of the current common practice
of RDF validation with SHACL applies inference only once on the shape graph instead of on each
instance.</p>
      <p>This remainder of this paper first studies the state of the art in validation languages and techniques,
and then presents our approach, results, and evaluation addressing the stated requirements.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <p>The state of the art in the area of our work includes validation languages (cf. section 2.1) that allow
users such as dataspace participants to design and evaluate constraints to ensure the conformance of
resources in dataspaces. These resources, ranging from data to services to participant descriptions, can
be validated against arbitrary specifications, including local agreements or global standards. The state
of the art in validation solutions and techniques (cf. section 2.2) includes software applications as well
as common practices for their application.</p>
      <sec id="sec-2-1">
        <title>2.1. Validation Languages</title>
        <p>
          The validation of RDF [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] data is an essential step of the linked data lifecycle to ensure high data quality.
There are several languages available, aiming to enable extensive validation possibilities. To get an
overview of the most relevant languages currently available for RDF validation, Dominik Tomaszuk
conducted a survey “perform[ing] an overview and comparison of current options for RDF validation”
in 2017 [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. In this survey, he identified five diferent validation tools, ShEx [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], SHACL [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], ReSH [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ],
DSP [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], and SPIN [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], as the most important validation languages. He compared these languages, as
well as the built-in validation capabilities of OWL [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], regarding their expressiveness of 17 commonly
used constraints. Since SHACL was strongly influenced by SPIN and can be regarded as its legitimate
successor [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], we will not evaluate both of these languages, but solely SHACL. Table 1 presents the
results of our evaluation.
        </p>
        <p>
          The metrics for evaluating the available RDF validation languages for our work are as follows.
Expressiveness is ranked by how many of the 17 commonly used constraints [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] can be expressed,
resulting in a score of “not very expressive” (–), “expressive” (+), and “very expressive” (++). In addition,
we compare the validation languages in terms of their standardization, use in existing dataspace
initiatives, and existing inference support. These are simple yes/no measures.
        </p>
        <p>
          SHACL is a constraint language for RDF. It can express 16 out of 17 commonly used constraints. It is a
W3C recommendation and is applied in dataspace initiatives such as Gaia-X and the International Data
Spaces. Existing SHACL validators ofer support for RDFS inference. ShEx is a language for describing
RDF graph structures that can prescribe conditions that RDF data graphs must meet to be considered
conformant. It can express 16 out of 17 commonly used constrains. Any inference must be done on the
RDF graph separately, the ShEx processor itself does not interact with any inference mechanism [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ].
ReSH is a high-level RDF vocabulary for specifying the shape of RDF resources. It is only able to express
9 out of 17 commonly used constrains. There is no additional inference support, besides the general
possibility to perform inference on the RDF graph directly. DSP is used to define structural constraints
on data. It is only able to express 10 out of 17 commonly used constraints. There is no additional
inference support, besides the general possibility to perform inference on the RDF graph directly. OWL
is a language to describe RDF graph structures used in dataspace initiatives such as Gaia-X and the
International Data Spaces. It is part of the W3C’s Semantic Web technology stack. While it has built-in
validation capabilities, it can only express 13 out of 17 commonly used constraints. It is possible to
apply inference mechanisms on graphs described using OWL.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Solutions and Techniques for Validation</title>
        <p>
          Since we have identified SHACL as the most appropriate validation language for our requirements, we
will focus in this section on solutions that implement SHACL. Apache Jena SHACL [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] implements
both SHACL Core and SHACL SPARQL constraints and provides a reader and writer for a compact
SHACL syntax. The TopBraid SHACL API [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] is an open-source implementation of SHACL based
on Apache Jena, which performs SHACL constraint checking and rule inferencing. There also exists
a data governance tool using this API, the TopBraid EDG [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. A GitHub issue opened in February
2021 [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ] shows that the topic of inference for RDF validation can be further improved. The issue
raises the question of subclass inference by SHACL. The answers suggest that today the only way to
perform this kind of inference is to import the whole ontology (or ontologies) into the data graph, which
does not always seem to be a high-performance solution. Additionally, there is no easily accessible
documentation available that shows what kind of inference is done.
        </p>
        <p>
          There are web-based validators available for simple validation tasks. E.g., the SHACL Playground [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ]
is implemented by TopQuadrant, who are also responsible for the TopBraid EDG mentioned above.
There exists an updated implementation of the SHACL Playground that can be found under the name
of Zazuko SHACL Playground [
          <xref ref-type="bibr" rid="ref23">23</xref>
          ]. Another web-based validator is provided as part of the European
Commission’s DG DIGIT Interoperability Test Bed [
          <xref ref-type="bibr" rid="ref24">24</xref>
          ]. The interface ofers a simple possibility to
upload a shape and data graph to conduct the validation. Apart from the data validator, they also ofer
a shape validator [
          <xref ref-type="bibr" rid="ref25">25</xref>
          ], which checks if the provided SHACL shape confirms to certain rules, namely the
core W3C syntax rules, the extended W3C syntax rules, and the extended W3C syntax rules and best
practices. These web-based validators have in common that they were developed for simple validation
tasks and do not support more advanced tasks. For example, it is not possible to import ontologies
into the data graph for inference or use advanced features such as SHACL-SPARQL for the SHACL
Playground.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Approach</title>
      <p>Our approach to enhancing validation techniques for dataspaces consists of two steps: The introduction
of shortcut templates (cf. section 3.1) and the optimization by inference (cf. section 3.2). Our approach
also suggests a combination of these two steps into one solution (cf. section 3.3).</p>
      <sec id="sec-3-1">
        <title>3.1. Introducing Shortcut Templates for Constraints</title>
        <p>
          We begin by addressing the challenge of dealing with commonly used constraints that are not natively
supported by validation languages (see figure 1 ). The current state-of-the-art approach for SHACL
is to manually formulate SHACL-SPARQL constraints, which can be complex and ambiguous (since
one constraint can be modelled using varying SHACL-SPARQL constructs). To simplify and unify
this process, we propose the introduction of shortcuts in the form of new template-based properties.
Our proposed methodology is shown in figure 2 . The approach involves (1) analyzing the feasibility
of common yet unsupported constraints, which were presented by Hartmann in [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ], (2) designing
templates for them, (3) aggregating them for easy use, and (4) implementing a demonstrator for them.
(a) Common approach (baseline)
(b) Our approach
        </p>
        <p>These shortcut templates serve as intuitive bridges between unsupported constraints and native
SHACL validators, significantly reducing the cognitive overhead for users. By encapsulating complex
constraints into reusable templates, we enable users to express validation rules more concisely and
efectively. In addition, the adoption of standardized templates improves interoperability and promotes
best practices across diferent validation scenarios.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Validation Enhancement through Inference Optimization</title>
        <p>In the second part of our approach, we address the ineficiencies inherent in current RDF validation
processes (see figure 3 ). Applying inference as part of the RDF validation with SHACL usually involves
appending the entire ontology file needed for inference to each data instance. This significantly increases
the size of each instance to be validated. Our proposed solution modifies this practice by applying the
inference only once to the shapes graph. This optimization results in a shapes graph of slightly higher
complexity but eliminates the need for expensive inference on each data graph during validation. With
this approach, we aim for a more scalable solution for recurrent validation against the same shapes
graph.</p>
        <p>Data Graph
(Instance)
Ontology
Shapes
Graph</p>
        <p>Data Graph</p>
        <p>(with
appended
ontology)</p>
        <p>Data Graph
(Instance)
Ontology
Shapes</p>
        <p>Graph
SHACL
Validator
SHACL</p>
        <p>Validator
Extended
Shapes
Graph
(a) Validation without prior inference (baseline)
(b) Validation with prior inference (our approach)</p>
        <p>Our approach not only reduces the computational overhead but also increases the agility of the
validation process. By decoupling the inference step from individual data instances, we aim to create a more
streamlined and resource-eficient workflow. This optimization could prove particularly advantageous
for large-scale dataspaces, since they require frequently performed validation tasks.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Combining the Solutions</title>
        <p>Combining these two solutions into one is the final step of our approach (see figure 4 ). The input data is
pre-processed by our proposed scaling inference solution, which optimizes the shapes graph for
subsequent validation. Subsequently, our constraint template mapping solution outputs native SHACL shape
graphs using SHACL-SPARQL. This ensures compatibility with any native SHACL validator supporting
SHACL-SPARQL, therefore facilitating seamless validation of data against predefined constraints.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Implementation</title>
      <p>
        In this section, we detail the implementation of our previously described approach. Our work enhances
the validation process of existing validation tools, such as the command-line tool pySHACL [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]
by implementing easy-to-use commands for complex constraints and performing an inference step
pre-validation. Since this is a two-step approach, we present our results for each step separately.
      </p>
      <sec id="sec-4-1">
        <title>Ontology</title>
      </sec>
      <sec id="sec-4-2">
        <title>Graph</title>
      </sec>
      <sec id="sec-4-3">
        <title>Shapes</title>
      </sec>
      <sec id="sec-4-4">
        <title>Graph</title>
        <p>l
o
o
T
e
c
n
e
r
e
f
n
I</p>
      </sec>
      <sec id="sec-4-5">
        <title>Shapes</title>
        <p>Graph
including
inferred
information
l
o
o
T
g
n
i
p
p
a
M
t
n
i
a
r
t
s
n
o
C</p>
      </sec>
      <sec id="sec-4-6">
        <title>Output</title>
      </sec>
      <sec id="sec-4-7">
        <title>Shapes</title>
      </sec>
      <sec id="sec-4-8">
        <title>Graph</title>
        <p>including
inferred
information
and
SHACL</p>
      </sec>
      <sec id="sec-4-9">
        <title>SPARQL constructs</title>
        <sec id="sec-4-9-1">
          <title>4.1. Resulting Shortcut Templates for Constraints</title>
          <p>
            The results of this work include 16 constraint templates implementing commonly used constraints [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]
using SHACL-SPARQL. This way, we can utilize the built-in SPARQL support of the SHACL validation
language. Using these templates, a potential user wanting to validate a data instance does not need to
implement a complex SHACL-SPARQL solution. Instead, the user only needs to use a simple command
that is replaced by the corresponding SHACL-SPARQL template during the mapping process. Listing 1
shows one of the created constraint templates. It implements a constraint on language tags for a data
property.
1 :CountryShape a sh:Shape ;
2 sh:scopeClass :Country ;
3 sh:constraint [
4 sh:message "Values of 'germanLabel' must hava a German language tag." ;
5 sh:sparql """
6 SELECT $this ($this AS ?subject) (:germanLabel AS ?predicate)
7 (?value AS ?object)
8 WHERE { $this :germanLabel ?value .
9 FILTER(isLiteral(?value) ||
10 !langMatches(lang(?value), "de"))}""";].
11
12 :ValidCountry a :Country ; :germanLabel "Deutschland"@de .
13 :InvalidCountry a :Country ; :germanLabel "Germany"@en .
          </p>
          <p>Listing 1: Example SHACL-SPARQL template to validate that a certain language tag (German) is
present for a data property. A valid example instance (line 12) has a German language
tag (@de) and an invalid instance (line 13) has only an English language tag (@en).</p>
          <p>Figure 5a shows an example of a possible input graph for the constraint mapping tool. This SHACL
shapes graph defines a constraint on the cardinality of English language tags on the rdfs:label
property by using one of the custom commands defined by this work. The constraint mapping tool then
maps this command onto the corresponding SHACL-SPARQL construct as defined in the constraint
template designed by us. Figure 5b shows the result of this mapping process. To achieve the mapping
from Figure 5a to Figure 5b, a simple command-line command with two input parameters is suficient.
The first is the path to the SHACL shapes graph containing our custom sst commands. The second is
the output location for the created SHACL shapes graph containing the SHACL-SPARQL templates. A
possible command line execution is:</p>
          <p>python Mapping.py pathTo/input.shacl.ttl pathTo/output.shacl.ttl
This mapping is implemented using a graph-based approach. The graph is searched for triples using
one of our custom commands as an edge between two nodes. When finding such a triple, it is replaced
by a new triple using the sh:sparql command as predicate and the corresponding SHACL-SPARQL
template as object. Hereby, the input file is imported as a graph data structure using the Python package
RDFlib1.
1 @prefix gax: &lt;https://registry.lab.gaia-x.eu/development/api/trusted-shape-registry/v1/shapes/
jsonld/trustframework#&gt; .
2 @prefix sh: &lt;http://www.w3.org/ns/shacl&gt; .
3
4 sh:ProviderShape a sh:NodeShape ;
5 sh:sparql [ sh:message "Values of the constraint 'language tag cardinality min' [...]" ;
6 sh:select """
7 SELECT $this
8 WHERE {
9 SELECT (COUNT(?value) as ?count)
10 WHERE {
11 { $this rdfs:label ?value } UNION { $this rdfs:comment ?value }
12 FILTER(isLiteral(?value) &amp;&amp; langMatches(lang(?value), 'en'))
13 }
14 GROUP BY $this
15 }
16 HAVING (SUM(?count) &lt; 1)
17 """ ] ;
18 sh:targetClass gax:Provider .</p>
          <p>(b) Output</p>
        </sec>
        <sec id="sec-4-9-2">
          <title>4.2. Resulting Inference Optimization</title>
          <p>Our solution for performing the inference as a preprocessing step is implemented as a tool similar
to the mapping tool above. Again, use a graph-based approach consisting of two tasks. The first
task is searching the ontology graph for all the sub-class and sub-property relations. The second task
is searching the shapes graph for occurrences of parent classes and properties and adding further
connections to cover all related sub-classes and sub-properties. For the second task, the tool identifies
specific connection types in the shape graph. Figure 6 shows a minimal example of what the input
and output graphs of the inference tool look like when performing a simple sub-class inference of the
sh:targetClass. Analogous to the realization of the constraint templates, our transitive inference tool
is realized by a command-line tool implemented in Python 3. More detailed, we use the Python library
RDFlib, which enables an implementation of our graph-based approach. To perform the inference step,
a simple command-line command with three input parameters is suficient. The first is the path to the
ontology graph containing additional information, the second is the path to the SHACL shapes graph,
and the third is the output location for the extended SHACL shapes graph. A possible command line
execution is:</p>
          <p>python mapping.py ontology.ttl shape.ttl newShape.ttl
1 @prefix ex: &lt;http://example.org/&gt; .
2 @prefix sh: &lt;http://www.w3.org/ns/shacl#&gt; .
3
4 ex:PersonShape
5 a sh:NodeShape ;
6 sh:targetClass ex:Person ;
7 sh:property [
8 sh:path ex:hasName ;
9 sh:minCount 1 ;
10 ] .
1 @prefix ex: &lt;http://example.org/&gt; .
2 @prefix sh: &lt;http://www.w3.org/ns/shacl#&gt; .
3
4 ex:PersonShape
5 a sh:NodeShape ;
6 sh:targetClass ex:Person, ex:Student ;
7 sh:property [
8 sh:path ex:hasName ;
9 sh:minCount 1 ;
10 ] .</p>
          <p>(a) Input
(b) Output</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <p>To assess the efectiveness and scalability of our proposed enhanced validation techniques, we conducted
an evaluation using example instances from the Gaia-X framework for dataspaces. We decided to do
this, to show the feasibility of our approach in real-world application scenarios. These instances, as
shown in figure 7 , represent typical data providers within the Gaia-X ecosystem, showing properties
with data attributes and links to related objects. Understanding the details of these instances is relevant
for evaluating the performance of our validation techniques. Note that the names and data of these
providers have been anonymized for this paper.</p>
      <sec id="sec-5-1">
        <title>5.1. Evaluation of the Impact of Shortcut Templates for Constraints</title>
        <p>To evaluate the impact of the constraint template enhancement on the time required for the validation
process and the error proneness of this process, we conducted a hands-on experiment that simulated a
real-world work situation. For this purpose, we asked a Semantic Web expert to implement a defined
set of constraints in SHACL once without the constraint mapping tool and once with the constraint
mapping tool. The developer’s skills included working with SHACL on a weekly basis and having basic
knowledge of SPARQL.</p>
        <p>We structured the experiment in the following way. Given was a data graph describing the four
diferent provider instances shown in figure 7 . The data graph was supposed to satisfy the following
four constraints: (C1) each provider trusts itself, (C2) the legally binding name of a provider must
be unique, (C3) the property ex:hasProjectPartner is symmetric, and (C4) at least one label with
an English language tag must be defined. An example data graph intentionally contained one error
for each of these constraints, meaning a validation with a correct shapes graph should lead to four
validation errors. Given the data graph and the constraint requirements, the developer was tasked
with creating a SHACL shapes graph without using the constraint mapping tool. After 30 minutes,
a validation of the data graph was conducted. For the second part of the experiment, the developer
used the constraint mapping tool in the form of a Python command line tool. In addition, we provided
written documentation on how to use the tool. Again, the allotted time was 30 minutes. The results of
the experiment are summarized in Table 2 and show that our approach led to improvements in both the
time needed to build the shapes graph and the correctness of the validation process.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Evaluation of the Inference Optimization</title>
        <p>In the second part of our evaluation, the inference optimization, we focused on measuring the execution
times of our proposed validation techniques under diferent scenarios. Table 3 summarizes the results of
several runs of our experiment, comparing the baseline common approach with our proposed method.
Notably, our approach achieves a significant reduction in execution times for individual data graphs,
demonstrating its eficiency and scalability. However, the execution times for a merged data graph
remain relatively unchanged between the baseline and our approach. Figure 8 visualizes the average
Without the constraint
mapping tool</p>
        <p>With the constraint
mapping tool
Implementation of C1 incorrect implementation</p>
        <p>successfully implemented
Implementation of C2
Implementation of C3
not implemented
not implemented
successfully implemented
successfully implemented
Implementation of C4 incorrect implementation</p>
        <p>successfully implemented
Time needed</p>
        <p>Experiment stopped
after 30 minutes</p>
        <p>Done after 15 minutes
execution times of all runs for a comprehensive analysis. Table 3 shows a graphic representation of
the average values of our diferent experiment executions. These results validate the practicality and
scalability of our proposed validation approaches for application in real-world scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>In conclusion, we successfully addressed the rising data validation requirements by proposing a scaling
inference solution and a constraint template mapping solution. This facilitates common information
design in decentralized environments such as dataspaces by making validation techniques more scalable
and usable. An evaluation in a real-world scenario with anonymized participants from the Gaia-X
dataspaces demonstrated feasibility, scalability, and practical impact. The contributions of this paper
include:
• A Scaling Solution for Validation: Our modification of the current common practice of RDF
validation with SHACL applies inference only once on the shapes graph instead of on each
instance. The shapes graph becomes slightly more complex, but expensive inference on each data
graph is avoided. This contributes to the requirements by making recurrent validation against
the same shapes graph more scalable.</p>
      <p>(a) Validating individual data graphs
(b) Validating merged data graph</p>
      <p>• Shortcuts for Commonly Used Constraints: We provided shortcuts in the form of
templatebased properties for commonly used constraints that are not yet natively supported by validation
languages, simplifying and unifying their application for users. We have introduced
SHACLSPARQL templates, which are used by our proposed mapping tool to enable native support by
any SHACL validator. This contributes to the requirements by allowing a wide range of dataspace
users, including non-experts, to formulate even complex constraints for validation.
• Combined Approach: By combining inference optimization with the introduction of constraint
templates, our two-step approach provides a holistic solution for improving dataspace validation
techniques. We not only address eficiency concerns but also prioritize usability and
interoperability, thereby advancing the state of the art in RDF validation methodologies. This approach lays
the foundation for scalable and robust validation processes in the context of evolving dataspaces
and Semantic Web applications.</p>
      <p>The contributions provide reliable guarantees for an integrated use data or services in applications
on the domain layer of dataspaces, such as APIs. The proposed scaling but convenient solutions
for validation in dataspaces allow the diverse participants of dataspaces, including non-experts, to
use shortcuts for expressing common validation constraints and to validate these much faster with
standard validators. The proposed enhanced validation techniques improve interoperability by providing
guarantees for all kinds of interfaces between participants or services in dataspaces, which contributes
to a better common understanding in dataspaces.</p>
      <p>Future work can expand the pool of constraint templates with corresponding
templates for property-based constraints. Concerning the inference approach introduced
by this work, our tool currently only applies transitive inference for the properties
rdfs:subClassOf and rdfs:subPropertyOf. Future work can extend the considered targets
of the inference process to achieve even better validation results. It can also include developing a user
interface for our command-line tools to make them even more accessible.</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgments</title>
      <p>Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s
Excellence Strategy – EXC-2023 Internet of Production – 390621612. Funded by the FAIR Data Spaces
project of the German Federal Ministry of Education and Research (BMBF) under the grant number
FAIRDS05. We thank Jens Lehmann for supervising the master thesis underlying this research.</p>
    </sec>
    <sec id="sec-8">
      <title>A. Data availability statement</title>
      <p>The tools developed as part of this work can be accessed via https://github.com/moosmannp/
Enhancing-SHACL-Validation-Through-Constraint-Templates-and-Inference.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Theissen-Lipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Paulus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pomp</surname>
          </string-name>
          , E. Curry,
          <article-title>Semantics in dataspaces: Origin and future directions</article-title>
          ,
          <source>in: Companion Proceedings of the ACM Web Conference</source>
          <year>2023</year>
          , WWW '23 Companion, Association for Computing Machinery, New York, NY, USA,
          <year>2023</year>
          , p.
          <fpage>1504</fpage>
          -
          <lpage>1507</lpage>
          . URL: https://doi.org/10.1145/3543873.3587689. doi:
          <volume>10</volume>
          .1145/3543873.3587689.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>International</given-names>
            <surname>Data</surname>
          </string-name>
          <string-name>
            <surname>Spaces e. V.</surname>
          </string-name>
          , International Data Spaces, https://www.internationaldataspaces. org/,
          <year>2016</year>
          .
          <source>[Accessed</source>
          <volume>29</volume>
          .
          <fpage>04</fpage>
          .
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Pullmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Mader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tramp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Müller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Akyürek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Böckmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Imbusch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Theissen-Lipp</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Geisler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lange</surname>
          </string-name>
          ,
          <article-title>The International Data Spaces Information Model - An Ontology for Sovereign Exchange of Digital Content</article-title>
          ,
          <source>in: Proceedings of the 19th International Semantic Web Conference</source>
          , Springer International Publishing, Cham,
          <year>2020</year>
          , pp.
          <fpage>176</fpage>
          -
          <lpage>192</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -62466-8_
          <fpage>12</fpage>
          . doi:
          <volume>10</volume>
          .1007/978- 3-
          <fpage>030</fpage>
          - 62466- 8_
          <fpage>12</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Gaia-X European</surname>
          </string-name>
          <article-title>Association for Data and Cloud AISBL</article-title>
          ,
          <string-name>
            <surname>Gaia-X: Vision</surname>
          </string-name>
          &amp; Mission, https://gaia-x.
          <article-title>eu/what-is-gaia-x/vision-</article-title>
          and-mission/,
          <year>2020</year>
          .
          <source>[Accessed</source>
          <volume>08</volume>
          .
          <fpage>08</fpage>
          .
          <year>2023</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <issue>Plattform Industrie 4</issue>
          .0,
          <string-name>
            <surname>Project</surname>
            <given-names>GAIA</given-names>
          </string-name>
          -X:
          <article-title>A Federated Data Infrastructure as the Cradle of a Vibrant European Ecosystem</article-title>
          ,
          <source>Technical Report</source>
          ,
          <article-title>Federal Ministry for Economic Afairs</article-title>
          and
          <string-name>
            <surname>Energy (BMWi)</surname>
          </string-name>
          , D-11019 Berlin, Germany,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>DRM</given-names>
            <surname>Datenraum Mobilität</surname>
          </string-name>
          <string-name>
            <surname>GmbH</surname>
          </string-name>
          , Mobility Data Space, https://mobility-dataspace.eu/,
          <year>2021</year>
          .
          <source>[Accessed</source>
          <volume>09</volume>
          .
          <fpage>05</fpage>
          .
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Catena-X Automotive Network</surname>
            <given-names>e.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Catena-X Automotive Network</surname>
          </string-name>
          , https://catena-x.net/en/,
          <year>2021</year>
          .
          <source>[Accessed</source>
          <volume>09</volume>
          .
          <fpage>05</fpage>
          .
          <year>2024</year>
          ].
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>H.</given-names>
            <surname>Knublauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <article-title>Shapes constraint language (SHACL), W3C Recommendation</article-title>
          , W3C Working Group,
          <year>2017</year>
          . Retrieved from https://www.w3.org/TR/shacl/,
          <source>on 23.04</source>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Raimond</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Manola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <surname>B. McBride</surname>
          </string-name>
          , RDF
          <volume>1</volume>
          .1 Primer, W3C Recommendation, W3C Working Group,
          <year>2014</year>
          . Retrieved from https://www.w3.org/TR/rdf11-primer/,
          <source>on 23.04</source>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Tomaszuk</surname>
          </string-name>
          ,
          <article-title>Rdf validation: a brief survey</article-title>
          , in: International Conference: Beyond Databases,
          <source>Architectures and Structures</source>
          , Springer,
          <year>2017</year>
          , pp.
          <fpage>344</fpage>
          -
          <lpage>355</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E.</given-names>
            <surname>Prud</surname>
          </string-name>
          <article-title>'hommeaux</article-title>
          ,
          <string-name>
            <given-names>J. E. Labra</given-names>
            <surname>Gayo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Solbrig</surname>
          </string-name>
          ,
          <article-title>Shape expressions: an RDF validation and transformation language</article-title>
          ,
          <source>in: SEMANTICS</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>32</fpage>
          -
          <lpage>40</lpage>
          . doi:
          <volume>10</volume>
          .1145/2660517.2660523.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ryman</surname>
          </string-name>
          ,
          <source>Resource Shape 2</source>
          .0,
          <string-name>
            <given-names>W3C</given-names>
            <surname>Member</surname>
          </string-name>
          <string-name>
            <surname>Submission</surname>
          </string-name>
          , W3C Working Group,
          <year>2014</year>
          . Retrieved from https://www.w3.org/submissions/shapes/,
          <source>on 23.04</source>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>M.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          , Description Set Profiles:
          <article-title>A constraint language for Dublin Core™ Application Proifles</article-title>
          ,
          <source>Technical Report</source>
          , DCMI,
          <year>2008</year>
          . Retrieved from https://www.dublincore.org/specifications/ dublin-core/dc-dsp/,
          <source>on 23.04</source>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Knublauch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Idehen</surname>
          </string-name>
          , SPARQL Inferencing Notation, W3C Member Submission, W3C Working Group,
          <year>2011</year>
          . Retrieved from https://www.w3.org/submissions/2011/ SUBM-spin-overview-
          <volume>20110222</volume>
          /, on 23.
          <year>04</year>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15] W3C OWL Working Group, OWL 2
          <string-name>
            <given-names>Web</given-names>
            <surname>Ontology</surname>
          </string-name>
          <string-name>
            <surname>Language</surname>
          </string-name>
          , W3C Recommendation, W3C Working Group,
          <year>2011</year>
          . Retrieved from https://www.w3.org/TR/2012/REC-owl2
          <string-name>
            <surname>-</surname>
          </string-name>
          overview-20121211/, on 23.
          <year>04</year>
          .
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>H.</given-names>
            <surname>Knublauch</surname>
          </string-name>
          , Spin webpage, https://spinrdf.org/,
          <year>2011</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. E. L.</given-names>
            <surname>Gayo</surname>
          </string-name>
          , E. Prud'hommeaux, I. Boneva,
          <string-name>
            <given-names>D.</given-names>
            <surname>Kontokostas</surname>
          </string-name>
          ,
          <source>Validating RDF Data - Chapter</source>
          <volume>7</volume>
          :
          <string-name>
            <surname>Comparing</surname>
            <given-names>ShEx</given-names>
          </string-name>
          <source>and SHACL</source>
          , Springer International Publishing,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .1007/ 978- 3-
          <fpage>031</fpage>
          - 79478- 0.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>The</given-names>
            <surname>Apache Software Foundation</surname>
          </string-name>
          , Apache jena shacl, https://jena.apache.org/documentation/ shacl/index.html,
          <year>2011</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>TopQuadrant</surname>
          </string-name>
          , Topbraid shacl api, https://github.com/TopQuadrant/shacl/tree/ d566c6b955cb8ec63ca32129bfb41b358ac07e31,
          <year>2016</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>TopBraid</surname>
          </string-name>
          ,
          <article-title>Topbraid enterprise data geovernance</article-title>
          , https://www.topquadrant.com/products/ topbraid-enterprise
          <string-name>
            <surname>-</surname>
          </string-name>
          data-governance/,
          <year>2017</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>R.</given-names>
            <surname>Palma</surname>
          </string-name>
          ,
          <article-title>Sub-class inference using shacl</article-title>
          , https://github.com/TopQuadrant/shacl/issues/108,
          <year>2021</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>H.</given-names>
            <surname>Knublauch</surname>
          </string-name>
          , Shacl playgound, https://shacl.org/playground/,
          <year>2017</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <surname>Zazuko</surname>
            <given-names>GmbH</given-names>
          </string-name>
          , Zazuko shacl playgound, https://shacl-playground.
          <source>zazuko.com/</source>
          ,
          <year>2021</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission's DG</surname>
          </string-name>
          <string-name>
            <surname>DIGIT</surname>
          </string-name>
          ,
          <article-title>Dg digit shacl validator</article-title>
          , https://www.itb.ec.europa.eu/shacl/ any/upload,
          <year>2020</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>European</given-names>
            <surname>Commission's DG</surname>
          </string-name>
          <string-name>
            <surname>DIGIT</surname>
          </string-name>
          ,
          <article-title>Dg digit shacl shape validator</article-title>
          , https://www.itb.ec.europa.eu/ shacl/shacl/upload,
          <year>2020</year>
          . Accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hartmann</surname>
          </string-name>
          ,
          <article-title>Validation framework for rdf-based constraint languages</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Karlsruhe Institute of Technology, Germany,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sommer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Car</surname>
          </string-name>
          ,
          <article-title>A python validator for shacl</article-title>
          , https://github.com/RDFLib/pySHACL,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .5281/zenodo.10958008, accessed:
          <fpage>2024</fpage>
          -04-23.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>