Enhanced and Scalable RDF Validation Techniques for Dataspaces Paul Moosmann1,∗ , Johannes Theissen-Lipp1,2 and Christoph Lange1,2 1 Fraunhofer Institute for Applied Information Technology FIT, Sankt Augustin, Germany 2 RWTH Aachen University, Aachen, Germany Abstract With the exponential growth of data on the Internet, the need to control data, especially after it has been shared, has become increasingly important. Dataspaces promise data sovereignty as a solution to this challenge, providing a decentralized environment where participants can share data while retaining control over its use. In such environments, where descriptive vocabularies and resource descriptions are decentralized and exchanged among participants, ensuring accurate and correct information becomes essential for integrated data use in mature software solutions. Validation principles play a crucial role in ensuring that the diverse information exchanged conforms to specified standards. We enhance validation approaches to facilitate the common shape of information in decentralized envi- ronments such as dataspaces. We improve the validation techniques (1) through inference optimization and (2) introducing SHACL-SPARQL shortcut templates for easy expression of commonly used constraints. We evaluate the effectiveness of our approach through runtime measurements and application experiments with a Semantic Web expert. This approach enables accurate and scalable solutions, as shown by our evaluation, which demonstrates the practicality and scalability. Using state-of-the-art techniques such as knowledge inference, our method ensures data quality and usability while maintaining ease of use and inclusion for a wide range of dataspace participants, including non-experts. Our proposed two-step method improves validation techniques by optimizing RDF validation, leading to faster execution times of the validation process and reducing errors when implementing more complex validation shapes. Keywords Dataspaces, Scalable Data Validation, SHACL, RDF Inference 1. Introduction and Motivation In the last years, the topic of dataspaces has become increasingly relevant, leading to multiple large-scale cross-domain dataspace initiatives being launched in Europe [1]. The most prominent examples include the International Data Spaces [2, 3] and Gaia-X [4, 5]. Concrete, domain-specific implementations of these initiatives include examples such as the Mobility Data Space [6] for the International Data Spaces or the automotive network Catena-X [7] for Gaia-X. Dataspaces offer approaches to exchange heterogeneous data from heterogeneous sources by supporting multiple data models and providing mechanisms to query and analyze the data, thereby discovering relationships amongst the data. However, an integrated use of data (e.g., in mature software solutions) demands precise and correct data. This requires validation principles to ensure that the heterogeneous information exchanged in dataspaces conforms to certain specifications. To achieve this, dataspaces use Semantic Web technologies such as SHACL [8]. The correct implementation of so-called validation shapes, which define the expected form of data, can be a non-trivial task for dataspace users who are not Semantic Web experts. This is especially true when validating against more complex constraints that are not natively supported by validation languages, and require the use of more advanced features, such as SHACL-SPARQL [8]. First International Workshop on Scaling Knowledge Graphs for Industry, co-located with 20th International Conference on Semantic Systems (SEMANTICS), Amsterdam, Sept. 17-19, 2024 ∗ Corresponding author. Envelope-Open paul.moosmann@fit.fraunhofer.de (P. Moosmann); theissen-lipp@dbis.rwth-aachen.de (J. Theissen-Lipp); christoph.lange-bever@fit.fraunhofer.de (C. Lange) Orcid 0009-0005-2114-8578 (P. Moosmann); 0000-0002-2639-1949 (J. Theissen-Lipp); 0000-0001-9879-3827 (C. Lange) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings In this paper, we address the requirements for an easy-to-use validation process and develop validation approaches to facilitate common information modeling in the decentralized environment of dataspaces. In this environment, two different sorts of data exist, that can be validated. There is (1) content data, meaning the actual payload data that is being exchanged and (2) metadata, which is data providing additional information about the data itself, e.g., who provided the data or what standards the data conforms to. The validation approach we suggest in this paper can be applied to both payload data and metadata, though we focused on metadata in our evaluation. We propose sound but convenient methods to facilitate validation in dataspaces. Sound in this context refers to precise and scalable solutions, including state-of-the-art solutions using advanced techniques such as knowledge inference to ensure the best possible results, while convenient refers to ease of use and inclusion of a wide range of users of a data space, including non-experts. We achieve this by simplifying the use of SHACL-SPARQL constraint shapes and by suggesting an inference step to pre-process the shapes. Therefore, this paper contributes: • Shortcuts for Commonly Used Constraints: We provided shortcuts in the form of template- based properties for commonly used constraints that are not yet natively supported by validation languages, simplifying and unifying their application for users. • A Scaling Inference Solution for Validation: Our modification of the current common practice of RDF validation with SHACL applies inference only once on the shape graph instead of on each instance. This remainder of this paper first studies the state of the art in validation languages and techniques, and then presents our approach, results, and evaluation addressing the stated requirements. 2. Background and Related Work The state of the art in the area of our work includes validation languages (cf. section 2.1) that allow users such as dataspace participants to design and evaluate constraints to ensure the conformance of resources in dataspaces. These resources, ranging from data to services to participant descriptions, can be validated against arbitrary specifications, including local agreements or global standards. The state of the art in validation solutions and techniques (cf. section 2.2) includes software applications as well as common practices for their application. 2.1. Validation Languages The validation of RDF [9] data is an essential step of the linked data lifecycle to ensure high data quality. There are several languages available, aiming to enable extensive validation possibilities. To get an overview of the most relevant languages currently available for RDF validation, Dominik Tomaszuk conducted a survey “perform[ing] an overview and comparison of current options for RDF validation” in 2017 [10]. In this survey, he identified five different validation tools, ShEx [11], SHACL [8], ReSH [12], DSP [13], and SPIN [14], as the most important validation languages. He compared these languages, as well as the built-in validation capabilities of OWL [15], regarding their expressiveness of 17 commonly used constraints. Since SHACL was strongly influenced by SPIN and can be regarded as its legitimate successor [16], we will not evaluate both of these languages, but solely SHACL. Table 1 presents the results of our evaluation. The metrics for evaluating the available RDF validation languages for our work are as follows. Expressiveness is ranked by how many of the 17 commonly used constraints [10] can be expressed, resulting in a score of “not very expressive” (–), “expressive” (+), and “very expressive” (++). In addition, we compare the validation languages in terms of their standardization, use in existing dataspace initiatives, and existing inference support. These are simple yes/no measures. SHACL is a constraint language for RDF. It can express 16 out of 17 commonly used constraints. It is a W3C recommendation and is applied in dataspace initiatives such as Gaia-X and the International Data Spaces. Existing SHACL validators offer support for RDFS inference. ShEx is a language for describing RDF graph structures that can prescribe conditions that RDF data graphs must meet to be considered conformant. It can express 16 out of 17 commonly used constrains. Any inference must be done on the RDF graph separately, the ShEx processor itself does not interact with any inference mechanism [17]. ReSH is a high-level RDF vocabulary for specifying the shape of RDF resources. It is only able to express 9 out of 17 commonly used constrains. There is no additional inference support, besides the general possibility to perform inference on the RDF graph directly. DSP is used to define structural constraints on data. It is only able to express 10 out of 17 commonly used constraints. There is no additional inference support, besides the general possibility to perform inference on the RDF graph directly. OWL is a language to describe RDF graph structures used in dataspace initiatives such as Gaia-X and the International Data Spaces. It is part of the W3C’s Semantic Web technology stack. While it has built-in validation capabilities, it can only express 13 out of 17 commonly used constraints. It is possible to apply inference mechanisms on graphs described using OWL. Table 1 Requirements and metrics for validation languages. Expressiveness in this context refers to the number of expressible constraints from the list of 17 commonly used constraints. Validation Language Expressive- Standard- Dataspace Inference ness ization Use Support SHACL [8] ++ 3 3 3 ShEx [11] ++ 7 7 7 ReSH [12] – 7 7 7 DSP [13] – 7 7 7 OWL [15] + 3 3 3 Based on the results shown in Table 1, we use SHACL in this work because it best meets our requirements. 2.2. Solutions and Techniques for Validation Since we have identified SHACL as the most appropriate validation language for our requirements, we will focus in this section on solutions that implement SHACL. Apache Jena SHACL [18] implements both SHACL Core and SHACL SPARQL constraints and provides a reader and writer for a compact SHACL syntax. The TopBraid SHACL API [19] is an open-source implementation of SHACL based on Apache Jena, which performs SHACL constraint checking and rule inferencing. There also exists a data governance tool using this API, the TopBraid EDG [20]. A GitHub issue opened in February 2021 [21] shows that the topic of inference for RDF validation can be further improved. The issue raises the question of subclass inference by SHACL. The answers suggest that today the only way to perform this kind of inference is to import the whole ontology (or ontologies) into the data graph, which does not always seem to be a high-performance solution. Additionally, there is no easily accessible documentation available that shows what kind of inference is done. There are web-based validators available for simple validation tasks. E.g., the SHACL Playground [22] is implemented by TopQuadrant, who are also responsible for the TopBraid EDG mentioned above. There exists an updated implementation of the SHACL Playground that can be found under the name of Zazuko SHACL Playground [23]. Another web-based validator is provided as part of the European Commission’s DG DIGIT Interoperability Test Bed [24]. The interface offers a simple possibility to upload a shape and data graph to conduct the validation. Apart from the data validator, they also offer a shape validator [25], which checks if the provided SHACL shape confirms to certain rules, namely the core W3C syntax rules, the extended W3C syntax rules, and the extended W3C syntax rules and best practices. These web-based validators have in common that they were developed for simple validation tasks and do not support more advanced tasks. For example, it is not possible to import ontologies into the data graph for inference or use advanced features such as SHACL-SPARQL for the SHACL Playground. 3. Approach Our approach to enhancing validation techniques for dataspaces consists of two steps: The introduction of shortcut templates (cf. section 3.1) and the optimization by inference (cf. section 3.2). Our approach also suggests a combination of these two steps into one solution (cf. section 3.3). 3.1. Introducing Shortcut Templates for Constraints We begin by addressing the challenge of dealing with commonly used constraints that are not natively supported by validation languages (see figure 1). The current state-of-the-art approach for SHACL is to manually formulate SHACL-SPARQL constraints, which can be complex and ambiguous (since one constraint can be modelled using varying SHACL-SPARQL constructs). To simplify and unify this process, we propose the introduction of shortcuts in the form of new template-based properties. Our proposed methodology is shown in figure 2. The approach involves (1) analyzing the feasibility of common yet unsupported constraints, which were presented by Hartmann in [26], (2) designing templates for them, (3) aggregating them for easy use, and (4) implementing a demonstrator for them. (a) Common approach (baseline) (b) Our approach Figure 1: The first part of our two-part approach is to introduce shortcuts for commonly used constraints that are not yet natively supported by validation languages. For each non-standard constraint, the current state of the art for SHACL (a) is that users must formulate SHACL-SPARQL constraints, which are often complex and ambiguous. We introduce shortcuts (b) in the form of new template-based properties to simplify and unify this process. 2 3 Design the SHACL-SPARQL Create a SHACL-SPARQL templates templates for the feasible (sst) vocabulary containing constraint types the custom commands for the feasible constraint types 1 4 Analyse constraint types Implement a tool that identified by Hartmann with maps custom commands regard to their feasibility for the to the corresponding SHACL- SHACL-SPARQL templates SPARQL templates Figure 2: The methodology of the second part of our two-part approach, which introduces SHACL-SPARQL templates as shortcuts for commonly used constraints not natively supported by SHACL. We (1) analyze the feasibility of common but unsupported constraints [26], (2) design templates for them, (3) aggregate them as shortcuts for easy use, and (4) implement a demonstrator that supports them. These shortcut templates serve as intuitive bridges between unsupported constraints and native SHACL validators, significantly reducing the cognitive overhead for users. By encapsulating complex constraints into reusable templates, we enable users to express validation rules more concisely and effectively. In addition, the adoption of standardized templates improves interoperability and promotes best practices across different validation scenarios. 3.2. Validation Enhancement through Inference Optimization In the second part of our approach, we address the inefficiencies inherent in current RDF validation processes (see figure 3). Applying inference as part of the RDF validation with SHACL usually involves appending the entire ontology file needed for inference to each data instance. This significantly increases the size of each instance to be validated. Our proposed solution modifies this practice by applying the inference only once to the shapes graph. This optimization results in a shapes graph of slightly higher complexity but eliminates the need for expensive inference on each data graph during validation. With this approach, we aim for a more scalable solution for recurrent validation against the same shapes graph. Data Graph Data Graph (Instance) (Instance) Data Graph (with appended ontology) SHACL SHACL Ontology Ontology Validator Validator Extended Shapes Graph Shapes Shapes Graph Graph (a) Validation without prior inference (baseline) (b) Validation with prior inference (our approach) Figure 3: The second part of our two-part approach is to modify a current common practice for RDF validation with SHACL. In the baseline (a), the entire ontology file needed for inference is appended to each instance and fed to a validator, increasing the size of each instance to be validated enormously. We propose an approach (b) that instead applies inference only once to the shapes graph. Our approach not only reduces the computational overhead but also increases the agility of the vali- dation process. By decoupling the inference step from individual data instances, we aim to create a more streamlined and resource-efficient workflow. This optimization could prove particularly advantageous for large-scale dataspaces, since they require frequently performed validation tasks. 3.3. Combining the Solutions Combining these two solutions into one is the final step of our approach (see figure 4). The input data is pre-processed by our proposed scaling inference solution, which optimizes the shapes graph for subse- quent validation. Subsequently, our constraint template mapping solution outputs native SHACL shape graphs using SHACL-SPARQL. This ensures compatibility with any native SHACL validator supporting SHACL-SPARQL, therefore facilitating seamless validation of data against predefined constraints. 4. Implementation In this section, we detail the implementation of our previously described approach. Our work enhances the validation process of existing validation tools, such as the command-line tool pySHACL [27] by implementing easy-to-use commands for complex constraints and performing an inference step pre-validation. Since this is a two-step approach, we present our results for each step separately. Input Output Constraint Mapping Tool Shapes Ontology Graph Inference Tool Graph Shapes including Graph inferred including information inferred and SHACL- Shapes information SPARQL Graph constructs Figure 4: The complete solution of our two-step approach. For input data to be validated, our proposed scaling inference solution (see figure 3) first manipulates the shapes graph, before our proposed constraint template mapping solution (see figure 1) makes it compatible with any native SHACL validator, so that the actual validation can be performed seamlessly. 4.1. Resulting Shortcut Templates for Constraints The results of this work include 16 constraint templates implementing commonly used constraints [10] using SHACL-SPARQL. This way, we can utilize the built-in SPARQL support of the SHACL validation language. Using these templates, a potential user wanting to validate a data instance does not need to implement a complex SHACL-SPARQL solution. Instead, the user only needs to use a simple command that is replaced by the corresponding SHACL-SPARQL template during the mapping process. Listing 1 shows one of the created constraint templates. It implements a constraint on language tags for a data property. 1 :CountryShape a sh:Shape ; 2 sh:scopeClass :Country ; 3 sh:constraint [ 4 sh:message "Values of 'germanLabel' must hava a German language tag." ; 5 sh:sparql """ 6 SELECT $this ($this AS ?subject) (:germanLabel AS ?predicate) 7 (?value AS ?object) 8 WHERE { $this :germanLabel ?value . 9 FILTER(isLiteral(?value) || 10 !langMatches(lang(?value), "de"))}""";]. 11 12 :ValidCountry a :Country ; :germanLabel "Deutschland"@de . 13 :InvalidCountry a :Country ; :germanLabel "Germany"@en . Listing 1: Example SHACL-SPARQL template to validate that a certain language tag (German) is present for a data property. A valid example instance (line 12) has a German language tag (@de ) and an invalid instance (line 13) has only an English language tag (@en ). Figure 5a shows an example of a possible input graph for the constraint mapping tool. This SHACL shapes graph defines a constraint on the cardinality of English language tags on the rdfs:label property by using one of the custom commands defined by this work. The constraint mapping tool then maps this command onto the corresponding SHACL-SPARQL construct as defined in the constraint template designed by us. Figure 5b shows the result of this mapping process. To achieve the mapping from Figure 5a to Figure 5b, a simple command-line command with two input parameters is sufficient. The first is the path to the SHACL shapes graph containing our custom sst commands. The second is the output location for the created SHACL shapes graph containing the SHACL-SPARQL templates. A possible command line execution is: python Mapping.py pathTo/input.shacl.ttl pathTo/output.shacl.ttl This mapping is implemented using a graph-based approach. The graph is searched for triples using one of our custom commands as an edge between two nodes. When finding such a triple, it is replaced by a new triple using the sh:sparql command as predicate and the corresponding SHACL-SPARQL template as object. Hereby, the input file is imported as a graph data structure using the Python package RDFlib1 . 1 @prefix gax: . 2 @prefix sh: . 3 @prefix sst: . 4 5 sh:ProviderShape a sh:NodeShape ; 6 sst:languageTagCardinalityMin "rdfs:label,rdfs:comment,en,1"; 7 sh:targetClass gax:Provider . (a) Input 1 @prefix gax: . 2 @prefix sh: . 3 4 sh:ProviderShape a sh:NodeShape ; 5 sh:sparql [ sh:message "Values of the constraint 'language tag cardinality min' [...]" ; 6 sh:select """ 7 SELECT $this 8 WHERE { 9 SELECT (COUNT(?value) as ?count) 10 WHERE { 11 { $this rdfs:label ?value } UNION { $this rdfs:comment ?value } 12 FILTER(isLiteral(?value) && langMatches(lang(?value), 'en')) 13 } 14 GROUP BY $this 15 } 16 HAVING (SUM(?count) < 1) 17 """ ] ; 18 sh:targetClass gax:Provider . (b) Output Figure 5: Example execution for a constraint that all labels and comments have at least one English value, using our proposed constraint mapping tool. The tool transforms an introduced shortcut (see line 6 in (a)) into corresponding SHACL-SPARQL expressions (see lines 5–17 in (b)), which can be used natively in any SHACL validator. 1 https://rdflib.readthedocs.io/en/stable/ 4.2. Resulting Inference Optimization Our solution for performing the inference as a preprocessing step is implemented as a tool similar to the mapping tool above. Again, use a graph-based approach consisting of two tasks. The first task is searching the ontology graph for all the sub-class and sub-property relations. The second task is searching the shapes graph for occurrences of parent classes and properties and adding further connections to cover all related sub-classes and sub-properties. For the second task, the tool identifies specific connection types in the shape graph. Figure 6 shows a minimal example of what the input and output graphs of the inference tool look like when performing a simple sub-class inference of the sh:targetClass . Analogous to the realization of the constraint templates, our transitive inference tool is realized by a command-line tool implemented in Python 3. More detailed, we use the Python library RDFlib, which enables an implementation of our graph-based approach. To perform the inference step, a simple command-line command with three input parameters is sufficient. The first is the path to the ontology graph containing additional information, the second is the path to the SHACL shapes graph, and the third is the output location for the extended SHACL shapes graph. A possible command line execution is: python mapping.py ontology.ttl shape.ttl newShape.ttl 1 @prefix ex: . 2 @prefix sh: . 3 4 ex:PersonShape 5 a sh:NodeShape ; 6 sh:targetClass ex:Person ; 7 sh:property [ 8 sh:path ex:hasName ; 9 sh:minCount 1 ; 10 ] . (a) Input 1 @prefix ex: . 2 @prefix sh: . 3 4 ex:PersonShape 5 a sh:NodeShape ; 6 sh:targetClass ex:Person, ex:Student ; 7 sh:property [ 8 sh:path ex:hasName ; 9 sh:minCount 1 ; 10 ] . (b) Output Figure 6: Example execution of an sub-class inference. Our tool updates the SHACL graph using the in- formation that ex:Student and is a sub-class of ex:Person . This is implemented by adding ex:Student as sh:targetClass (cf. line 6 in (b)). 5. Evaluation To assess the effectiveness and scalability of our proposed enhanced validation techniques, we conducted an evaluation using example instances from the Gaia-X framework for dataspaces. We decided to do this, to show the feasibility of our approach in real-world application scenarios. These instances, as shown in figure 7, represent typical data providers within the Gaia-X ecosystem, showing properties with data attributes and links to related objects. Understanding the details of these instances is relevant for evaluating the performance of our validation techniques. Note that the names and data of these providers have been anonymized for this paper. Figure 7: Overview of the four example instances of providers from the Gaia-X dataspace used in this evaluation. Each provider instance contains properties defining the type, name, sales tax ID, and label. Further properties describe the relation amongst the different providers, such as subsidiary or project partner relations, and if a provider trusts the other providers. 5.1. Evaluation of the Impact of Shortcut Templates for Constraints To evaluate the impact of the constraint template enhancement on the time required for the validation process and the error proneness of this process, we conducted a hands-on experiment that simulated a real-world work situation. For this purpose, we asked a Semantic Web expert to implement a defined set of constraints in SHACL once without the constraint mapping tool and once with the constraint mapping tool. The developer’s skills included working with SHACL on a weekly basis and having basic knowledge of SPARQL. We structured the experiment in the following way. Given was a data graph describing the four different provider instances shown in figure 7. The data graph was supposed to satisfy the following four constraints: (C1) each provider trusts itself, (C2) the legally binding name of a provider must be unique, (C3) the property ex:hasProjectPartner is symmetric, and (C4) at least one label with an English language tag must be defined. An example data graph intentionally contained one error for each of these constraints, meaning a validation with a correct shapes graph should lead to four validation errors. Given the data graph and the constraint requirements, the developer was tasked with creating a SHACL shapes graph without using the constraint mapping tool. After 30 minutes, a validation of the data graph was conducted. For the second part of the experiment, the developer used the constraint mapping tool in the form of a Python command line tool. In addition, we provided written documentation on how to use the tool. Again, the allotted time was 30 minutes. The results of the experiment are summarized in Table 2 and show that our approach led to improvements in both the time needed to build the shapes graph and the correctness of the validation process. 5.2. Evaluation of the Inference Optimization In the second part of our evaluation, the inference optimization, we focused on measuring the execution times of our proposed validation techniques under different scenarios. Table 3 summarizes the results of several runs of our experiment, comparing the baseline common approach with our proposed method. Notably, our approach achieves a significant reduction in execution times for individual data graphs, demonstrating its efficiency and scalability. However, the execution times for a merged data graph remain relatively unchanged between the baseline and our approach. Figure 8 visualizes the average Without the constraint With the constraint mapping tool mapping tool Implementation of C1 incorrect implementation successfully implemented Implementation of C2 not implemented successfully implemented Implementation of C3 not implemented successfully implemented Implementation of C4 incorrect implementation successfully implemented Experiment stopped Time needed Done after 15 minutes after 30 minutes Table 2 Overview of the experiment results, determining the utility of our constraint mapping tool. execution times of all runs for a comprehensive analysis. Table 3 shows a graphic representation of the average values of our different experiment executions. These results validate the practicality and scalability of our proposed validation approaches for application in real-world scenarios. Table 3 Execution times measured in our experiments. In both scenarios, individual data graphs and merged data graph, we compared the state-of-the-art common approach as a baseline with our approach. Scalability is measured by a growing number of 100–10000 input data graph files. Each cell on the right contains three measurements representing repeated experiments. While our approach does not change the execution times for a merged data graph, it significantly reduces the execution times for individual data graphs to about 1/5. Number of Files 100 500 1000 5000 10000 171.85 882.09 1737.95 8600 17200 Baseline 177.36 884.06 1808.80 8850 17700 172.00 866.37 1658.90 8600 17200 individual data graphs 31.54 142.45 289.93 1600 3200 Our Approach 30.41 144.34 298.70 1500 3000 30.2 145.57 258.76 1500 3000 2.82 3.02 5.12 16.24 30.68 Baseline 2.76 3.17 4.73 15.83 30.21 2.84 3.19 5.02 16.08 30.15 merged data graph 3.34 4.47 5.99 17.26 31.48 Our Approach 3.44 4.67 6.09 18.73 32.08 3.55 4.82 6.48 17.41 31.96 6. Conclusion and Future Work In conclusion, we successfully addressed the rising data validation requirements by proposing a scaling inference solution and a constraint template mapping solution. This facilitates common information design in decentralized environments such as dataspaces by making validation techniques more scalable and usable. An evaluation in a real-world scenario with anonymized participants from the Gaia-X dataspaces demonstrated feasibility, scalability, and practical impact. The contributions of this paper include: • A Scaling Solution for Validation: Our modification of the current common practice of RDF validation with SHACL applies inference only once on the shapes graph instead of on each instance. The shapes graph becomes slightly more complex, but expensive inference on each data graph is avoided. This contributes to the requirements by making recurrent validation against the same shapes graph more scalable. (a) Validating individual data graphs (b) Validating merged data graph Figure 8: Visualization of the average execution times of our experiments (see table 3). While our approach does not change the execution times for a merged data graph (b), it significantly reduces the execution times for individual data graphs to about 1/5 and scales linearly. • Shortcuts for Commonly Used Constraints: We provided shortcuts in the form of template- based properties for commonly used constraints that are not yet natively supported by validation languages, simplifying and unifying their application for users. We have introduced SHACL- SPARQL templates, which are used by our proposed mapping tool to enable native support by any SHACL validator. This contributes to the requirements by allowing a wide range of dataspace users, including non-experts, to formulate even complex constraints for validation. • Combined Approach: By combining inference optimization with the introduction of constraint templates, our two-step approach provides a holistic solution for improving dataspace validation techniques. We not only address efficiency concerns but also prioritize usability and interoperabil- ity, thereby advancing the state of the art in RDF validation methodologies. This approach lays the foundation for scalable and robust validation processes in the context of evolving dataspaces and Semantic Web applications. The contributions provide reliable guarantees for an integrated use data or services in applications on the domain layer of dataspaces, such as APIs. The proposed scaling but convenient solutions for validation in dataspaces allow the diverse participants of dataspaces, including non-experts, to use shortcuts for expressing common validation constraints and to validate these much faster with standard validators. The proposed enhanced validation techniques improve interoperability by providing guarantees for all kinds of interfaces between participants or services in dataspaces, which contributes to a better common understanding in dataspaces. Future work can expand the pool of constraint templates with corresponding tem- plates for property-based constraints. Concerning the inference approach introduced by this work, our tool currently only applies transitive inference for the properties rdfs:subClassOf and rdfs:subPropertyOf . Future work can extend the considered targets of the inference process to achieve even better validation results. It can also include developing a user interface for our command-line tools to make them even more accessible. Acknowledgments Funded by the Deutsche Forschungsgemeinschaft (DFG, German Research Foundation) under Germany’s Excellence Strategy – EXC-2023 Internet of Production – 390621612. Funded by the FAIR Data Spaces project of the German Federal Ministry of Education and Research (BMBF) under the grant number FAIRDS05. We thank Jens Lehmann for supervising the master thesis underlying this research. References [1] J. Theissen-Lipp, M. Kocher, C. Lange, S. Decker, A. Paulus, A. Pomp, E. Curry, Semantics in dataspaces: Origin and future directions, in: Companion Proceedings of the ACM Web Conference 2023, WWW ’23 Companion, Association for Computing Machinery, New York, NY, USA, 2023, p. 1504–1507. URL: https://doi.org/10.1145/3543873.3587689. doi:10.1145/3543873.3587689 . [2] International Data Spaces e. V., International Data Spaces, https://www.internationaldataspaces. org/, 2016. [Accessed 29.04.2024]. [3] S. Bader, J. Pullmann, C. Mader, S. Tramp, C. Quix, A. Müller, H. Akyürek, M. Böckmann, B. Imbusch, J. Theissen-Lipp, S. Geisler, C. Lange, The International Data Spaces Information Model - An Ontology for Sovereign Exchange of Digital Content, in: Proceedings of the 19th International Semantic Web Conference, Springer International Publishing, Cham, 2020, pp. 176–192. URL: https://doi.org/10.1007/978-3-030-62466-8_12. doi:10.1007/978- 3- 030- 62466- 8_12 . [4] Gaia-X European Association for Data and Cloud AISBL, Gaia-X: Vision & Mission, https://gaia-x. eu/what-is-gaia-x/vision-and-mission/, 2020. [Accessed 08.08.2023]. [5] Plattform Industrie 4.0, Project GAIA-X: A Federated Data Infrastructure as the Cradle of a Vibrant European Ecosystem, Technical Report, Federal Ministry for Economic Affairs and Energy (BMWi), D-11019 Berlin, Germany, 2019. [6] DRM Datenraum Mobilität GmbH, Mobility Data Space, https://mobility-dataspace.eu/, 2021. [Accessed 09.05.2024]. [7] Catena-X Automotive Network e.V., Catena-X Automotive Network, https://catena-x.net/en/, 2021. [Accessed 09.05.2024]. [8] H. Knublauch, D. Kontokostas, Shapes constraint language (SHACL), W3C Recommendation, W3C Working Group, 2017. Retrieved from https://www.w3.org/TR/shacl/, on 23.04.2024. [9] G. Schreiber, Y. Raimond, F. Manola, E. Miller, B. McBride, RDF 1.1 Primer, W3C Recommendation, W3C Working Group, 2014. Retrieved from https://www.w3.org/TR/rdf11-primer/, on 23.04.2024. [10] D. Tomaszuk, Rdf validation: a brief survey, in: International Conference: Beyond Databases, Architectures and Structures, Springer, 2017, pp. 344–355. [11] E. Prud’hommeaux, J. E. Labra Gayo, H. Solbrig, Shape expressions: an RDF validation and transformation language, in: SEMANTICS, 2014, pp. 32–40. doi:10.1145/2660517.2660523 . [12] A. Ryman, Resource Shape 2.0, W3C Member Submission, W3C Working Group, 2014. Retrieved from https://www.w3.org/submissions/shapes/, on 23.04.2024. [13] M. Nilsson, Description Set Profiles: A constraint language for Dublin Core™ Application Pro- files, Technical Report, DCMI, 2008. Retrieved from https://www.dublincore.org/specifications/ dublin-core/dc-dsp/, on 23.04.2024. [14] H. Knublauch, J. A. Hendler, K. Idehen, SPARQL Inferencing Notation, W3C Member Sub- mission, W3C Working Group, 2011. Retrieved from https://www.w3.org/submissions/2011/ SUBM-spin-overview-20110222/, on 23.04.2024. [15] W3C OWL Working Group, OWL 2 Web Ontology Language, W3C Recommendation, W3C Work- ing Group, 2011. Retrieved from https://www.w3.org/TR/2012/REC-owl2-overview-20121211/, on 23.04.2024. [16] H. Knublauch, Spin webpage, https://spinrdf.org/, 2011. Accessed: 2024-04-23. [17] J. E. L. Gayo, E. Prud’hommeaux, I. Boneva, D. Kontokostas, Validating RDF Data - Chap- ter 7: Comparing ShEx and SHACL, Springer International Publishing, 2018. doi:10.1007/ 978- 3- 031- 79478- 0 . [18] The Apache Software Foundation, Apache jena shacl, https://jena.apache.org/documentation/ shacl/index.html, 2011. Accessed: 2024-04-23. [19] TopQuadrant, Topbraid shacl api, https://github.com/TopQuadrant/shacl/tree/ d566c6b955cb8ec63ca32129bfb41b358ac07e31, 2016. Accessed: 2024-04-23. [20] TopBraid, Topbraid enterprise data geovernance, https://www.topquadrant.com/products/ topbraid-enterprise-data-governance/, 2017. Accessed: 2024-04-23. [21] R. Palma, Sub-class inference using shacl, https://github.com/TopQuadrant/shacl/issues/108, 2021. Accessed: 2024-04-23. [22] H. Knublauch, Shacl playgound, https://shacl.org/playground/, 2017. Accessed: 2024-04-23. [23] Zazuko GmbH, Zazuko shacl playgound, https://shacl-playground.zazuko.com/, 2021. Accessed: 2024-04-23. [24] European Commission’s DG DIGIT, Dg digit shacl validator, https://www.itb.ec.europa.eu/shacl/ any/upload, 2020. Accessed: 2024-04-23. [25] European Commission’s DG DIGIT, Dg digit shacl shape validator, https://www.itb.ec.europa.eu/ shacl/shacl/upload, 2020. Accessed: 2024-04-23. [26] T. Hartmann, Validation framework for rdf-based constraint languages, Ph.D. thesis, Karlsruhe Institute of Technology, Germany, 2016. [27] A. Sommer, N. Car, A python validator for shacl, https://github.com/RDFLib/pySHACL, 2018. doi:10.5281/zenodo.10958008 , accessed: 2024-04-23. A. Data availability statement The tools developed as part of this work can be accessed via https://github.com/moosmannp/ Enhancing-SHACL-Validation-Through-Constraint-Templates-and-Inference.