Semantic Data Link: Bridging Domain-Specific Needs with Universal and Interoperable Semantic Models Maximilian Stäbler*1 , Paul Moosmann*2 , Patrick Dittmer3 , DanDan Wang4 , Frank Köster1 and Christoph Lange2,5 1 German Aerospace Center (DLR) Institute for AI Safety & Security, Ulm, Germany 2 Fraunhofer Institute for Applied Information Technology (FIT), Sankt Augustin, Germany 3 Behörde für Verkehr und Mobilitätswende (BVM), Hamburg, Germany 4 T-Systems International GmbH, Bonn, Germany 5 RWTH Aachen University, Aachen, Germany Abstract The emergence of data-driven systems necessitates enhanced interoperability across diverse data ecosys- tems. Traditional approaches to semantic interoperability have been hindered by the complexity and specificity of ontologies, demanding significant expertise and resources for their development and maintenance. This paper introduces the Semantic Data Link (SDL) framework, a novel approach that aims to democratize data description and enhance semantic interoperability. SDL offers a domain and ontology-independent methodology, focusing on a multi-layered architecture that emphasizes decen- tralized semantics and categorizes data into definitional, structural, and contextual aspects. Developed as part of the Gaia-X 4 Future Mobility initiative, SDL is particularly pertinent to the mobility sector, where real-time data exchange and interoperability are crucial. This framework promises to bridge the gap between varying levels of expertise in semantic technologies and accelerate the development of semantically interoperable applications and services. We provide an in-depth discussion on the conceptual framework, design rationale, and implementation of SDL. The paper concludes with insights into the practical implications of SDL and prospective directions for future research in the quest for a seamless, interoperable data landscape. Keywords Semantic Interoperability, Data Ecosystems, Dataspaces, Domain Agnostic Framework 1. Introduction and Motivation Efficient data exchange and interoperability are crucial in various ecosystems [1], especially in the mobility sector [2], where they face significant challenges due to real-time data shar- ing demands across domains [3]. This situation exacerbates urban issues like congestion and pollution, as interoperability deficits limit the development of smart, connected urban mobil- ity services [2]. Data heterogeneity, marked by incompatible message formats, complicates seamless data interoperability [4]. Although traditional ontology approaches have aimed at The Second International Workshop on Semantics in Dataspaces, co-located with the Extended Semantic Web Conference, May 26 – 27, 2024, Hersonissos, Greece $ maximilian.staebler@dlr.de (M. Stäbler* ); paul.moosmann@fit.fraunhofer.de (P. Moosmann* )  0000-0003-1311-3568 (M. Stäbler* ); 0009-0005-2114-8578 (P. Moosmann* ); 0000-0001-9879-3827 (C. Lange) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 * These authors contributed equally to this work CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings alignment [5, 4, 6], merging [7, 8], and matching [9, 10, 5] to address these issues, they adhere to the “80/20” principle, where automated solutions handle most discrepancies but still leave complex cases needing manual refinement [11]. The advancement of semantic interoperability is constrained by the complexity and specialized knowledge required for ontology development and deployment [11, 12]. This complexity necessitates that stakeholders possess advanced skills in semantic technology and ontology engineering, limiting the prevalence of semantically interoperable solutions. However, the legislative landscape is evolving to address some of these barriers. A note- worthy advancement in this direction is the adoption of the European Data Governance Act (DGA), which represents a important step towards enhancing trust and expanding data avail- ability across Europe. Under the European Data Act (Data Act), mobility service providers are mandated to make data exportable, understandable, and reusable for other stakeholders in the value chain, including end-users and manufacturers. This legislative move underscores the importance of interoperability and data sharing, potentially easing some of the complexity and expertise barriers associated with semantic technologies. In a communication to the European Parliament, the European Commission has called for the development of a common European Mobility Data Space. This communication also calls for interoperability (interlinking) between different dataspaces, which are defined as collaborative digital architectures enabling secure and sovereign data exchange among diverse stakeholders. In parallel, initiatives such as the European Open Science Cloud EOSC and the adoption of the FAIR (Findable, Accessible, In- teroperable, Reusable) Data Principles further reinforce the importance of developing robust dataspaces. These challenges highlight the critical need for solutions that bridge semantic gaps between interconnected dataspaces, moving from inefficiency to semantic interoperability [3]. A specific example of this trend towards increasing data transparency and interoperability at a more localized level is seen in the City of Hamburg, Germany. Due to the Hamburg Transparency Act (Hamburger Transparenzgesetz), public sector datasets in Hamburg must be made directly accessible and published as open data, while ensuring the protection of personal data. Data is published via the Hamburger Transparenzportal using the European metadata standard DCAT-AP. However, this standard primarily concerns how data is cataloged, but not its format, which is chosen individually. Consequently, the need for semantic interoperability is not just a broad European challenge but indeed also exists at a localized level, as demonstrated by the Hamburg initiative. This paper presents the Semantic Data Link (SDL) framework as an innovative solution to establish semantic interoperability between previously incompatible systems, data sources and applications. With an emphasis on ease of use and domain agnosticism, SDL addresses a wide range of domains and applications. In particular, it is designed to be accessible to domain experts without prior knowledge of semantic technologies such as Resource Description Framework (RDF) or Shapes Constraint Language (SHACL), enabling them to create meaningful and universally applicable descriptions. Developed as part of the Gaia-X 4 Future Mobility (GX4FM) project family – a Gaia-X initiative – SDL embodies the vision of a more connected, efficient and innovative mobility future. It enables the free and meaningful flow of data across borders and sectors. The project family involves more than 80 partners from industry, research and the public sector, and each of them requires continuous ontology updates to keep pace with evolving domain knowledge and practices. Updates often lag behind due to the dynamic and resource-intensive nature of these revisions. In particular, manual intervention is normally needed, which slows down the interoperability process and introduces the risk of inconsistencies. With these challenges in mind, we developed SDL to facilitate semantic interoperability, the effectiveness of which will be validated in partnership with the City of Hamburg to enhance future mobility applications. The collaboration with industry partners and municipalities demonstrates the industry-driven approach of this project and highlights its potential for widespread adoption and impact. The paper is structured as follows: We begin with a background section that sets the context for our study, highlighting the current state of semantic interoperability. We then present the SDL, detailing its conceptual underpinnings and describing its implementation process. The paper concludes with a discussion of our findings, implications for practice, and directions for future research in this evolving field. 2. Background In this section, we will give an overview of the background of our work. Specifically, we will cover the topics of Semantic Web technologies, corresponding tool support, and give a short introduction to the dataspace initiative Gaia-X. This work emerged in the context of building a dataspace based on Gaia-X, which is directly connected to the topic of Semantic Web technologies and tools since their use is central to the implementation of federated dataspaces in Gaia-X. A general overview of the role of semantics in dataspaces is given by Theissen-Lipp et al. [13]. Semantic Web Technologies and Tool Support. There have been several reviews of the current status of Semantic Web technologies in recent years, such as the works of Hitzler [14] or Patel and Jain [15]. These works identify the W3C standards RDF, RDF Schema, OWL, and SPARQL as core technologies. In our work, we extend this list with SHACL, which builds upon RDF and is used for validating RDF graphs against a set of conditions. SHACL plays a vital role not only in the functionality of the SDL but also in underpinning semantic technologies within dataspace initiatives, such as Gaia-X or IDS. Based on these core technologies, further vocabularies were defined, which today, due to their widespread adoption, can also be seen as part of the core of the Semantic Web. Examples include the SKOS and DCAT vocabularies [15, 16]. These are also used in the context of the SDL and Gaia-X. While the development of further vocabularies leads to a stronger (and more standardized) core, Hogan [16] compiled various criticisms regarding the Semantic Web, with one being that the standards are complex and difficult to understand. To tackle this problem, various tools have been developed to aid users of Semantic Web technologies. The survey of Khamparia and Pandey gives a good overview of existing Semantic Web reasoners and tools [7]. Some prominent examples include the Protégé ontology editor, the ELK reasoner or the Linked Open Vocabularies (LOV) database [14, 17]. Even though the Semantic Web is being criticized for its lack of usable systems and tools [16], a variety of isolated tools exist, that can be built upon or integrated into the SDL. That way, the SDL decreases the impact of lacking Semantic Web expertise by its users, without being redundant. E.g., we use LOV to reuse existing vocabularies and foster interoperability. We also reduce the complexity of creating OWL and SHACL schemas, by using the LinkML model to generate them. LinkML is a general purpose modeling language. While LinkML is designed to work in harmony with semantic RDF-based frameworks, it uses the human-readable data serialization language YAML, making it more approachable for non RDF experts [18]. Gaia-X. Gaia-X, a European initiative for secure data sharing, employs federated services with Semantic Web technologies to ensure data trustworthiness. Participants and services within Gaia-X must provide credentials as specified by the W3C Verifiable Credential Data Model. The content of the credentials is detailed in the Gaia-X Trust Framework and their corresponding OWL and SHACL schemas are retrievable from the Gaia-X Registry. Also, implementations using the LinkML schema already exist and can be found in the repository of the Gaia-X Working Group Service Characteristics. Since SDL was developed to streamline the creation of credentials and associated OWL and SHACL schemas for usage within and beyond the Gaia-X context, we decided to build SDL on top of LinkML to support already existing resources. 3. Semantic Data Link Recognizing the vital necessity of enhancing semantic interoperability within data ecosystems, the SDL framework emerges as a transformative solution. SDL proposes a domain agnostic approach to synchronize data models, data services and representation formats. This method- ology paves the way towards the democratization of the development of meaningful data descriptions for individuals lacking comprehensive proficiency in semantic technologies. This section elaborates SDL’s conceptual foundation with key functionalities, the rationale behind its innovative design, and the implementation, which aims to provide a comprehensive solution for data heterogeneity and interoperability. 3.1. Conceptual Framework SDL enables a uniform description framework for individual data records without mandating particular standards. This flexibility allows for the integration of existing semantic descrip- tions from diverse domains, including but not limited to datasets, services, digital twins, and other relevant fields, facilitating compatibility with industry standards such as OPC UA. Its intermediate layer bridges disparate data descriptions without necessitating alterations to the source systems, thereby providing advantages to a broad range of stakeholders. Publishers do not need to adopt new formats for compatibility, which lowers the entry barrier and therefore enhances data availability. Consumers benefit from standardized descriptions and improved data comparability and usability from various sources. Also, enhanced data discoverability allows all users to precisely locate necessary data for applications and services. Figure 1 illustrates the stacked approach of the SDL and emphasizes decentralized semantics by categorizing data into definitional (semantic), structural (morphologic), and contextual (pragmatic) aspects for complex entities like services or datasets. This categorization aids in aggregating digital representations with rich contextual understanding. At its core, SDL features an Entity Core encapsulating essential dataset or service attributes (e.g., provider ID, name) and assigns a unique identifier to Figure 1: SDL Layers: Scalable approach to semantic annotation, dividing data characteristics into three principal layers: Pragmatic, Semantic, and Morphologic, supported by additional extendable extensions for domain- and application specific details. This stratification facilitates modular interoperability and provides a robust framework for data description across varied applications and domains. Practitioners can decide individually how many layers they want to use, depending on the application. Each attribute of each extension represents one layer. each entity, streamlining referencing and interactions in the data ecosystem. This approach draws upon the theoretical foundation laid out by the Overlay Capture Architecture (OCA). OCA is a framework designed to enable data harmonization and privacy compliant sharing across different governance frameworks. Extensions are critical in SDL because they add layers of metadata to the entity core, taking semantic, morphological, contextual and other individual and application-specific dimensions into consideration. The selection of attributes for the SDL framework was significantly influenced by a combination of the foundational principles from the OCA and extensive deliberations within the GX4FM project’s Expert Group on semantics, en- suring alignment with both theoretical and practical requirements of data interoperability. This is an evolving, community-driven effort, open to incorporating additional attributes in future versions to better meet the emerging needs and insights of the diverse stakeholder community. This architecture supports modular interoperability and semantic integrity, is domain agnostic, and harmonizes data models and formats across boundaries. Its layered design enriches data and service descriptions, improving comparability and interoperability to efficiently meet the needs of diverse domains and applications. 3.2. Rationale Behind SDL’s Design Choices The SDL framework seeks to overcome the limitations of existing semantic interoperability frameworks, responding to industry calls for a solution that is more flexible and responsive than traditional, rigid systems, thereby offering a user-friendly alternative capable of evolving with technological and business needs. Compatibility with arbitrary ontologies creates a flexible framework for data description. This design choice directly responds to the industry’s require- ment for simplified semantic technologies that can accommodate the diverse backgrounds of Figure 2: The frontend implementation of the SDL allows the user to input the namespace, prefix, class, and related attributes that are being described by the created schema. It supports the reuse of existing vocabularies by embedding LOV and guides the user by providing metadata layers to enhance the data description. This information is then translated using the LinkML Framework to generate a SHACL graph, an OWL graph and a LinkML Schema as output. In a future release the information stored in the LinkML schema will also be translated into a Knowledge Graph. users, including domain experts, without any preliminary semantic knowledge. By decentraliz- ing semantics, SDL broadens user participation, which leads to a more inclusive data ecosystem. The presented extensions and entity core concept enhances the data’s contextual understanding, which is crucial for interoperability. This not only improves data description precision but also ensures the ecosystem’s adaptability and scalability, meeting industry needs for evolving data landscapes. 3.3. Implementation of SDL The implementation of SDL consists of a front-end component that allows user input and a back- end component that uses this input to generate an output consisting of structured data in YAML and RDF format. To collect user input, we provide a simple UI that allows the user to define the data they want to describe. Since SDL is designed to help non-Semantic Web experts create RDF data, the UI enables gathering all the necessary information without requiring the user to apply any Semantic Web technologies. Figure 2 shows an overview of the main components of our SDL implementation. The user input component consists of (1) two text fields that collect information about the namespace and prefix used, (2) a part where the user can add attributes necessary to describe the data, and (3) additional layers that can be used to extend the data by certain predefined attributes to improve the data description. While the currently implemented layers focus on the basic metadata description of datasets (see Figure 1), it is possible and intended to implement additional layers in future releases. Once the user input is complete, the schema generation component is used to create a YAML file that follows the LinkML model from the input. LinkML provides a framework for generating RDF schemas from the YAML Table 1 Top 3 LOV results for attribute longitude. score indicates how well a prefixName matches the search attribute (longitude) [19]. Rank prefixedName uri score 1 geo:long http://www.w3.org/2003/01/geo/wgs84_pos#long 0.856 2 og:longitude http://ogp.me/ns#longitude 0.556 3 geo:lat http://www.w3.org/2003/01/geo/wgs84_pos#lat 0.507 file. We chose LinkML as the basis for SDL to seamlessly integrate future Gaia-X schemas already modeled in LinkML, leveraging existing work and its framework for straightforward implementation and user-defined attribute modeling through the pre-defined parameters of LinkML. These include (1) required and multivalued to model cardinality constraints, (2) a description of the attribute, (3) a regex pattern that defines constraints on the attribute value, and (4) a URI that can point to already defined attributes to facilitate reuse of existing vocabularies. We enhance vocabulary reuse and new attribute creation by automatically linking to the Linked Open Vocabularies (LOV) database. When users create an attribute and name it, a LOV search auto-executes, presenting the top ten results for selection. An example is shown in Table 1. Any of these attributes can be selected to be used if they meet the user’s requirements. If none of the suggestions fit, a new attribute is created under the previously defined namespace. Finally, the LinkML framework generators are used to generate an OWL graph and a SHACL graph from the LinkML YAML created from the user input. While LinkML also provides generators to generate schemas in other formats from the YAML file, we limit the output to OWL and SHACL graphs, as these are the relevant schemas in the context of the Gaia-X dataspace initiative. This can be easily adapted for other use cases. A future implementation of the SDL will also use the LinkML YAML to transfer the stored information into a knowledge graph, which plays an important role in promoting semantic interoperability. The current implementation of the SDL, as described in this section, can be found in the form of a GitHub repository. This repository also contains instructions for installing the SDL locally using Docker. 4. Summary and Future Work Conclusion. In summary, SDL represents a significant advancement towards achieving seamless semantic interoperability within data ecosystems. By abstracting the complexities of domain-specific ontologies and providing a user-friendly, multi-layered architecture, SDL democratizes data description and fosters an inclusive ecosystem of data exchange. The frame- work’s potential was evidenced through its applicability in the mobility sector, with future enhancements poised to extend its utility across various domains. The development of SDL aims to improve interoperability between existing data infrastructures while lowering the complexity hurdles of semantic technologies. Limitation. The SDL leverages the LinkML Framework for generating OWL and SHACL graphs, with its limitations bifurcating into: (1) LinkML’s OWL and SHACL generators inade- quately translating defined constraints within the schema to corresponding graphs, exemplified by the non-translated properties such as any_of or equals_string_in, and (2) the incapacity of the LinkML schema to represent certain semantic details expressible in OWL or SHACL. Addressing these limitations is imperative, involving the expansion of the LinkML schema to encompass broader semantic expressions, and enhancing the generators for full property translation. Continued refinement will explore extending the existing LinkML schema and eval- uating alternative frameworks to ensure SDL’s adaptability to future semantic interoperability requirements. Future Work. Moving forward, we have identified several areas for future research. Firstly, we suggest focusing on the enhancement of SDL through the creation of a Knowledge Graph that enables advanced interoperability. This graph should be composed of the descriptions generated by the SDL. Secondly, there is a need to refine SDL’s multi-layered architecture to broaden its adoption. Lastly, further evaluation in a real-world setting is essential to validate the effectiveness and applicability of these advancements. The ultimate goal of these proposed areas of research is to achieve scalable and resilient interoperability, democratize data usage across diverse ecosystems, and accomplish these without the necessity for specialized semantic expertise. Acknowledgments This work was supported by the German Federal Ministry for Economic Affairs and Climate Action and by the European Commission, whose funding has been crucial to our research efforts. We also thank the Core-Working-Group Semantics of the Gaia-X 4 Future Mobility project family for their significant contributions to the Semantic Data Link, enhancing our work on data interoperability and governance. References [1] R. Henßen, M. Schleipen, Interoperability between opc ua and automationml, Procedia CIRP (2014). doi:10.1016/j.procir.2014.10.042. [2] S. Paiva, M. A. A. G. Tripathi, N. Feroz, G. Casalino, Enabling technologies for urban smart mobility: Recent trends, opportunities and challenges., Sensors (2021). doi:10.3390/ s21062143. [3] A. Kouroubali, D. G. Katehakis, The new european interoperability framework as a facilitator of digital transformation for citizen empowerment., Journal of Biomedical Informatics (2019). doi:10.1016/j.jbi.2019.103166. [4] F. Ardjani, D. Bouchihaand, M. Malki, Ontology-alignment techniques: Survey and analysis, International Journal of Modern Education and Computer Science (2015). doi:10.5815/ ijmecs.2015.11.08. [5] M. A. Khoudja, M. Fareh, H. Bouarfa, Ontology matching using neural networks: Survey and analysis, International Conference on Independent Component Analysis and Signal Separation (2018). doi:10.1109/icass.2018.8652049. [6] A. H. Nejhadi, B. Shadgar, A. Osareh, Ontology alignment using machine learning tech- niques, International Journal of Computer Science and Information Technology (2011). doi:10.5121/ijcsit.2011.3210. [7] A. Khamparia, B. Pandey, Comprehensive analysis of semantic web reasoners and tools: a survey, Education and Information Technologies (2017). doi:10.1007/ s10639-017-9574-5. [8] M. Fahad, N. Moalla, A. Bouras, Detection and resolution of semantic inconsistency and redundancy in an automatic ontology merging system, Journal of Intelligence and Information Systems (2012). doi:10.1007/s10844-012-0202-y. [9] X. Liu, Q. Tong, X. Liu, Z. Qin, Ontology matching: State of the art, future challenges, and thinking based on utilized information, IEEE Access (2021). doi:10.1109/access.2021. 3057081. [10] A. Bento, A. Zouaq, M. Gagnon, Ontology matching using convolutional neural networks., International Conference on Language Resources and Evaluation (2020). doi:null. [11] Z. Boukhers, C. Lange, O. Beyan, Enhancing data space semantic interoperability through machine learning: a visionary perspective, The Web Conference (2023). doi:10.1145/ 3543873.3587658. [12] M. Stäbler, T. M. Guggenberger, W. DanDan, R. Mrasek, F. Köster, C. Langdon Schlueter, Bridging Data Domains: Towards Semantic Interoperability in Heterogeneous Data Ecosys- tems and Data Spaces, [Manuscript submitted for publication] (2024). [13] J. Theissen-Lipp, M. Kocher, C. Lange, S. Decker, A. Paulus, A. Pomp, E. Curry, Semantics in dataspaces: Origin and future directions, The Web Conference (2023). doi:10.1145/ 3543873.3587689. [14] P. Hitzler, A review of the semantic web field, Communications of The ACM (2021). doi:10.1145/3397512. [15] A. Patel, S. Jain, Present and future of semantic web technologies: a research statement, International Journal of Computers and Applications (2019). doi:10.1080/1206212x. 2019.1570666. [16] A. Hogan, The semantic web: Two decades on, Social Work (2020). doi:10.3233/ sw-190387. [17] F. Gandon, A survey of the first 20 years of research on semantic web and linked data, Ingénierie Des Systèmes D’information (2018). doi:10.3166/isi.23.3-4.11-38. [18] S. Moxon, H. Solbrig, D. Unni, D. Jiao, R. Bruskiewich, J. Balhoff, G. Vaidya, W. D. Duncan, H. Hegde, M. Miller, M. H. Brush, N. Harris, M. Haendel, C. Mungall, The linked data modeling language (linkml): A general-purpose data modeling framework grounded in machine-readable semantics, International Conference on Biomedical Ontology (2021). doi:null. [19] P.-Y. Vandenbussche, G. A. Atemezing, M. Poveda-Villalón, B. Vatant, Linked open vocab- ularies (lov): a gateway to reusable semantic vocabularies on the web, Sprachwissenschaft (2016). doi:10.3233/sw-160213.