FAIRness in Dataspaces: The Role of Semantics for Data Management Marco Hauff1 , Lina Molinas Comet2 , Paul Moosmann2 , Christoph Lange2,3 , Ioannis Chrysakis4,5,6 and Johannes Theissen-Lipp2,3 1 Fraunhofer Institute for Integrated Circuits IIS, Nuremberg, Germany 2 Fraunhofer Institute for Applied Information Technology FIT, Aachen, Germany 3 RWTH Aachen University, Aachen, Germany 4 IDLab, Department of Electronics and Information Systems, Ugent, imec, Belgium 5 DTAI, Department of Computer Science, KU Leuven, Belgium 6 Netcompany-Intrasoft, Research and Innovation Development Department, Luxembourg Abstract Effective data governance and management are necessary but challenging prerequisites for creating value from data assets. Findability, accessibility, interoperability, and reusability are guiding principles for data owners in managing and archiving datasets, known as the FAIR Principles. Dataspaces provide an infrastructure for heterogeneous, multi-source data integration and cross-organizational data sharing that would benefit from FAIR compliance. In this paper, we propose semantics as an approach to ensure data FAIRness, enabling machine-aided discovery and reuse of data in different formats and structures. We conduct a systematic literature review to translate the overarching principles into ten concrete methods that can be implemented using semantic technologies. In addition, we analyze three mature dataspace initiatives for their adherence to the FAIR Principles and describe their specific implementation. In summary, we argue that semantics provide a common and infrastructure-independent foundation for data management in emerging dataspaces. Keywords Dataspaces, Data Spaces, FAIR Data, Semantics, Data Sharing 1. Introduction Data is a valuable strategic resource for competitiveness, driving innovation and the digital transformation of organizations. Business value is created by using and reusing data assets [1]. Consequently, organizations across all industries are adapting their strategies to incorporate data into their processes and take advantage of opportunities for optimization and automation [2]. Relevant data may originate from various sources belonging to different actors in the supply chain or from other market participants, which could also include competitors. Therefore, The Second International Workshop on Semantics in Dataspaces, co-located with the Extended Semantic Web Conference, May 26–27, 2024, Hersonissos, Greece $ marco.hauff@iis.fraunhofer.de (M. Hauff); lina.teresa.molinas.comet@fit.fraunhofer.de (L. M. Comet); paul.moosmann@fit.fraunhofer.de (P. Moosmann); christoph.lange-bever@fit.fraunhofer.de (C. Lange); ioannis.chrysakis@ugent.be (I. Chrysakis); theissen-lipp@dbis.rwth-aachen.de (J. Theissen-Lipp)  0009-0006-1619-8762 (M. Hauff); 0000-0001-5446-6947 (L. M. Comet); 0009-0005-2114-8578 (P. Moosmann); 0000-0001-9879-3827 (C. Lange); 0000-0003-2665-4056 (I. Chrysakis); 0000-0002-2639-1949 (J. Theissen-Lipp) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings data-driven value is often created by combining datasets of multiple actors [3]. By facilitating cross-organizational data sharing, dataspaces are becoming increasingly important for value co- creation from distributed data [4, 5]. Their potential is also being recognized at a political level, and governmental funding for dataspace initiatives such as Gaia-X encourages the development of secure infrastructures for data sharing [6]. Handling the increasing volume of unstructured data necessitates proper data governance and management practices. Typically, decisions on their concrete implementation are left to the data owner and remain undefined [7]. This heterogeneity poses a major challenge for dataspaces. Therefore, various dataspace initiatives agree on the four principles of findability, accessibility, interoperability, and reuse (FAIR) as common data management practices to enable smooth integration of multi-source data [8]. However, the FAIR Principles are described at a general level and allow for various implementation options, complicating standardized data management. Semantics enrich data with context and thus enable automated search and processing. Using common, established semantic vocabularies can support FAIRness – i.e., compliance with the FAIR Data Principles – in dataspaces [8, 9]. Previous literature deals with individual aspects of FAIR that need to be synthesized to provide a guideline for data management in dataspaces. Therefore, we aim to answer the following research question: How can semantics contribute to FAIR compliance in dataspaces? The remainder of the paper is structured as follows. First, we detail the concept of dataspaces, semantics, and the FAIR Principles. Second, we describe our methodological approach. Third, we present the semantic approach we identified for each FAIR Principle. Fourth, we assess three mature dataspace initiatives for their FAIR compliance. Finally, we conclude with a discussion of our findings. 2. Background Dataspaces. Database management systems are limited to structured data that correspond to a predefined schema. Consequently, these traditional systems are unsuitable for integrating and processing the increasing volume of heterogeneous data coming from multiple data sources [9, 10]. Franklin et al. [11] describe dataspaces as an abstraction for managing data regardless of its format or structure. Unlike database management systems, dataspaces do not force the complete integration of data. Instead, they adopt an incremental integration approach and gradually improve data accessibility and interoperability by leveraging Semantic Web technologies for metadata management [11, 12]. Recently, dataspaces have been attributed immense potential for value co-creation from distributed data, providing a decentralized infrastructure for cross- organizational data sharing while maintaining data sovereignty [4, 13]. Large-scale, government- funded initiatives such as Gaia-X1 aim to establish a standardized platform that facilitates data sharing and minimizes the effort required to integrate multi-source datasets. Repetitive and low-level administrative tasks such as search functionality, naming conventions, data lineage, and access management should be reduced and streamlined [9, 11]. So far, the dataspaces initiatives are still in their infancy and standards have yet to be defined. 1 https://gaia-x.eu FAIR Data Principles. Data governance and management are essential for the reusability of datasets. In many cases, individual procedures are applied that remain undefined and opaque, thus complicating or preventing data reuse by third parties. Wilkinson et al. [7] define four foundational principles, known as the FAIR data principles, to standardize inconsistent data management and archiving practices in the scientific environment. Within the FAIRification process, digital resources become manageable. The underlying four principles are interconnected but can be implemented independently of each other. Besides the actual data, the process also includes the algorithms, tools, and workflows relevant to data collection. Overall, compliance with the FAIR Principles reduces human intervention by enabling machines to automatically discover, process, and integrate data [7, 14]. Efficient data management in line with the FAIR Principles is considered particularly important for dataspaces to facilitate data sharing [15]. Semantics. A common language for describing data assets is crucial for managing and sharing data in dataspaces [10]. Semantics are predefined vocabularies to describe relationships between data entities in a machine-readable format, supporting data integration and enabling automated data discovery [16]. The variety of terminologies poses a critical challenge for communication and uniform understanding between different parties [10]. Seamless data integration from multiple sources requires semantic interoperability, i.e., a shared understanding of the vocabulary. Ontologies aim to reduce semantic heterogeneity through standardized and generally accepted vocabularies [17]. 3. Methodology This paper investigates and answers the research question through a systematic literature review. The process is based on the structures proposed by vom Brocke et. al [18] and Webster & Watson [19], which describe the path from general to specific results. Following vom Brocke et. al [18], we begin our research by conceptualizing the topic and delineating the search fields: ‘dataspaces’, ‘FAIR Data Principles’, and ‘semantics’. The purpose of this study is to examine the correlation between semantics and FAIR dataspaces. To achieve this, we consider each of the FAIR Data Principles separately. Thus, we use the search string: "dataspace*" OR "data Space*" AND ("FAIR" OR "find*" OR "access*" OR "interoper*" OR "reuse*") AND "semantic*". In the second step, we search the following databases for relevant literature: ACM Digital Library, AIS eLibrary, IEEE Xplore, and Scopus. For ACM and Scopus, we limit the search to the title, keywords, and abstract. The search was conducted in February 2024. ACM yielded n=30 articles, AIS n=58, IEEE Xplore n=225, and Scopus n=163 articles. We evaluate the identified articles in an initial review by scanning the title, abstract, and keywords for relevance. Articles that do not address any of our defined search fields are excluded. Duplicates and non-peer-reviewed articles are also removed. After this first phase, we are left with n=78 articles. We then perform an in-depth screening of the abstract, methodology, results, discussion, and conclusion, removing those that do not address synergies between semantics and FAIR dataspaces. After this screening, n=26 articles remain. In addition, we perform a forward and backward search as proposed in [19], resulting in n=21 additional articles. These articles are analyzed in the same cycle as those retrieved directly from the search string, resulting in a total of n=37 articles. 4. Results From the literature, we extract the semantic approaches that are applicable in dataspaces. We code and list the ten most frequently mentioned ones, assign them to the underlying FAIR Principle according to their intended use and create from this the requirements profile for dataspaces, as shown in Table 1. Then, based on the analysis of architecture documents and code repositories, we compare the extent to which the semantics in the three currently most popular dataspace initiatives ensure FAIRness according to these criteria. 4.1. FAIRness Requirements Findability. To ensure findability in dataspaces, three requirements can be met by semantics: self-descriptions, metadata, and catalogs. Digital resources require globally unique and persistent identifiers (PIDs), such as Uniform Resource Identifiers, specified in RFC 39862 . These identifiers are widely used in current initiatives, such as Gaia-X, allowing resources to be addressed, modified, and shared separately, even across multiple dataspaces [8]. In addition to PIDs, the self-description of resources also includes information about the entities’ attributes and relationships, often using subject-predicate-object triples to comply with metadata [20, 21]. Therefore, the Resource Description Framework (RDF) data model is commonly used to model these data structures. Using RDF vocabularies, such as the Data Catalog Vocabulary (DCAT)3 , dataspace participants can exchange standardized information, including asset name, type, and lineage. This results in data structures that are both human and machine actionable. Catalogs and search tools, such as the RDF query language SPARQL, allow users to access shared information and browse descriptions to find valuable data for their specific use case [22]. Accessibility. Semantics are crucial for accessibility in FAIR-compliant dataspaces, particu- larly regarding authentication, authorization, and pipeline. Like assets, dataspace participants receive a unique and descriptive identity, managed and verified by external identity providers. Trust between identities is ensured through cryptographic verification, which allows only au- thenticated actors to engage within the dataspace [23, 24]. Due to the sensitivity of data, global sharing is not desired. Authorization for access and usage rights is required. Thus, participants manage their offerings, data or services, via policy frameworks to determine the use and scope of data. Standards such as the Open Digital Rights Language (ODRL) facilitate the creation, com- munication, and execution of contracts in the data space [25, 26]. Leveraging semantics enables machines to understand interfaces and processes, and thus allows for automation [27, 28]. Interoperability. Dataspaces promote the exchange of data between parties. To ensure a seamless process, dataspaces must be interoperable. This requires participants to exchange data to be able to comprehend and integrate them. Semantics can aid in standardization and integration. Usually, dataspaces offer a vocabulary hub, providing an overview of commonly es- tablished and standardized terms and vocabularies as examples and best practices for describing data, services, and contracts [37, 46]. The hub facilitates the exchange of documentation among 2 https://datatracker.ietf.org/doc/html/rfc3986 3 https://www.w3.org/TR/vocab-dcat-3/ Table 1 Requirements for FAIRness in dataspaces Requirement Objective References FINDABLE Self-Description Use unique identifiers and metadata self- 8, 20, 21, 22, 29 descriptions to accurately and unambiguously characterize participants, services, and assets. Metadata Use standard vocabularies and ontologies, to 10, 21, 29, 30, 31, ensure data is described accurately and com- 32, 33 prehensibly. Catalog Use a publicly available catalog that aggre- 11, 21, 34, 35, 36, gates participants, services and assets to en- 37, 38, 39 able discoverability. ACCESSIBLE Authentication Use a self-description to participate in the 23, 24, 29, 40, 41 dataspace that can be verified by an external identity provider Authorization Use policies to manage data sharing among 25, 26, 29, 42, 43, participants and determine the constitutions 44 and parties involved. Pipeline Use service descriptions to automate interac- 20, 21, 27, 28, 45 tions and processing pipeline. INTEROPERABLE Standardization Use shared ontologies and community stan- 10, 29, 30, 32, 33, dards to ensure consistent understanding and 37, 46, 47 interoperability. Integration Use reasoning and aggregation techniques to 10, 36, 48, 49, 50, incorporate heterogeneous data sources. 51 REUSABLE Reliability Use verification and validation tools to en- 32, 48, 52, 53, 54. hance the integrity, quality, and usability of data assets. Enrichment Use contextual metadata annotations to en- 26, 29, 31, 36, 45, hance data assets and facilitate advanced 48, 52 queries and links. relevant parties, promoting a shared understanding [29]. Therefore, it enables the integration of data from different sources, such as multiple datasets in a structured format (e.g., RDF), which can be aggregated and queried together. This allows to reveal previously unknown connections between data [50, 51]. Reusability. All of the previously mentioned principles contribute to the concept of reusabil- ity. For example, if an entity is easily findable, it is more likely to be reused. Similarly, easily comprehensible entities by following a common standard increase the likelihood of reuse. Addi- tionally, semantics can improve the reliability and enrichment of data. Reliability assessment and assurance can be achieved by using shapes to verify the format of incoming data and automating standardization, as mentioned under interoperability. RDF validation is often performed against shapes defined in the Shapes Constraint Language (SHACL) or Shape Expressions (ShEx), which define the requirements that an RDF graph must meet [53, 54]. Enrichment can be applied throughout the data lifecycle, mapping raw data into a structured format and augmenting it with additional information from external sources [45]. 4.2. State of Development Several endeavors aim to define principles and guidelines for the creation and development of dataspaces and to transfer concepts from theory into practice. Today, several well-known initiatives in various application areas have reached distinct stages of development and maturity. Initially, such initiatives define the requirements, principles and specifications to be considered when implementing dataspaces. Subsequently, these specifications can lead to implementations or elements that facilitate the establishment of dataspaces in which different actors or partici- pants can exchange data. These implementations can then be sector-specific, such as Catena-X4 in the automotive sector, in Industry 4.05 with a focus on manufacturing, Prometheus-X6 in the education and skill sector, or with more general developments such as FAIR Data Spaces7 at the interface between research and various application domains. It is also worth mentioning that additional projects looking to facilitate the implementation of dataspaces exist, such as the Eclipse Dataspace Components (EDC)8 , whose adoption in several projects is documented. The EDC is a framework for data sharing in a cross-organizational and sovereign manner supporting specifications from IDS and Gaia-X. One of its main components is the EDC Connector, a well-defined interface exposing the dataspace participants’ back-office infrastructure, aiming to bring together their otherwise incompatible and without access/usage control infrastructure. The connector provides functionalities such as discovering, connecting, automated contract negotiation, policy enforcement, and auditing processes. The Minimum Viable Dataspace9 is a sample implementation of the EDC, leveraging it and showing its capa- bilities. Currently, these two do not support semantics, but one could extend the connector to include such functionality. Other implementations of connectors exist, such as those listed in [55]. The Sovity Connector10 , based on the EDC and the Dataspace Connector, extends the functionality of the EDC. For example, it allows the EDC to communicate with the catalog called IDS Metadata Broker 11 . In this paper, we solely focus on assessing the leading and most frequently mentioned initia- tives in research. The most commonly noted initiatives in research include Gaia-X, International Data Spaces (IDS), and the European Open Science Cloud (EOSC). These initiatives define the architectural frameworks for dataspaces [56]. Our assessment of the initiatives consists of reviewing the official documentation of the specifications and the repositories with available 4 https://catena-x.net/en/about-us/operating-environment-1 5 https://www.plattform-i40.de/IP/Redaktion/EN/Downloads/Publikation/PositionPaper-DataSpace.html 6 https://dataspace.prometheus-x.org/ 7 https://www.nfdi.de/fair-data-spaces/?lang=en 8 https://github.com/eclipse-edc 9 https://github.com/eclipse-edc/MinimumViableDataspace 10 https://github.com/sovity 11 https://github.com/International-Data-Spaces-Association/metadata-broker-open-core code. In this way, we want to find out whether the written specifications reflect concrete implementations and how these align with the dataspace requirements introduced in Table 1. The results of our assessment are shown in Table 2. There, we summarize our findings as follows: the symbol ✓ indicates that the requirement is fully implemented, (✓) shows that the requirement is only partially supported, i.e., it is only present as a specification but not yet implemented. However, a * indicates that the requirement is not even part of the specification. Table 2 The evaluated FAIRness of current dataspace initiatives Gaia-X IDS EOSC Self-Description ✓ ✓ (✓) Metadata (✓) ✓ ✓ Catalog ✓ ✓ ✓ Authentication ✓ ✓ (✓) Authorization ✓ ✓ (✓) Pipeline * * * Standardization ✓ ✓ ✓ Integration (✓) * * Reliability ✓ ✓ * Enrichment * * ✓ Gaia-X. Gaia-X aims to establish an ecosystem, whereby data is shared and made available in a trustworthy environment. To achieve this goal, Gaia-X defines several federation services, where a federation implements and federates a dataspace. These services are grouped into four sets, namely Identity & Trust, Federated Catalogue, Sovereign Data Exchange, and Compliance12 . One major reference implementation of federation services is provided by the XFSC (Cross Federation Service Components) repositories under the Eclipse Foundation13 . The Identity & Trust set of services contains an Authentication/Authorization service, which implements these requirements as defined in Table 1. The Federated Catalogue set consists of the Federated Catalogue as well as tool support for self-descriptions. The implementation of the XFSC Federated Catalogue enables the management of self-descriptions and also allows for the validation against SHACL shapes as a measure towards reliability. The topic of Metadata and Standardization are addressed in the Gaia-X Trust Framework14 . In this framework, a Gaia-X ontology is specified, which has to be used to describe all participants and services of a Gaia-X dataspace. Additionally, constraints are specified, which are modelled using SHACL. All ontology and SHACL graphs can be accessed using the Gaia-X Registry15 . The Gaia-X specifications also specify the re-use of certain metadata vocabularies such as DCAT, for example, for the data exchange services16 . The topic of integration is partly specified by the notion of service 12 https://www.gxfs.eu/set-of-services/ 13 https://gitlab.eclipse.org/eclipse/xfsc/. 14 https://docs.gaia-x.eu/policy-rules-committee/trust-framework/22.10/ 15 https://registry.lab.gaia-x.eu/v1/docs 16 https://docs.gaia-x.eu/technical-committee/data-exchange/latest/dewg/ compositions17 , which allow the aggregation of several services. This way, a service can re-use data that emerged from the application of another service to apply further processing steps. the remaining requirements defined in Table 1, namely pipeline and enrichment, are currently not addressed by Gaia-X specifications or implementations. IDS. The latest developments of the IDS Dataspace Protocol18 reflect the specifications re- garding the Dataspace Information Model, which covers the definitions of the main concepts to be considered in IDS-based dataspaces. The implementation of connectors in an IDS dataspace context must respond with a JSON-LD data object compliant with JSON Schemas and SHACL shapes. Moreover, participants describe themselves and their resources and infrastructure. These self-descriptions can be registered, published, queried, and maintained by the IDS Metadata Broker. The self-descriptions are metadata, and the use of existing semantic web standards is favored. In this sense, the IDS Information Model [23] is modeled as an RDF/OWL ontology and reuses concepts of the DCAT, ODRL, Time, DQV, and other vocabularies19 . Moreover, SHACL shapes are available for testing20 . There is also an implementation of the Dataspace Connector21 , which uses the IDS Messaging Services for the functionalities and message handling, as speci- fied in the IDS Reference Architecture Model 4.022 and integrates the IDS Information Model. Moreover, the Catalog Protocol specifies how a data consumer requests a catalog from a catalog service. Such a catalog is DCAT and ODRL compliant. In the latest version of IDS RAM 4.0, the Identity Authority role includes specifications about the authorization functionalities and the clearing house23 , which serves as an intermediary to provide clearing and settling services for the data exchange transactions in the IDS. Such a component is similar to the Data Exchange Logging Service from Gaia-X24 . Additionally, the Dynamic Attribute Provisioning Service25 is part of the Identity Provider to verify the attributes of the participants and connectors in the dataspace. EOSC. The EOSC initiative addresses the FAIR Principles, with interoperability as a core concept, and aims to create a shared data space for research, science and innovation data man- agement while ensuring the protection of data through EU laws [57]. The specified requirements regarding semantic interoperability are the following: there should be a definition of the con- cepts, their metadata and data schemas, and they should be publicly available; semantic artefacts should be FAIR, available preferably using open licenses, and have associated documentation, and support maintenance; a metadata model, based on existing standards, should be available to allow discovery over existing federated research data and metadata; there should be building blocks and protocols to facilitate the federation and harvesting of semantic artefacts catalogs. 17 docs.gaia-x.eu/technical-committee/architecture-document/latest/component_details/#service-composition 18 https://docs.internationaldataspaces.org/ids-knowledgebase/v/dataspace-protocol/overview/readme 19 https://international-data-spaces-association.github.io/InformationModel/docs/index.html 20 https://github.com/International-Data-Spaces-Association/InformationModel/tree/develop/testing 21 https://github.com/International-Data-Spaces-Association/DataspaceConnector 22 https://docs.internationaldataspaces.org/ids-knowledgebase/v/ids-ram-4/layers-of-the-reference-architecture-model/ 3-layers-of-the-reference-architecture-model/3-1-business-layer/3_1_1_roles_in_the_ids 23 https://github.com/International-Data-Spaces-Association/IDS-G/tree/main/Components/ClearingHouse 24 http://docs.gaia-x.eu/technical-committee/architecture-document/latest/enabling_services/ 25 https://github.com/International-Data-Spaces-Association/IDS-G The technical layer defines a common security and privacy framework covering authorization and authentication functionalities. The FAIRCORE4EOSC26 project focuses on the development of core components27 for the EOSC namely the Compliance Assessment Toolkit to provide services related to policies and vocabulary services; the EOSC Data Type Registry to register the PID metadata elements including provenance information; the Metadata Schema and Crosswalk Registry to allow register users to create, register and version schemas and crosswalks with PIDs; the EOSC PID Meta Resolver to map items into records; the Research Activity Identifier Service to provide persistent identifiers for research projects; the EOSC Research Discovery Graph Service to allow discovery of EOSC elements from the catalog (resources and communities); the EOSC Research Software APIs and Connectors to guarantee the enduring preservation of research software across various disciplines. Although metadata registration and vocabularies services are mentioned, the concept of self-descriptions is not noted. Summary. Table 2 shows that two of the initiatives, Gaia-X and IDS, satisfy most of the requirements for semantics, while EOSC has specifications for such requirements, but they are not all implemented. It also shows that these initiatives mostly lack specifications for the more specific functionalities such as pipeline, integration, and enrichment requirements for dataspaces, except EOSC which offers more advanced features covering research discovery through their metadata. Such findings indicate that the initiatives are currently focused on defining the main elements of dataspaces. These efforts cover aspects like authentication and authorization, offering a catalog of available resources for transfer, ensuring self-descriptions and metadata to identify these resources, promoting standardization by using existing (W3C) standards, and providing SHACL shapes for validation and reliability. Another important remark, based on our research of the repositories of the initiatives and connectors, is that some IDS components, such as the IDS Connector are not currently main- tained in their original repositories. However, Sovity is supporting the maintenance of the IDS Connector, and extending some of its functionalities. Regarding Gaia-X, some specifications and implementations have not been updated since their initial release, e.g., the implementations of the XFSC components implement the specifications given by the 21.03 version of the Trust Framework, which has since then been replaced by the newer 22.10 version28 . 26 https://faircore4eosc.eu/ 27 https://faircore4eosc.eu/eosc-core-components 28 https://docs.gaia-x.eu/framework/?tab=software 5. Discussion and Conclusion In our paper, we derive a framework to assess the semantic FAIRness of dataspaces, which we directly apply to investigate the FAIRness of three mature initiatives. Our framework provides a basis for rating further initiatives and third-party developments that also deal with the development of dataspaces. Research and innovation projects such as MobiSpaces [58], Green.Dat.AI29 or Flex4Res30 , which aim to use dataspace technology, have not yet been assessed. The same applies to projects funded by companies or through private programs that already provide the essential elements for setting up dataspaces, e.g., Solid [59]. Theoretical Contribution. This paper synthesizes the existing literature to provide a com- prehensive overview of the evolution and current state of semantics in dataspaces. By extending the FAIR Principles, we propose a framework with tangible requirements that provide semantic clarity and interoperability. Accordingly, our paper establishes a foundation for evaluating future dataspace approaches based on the extended FAIR Principles, promoting a structured and objective assessment process. Practical Implication. Connector implementations can leverage our extended FAIR frame- work to derive the requirements that a dataspace should fulfill. Vocabularies could be further developed and concretized specifically for the requirements found to cover potential demands. Limitations. The analysis is limited to a specific number of dataspace developments, po- tentially missing emerging trends and niche innovations. Insight into the development and operational intricacies of these dataspaces is limited, relying on publicly available information and academic publications. Standardization processes within dataspaces often have a pay-as- you-go approach, which introduces variability in the implementation and adherence to the proposed FAIR Principles, affecting the generalizability of our findings. Conclusion. Our literature review suggests that semantic approaches in theory and practice can guide the FAIRness of dataspaces. Semantic tools focus on standardization, establishing a uniform understanding through shared and standardized vocabularies. They support all FAIR Data principles and contribute to improvements. Utilizing standardized identifiers, shared ontologies, and rich contextual metadata, semantics enhance the individual aspects of the FAIR Principles and create a more interconnected and efficient data ecosystem. Current dataspace initiatives demonstrate similar approaches, with most FAIR Principles already guaranteed by semantics. However, there are additional approaches available, especially in automation such as our identified requirements pipeline, integration, and enrichment. Future directions could see dataspaces evolving to facilitate automatic data exchange and analysis, thereby improving the efficiency of data use. Capable and reliable integration systems can provide a larger data basis that can be further enhanced through enrichment. This includes further research into the development of advanced, dynamic ontologies and the automation of ontology matching 29 https://greendatai.eu/ 30 https://www.flex4res.eu/ for the seamless integration of different data sources. To ensure scalability, the continuous optimization of graph databases and the use of parallel processing to efficiently manage large amounts of data are essential. In addition, developing secure, ethical semantic technologies to protect privacy and promote responsible data use is a necessity if stakeholders are to share their data. Acknowledgments This publication is based upon work from COST Action DKG (CA19134), supported by COST (European Cooperation in Science and Technology). This work has been partially funded by the German Federal Ministry for Economic Affairs and Climate Action (BMWK) through the Antrieb 4.0 project (Grant No. 13IK015B), by the European Union’s funded Projects MobiSpaces (Grant agreement no 101070279), Green.DAT.AI (Grant agreement no 101070416), AgriDataValue (Grant agreement no 101086461) and by the FAIR Data Spaces project of the German Federal Ministry of Education and Research (BMBF) under the grant number FAIRDS05. References [1] F. von Scherenberg, M. Hellmeier, B. Otto, Data sovereignty in information systems, Electronic Markets 34 (2024) 1–11. doi:10.1007/s12525-024-00693-4. [2] M. Rüßmann, M. Lorenz, P. Gerbert, M. Waldner, J. Justus, P. Engel, M. Harnisch, Industry 4.0: The future of productivity and growth in manufacturing industries, Boston consulting group 9 (2015) 54–89. [3] J. Gelhaar, B. Otto, Challenges in the emergence of data ecosystems, in: PACIS 2020 Proceedings, 2020. URL: https://aisel.aisnet.org/pacis2020/175. [4] C. Cappiello, A. Gal, M. Jarke, J. Rehof, Data Ecosystems: Sovereign Data Exchange among Organizations (Dagstuhl Seminar 19391), Dagstuhl Reports 9 (2020) 66–134. doi:10.4230/ DagRep.9.9.66. [5] P. Singh, M. J. Beliatis, M. Presser, Enabling edge-driven dataspace integration through convergence of distributed technologies, Internet of Things 25 (2024) 1–33. doi:10.1016/ j.iot.2024.101087. [6] A. Seidel, K. Wenzel, A. Hänel, ..., H. Ernst, Towards a seamless data cycle for space components: considerations from the growing european future digital ecosystem gaia-x, CEAS Space Journal (2023) 1–14. doi:10.1007/s12567-023-00500-4. [7] M. D. Wilkinson, M. Dumontier, I. J. Aalbersberg, ..., B. Mons, The fair guiding principles for scientific data management and stewardship, Scientific Data 3 (2016) 1–9. doi:10. 1038/sdata.2016.18. [8] J. Theissen-Lipp, M. Kocher, C. Lange, S. Decker, A. Paulus, A. Pomp, E. Curry, Semantics in dataspaces: Origin and future directions, in: Companion Proceedings of the ACM Web Conference 2023, WWW ’23 Companion, Association for Computing Machinery, New York, NY, USA, 2023, p. 1504–1507. doi:10.1145/3543873.3587689. [9] E. Curry, Future research directions for dataspaces, data ecosystems, and intelligent systems, Real-time Linked Dataspaces: Enabling Data Ecosystems for Intelligent Systems (2020) 297–304. doi:10.1007/978-3-030-29665-0_18. [10] A. Halevy, A. Rajaraman, J. Ordille, Data integration: the teenage years, in: Proceedings of the 32nd International Conference on Very Large Data Bases, VLDB ’06, VLDB Endowment, 2006. [11] M. Franklin, A. Halevy, D. Maier, From databases to dataspaces: a new abstraction for information management, ACM Sigmod Record 34 (2005) 27–33. doi:10.1145/1107499. 1107502. [12] A. Halevy, Z. Ives, J. Madhavan, P. Mork, D. Suciu, I. Tatarinov, The piazza peer data management system, IEEE Transactions on Knowledge and Data Engineering 16 (2004) 787–798. doi:10.1109/TKDE.2004.1318562. [13] N. C. Kuicheu, N. Wang, F. T. G. Narcisse, D. Xu, S. Francois, Building semantic relationships incrementally in dataspace, in: 2009 First International Conference on Information Science and Engineering, IEEE, 2009. doi:10.1109/ICISE.2009.370. [14] M. Scheffler, M. Aeschlimann, M. Albrecht, ..., C. Draxl, Fair data enabling new horizons for materials research, Nature 604 (2022) 635–642. doi:10.1038/s41586-022-04501-x. [15] A. Kotsev, M. Minghini, R. Tomas, V. Cetl, M. Lutz, From spatial data infrastructures to data spaces—a technological perspective on the evolution of european sdis, ISPRS International Journal of Geo-Information 9 (2020) 1–19. doi:10.3390/ijgi9030176. [16] E. Curry, S. Scerri, T. Tuikka, Data Spaces: Design, Deployment, and Future Directions, Springer International Publishing, Cham, 2022. doi:10.1007/978-3-030-98636-0_1. [17] K.-D. Schewe, B. Thalheim, Semantics in data and knowledge bases, Springer Berlin Heidelberg, Berlin, Heidelberg, 2008. doi:10.1007/978-3-540-88594-8_1. [18] J. v. Brocke, A. Simons, B. Niehaves, B. Niehaves, K. Reimer, R. Plattfaut, A. Cleven, Reconstructing the giant: On the importance of rigour in documenting the literature search process, in: ECIS 2009 Proceedings, European Conference on Information Systems, 2009. URL: https://aisel.aisnet.org/ecis2009/161. [19] J. Webster, R. T. Watson, Analyzing the past to prepare for the future: Writing a literature review, MIS Quarterly 26 (2002) xiii–xxiii. URL: http://www.jstor.org/stable/4132319. [20] S. P. Stier, X. Xu, L. Gold, M. Möckel, Ontology-based battery production dataspace and its interweaving with artificial intelligence-empowered data analytics, Energy Technology (2024) 8–13. doi:10.1002/ente.202301305. [21] H. B. Nasrabadi, T. Hanke, M. Weber, ..., B. Skrotzki, Toward a digital materials mechanical testing lab, Computers in Industry 153 (2023) 1–15. doi:10.1016/j.compind.2023. 104016. [22] O. Hartig, C. Bizer, J.-C. Freytag, Executing sparql queries over the web of linked data, in: The Semantic Web - ISWC, 2009. doi:10.1007/978-3-642-04930-9_19. [23] S. Bader, J. Pullmann, C. Mader, ..., C. Lange, The international data spaces information model – an ontology for sovereign exchange of digital content, Lecture Notes in Computer Science 12507 LNCS (2020) 176 – 192. doi:10.1007/978-3-030-62466-8_12. [24] S. Stubblebine, R. Wright, An authentication logic with formal semantics supporting synchronization, revocation, and recency, IEEE Transactions on Software Engineering 28 (2002) 256–285. doi:10.1109/32.991320. [25] H. J. M. Bastiaansen, S. Dalmolen, M. Kollenstart, T. M. van Engers, User-centric network- model for data control with interoperable legal data sharing artefacts, in: Pacific Asia Conference on Information Systems, 2020. doi:10.1016/j.compind.2023.104016. [26] S. Opriel, F. Möller, U. Burkhardt, B. Otto, Requirements for usage control based exchange of sensitive data in automotive supply chains, in: Proceedings of the 54th Hawaii International Conference on System Sciences, 2021. doi:10.24251/HICSS.2021.051. [27] F. Amato, V. Casola, A. Gaglione, A. Mazzeo, A semantic enriched data model for sensor network interoperability, Simulation Modelling Practice and Theory 19 (2011) 1745–1757. doi:10.1016/j.simpat.2010.09.010. [28] F. Burzlaff, C. Bartelt, A conceptual architecture for enabling future self-adaptive service systems, in: 52nd Hawaii International Conference on System Sciences, HICSS 2019, Atlanta, GA, 2019. URL: https://madoc.bib.uni-mannheim.de/49901/. [29] C. Schwede, J. Cirullies, On-demand shared digital twins – an information architectural model to create transparency in collaborative supply networks, in: Proceedings of the 54th Hawaii International Conference on System Sciences, 2021, pp. 1675–1684. doi:10. 24251/HICSS.2021.202. [30] C. Meghini, Linked open data & metadata, in: Handbook of Digital Public History, De Gruyter Oldenbourg, Berlin, Boston, 2022. doi:doi:10.1515/9783110430295-039. [31] D. Paparova, Exploring the ontological status of data: A process-oriented approach, in: ECIS 2023 Research Papers, 2023. URL: https://aisel.aisnet.org/ecis2023_rp/299. [32] S. Scheider, F. Lauf, F. Möller, B. Otto, A reference system architecture with data sovereignty for human-centric data ecosystems, Business & Information Systems Engineering 65 (2023) 577–595. doi:10.1007/s12599-023-00816-9. [33] T. Wessel, K. Heuing, M. Schlangen, B. Schnieders, M. Algermissen, Rare diseases, digitiza- tion, and the national action league for people with rare diseases (namse), Bundesgesund- heitsblatt 65 (2022) 1119 – 1125. doi:10.1007/s00103-022-03597-w. [34] E. Curry, E. Curry, Fundamentals of real-time linked dataspaces, Real-time Linked Dataspaces: Enabling Data Ecosystems for Intelligent Systems (2020) 63–80. [35] N. Jahnke, B. Otto, Data catalogs in the enterprise: Applications and integration, Datenbank-Spektrum 23 (2023) 89–96. doi:10.1007/s13222-023-00445-2. [36] M. Franklin, A. Halevy, D. Maier, A first tutorial on dataspaces, Proc. VLDB Endow. 1 (2008) 1516–1517. doi:10.14778/1454159.1454217. [37] J. Möller, D. Jankowski, A. Hahn, Towards an architecture to support data access in research data spaces, in: 2021 IEEE 22nd International Conference on Information Reuse and Integration for Data Science (IRI), 2021. doi:10.1109/IRI51335.2021.00049. [38] J. Umbrich, M. Karnstedt, J. X. Parreira, A. Polleres, M. Hauswirth, Linked data and live querying for enabling support platforms for web dataspaces, in: 2012 IEEE 28th International Conference on Data Engineering Workshops, 2012. doi:10.1109/ICDEW. 2012.55. [39] S. V. Manisekaran, J. Sathishkumar, A fuzzy based semantic search engine for document retrieval in a personalized data space, in: International Conference on Recent Trends in Information Technology (ICRTIT), 2016. doi:10.1109/ICRTIT.2016.7569577. [40] Á. Alonso, A. Pozo, J. M. Cantera, F. De la Vega, J. J. Hierro, Industrial data space architecture implementation using fiware, Sensors 18 (2018) 1–18. doi:10.3390/s18072226. [41] I. Elsayed, P. Brezany, A. Tjoa, Towards realization of dataspaces, in: 17th International Workshop on Database and Expert Systems Applications (DEXA’06), 2006. doi:10.1109/ DEXA.2006.140. [42] J. Hernandez, L. McKenna, R. Brennan, Tikd: A trusted integrated knowledge dataspace for sensitive healthcare data sharing, in: 2021 IEEE 45th Annual Computers, Software, and Applications Conference (COMPSAC), 2021. doi:10.1109/COMPSAC51774.2021. 00280. [43] L. Jin, Y. Zhang, X. Ye, An extensible data model with security support for dataspace man- agement, in: 2008 10th IEEE International Conference on High Performance Computing and Communications, 2008. doi:10.1109/HPCC.2008.70. [44] P. de Alencar Silva, R. Fadaie, M. van Sinderen, Towards a digital twin for simulation of organizational and semantic interoperability in ids ecosystems, Proceedings of the Workshop of I-ESA 3214 (2022). URL: https://api.semanticscholar.org/CorpusID:252599898. [45] L. Sánchez, J. Lanza, J. R. Santana, ..., N. Crespi, Data enrichment toolchain: A data linking and enrichment platform for heterogeneous data, IEEE Access 11 (2023) 103079–103091. doi:10.1109/ACCESS.2023.3317705. [46] C. Roda, E. Navarro, C. E. Cuesta, A comparative analysis of linked data tools to support architectural knowledge, in: Integrated Spatial Databases, 2014. URL: https: //api.semanticscholar.org/CorpusID:10317475. [47] W. Lin, C. Hu, Y. Li, X. Cheng, Virtual dataspace – a service oriented model for scientific big data, in: 2013 Fourth International Conference on Emerging Intelligent Data and Web Technologies, 2013. doi:10.1109/EIDWT.2013.5. [48] S. R. Jeffery, M. J. Franklin, A. Y. Halevy, Pay-as-you-go user feedback for dataspace systems, in: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data, SIGMOD ’08, Association for Computing Machinery, 2008, pp. 847–860. doi:10. 1145/1376616.1376701. [49] Y. Li, X. Meng, Research on personal dataspace management, in: Proceedings of the 2nd SIGMOD PhD Workshop on Innovative Database Research, IDAR ’08, Association for Computing Machinery, 2008. doi:10.1145/1410308.1410311. [50] H. Belani, P. Šolić, T. Perković, An industrial iot-based ontology development for well-being, aging and health: A scoping review, in: 2022 IEEE International Conference on E-health Networking, Application & Services (HealthCom), 2022. doi:10.1109/HealthCom54947. 2022.9982769. [51] N. Dessi, B. Pes, Towards scientific dataspaces, in: 2009 IEEE/WIC/ACM International Joint Conference on Web Intelligence and Intelligent Agent Technology, volume 3, 2009. doi:10.1109/WI-IAT.2009.353. [52] R. A. Buchmann, D. Karagiannis, Enriching linked data with semantics from domain- specific diagrammatic models, Business & Information Systems Engineering 58 (2016) 341–353. doi:10.1007/s12599-016-0445-1. [53] S. Staworko, I. Boneva, J. E. Labra Gayo, S. Hym, E. G. Prud’hommeaux, H. Solbrig, Complexity and Expressiveness of ShEx for RDF, in: 18th International Conference on Database Theory (ICDT), 2015. doi:10.4230/LIPIcs.ICDT.2015.195. [54] K. Thornton, H. Solbrig, G. S. Stupp, L..., A. Waagmeester, Using shape expressions (shex) to share rdf data models and to guide curation with rigorous validation, in: The Semantic Web, Cham, 2019. doi:10.1007/978-3-030-21348-0_39. [55] G. Giussani, S. Steinbuss, Data Connector Report, Technical Report, International Data Spaces Association, 2022. URL: https://internationaldataspaces.org/wp-content/uploads/ dlm_uploads/Data-Connector-Report-1.pdf, accessed: 2024-02-05. [56] M. Atzori, A. Ciaramella, C. Diamantini, B. Martino, S. Distefano, T. Facchinetti, F. Mon- tecchiani, A. Nocera, G. Ruffo, R. Trasarti, et al., Dataspaces: Concepts, architectures and initiatives, in: The 2nd Italian Conference on Big Data and Data Science, volume 3606, CEUR-WS, 2024. URL: https://hdl.handle.net/11584/389724. [57] O. Corcho, M. Eriksson, K. Kurowski, M. Ojsteršek, C. Choirat, M. Sanden, F. Coppens, EOSC interoperability framework – Report from the EOSC Executive Board Working Groups FAIR and Architecture, Publications Office, 2021. doi:10.2777/620649. [58] C. Doulkeridis, G. M. Santipantakis, N. Koutroumanis, ..., M. Falsetta, Mobispaces: An archi- tecture for energy-efficient data spaces for mobility data, in: 2023 IEEE International Confer- ence on Big Data (BigData), IEEE, 2023. doi:10.1109/BigData59044.2023.10386539. [59] S. Meckler, R. Dorsch, D. Henselmann, A. Harth, The web and linked data as a solid foundation for dataspaces, in: Companion Proceedings of the ACM Web Conference 2023, 2023, pp. 1440–1446. doi:10.1145/3543873.3587616.