=Paper=
{{Paper
|id=Vol-3816/paper61
|storemode=property
|title=Towards Modular Data Marketplaces
|pdfUrl=https://ceur-ws.org/Vol-3816/paper61.pdf
|volume=Vol-3816
|authors=Soulmaz Gheisari,Semih Yumusak,Jaime Osvaldo Salas,Luis-Daniel Ibáñez,George Konstantinidis,Dumitru Roman
|dblpUrl=https://dblp.org/rec/conf/rulemlrr/GheisariYSI0R24
}}
==Towards Modular Data Marketplaces==
Towards Modular Data Marketplaces
Soulmaz Gheisari1 , Semih Yumusak1 , Jaime Osvaldo Salas1 , Luis-Daniel Ibáñez1 ,
George Konstantinidis1 and Dumitru Roman2
1
Department of Electronics and Computer Science, University of Southampton, Southampton, UK
2
SINTEF, Oslo, Norway
Abstract
Building a monolithic data marketplace is challenging due to complex inter dependencies, leading to cumbersome
and error-prone development where a single failure can disrupt the entire system. To address this, we propose a
modular approach using dynamic plugins in the UPCAST project. Our flexible framework allows components to
be activated or deactivated as needed, enhancing scalability and resilience. By decoupling functionalities into
interchangeable modules, we mitigate the risk of single points of failure, simplify maintenance, and facilitate
customization for more robust marketplace solutions.
Keywords
Data consumer, Data marketplace, Data provider, Negotiation, Privacy and usage control, Resource specification,
Resource discovery
1. Introduction
The UPCAST project1 offers a set of plugins designed to automate data sharing and processing in data
marketplaces, facilitating interactions between data consumers and data providers. We outline several
key components relevant to UPCAST’s plugins (see workflow in Figure 1.). The process begins with the
Data Provider defining the resource specification, detailing the attributes, capabilities, and constraints
of the resource. Next, the Data Consumer examines these specifications through a discovery process to
identify potential resources that meet their needs[1]. Upon finding a suitable resource, the consumer
generates a request to access it. This request undergoes a review process to ensure all privacy and
access control criteria are satisfied. In cases where conflicts are detected, they must be resolved. The
request is then sent to the provider, initiating a negotiation process. Once an agreement is reached,
an UPCAST contract is generated and signed by both the provider and the consumer, finalizing the
agreement.
1.1. Resource Specification Plugin
Data sources are annotated as a dcat:Dataset, with the data model designed as a knowledge graph using
both the DCAT2 and UPCAST vocabularies. Users initiate this plugin to specify the details required to
create a new resource. The creation of a new resource involves the following sub-procedures:
• Import UPCAST vocabulary and domain-specific vocabulary in machine-readable format;
• Define metadata of the resource;
• Define access and usage policies of the resource;
• Assign energy profile to the resource that will be used to optimise the environmental impact;
• Associate price to the resource for further negotiations;
• Create resource profile/summary.
RuleML+RR’24: Companion Proceedings of the 8th International Joint Conference on Rules and Reasoning, September 16–22, 2024,
Bucharest, Romania
Envelope-Open s.gheisari@soton.ac.uk (S. Gheisari); semih.yumusak@soton.ac.uk (S. Yumusak); j.o.salas@soton.ac.uk (J. O. Salas);
l.d.ibanez@soton.ac.uk (L. Ibáñez); g.konstantinidis@soton.ac.uk (G. Konstantinidis); dumitru.roman@sintef.no (D. Roman)
Orcid 0000-0001-8974-2841 (S. Gheisari); 0000-0002-8878-4991 (S. Yumusak); 0000-0002-9353-8955 (J. O. Salas);
0000-0001-6993-0001 (L. Ibáñez); 0000-0002-3962-9303 (G. Konstantinidis); 0000-0001-6397-3705 (D. Roman)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
1
https://www.upcast-project.eu
2
https://www.w3.org/TR/vocab-dcat-3/
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
Figure 1: The Workflow in a Modular Data Marketplace.
1.1.1. Semantic Profiling
The data profiling service generates a profile for a dataset using a specified profiler given dataset meta-
data, sample data or the whole dataset, and other supplementary materials. A number of profilers can
be connected to provide the “plug-and-play” profiling service according to the needs and requirements
of the user, for example, profilers that give statistics on the dataset or provide semantic information
about the data. In UPCAST, the main purpose of the profiling service is to enhance the representation
of data to improve data discoverability, in particular, through semantic profiling.
1.2. Resource Discovery Plugin
The Resource Discovery Plugin acts as an intermediary, facilitating the retrieval of resource specifications.
Resource consumers can request and retrieve information from the available resources provided by
various providers. While searching the knowledge base, users may also find similar sources through
semantic similarity search[2]. Therefore, resource discovery provides the following functionalities for a
consumer:
• A comprehensive search for resources based on the consumer’s intentions;
• Browsing for resources, offering the user an intuitive and efficient way to navigate and explore
the available resources;
• Discovering related/recommended resources, ensuring up-to-date and dynamic results. The
relevant resources graph is continuously updated as new datasets arrive.
Figure 2 illustrates the data model for both resource specification and resource discovery. This model
details the structure and attributes necessary for specifying resources and discovering them within the
UPCAST.
1.3. Privacy and Usage Control Plugin
After the resource specification, the resource provider defines constraints on the resources using Open
Digital Rights Language (ODRL)3 rules, leveraging both the UPCAST and domain-specific vocabularies.
On the other hand, the resource consumer specifies the intentions via a Data Processing Workflow
(DPW) specification and outlines any organisation-specific access and usage control rules, as well as
rules prescribed by applicable regulations (e.g., GDPR). Subsequently, conflict identification occurs
between the provider’s constraints, the consumer’s intentions, and internal rules, making the derivation
of authorisation decisions possible. The functionalities of the plugin can be summarised as below:
• Transform the resource provider constraints to privacy and usage control rules;
3
https://www.w3.org/TR/2018/REC-odrl-model-20180215/
Figure 2: Resource Specification and Discovery Data Model
• Define rules for the resource consumer;
• Manage rules;
• Identify conflicts between the provider’s constraints and the consumer’s intentions;
• Access and usage decision making.
Figure 3 shows the data model of this plugin.
1.4. Negotiation Plugin
Often, the processing intentions of a data consumer for a dataset of their interest differ from what the
data provider is willing to allow. These differences may include the purpose of the processing, the
time interval for which the provider is willing to allow access, or the price to pay. Nevertheless, these
differences are not necessarily irreconcilable, and both parties can often reach an agreement through
negotiation [3]. The Negotiation and Contracting plugin within UPCAST, serves as a pivotal component,
streamlining the complex processes of negotiation and contract management. With its multifaceted
functionality, this plugin facilitates efficient communication and collaboration between data producers
and consumers. First, the plugin provides a Policy Administration Point with a user-friendly graphical
interface, enabling users to define restrictions, privacy, and usage policies in a user-friendly and intuitive
manner. In addition, the Negotiation Plugin serves as a Policy Management Point (PMP) for usage
Figure 3: Privacy and Usage Control Data Model
restrictions by reading machine-readable policies and checking them against information from the
privacy and usage control, environmental impact, and pricing plugins, and automatically reaching
an agreement if there are no policy conflicts. Otherwise, if conflicts are detected, a negotiation will
be initiated, allowing the data provider or consumer to present counteroffers. Figure 4 illustrates the
negotiation and contracting plugin flowchart.
Upon the initiation of a negotiation process, the plugin provides a centralised platform for discussing
terms, pricing, and specifications, allowing users to track, and finalise negotiations seamlessly. Moreover,
the plugin incorporates robust contract management features, allowing users to create, review, and
execute contracts with ease. By automating routine tasks and offering customisable Data Processing
Workflows (DPWs), it enhances data sharing while ensuring compliance with regulatory requirements.
The provider will ultimately decide the negotiation’s outcome by agreeing, rejecting, or sending
another counteroffer. The result of a successful negotiation process is a data sharing contract [4] that
extends the usage control specification defined by the International-Data-Spaces-Association (IDSA) 4 ,
which in turn uses ODRL. Contracts also utilise other ontologies such as the Data Privacy Vocabulary
(DPV)5 , which defines an ontology that allows for the definition of the use, processing and purpose
of processing of data under relevant legislation, notably the GDPR, enabling more descriptive and
technology-independent contracts.
1.4.1. Contract Generation Supported by LLM
The contract generation process within the UPCAST plugin is significantly enhanced by the integration
of Large Language Models (LLMs). These advanced AI models facilitate the automatic generation of
comprehensive and precise contracts based on the negotiation outcomes. By analyzing the details of
the negotiation, including usage policies, pricing structures, and specific data processing requirements,
the LLM can draft contracts that accurately reflect the agreed terms. This automation not only speeds
up the contract creation process but also reduces the risk of human error and ensures that all legal and
regulatory aspects are meticulously addressed. The LLM’s ability to understand and generate natural
language makes it an invaluable tool for creating clear and enforceable contracts, thereby streamlining
the entire negotiation and contracting workflow within the UPCAST platform.
4
https://internationaldataspaces.org/
5
https://w3c.github.io/dpv/dpv/
Figure 4: Negotiation and Contracting Flow Chart
2. Conclusion
In conclusion, this paper has introduced a modular approach to building data marketplaces, addressing
the challenges posed by traditional monolithic systems. By utilising dynamic plugins within the
UPCAST project, our solution provides a flexible framework that enhances scalability, resilience, and
ease of maintenance. The decoupling of functionalities into discrete modules mitigates the risk of
single points of failure and allows for tailored customisation to meet specific marketplace needs. This
approach not only simplifies system upgrades and maintenance but also ensures robust and adaptable
data marketplace solutions, demonstrating significant advantages over conventional monolithic designs.
Acknowledgments
This work was funded by the UKRI Horizon Europe guarantee funding scheme for the Horizon Europe
projects UPCAST (101093216101093216) and RAISE (101093216101058479).
References
[1] G. Konstantinidis, L.-D. Ibáñez, D. Roman, Data marketplaces in the ai economy, in: Symposium
on AI, Data and Digitalization (SAIDD 2023), 2023, p. 38.
[2] R. Sharifpour, M. Wu, X. Zhang, Large-scale analysis of query logs to profile users for dataset
search, Journal of Documentation 79 (2023) 66–85.
[3] W. Fox, Y. Dautaj, International commercial agreements, Kluwer Law International BV, 2023.
[4] J. J. Chen, Multicenter observational studies: Understanding the basics of data sharing and data
user agreements, 2024.