<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Modular Data Marketplaces</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Soulmaz</forename><surname>Gheisari</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Electronics and Computer Science</orgName>
								<orgName type="institution">University of Southampton</orgName>
								<address>
									<settlement>Southampton</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Semih</forename><surname>Yumusak</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Electronics and Computer Science</orgName>
								<orgName type="institution">University of Southampton</orgName>
								<address>
									<settlement>Southampton</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Jaime</forename><forename type="middle">Osvaldo</forename><surname>Salas</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Electronics and Computer Science</orgName>
								<orgName type="institution">University of Southampton</orgName>
								<address>
									<settlement>Southampton</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luis-Daniel</forename><surname>Ibáñez</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Electronics and Computer Science</orgName>
								<orgName type="institution">University of Southampton</orgName>
								<address>
									<settlement>Southampton</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">George</forename><surname>Konstantinidis</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Department of Electronics and Computer Science</orgName>
								<orgName type="institution">University of Southampton</orgName>
								<address>
									<settlement>Southampton</settlement>
									<country key="GB">UK</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Dumitru</forename><surname>Roman</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">SINTEF</orgName>
								<address>
									<settlement>Oslo</settlement>
									<country key="NO">Norway</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Modular Data Marketplaces</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">6E24745EA06814A539E888E50A6459EC</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:27+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Data consumer</term>
					<term>Data marketplace</term>
					<term>Data provider</term>
					<term>Negotiation</term>
					<term>Privacy and usage control</term>
					<term>Resource specification</term>
					<term>Resource discovery</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Building a monolithic data marketplace is challenging due to complex inter dependencies, leading to cumbersome and error-prone development where a single failure can disrupt the entire system. To address this, we propose a modular approach using dynamic plugins in the UPCAST project. Our flexible framework allows components to be activated or deactivated as needed, enhancing scalability and resilience. By decoupling functionalities into interchangeable modules, we mitigate the risk of single points of failure, simplify maintenance, and facilitate customization for more robust marketplace solutions.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>The UPCAST project 1 offers a set of plugins designed to automate data sharing and processing in data marketplaces, facilitating interactions between data consumers and data providers. We outline several key components relevant to UPCAST's plugins (see workflow in Figure <ref type="figure" target="#fig_0">1</ref>.). The process begins with the Data Provider defining the resource specification, detailing the attributes, capabilities, and constraints of the resource. Next, the Data Consumer examines these specifications through a discovery process to identify potential resources that meet their needs <ref type="bibr" target="#b0">[1]</ref>. Upon finding a suitable resource, the consumer generates a request to access it. This request undergoes a review process to ensure all privacy and access control criteria are satisfied. In cases where conflicts are detected, they must be resolved. The request is then sent to the provider, initiating a negotiation process. Once an agreement is reached, an UPCAST contract is generated and signed by both the provider and the consumer, finalizing the agreement.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.">Resource Specification Plugin</head><p>Data sources are annotated as a dcat:Dataset, with the data model designed as a knowledge graph using both the DCAT 2 and UPCAST vocabularies. Users initiate this plugin to specify the details required to create a new resource. The creation of a new resource involves the following sub-procedures:</p><p>• Import UPCAST vocabulary and domain-specific vocabulary in machine-readable format; • Define metadata of the resource; • Define access and usage policies of the resource; • Assign energy profile to the resource that will be used to optimise the environmental impact; • Associate price to the resource for further negotiations; • Create resource profile/summary.</p><p>RuleML+RR'24: Companion Proceedings of the 8th International Joint Conference on Rules and Reasoning, September 16-22, 2024, Bucharest, Romania Envelope s.gheisari@soton.ac.uk (S. Gheisari); semih.yumusak@soton.ac.uk (S. Yumusak); j.o.salas@soton.ac.uk (J. O. Salas); l.d.ibanez@soton.ac.uk (L. Ibáñez); g.konstantinidis@soton.ac.uk (G. Konstantinidis); dumitru.roman@sintef.no (D. Roman) Orcid 0000-0001-8974-2841 (S. Gheisari); 0000-0002-8878-4991 (S. Yumusak); 0000-0002-9353-8955 (J. O. Salas); 0000-0001-6993-0001 (L. Ibáñez); 0000-0002-3962-9303 (G. Konstantinidis); 0000-0001-6397-3705 (D. Roman) </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1.1.">Semantic Profiling</head><p>The data profiling service generates a profile for a dataset using a specified profiler given dataset metadata, sample data or the whole dataset, and other supplementary materials. A number of profilers can be connected to provide the "plug-and-play" profiling service according to the needs and requirements of the user, for example, profilers that give statistics on the dataset or provide semantic information about the data. In UPCAST, the main purpose of the profiling service is to enhance the representation of data to improve data discoverability, in particular, through semantic profiling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Resource Discovery Plugin</head><p>The Resource Discovery Plugin acts as an intermediary, facilitating the retrieval of resource specifications. Resource consumers can request and retrieve information from the available resources provided by various providers. While searching the knowledge base, users may also find similar sources through semantic similarity search <ref type="bibr" target="#b1">[2]</ref>. Therefore, resource discovery provides the following functionalities for a consumer:</p><p>• A comprehensive search for resources based on the consumer's intentions; • Browsing for resources, offering the user an intuitive and efficient way to navigate and explore the available resources; • Discovering related/recommended resources, ensuring up-to-date and dynamic results. The relevant resources graph is continuously updated as new datasets arrive.</p><p>Figure <ref type="figure" target="#fig_1">2</ref> illustrates the data model for both resource specification and resource discovery. This model details the structure and attributes necessary for specifying resources and discovering them within the UPCAST.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.3.">Privacy and Usage Control Plugin</head><p>After the resource specification, the resource provider defines constraints on the resources using Open Digital Rights Language (ODRL) <ref type="foot" target="#foot_0">3</ref> rules, leveraging both the UPCAST and domain-specific vocabularies.</p><p>On the other hand, the resource consumer specifies the intentions via a Data Processing Workflow (DPW) specification and outlines any organisation-specific access and usage control rules, as well as rules prescribed by applicable regulations (e.g., GDPR). Subsequently, conflict identification occurs between the provider's constraints, the consumer's intentions, and internal rules, making the derivation of authorisation decisions possible. The functionalities of the plugin can be summarised as below:</p><p>• Transform the resource provider constraints to privacy and usage control rules; Figure <ref type="figure" target="#fig_2">3</ref> shows the data model of this plugin.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4.">Negotiation Plugin</head><p>Often, the processing intentions of a data consumer for a dataset of their interest differ from what the data provider is willing to allow. These differences may include the purpose of the processing, the time interval for which the provider is willing to allow access, or the price to pay. Nevertheless, these differences are not necessarily irreconcilable, and both parties can often reach an agreement through negotiation <ref type="bibr" target="#b2">[3]</ref>. The Negotiation and Contracting plugin within UPCAST, serves as a pivotal component, streamlining the complex processes of negotiation and contract management. With its multifaceted functionality, this plugin facilitates efficient communication and collaboration between data producers and consumers. First, the plugin provides a Policy Administration Point with a user-friendly graphical interface, enabling users to define restrictions, privacy, and usage policies in a user-friendly and intuitive manner. In addition, the Negotiation Plugin serves as a Policy Management Point (PMP) for usage restrictions by reading machine-readable policies and checking them against information from the privacy and usage control, environmental impact, and pricing plugins, and automatically reaching an agreement if there are no policy conflicts. Otherwise, if conflicts are detected, a negotiation will be initiated, allowing the data provider or consumer to present counteroffers. Figure <ref type="figure" target="#fig_3">4</ref> illustrates the negotiation and contracting plugin flowchart.</p><p>Upon the initiation of a negotiation process, the plugin provides a centralised platform for discussing terms, pricing, and specifications, allowing users to track, and finalise negotiations seamlessly. Moreover, the plugin incorporates robust contract management features, allowing users to create, review, and execute contracts with ease. By automating routine tasks and offering customisable Data Processing Workflows (DPWs), it enhances data sharing while ensuring compliance with regulatory requirements.</p><p>The provider will ultimately decide the negotiation's outcome by agreeing, rejecting, or sending another counteroffer. The result of a successful negotiation process is a data sharing contract <ref type="bibr" target="#b3">[4]</ref> that extends the usage control specification defined by the International-Data-Spaces-Association (IDSA) <ref type="foot" target="#foot_1">4</ref> , which in turn uses ODRL. Contracts also utilise other ontologies such as the Data Privacy Vocabulary (DPV)<ref type="foot" target="#foot_2">5</ref> , which defines an ontology that allows for the definition of the use, processing and purpose of processing of data under relevant legislation, notably the GDPR, enabling more descriptive and technology-independent contracts.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.4.1.">Contract Generation Supported by LLM</head><p>The contract generation process within the UPCAST plugin is significantly enhanced by the integration of Large Language Models (LLMs). These advanced AI models facilitate the automatic generation of comprehensive and precise contracts based on the negotiation outcomes. By analyzing the details of the negotiation, including usage policies, pricing structures, and specific data processing requirements, the LLM can draft contracts that accurately reflect the agreed terms. This automation not only speeds up the contract creation process but also reduces the risk of human error and ensures that all legal and regulatory aspects are meticulously addressed. The LLM's ability to understand and generate natural language makes it an invaluable tool for creating clear and enforceable contracts, thereby streamlining the entire negotiation and contracting workflow within the UPCAST platform. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Conclusion</head><p>In conclusion, this paper has introduced a modular approach to building data marketplaces, addressing the challenges posed by traditional monolithic systems. By utilising dynamic plugins within the UPCAST project, our solution provides a flexible framework that enhances scalability, resilience, and ease of maintenance. The decoupling of functionalities into discrete modules mitigates the risk of single points of failure and allows for tailored customisation to meet specific marketplace needs. This approach not only simplifies system upgrades and maintenance but also ensures robust and adaptable data marketplace solutions, demonstrating significant advantages over conventional monolithic designs.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: The Workflow in a Modular Data Marketplace.</figDesc><graphic coords="2,72.00,65.61,451.29,145.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Resource Specification and Discovery Data Model</figDesc><graphic coords="3,72.00,65.61,469.50,390.00" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Privacy and Usage Control Data Model</figDesc><graphic coords="4,72.00,65.60,451.27,222.46" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Negotiation and Contracting Flow Chart</figDesc><graphic coords="5,188.07,65.61,216.65,325.50" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_0">https://www.w3.org/TR/2018/REC-odrl-model-20180215/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_1">https://internationaldataspaces.org/</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_2">https://w3c.github.io/dpv/dpv/</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was funded by the UKRI Horizon Europe guarantee funding scheme for the Horizon Europe projects UPCAST (101093216101093216) and RAISE (101093216101058479).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Data marketplaces in the ai economy</title>
		<author>
			<persName><forename type="first">G</forename><surname>Konstantinidis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L.-D</forename><surname>Ibáñez</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Roman</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Symposium on AI, Data and Digitalization</title>
				<meeting><address><addrLine>SAIDD</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page">38</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Large-scale analysis of query logs to profile users for dataset search</title>
		<author>
			<persName><forename type="first">R</forename><surname>Sharifpour</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Documentation</title>
		<imprint>
			<biblScope unit="volume">79</biblScope>
			<biblScope unit="page" from="66" to="85" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Fox</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Dautaj</surname></persName>
		</author>
		<title level="m">International commercial agreements</title>
				<imprint>
			<publisher>Kluwer Law International BV</publisher>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<title level="m" type="main">Multicenter observational studies: Understanding the basics of data sharing and data user agreements</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Chen</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
