<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Soulmaz Gheisari</string-name>
          <email>s.gheisari@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Semih Yumusak</string-name>
          <email>semih.yumusak@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaime Osvaldo Salas</string-name>
          <email>j.o.salas@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luis-Daniel Ibáñez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>George Konstantinidis</string-name>
          <email>g.konstantinidis@soton.ac.uk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dumitru Roman</string-name>
          <email>dumitru.roman@sintef.no</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Bucharest, Romania</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Electronics and Computer Science, University of Southampton</institution>
          ,
          <addr-line>Southampton</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>SINTEF</institution>
          ,
          <addr-line>Oslo</addr-line>
          ,
          <country country="NO">Norway</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Building a monolithic data marketplace is challenging due to complex inter dependencies, leading to cumbersome and error-prone development where a single failure can disrupt the entire system. To address this, we propose a modular approach using dynamic plugins in the UPCAST project. Our flexible framework allows components to be activated or deactivated as needed, enhancing scalability and resilience. By decoupling functionalities into interchangeable modules, we mitigate the risk of single points of failure, simplify maintenance, and facilitate customization for more robust marketplace solutions.</p>
      </abstract>
      <kwd-group>
        <kwd>Data consumer</kwd>
        <kwd>Data marketplace</kwd>
        <kwd>Data provider</kwd>
        <kwd>Negotiation</kwd>
        <kwd>Privacy and usage control</kwd>
        <kwd>Resource specification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>1.1. Resource Specification Plugin</title>
        <p>Data sources are annotated as a dcat:Dataset, with the data model designed as a knowledge graph using
both the DCAT2 and UPCAST vocabularies. Users initiate this plugin to specify the details required to
create a new resource. The creation of a new resource involves the following sub-procedures:
• Import UPCAST vocabulary and domain-specific vocabulary in machine-readable format;
• Define metadata of the resource;
• Define access and usage policies of the resource;
• Assign energy profile to the resource that will be used to optimise the environmental impact;
• Associate price to the resource for further negotiations;
• Create resource profile/summary.
CEUR</p>
        <p>ceur-ws.org</p>
        <sec id="sec-1-1-1">
          <title>1.1.1. Semantic Profiling</title>
          <p>The data profiling service generates a profile for a dataset using a specified profiler given dataset
metadata, sample data or the whole dataset, and other supplementary materials. A number of profilers can
be connected to provide the “plug-and-play” profiling service according to the needs and requirements
of the user, for example, profilers that give statistics on the dataset or provide semantic information
about the data. In UPCAST, the main purpose of the profiling service is to enhance the representation
of data to improve data discoverability, in particular, through semantic profiling.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Resource Discovery Plugin</title>
        <p>
          The Resource Discovery Plugin acts as an intermediary, facilitating the retrieval of resource specifications.
Resource consumers can request and retrieve information from the available resources provided by
various providers. While searching the knowledge base, users may also find similar sources through
semantic similarity search[
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Therefore, resource discovery provides the following functionalities for a
consumer:
• A comprehensive search for resources based on the consumer’s intentions;
• Browsing for resources, ofering the user an intuitive and eficient way to navigate and explore
the available resources;
• Discovering related/recommended resources, ensuring up-to-date and dynamic results. The
relevant resources graph is continuously updated as new datasets arrive.
        </p>
      </sec>
      <sec id="sec-1-3">
        <title>1.3. Privacy and Usage Control Plugin</title>
        <p>After the resource specification, the resource provider defines constraints on the resources using Open
Digital Rights Language (ODRL)3 rules, leveraging both the UPCAST and domain-specific vocabularies.
On the other hand, the resource consumer specifies the intentions via a Data Processing Workflow
(DPW) specification and outlines any organisation-specific access and usage control rules, as well as
rules prescribed by applicable regulations (e.g., GDPR). Subsequently, conflict identification occurs
between the provider’s constraints, the consumer’s intentions, and internal rules, making the derivation
of authorisation decisions possible. The functionalities of the plugin can be summarised as below:
• Transform the resource provider constraints to privacy and usage control rules;
• Define rules for the resource consumer;
• Manage rules;
• Identify conflicts between the provider’s constraints and the consumer’s intentions;
• Access and usage decision making.</p>
      </sec>
      <sec id="sec-1-4">
        <title>1.4. Negotiation Plugin</title>
        <p>
          Often, the processing intentions of a data consumer for a dataset of their interest difer from what the
data provider is willing to allow. These diferences may include the purpose of the processing, the
time interval for which the provider is willing to allow access, or the price to pay. Nevertheless, these
diferences are not necessarily irreconcilable, and both parties can often reach an agreement through
negotiation [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The Negotiation and Contracting plugin within UPCAST, serves as a pivotal component,
streamlining the complex processes of negotiation and contract management. With its multifaceted
functionality, this plugin facilitates eficient communication and collaboration between data producers
and consumers. First, the plugin provides a Policy Administration Point with a user-friendly graphical
interface, enabling users to define restrictions, privacy, and usage policies in a user-friendly and intuitive
manner. In addition, the Negotiation Plugin serves as a Policy Management Point (PMP) for usage
restrictions by reading machine-readable policies and checking them against information from the
privacy and usage control, environmental impact, and pricing plugins, and automatically reaching
an agreement if there are no policy conflicts. Otherwise, if conflicts are detected, a negotiation will
be initiated, allowing the data provider or consumer to present counterofers. Figure 4 illustrates the
negotiation and contracting plugin flowchart.
        </p>
        <p>Upon the initiation of a negotiation process, the plugin provides a centralised platform for discussing
terms, pricing, and specifications, allowing users to track, and finalise negotiations seamlessly. Moreover,
the plugin incorporates robust contract management features, allowing users to create, review, and
execute contracts with ease. By automating routine tasks and ofering customisable Data Processing
Workflows (DPWs), it enhances data sharing while ensuring compliance with regulatory requirements.</p>
        <p>
          The provider will ultimately decide the negotiation’s outcome by agreeing, rejecting, or sending
another counterofer. The result of a successful negotiation process is a data sharing contract [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] that
extends the usage control specification defined by the International-Data-Spaces-Association (IDSA) 4,
which in turn uses ODRL. Contracts also utilise other ontologies such as the Data Privacy Vocabulary
(DPV)5, which defines an ontology that allows for the definition of the use, processing and purpose
of processing of data under relevant legislation, notably the GDPR, enabling more descriptive and
technology-independent contracts.
        </p>
        <sec id="sec-1-4-1">
          <title>1.4.1. Contract Generation Supported by LLM</title>
          <p>The contract generation process within the UPCAST plugin is significantly enhanced by the integration
of Large Language Models (LLMs). These advanced AI models facilitate the automatic generation of
comprehensive and precise contracts based on the negotiation outcomes. By analyzing the details of
the negotiation, including usage policies, pricing structures, and specific data processing requirements,
the LLM can draft contracts that accurately reflect the agreed terms. This automation not only speeds
up the contract creation process but also reduces the risk of human error and ensures that all legal and
regulatory aspects are meticulously addressed. The LLM’s ability to understand and generate natural
language makes it an invaluable tool for creating clear and enforceable contracts, thereby streamlining
the entire negotiation and contracting workflow within the UPCAST platform.
4https://internationaldataspaces.org/
5https://w3c.github.io/dpv/dpv/</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Conclusion</title>
      <p>In conclusion, this paper has introduced a modular approach to building data marketplaces, addressing
the challenges posed by traditional monolithic systems. By utilising dynamic plugins within the
UPCAST project, our solution provides a flexible framework that enhances scalability, resilience, and
ease of maintenance. The decoupling of functionalities into discrete modules mitigates the risk of
single points of failure and allows for tailored customisation to meet specific marketplace needs. This
approach not only simplifies system upgrades and maintenance but also ensures robust and adaptable
data marketplace solutions, demonstrating significant advantages over conventional monolithic designs.</p>
    </sec>
    <sec id="sec-3">
      <title>Acknowledgments</title>
      <p>This work was funded by the UKRI Horizon Europe guarantee funding scheme for the Horizon Europe
projects UPCAST (101093216101093216) and RAISE (101093216101058479).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Konstantinidis</surname>
          </string-name>
          , L.
          <string-name>
            <surname>-D. Ibáñez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Roman</surname>
          </string-name>
          ,
          <article-title>Data marketplaces in the ai economy</article-title>
          ,
          <source>in: Symposium on AI, Data and Digitalization (SAIDD</source>
          <year>2023</year>
          ),
          <year>2023</year>
          , p.
          <fpage>38</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Sharifpour</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>X. Zhang,</surname>
          </string-name>
          <article-title>Large-scale analysis of query logs to profile users for dataset search</article-title>
          ,
          <source>Journal of Documentation</source>
          <volume>79</volume>
          (
          <year>2023</year>
          )
          <fpage>66</fpage>
          -
          <lpage>85</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W.</given-names>
            <surname>Fox</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Dautaj</surname>
          </string-name>
          , International commercial agreements, Kluwer Law International BV,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Multicenter observational studies: Understanding the basics of data sharing and data user agreements</article-title>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>