<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Case Study in Federated Data Ecosystems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Matthias Jarke</string-name>
          <email>jarke@dbis.rwth-aachen.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Applied Information Technology FIT</institution>
          ,
          <addr-line>Schloss Birlinghoven, 53757 Sankt Augustin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>RWTH Aachen University</institution>
          ,
          <addr-line>Databases and Information Systems (Computer Science 5), Ahornstrasse 55, 52074 Aachen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In several national and even continental data strategies worldwide, the decentralized data space concept aims to address concerns of data sovereignty among organizations. A significant number of projects and a few already operational data spaces address application domains in industrial domains such as manufacturing, mobility and logistics, or health. However, data sovereignty has also become a key concern of artists and cultural institutions who are pursuing the two-pronged and sometimes conflicting goals of creating added value from data sharing, and protecting their intellectual property and personal privacy rights. Additional challenges of this sector include an orders-of-magnitude larger number of potential players compared to existing data spaces, frequently limited IT capabilities, a complex diferentiated system of data types and regulations, and a novel interplay between heterogeneous data integration and analytics with human creativity, among many others. Also, the diferent evolution paths of the involved sub-communities require a sophisticated concept for federated data space evolution. This keynote talk reports experiences of the ”Data Space Culture”, a lighthouse project of the German Chancellors Ofice aiming at investigating these issues and demonstrating and evaluating a suitable data ecosystem around four use cases in the fields of theaters, museums, music training, and networking local culture communities. We also discuss the potential synergies and interoperation challenges with the many other culture digitization initiatives in Europe and beyond.</p>
      </abstract>
      <kwd-group>
        <kwd>data space</kwd>
        <kwd>GAIA-X</kwd>
        <kwd>culture Informatics</kwd>
        <kwd>data sovereignty</kwd>
        <kwd>data exchange</kwd>
        <kwd>data ecosystem</kwd>
        <kwd>federated data integration</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>The European cultural sector is one of the most important</title>
        <p>ifelds, characterized by a wide diversity in digitalization.
Some large individual players (e.g. the Deutsche Museum
in Munich) have invested intensely in all kinds of digital
upgrading of their artefacts, or (like the movie and music
industries, or the Europeana cultural heritage sector) are
organized by large-scale multimedia data collection and
streaming platforms, or multi-year joint eforts such as
the Europeana or the German Digital Library.
IT infrastructure and limited access to their intended
audiences. Moreover, the potential advantages through
joint value creation through controlled B2B data sharing
without giving up sovereignty of their own data rarely
expoited.
ture as one important application and innovation domain
for the concept of data spaces, but other areas such as
logistics, mobility, and manufacturing or even health have
Bases (VLDBW’23) — Data Ecosystems (DEco), August 28 - September
0000-0001-6169-2942 (M. Jarke)
been earlier in starting specific initiatives.</p>
      </sec>
      <sec id="sec-1-2">
        <title>Encouraged by early regional initiatives in eCulture,</title>
        <p>the Culture Department of the German Chancellor’s
ofifce (BKM Bund) has therefore funded a large-scale
experimental efort to set up a Cultural Data Space. The
project is led by the German Academy of Science and</p>
      </sec>
      <sec id="sec-1-3">
        <title>Technology acatech, the Fraunhofer Institute for Applied</title>
      </sec>
      <sec id="sec-1-4">
        <title>Information Technology FIT, and Hamburgs Ministry of</title>
      </sec>
      <sec id="sec-1-5">
        <title>Culture and Media. In addition, the project (2022-2025) involves representatives of dozens of cultural organizations and research institutions the culture fields of museums,</title>
      </sec>
      <sec id="sec-1-6">
        <title>The analysis of these four use case areas revealed the</title>
        <p>need for some important extensions of the emerging data
space technologies, but also some important implications
for governance and business models. In this keynote
paper, we give a short summary of this initiative and its</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background: Data Spaces for</title>
    </sec>
    <sec id="sec-3">
      <title>Data Sovereignty</title>
      <sec id="sec-3-1">
        <title>The idea of data spaces started in Microsoft research in the early 2000s [1], aiming at a personalized organization for the increasingly heterogeneous swamp of data on personal computers. A major technical challenge</title>
        <p>The European Data Strategy therefore mentions cul- linkages to other data space application domains.</p>
        <p>technical storage formats, bringing database technologies
© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License was semantic modeling and querying across the diferent</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Use Case Requirements for a</title>
    </sec>
    <sec id="sec-5">
      <title>Culture Data Space</title>
      <p>together with information retrieval as well as semantic
networks and later linked open data.</p>
      <p>
        In the 2010s, this original small-scale concept grew
into large muli-player systems in two dimensions. First, Intense political and technical discussions resulted in the
structured data warehouses of the late 1990s evolved to decision to strengthen and validate attractiveness and
Data Lakes which store raw data from many subfields or feasibility of a Culture Data Space by
sub-organizations, and allow data cleaning, detection of
related data, and sophistcated data analytics and machine
learning from many diferent perspectives, without big
upfront integration investment [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The second dimension emphasized decentralization of
data ownership and data usage control. Driven by the
importance of niche players and hidden industry
champions in Europe, an intense debate on data sovereignty
emerged, e.g. in the GAIA-X initiative [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] which led to
European legislations such as the European Data and AI
Acts, and the GDPR.
      </p>
      <p>A minimal data space system involves the following
core functionalities [4? , 5]:
• exploring the possibility of reusing structures and
software components of a somewhat similar
already operational data space, in the field of
human mobility; not surprisingly, this turned out
to be nontrivial, as this data space and even its
the underlying base software was and is rapidly
evolving.
• identifying the adaption and extension needs of a
broad range of cultural fields with specific
highvisibility use cases, such that we could address
the common core functionalities;
• coping with the complex multi-organizational
setting which on the one hand involves cultural
institutions and creativity industries at multiple scales,
and on the other hand regional interests ranging
from local communities to state and federal to
even European level.</p>
      <p>
        To summarize, the data space technology enables the
formation and operation of communities of data
exchange and cooperative, secure value creation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A
data space is thus complementary to large-scale
repositories or data lakes, even though storage can be an elective
functionality ofered or used by certain players within a
data space community.
      </p>
      <p>
        • standardized connectors as a kind of wrapper
gateway for the import and export of data to/from
the organizational information system of a
participant; thus, connectors can serve the technical
roles of data supplier, data requester, or both; the
units of data exchange are often digital shadows The four use cases were designed to analyse and
showof objects or activities which carry some value case specific opportunities and challenges for important
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] individual improvements while jointly generating an
un• broker functionality involving one or more cat- derstanding of the technical and organizational needs
alogs and associated vocabulary support to help for a Culture Workspace. In the sequel, we briefly
sumthe searching and matching between information marize the ideas, approach, and challenges for these use
ofers and requests [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]; cases.
• contract patterns, contracting workflows, and In the last few years, a large number of city- or
countycontract execution monitoring for ensuring wide culture platforms have emerged independently from
sovereign data exchange processes, including the each other, including many creative ideas and tools, such
definition and partially automated monitoring of as, for example, linking information about cultural events
access constraints and usage policies; to local public transportation systems, thus creating
syn• services for the authentification of participants, ergy between more cultural interest, touristic value, and
and the certification of all above-mentioned sys- better environmental sustainability. The use case aims
tem components according to rules of the IS asso- firstly at enabling interoperability between such cultural
ciation, i.e. protection against misuse of the data platforms (e.g. for creating tour proposals across
respace by outsiders gional boundaries), but also with large-scale data
repositories such as the German Digital Library, the more than
1.500 regional archives, and even media companies. The
overload created by this enormously extended oferings
will be reduced by culture-specific personalization tools
across this heterogeneous landscape, without resorting
to a centralization of the data. This use case is jointly
coordinated by the OWL culture platform and the IS
department of Paderborn University.
      </p>
      <p>Museum exhibitions often face the challenge that only
a few original works related to the theme – each often
worth millions – are available locally, others must be
borrowed and insured at enormous cost, or in digital form.</p>
      <p>The reuse of artefacts and processes for later purposes</p>
    </sec>
    <sec id="sec-6">
      <title>4. Technological and</title>
      <p>
        Organizational Implications
• a scaling of participant numbers over previously
studied data space applications by at least two
orders of magnitude;
• a culture-specific extension of access and usage
policies, including the embedding of existing
monetization organizations such as, in Germany,
VG Wort, VG Media, or GEMA for musical
performance rights;
• to avoid double work and unnecessary
inconsistencies, the creation of interoperability and
functional synergy with large public or private
data collections, such as the German and
European digital libraries„ public archives, and
media organizations, but also compatibilities with
the onging eforts towards domain-specific
FAIRcompatible research data infrastructures [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ],
such as NFDI4Culture and also the European Data
      </p>
      <p>
        Space Support Center;
• methods and tools for semantic interoperability
among the many existing and forthcoming
metadata standards in the various cultural domains
but also to related data spaces such as Mobility,
Tourism, European Cultural Heritage, and the
like.
• in addition to the above-mentioned control-level
and metadata infrastructure, also culture-adapted
optimization for the actual data exchange and
value-added processing, including data-kind
specific storage, query,integration, and
personalization services in the highly heterogeneous setting,
following a logic-based approach as in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>
        These in part unique requirements imply careful think- data, and the value creation and fair value appropriation
ing about a suitable data space architecture. Among the [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. It is thus complementary to the large data collection
over 20 open and private proposed infrasstructures on eforts and, abstractly spoken, to the data lake approach.
the market, we considered three open source variants for At present, an initial ”sandbox” data space has been set
the Culture Data Space architecture. up at Fraunhofer which includes first extensions to the
      </p>
      <p>The early data space applications, such as the German central approach, in particular aiming at federated
broMobility Data Space, bundle all the core data space func- kers with intelligent metadata translation facilities. By
tions – except the connectors of the participants in a the end of 2023, two of the four use cases will be in part
single operational organization which provided the core operational, all of them fully a year later. In parallel, the
services such as broker, contract, and identity manage- generic topics of advanced rights management, the
govment. It seemed natural to start with such an already ernance structure of the federated data space ecosystem,
operational infrastructure for the first experiments, and and the future operating companies, and the business
define a growth path for the new requirements. models will be decided.</p>
      <p>For data spaces with tech-savvy large industrial
participants, such as the emerging catena-X data space for
the automotive industry, this architecture was consid- Acknowledgments
ered to limited, but also too intrusive since e.g. usage
policies were intended to be enforced into the individ- This work was supported in part by the Beauftragte fuer
ual connectors of participants, such that they felt to lose Kultur und Medien der Bundesregierung as a lighthouse
some control over their own information systems. They project Datenraum Kultur within the digialization
stratpreferred a solution in which they take full responsibil- egy of the German Federal Government. Special thanks
ity for linking their internal system to connectors. The go to Georgios Toubekis who coordinated the complex
connectors themselves have only a very basic function- use case analyses.
ality, and the participants can add those of the formerly
central services they need as so-called extensions to the References
connectors. The open source development program for
this efort is mostly performed in the EDC connector
initiative of the Eclipse foundation by industrial and science
partners including Fraunhofer.</p>
      <p>However, this approach assumes high IT-/data
management competence of at least the largest part of data
space participants, which is totally unrealistic for most of
the cultural sector. Moreover, diferent user groups and
sub-dataspace communities progress at very diferent
speeds, e.g. depending on whether they can invest their
own money or need to rely on (initial) public support.</p>
      <p>Even from the experiences within the current project, it
seems unimaginable to think of a single governing body
which could make suficiently quick and mutually
acceptable decisions. We therefore opted for a third
architecture which consists of several federated sub-dataspaces
run by a culture domain, a very large data repository
organization, a city or rural community, or perhaps by a
creative industry organisation which would ofer similar
services as the original IDS approach but e.g. federated
brokers and contract support for data exchange across
the boundaries of the subcommunity.</p>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion and Outlook</title>
      <p>Our case studies have shown remarkable additional
attractions and requirements for a Culture Data Space. We
stress again that our understanding of such a data space
focuses on the sovereign sharing of diferent kinds of</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Franklin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Maier</surname>
          </string-name>
          ,
          <article-title>Principles of data space systems</article-title>
          ,
          <source>in: 25th ACM PODS</source>
          , ACM Press,
          <year>2006</year>
          , pp.
          <fpage>751</fpage>
          -
          <lpage>772</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Koutras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarke</surname>
          </string-name>
          ,
          <article-title>Data lakes: a survey of functions and systems</article-title>
          ,
          <source>IEEE Trans. Knowledge and Data Engineering</source>
          <volume>35</volume>
          (
          <year>2023</year>
          )
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>W. G.</given-names>
            <surname>GAIA-X</surname>
          </string-name>
          ,
          <article-title>Reference Architecture Document</article-title>
          ,
          <source>Release</source>
          <volume>21</volume>
          .03,
          <string-name>
            <surname>GAIA-X</surname>
          </string-name>
          ,
          <article-title>European Association for Data</article-title>
          and
          <string-name>
            <surname>Cloud</surname>
            <given-names>AISBL</given-names>
          </string-name>
          ,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Quix</surname>
          </string-name>
          ,
          <article-title>On warehouses, lakes, and spaces - the changing role of conceptual modeling for data integration</article-title>
          , in: J.
          <string-name>
            <surname>Cabot</surname>
          </string-name>
          et al. (eds.): Conceptual Modeling Perspectives, SpringerNature,
          <year>2017</year>
          , pp.
          <fpage>231</fpage>
          -
          <lpage>245</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>B.</given-names>
            <surname>Otto</surname>
          </string-name>
          , M. ten
          <string-name>
            <surname>Hompel</surname>
          </string-name>
          , S. Wrobel,
          <article-title>Designing Data Spaces: The Ecosystem Approach to Competitive Advantage</article-title>
          , SpringerNature,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarke</surname>
          </string-name>
          ,
          <article-title>Data sovereignty and the internet of production</article-title>
          ,
          <source>in: International Conference on Advanced Information Systems Engineering - CAiSE 20</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>549</fpage>
          -
          <lpage>558</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Kolaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          ,
          <article-title>Data exchange: semantics and query answering</article-title>
          ,
          <source>Theoretical Computer Science</source>
          <volume>336</volume>
          (
          <year>2005</year>
          )
          <fpage>89</fpage>
          -
          <lpage>124</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Geisler</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-E. Vidal</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Cappiello</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Bernadette Farias Lóscio</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Gal</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Jarke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Missier</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Otto</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Paja</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Pernici</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Rehof</surname>
          </string-name>
          ,
          <article-title>Knowledge-driven data ecosystems towards data transparency</article-title>
          ,
          <source>ACM Journal of Data and Information Quality JDIQ</source>
          <volume>14</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Liebenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jarke</surname>
          </string-name>
          ,
          <source>Information Systems Engineering with Digital Shadows: Concept and Case Studies, in: International Conference on Advanced Information Systems Engineering - CAiSE 20</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>84</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Widom</surname>
          </string-name>
          ,
          <article-title>Lineage tracing for general data warehouse transformations</article-title>
          ,
          <source>VLDB Journal 12</source>
          (
          <year>2003</year>
          )
          <fpage>41</fpage>
          -
          <lpage>58</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. D. Wilkinson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Dumontier</surname>
            ,
            <given-names>I. J.</given-names>
          </string-name>
          <string-name>
            <surname>Aalbersberg</surname>
            , G. Appleton,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Axton</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Baak</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Blomberg</surname>
            ,
            <given-names>J.-W.</given-names>
          </string-name>
          <string-name>
            <surname>Boiten</surname>
            ,
            <given-names>L. B. da Silva</given-names>
          </string-name>
          <string-name>
            <surname>Santos</surname>
            ,
            <given-names>P. E.</given-names>
          </string-name>
          <string-name>
            <surname>Bourne</surname>
          </string-name>
          , et al.,
          <article-title>The FAIR Guiding Principles for scientific data management and stewardship</article-title>
          ,
          <source>Scientific data 3</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          ,
          <article-title>Direct and Reverse Rewriting in Data Interoperability</article-title>
          ,
          <source>in: International Conference on Advanced Information Systems Engineering</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>13</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>F.</given-names>
            <surname>Piller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Nitsch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Luettgens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mertens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Puetz</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. van Dyck</surname>
          </string-name>
          ,
          <source>Forecasting Next Generation Manufacturing: Digital Shadows</source>
          ,
          <string-name>
            <surname>Human-Machine Collaboration</surname>
          </string-name>
          , and
          <string-name>
            <surname>Data-Driven Business</surname>
            <given-names>Models</given-names>
          </string-name>
          , Springer Contributions to Management Science,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>