<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Data Ecosystems - Fuelling the Digital Age</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Henderik A. Proper</string-name>
          <email>E.Proper@acm.org</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Luxembourg Institute of Science and Technology (LIST)</institution>
          ,
          <addr-line>Belval</addr-line>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Luxembourg</institution>
          ,
          <country country="LU">Luxembourg</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the increased digitisation of society comes an increase in the role of data. Business analytics, statistics-based AI, the development of digital twins, etc, are typical examples of “data hungry” applications. Such, “data hungry” applications not only need data in different shapes and forms, they also need data from a wide variety of sources. The systems involved in gathering, storing, processing, analysing, and visualising data, have evolved to be complex systems themselves, involving many actors of widely differing nature. We argue that, as such, these complex systems can be best thought of as 'data ecosystems', which we see as involving the entire complex of social / physical / digital actors which provide, own, sell, buy, exchange, manipulate, store, and use, data. Within these data ecosystems, one needs to deal with technical concerns regarding reliability, performance, interoperability, semantics, etc, as well as social concerns, such as value of data, privacy, trust, ownership, ethics, risk, etc. In line with this, we argue that there is a need to define / study 'data ecosystems' more closely, where we see a potential future role for the VMBO community.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Our society is transitioning from the industrial age to the digital age. With the
increasing digitisation of society comes an increase in the role of data. Data is gathered from
sensors, consequently stored, processed, analysed and visualised, and is eventually
consumed by (human and / or digital) actors to enable them to gain insight and / or make
informed decisions.</p>
      <p>
        Business analytics, statistics-based AI, the development of digital twins, etc, are
typical examples of modern-day “data hungry” applications. For example, data is
essential for the training of statistics-based AI and the development of digital twins [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], while
also enabling enterprises to continuously assess their performance in real-time [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and
learn to improve their operations [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Industry uses phrases such as thriving on data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
to underline the potential value of data. Meanwhile, we have all grown familiar with the
possibilities, as well as the possible positive and negative consequences, of large scale
data collection and utilisation as conducted by e.g. Google, Facebook, etc.
      </p>
      <p>The, “data hungry” applications need to be “fuelled” with a wide variety of data
resources. For example, ranging from: raw observations from different sensors /
informants, processed and / or enriched artefacts in terms of e.g. predictive models,
representations of intentions (e.g. plans, strategy documents, designs, etc), specifications</p>
      <p>(source code, work procedures, etc), or norms (regulations, principles, policies, etc).
Next to that, such applications also need data from a wide variety of sources, requiring
the need to transfer ownership of data, or at least a transfer of the right to use the data.</p>
      <p>
        We specifically use the term data, as, in line with e.g. [
        <xref ref-type="bibr" rid="ref11 ref3">11, 3</xref>
        ], we see information as
the increment in knowledge / insights which an actor gets when “consuming” data. As
such, data are “mere” explicitly represented artefacts that could have value to (human
and / or digital) actors in the sense that it may provide them with relevant / timely
information.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Data ecosystems and their development</title>
      <p>
        As a result of the growing role of data as a key underlying resource, the systems
involved in gathering, storing, processing, analysing, and visualising data have evolved
to become complex systems themselves, involving different actors with their own
interests. We argue that, as such, these complex systems can be best thought of as ‘data
ecosystems’, which we see as involving the entire complex web of social / physical /
digital actors (i.e. an ActorWeb [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]), which provide, own, sell, buy, exchange,
manipulate, store, and use, data.
      </p>
      <p>
        Within these data ecosystems, one needs to deal with technical concerns
regarding reliability, performance, interoperability, semantics, etc, as well as social concerns,
such as value of data, privacy, trust, ownership, ethics, risk, etc. For instance, as the data
involved may pertain to (the behaviour of) humans, privacy and ethical considerations
may clearly play a role. Furthermore, as the data has some correspondence to
“something” in the social, economical, or physical world, it is important to consider quality
of this correspondence. At the same time, some actors may have an interest in
maliciously changing the data, thus distorting this correspondence. Data also comes with
the question of ownership. Data may be of strategic value to some actors, leading them
to want to control / sell the access for others. For instance, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] provides an interesting
perspective on this in terms of a personal data market.
      </p>
      <p>A data ecosystem can also be regarded as a “data-management enterprise”, i.e.
a networked enterprise with “data-management” as its primary business, where
datamanagement refers to all data related activities (gathering, exchanging, manipulating,
storing, using, etc.). Such a “data-management enterprise” will typically be embedded
in a larger enterprise, where the latter focuses on a “regular” products / services.</p>
      <p>The development of data ecosystems, as “data-management enterprises”, can clearly
benefit from the use of enterprise modelling approaches. As such, the above
considerations directly apply, while at the same time suggesting the need to more specifically
capture data ownership, data lineage, value of data (to specific stakeholders), access
control, data regulations, etc.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Research challenges</title>
      <p>We conclude this discussion paper with some some possible research challenges in
relation to data ecosystems. They are certainly not intended as a complete list of
challenges, but should rather provide a starting point for a broader discussion at the VMBO
workshop.</p>
      <p>
        Data as a key resource – It is clear that data is a key resource in a data ecosystem. As
such, it generates several important questions:
What is the (potential) value of data? How to assess / express this?
What does ownership of data mean, also in relationship to “the original” (e.g.
behaviour / properties of a human being), and associated privacy concerns.
How to model the ownership, access to, the (potential!) value of data, etc, as well as
associated risks?
How to take these elements into due consideration when designing / developing /
evolving data ecosystems?
Trust at the core – Exchanging data requires trust between the (human) actors involved,
regarding (1) the way they handle the data and / or access to the data, (2) on how the
data is gathered (quality of data), and (3) the way data is used (ethics and privacy). This
results in several challenges:
What is “trust” in the context of data ecosystems, and what can threaten such trust?
How to conduct a risk analysis on how data is handled?
How to nurture / increase trust between different stakeholders?
Does the notion of “privacy by design” work in an (open and evolving) data ecosystem?
How to identify system risks for data ecosystem, and how to manage these?
Regulation of data ecosystems – Regulators are likely to have a need to regulate the
risks (see above), privacy concerns of data ecoststems, as well as possibly other
properties. This results in challenges such as:
To what extent can data ecosystems be regulated at all, given their open, and evolving,
nature?
How to express, and enforce, regulations on data ecosystems?
What are the possible risks that need regulation?
Data needs semantics – With the large amounts of data available to us, it is important
to also capture its informational semantics. Both to enable re-use and relating
(interoperability between) different data sources. Of course this takes us back to semantic
modelling [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and information modelling [
        <xref ref-type="bibr" rid="ref12 ref2">2, 12</xref>
        ], as well as (foundational) ontology
approaches [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. This leads to the following broad challenge:
How to re-apply old (but proven) semantic / information / ontology modelling
approaches to continuously capture the semantics of (evolving) data streams flowing
between the web of actors involved in a data ecosystem?
From data to information – Data, in itself, is “just” a passive resource. Even enriched
data (e.g. predictive models, digital twins, etc) is. Data does not become “activated”
until an actor (human or digital) becomes informed by it in the context of learning,
decision making, etc. In doing so, the actor “gleans” information from the data (as a
potential information carrier [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]). In the context of “the web”, finding the right data
carriers to relinquish one’s information need was already a major challenge. In the
context of data ecosystems, this challenge will only grow, leading to the following broad
challenges: How to evolve / extend existing search / discovery techniques form
information retrieval / discovery towards data ecosystems?
How to apply different techniques for visualisation, verbalisation, audiofication, etc,
to make data better accessible to human actors, to increase the information they may
glean from the data?
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Capgemini. TechnoVision 2012 - Bringing Business</surname>
          </string-name>
          Technology to Life. Research report, Utrecht, the Netherlands,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P. P.</given-names>
            <surname>Chen. The Entity-Relationship</surname>
          </string-name>
          <string-name>
            <surname>Model</surname>
          </string-name>
          :
          <article-title>Towards a Unified View of Data</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <fpage>9</fpage>
          -
          <lpage>36</lpage>
          ,
          <year>March 1976</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Falkenberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. A.</given-names>
            <surname>Verrijn-Stuart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Voss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Hesse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Lindgreen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. E.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L. H.</given-names>
            <surname>Oei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Rolland</surname>
          </string-name>
          , and R. K. Stamper, editors.
          <source>A Framework of Information Systems Concepts</source>
          .
          <source>IFIP WG 8</source>
          .1 Task
          <string-name>
            <surname>Group</surname>
            <given-names>FRISCO</given-names>
          </string-name>
          , IFIP, Laxenburg, Austria,
          <year>1998</year>
          . ISBN:
          <fpage>3</fpage>
          -
          <lpage>901</lpage>
          - 88201-4
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>R.</given-names>
            <surname>Farrelly</surname>
          </string-name>
          and
          <string-name>
            <given-names>E. K.</given-names>
            <surname>Chew</surname>
          </string-name>
          .
          <article-title>Designing a primary personal information market as an industry platform: a service innovation approach</article-title>
          .
          <source>In Hawaii International Conference on System Sciences 2017 (HICSS)</source>
          ,
          <year>01 2017</year>
          . doi:
          <volume>10</volume>
          .24251/HICSS.
          <year>2017</year>
          .556
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>M.</given-names>
            <surname>Grieves</surname>
          </string-name>
          .
          <article-title>Virtually Intelligent Product Systems: Digital and Physical Twins</article-title>
          . In S. Flumerfelt,
          <string-name>
            <given-names>K. G.</given-names>
            <surname>Schwartz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Mavris</surname>
          </string-name>
          , and S. Briceno, editors,
          <source>Complex Systems Engineering: Theory and Practice</source>
          , pages
          <fpage>175</fpage>
          -
          <lpage>200</lpage>
          . American Institute of Aeronautics and Astronautics,
          <year>2019</year>
          . ISBN:
          <fpage>978</fpage>
          -
          <lpage>1624105647</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>N.</given-names>
            <surname>Guarino</surname>
          </string-name>
          .
          <article-title>Formal Ontology and Information Systems</article-title>
          . In N. Guarino, editor,
          <source>Proceedings of FOIS'98</source>
          ,
          <string-name>
            <surname>Trento</surname>
          </string-name>
          , Italy, pages
          <fpage>3</fpage>
          -
          <lpage>15</lpage>
          , Amsterdam, the Netherlands,
          <year>June 1998</year>
          . IOS Press.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>G.</given-names>
            <surname>Guizzardi</surname>
          </string-name>
          . On Ontology, ontologies, Conceptualizations,
          <string-name>
            <given-names>Modeling</given-names>
            <surname>Languages</surname>
          </string-name>
          , and
          <article-title>(Meta)Models</article-title>
          . In O. Vasilecas,
          <string-name>
            <given-names>J.</given-names>
            <surname>Eder</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <surname>A</surname>
          </string-name>
          . Caplinskas, editors,
          <source>Databases and Information Systems IV - Selected Papers from the Seventh International Baltic Conference, DB&amp;IS 2006, July 3-6</source>
          ,
          <year>2006</year>
          , Vilnius, Lithuania, volume
          <volume>155</volume>
          <source>of Frontiers in Artificial Intelligence and Applications</source>
          , pages
          <fpage>18</fpage>
          -
          <lpage>39</lpage>
          . IOS Press,
          <year>2006</year>
          . ISBN:
          <fpage>978</fpage>
          -1-
          <fpage>58603</fpage>
          -715-4
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>M.</given-names>
            <surname>Hammer</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>McLeod</surname>
          </string-name>
          .
          <article-title>Database Description with SDM: A Semantic Database Model</article-title>
          .
          <source>ACM Transactions on Database Systems</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ):
          <fpage>351</fpage>
          -
          <lpage>386</lpage>
          ,
          <year>September 1981</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Hess</surname>
          </string-name>
          .
          <article-title>Learn or Die: Using Science to Build a Leading-Edge Learning Organization</article-title>
          . Columbia University Press,
          <year>2014</year>
          . ISBN:
          <fpage>978</fpage>
          -
          <lpage>0231170246</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. H. Hugos</surname>
          </string-name>
          .
          <article-title>Building the Real-Time Enterprise: An Executive Briefing</article-title>
          . Wiley, Hoboken, New Jersey,
          <year>2004</year>
          . ISBN:
          <fpage>978</fpage>
          -
          <lpage>0471678298</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>B.</given-names>
            <surname>Langefors</surname>
          </string-name>
          . Editorial notes to:
          <source>Computer Aided Information Systems Analysis and Design. Studentlitteratur</source>
          , Lund, Sweden,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>G. M. Nijssen</surname>
            and
            <given-names>T. A.</given-names>
          </string-name>
          <string-name>
            <surname>Halpin</surname>
          </string-name>
          .
          <article-title>Conceptual Schema and Relational Database Design: a fact oriented approach</article-title>
          . Prentice Hall, Englewood Cliffs, New Jersey,
          <year>1989</year>
          . ISBN:
          <fpage>0</fpage>
          -
          <lpage>13</lpage>
          -167263- 0
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Proper</surname>
          </string-name>
          . Fundamentally understanding IT?
          <article-title>- Why Web 2.0 needs architects</article-title>
          .
          <source>Part II</source>
          ,
          <year>2008</year>
          . http://tinyurl.com/mc3ozv8
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>H. A.</given-names>
            <surname>Proper</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. D.</given-names>
            <surname>Bruza</surname>
          </string-name>
          .
          <article-title>What is information discovery about</article-title>
          ?
          <source>Journal of the American Society for Information Science</source>
          ,
          <volume>50</volume>
          (
          <issue>9</issue>
          ):
          <fpage>737</fpage>
          -
          <lpage>750</lpage>
          ,
          <year>July 1999</year>
          . doi:
          <volume>10</volume>
          .1002/(SICI)
          <fpage>1097</fpage>
          -
          <lpage>4571</lpage>
          (
          <year>1999</year>
          )
          <article-title>50:9&lt;737::AID-ASI2&gt;3.0</article-title>
          .CO; 2-C
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>