Data Ecosystems – Fuelling the Digital Age Henderik A. Proper1,2 1 Luxembourg Institute of Science and Technology (LIST), Belval, Luxembourg 2 University of Luxembourg, Luxembourg E.Proper@acm.org Abstract. With the increased digitisation of society comes an increase in the role of data. Business analytics, statistics-based AI, the development of digital twins, etc, are typical examples of “data hungry” applications. Such, “data hungry” ap- plications not only need data in different shapes and forms, they also need data from a wide variety of sources. The systems involved in gathering, storing, processing, analysing, and visualising data, have evolved to be complex systems themselves, involving many actors of widely differing nature. We argue that, as such, these complex systems can be best thought of as ‘data ecosystems’, which we see as involving the entire complex of social / physical / digital actors which provide, own, sell, buy, exchange, manip- ulate, store, and use, data. Within these data ecosystems, one needs to deal with technical concerns regarding reliability, performance, interoperability, semantics, etc, as well as social concerns, such as value of data, privacy, trust, ownership, ethics, risk, etc. In line with this, we argue that there is a need to define / study ‘data ecosystems’ more closely, where we see a potential future role for the VMBO community. 1 Introduction Our society is transitioning from the industrial age to the digital age. With the increas- ing digitisation of society comes an increase in the role of data. Data is gathered from sensors, consequently stored, processed, analysed and visualised, and is eventually con- sumed by (human and / or digital) actors to enable them to gain insight and / or make informed decisions. Business analytics, statistics-based AI, the development of digital twins, etc, are typical examples of modern-day “data hungry” applications. For example, data is essen- tial for the training of statistics-based AI and the development of digital twins [5], while also enabling enterprises to continuously assess their performance in real-time [10] and learn to improve their operations [9]. Industry uses phrases such as thriving on data [1] to underline the potential value of data. Meanwhile, we have all grown familiar with the possibilities, as well as the possible positive and negative consequences, of large scale data collection and utilisation as conducted by e.g. Google, Facebook, etc. The, “data hungry” applications need to be “fuelled” with a wide variety of data resources. For example, ranging from: raw observations from different sensors / infor- mants, processed and / or enriched artefacts in terms of e.g. predictive models, rep- resentations of intentions (e.g. plans, strategy documents, designs, etc), specifications 1 Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). (source code, work procedures, etc), or norms (regulations, principles, policies, etc). Next to that, such applications also need data from a wide variety of sources, requiring the need to transfer ownership of data, or at least a transfer of the right to use the data. We specifically use the term data, as, in line with e.g. [11, 3], we see information as the increment in knowledge / insights which an actor gets when “consuming” data. As such, data are “mere” explicitly represented artefacts that could have value to (human and / or digital) actors in the sense that it may provide them with relevant / timely information. 2 Data ecosystems and their development As a result of the growing role of data as a key underlying resource, the systems in- volved in gathering, storing, processing, analysing, and visualising data have evolved to become complex systems themselves, involving different actors with their own in- terests. We argue that, as such, these complex systems can be best thought of as ‘data ecosystems’, which we see as involving the entire complex web of social / physical / digital actors (i.e. an ActorWeb [13]), which provide, own, sell, buy, exchange, manip- ulate, store, and use, data. Within these data ecosystems, one needs to deal with technical concerns regard- ing reliability, performance, interoperability, semantics, etc, as well as social concerns, such as value of data, privacy, trust, ownership, ethics, risk, etc. For instance, as the data involved may pertain to (the behaviour of) humans, privacy and ethical considerations may clearly play a role. Furthermore, as the data has some correspondence to “some- thing” in the social, economical, or physical world, it is important to consider quality of this correspondence. At the same time, some actors may have an interest in mali- ciously changing the data, thus distorting this correspondence. Data also comes with the question of ownership. Data may be of strategic value to some actors, leading them to want to control / sell the access for others. For instance, [4] provides an interesting perspective on this in terms of a personal data market. A data ecosystem can also be regarded as a “data-management enterprise”, i.e. a networked enterprise with “data-management” as its primary business, where data- management refers to all data related activities (gathering, exchanging, manipulating, storing, using, etc.). Such a “data-management enterprise” will typically be embedded in a larger enterprise, where the latter focuses on a “regular” products / services. The development of data ecosystems, as “data-management enterprises”, can clearly benefit from the use of enterprise modelling approaches. As such, the above consider- ations directly apply, while at the same time suggesting the need to more specifically capture data ownership, data lineage, value of data (to specific stakeholders), access control, data regulations, etc. 3 Research challenges We conclude this discussion paper with some some possible research challenges in relation to data ecosystems. They are certainly not intended as a complete list of chal- 2 lenges, but should rather provide a starting point for a broader discussion at the VMBO workshop. Data as a key resource – It is clear that data is a key resource in a data ecosystem. As such, it generates several important questions: What is the (potential) value of data? How to assess / express this? What does ownership of data mean, also in relationship to “the original” (e.g. be- haviour / properties of a human being), and associated privacy concerns. How to model the ownership, access to, the (potential!) value of data, etc, as well as associated risks? How to take these elements into due consideration when designing / developing / evolv- ing data ecosystems? Trust at the core – Exchanging data requires trust between the (human) actors involved, regarding (1) the way they handle the data and / or access to the data, (2) on how the data is gathered (quality of data), and (3) the way data is used (ethics and privacy). This results in several challenges: What is “trust” in the context of data ecosystems, and what can threaten such trust? How to conduct a risk analysis on how data is handled? How to nurture / increase trust between different stakeholders? Does the notion of “privacy by design” work in an (open and evolving) data ecosystem? How to identify system risks for data ecosystem, and how to manage these? Regulation of data ecosystems – Regulators are likely to have a need to regulate the risks (see above), privacy concerns of data ecoststems, as well as possibly other prop- erties. This results in challenges such as: To what extent can data ecosystems be regulated at all, given their open, and evolving, nature? How to express, and enforce, regulations on data ecosystems? What are the possible risks that need regulation? Data needs semantics – With the large amounts of data available to us, it is important to also capture its informational semantics. Both to enable re-use and relating (inter- operability between) different data sources. Of course this takes us back to semantic modelling [8] and information modelling [2, 12], as well as (foundational) ontology ap- proaches [6, 7]. This leads to the following broad challenge: How to re-apply old (but proven) semantic / information / ontology modelling ap- proaches to continuously capture the semantics of (evolving) data streams flowing be- tween the web of actors involved in a data ecosystem? From data to information – Data, in itself, is “just” a passive resource. Even enriched data (e.g. predictive models, digital twins, etc) is. Data does not become “activated” until an actor (human or digital) becomes informed by it in the context of learning, decision making, etc. In doing so, the actor “gleans” information from the data (as a potential information carrier [14]). In the context of “the web”, finding the right data carriers to relinquish one’s information need was already a major challenge. In the con- text of data ecosystems, this challenge will only grow, leading to the following broad challenges: How to evolve / extend existing search / discovery techniques form informa- tion retrieval / discovery towards data ecosystems? 3 How to apply different techniques for visualisation, verbalisation, audiofication, etc, to make data better accessible to human actors, to increase the information they may glean from the data? References 1. Capgemini. TechnoVision 2012 – Bringing Business Technology to Life. Research report, Utrecht, the Netherlands, 2009. 2. P. P. Chen. The Entity–Relationship Model: Towards a Unified View of Data. ACM Trans- actions on Database Systems, 1(1):9–36, March 1976. 3. E. D. Falkenberg, A. A. Verrijn–Stuart, K. Voss, W. Hesse, P. Lindgreen, B. E. Nilsson, J. L. H. Oei, C. Rolland, and R. K. Stamper, editors. A Framework of Information Systems Concepts. IFIP WG 8.1 Task Group FRISCO, IFIP, Laxenburg, Austria, 1998. ISBN: 3-901- 88201-4 4. R. Farrelly and E. K. Chew. Designing a primary personal information market as an industry platform: a service innovation approach. In Hawaii International Conference on System Sciences 2017 (HICSS), 01 2017. doi:10.24251/HICSS.2017.556 5. M. Grieves. Virtually Intelligent Product Systems: Digital and Physical Twins. In S. Flumer- felt, K. G. Schwartz, D. Mavris, and S. Briceno, editors, Complex Systems Engineering: Theory and Practice, pages 175–200. American Institute of Aeronautics and Astronautics, 2019. ISBN: 978-1624105647 6. N. Guarino. Formal Ontology and Information Systems. In N. Guarino, editor, Proceedings of FOIS’98, Trento, Italy, pages 3–15, Amsterdam, the Netherlands, June 1998. IOS Press. 7. G. Guizzardi. On Ontology, ontologies, Conceptualizations, Modeling Languages, and (Meta)Models. In O. Vasilecas, J. Eder, and A. Caplinskas, editors, Databases and Informa- tion Systems IV - Selected Papers from the Seventh International Baltic Conference, DB&IS 2006, July 3-6, 2006, Vilnius, Lithuania, volume 155 of Frontiers in Artificial Intelligence and Applications, pages 18–39. IOS Press, 2006. ISBN: 978-1-58603-715-4 8. M. Hammer and D. McLeod. Database Description with SDM: A Semantic Database Model. ACM Transactions on Database Systems, 6(3):351–386, September 1981. 9. E. D. Hess. Learn or Die: Using Science to Build a Leading-Edge Learning Organization. Columbia University Press, 2014. ISBN: 978-0231170246 10. M. H. Hugos. Building the Real-Time Enterprise: An Executive Briefing. Wiley, Hoboken, New Jersey, 2004. ISBN: 978-0471678298 11. B. Langefors. Editorial notes to: Computer Aided Information Systems Analysis and Design. Studentlitteratur, Lund, Sweden, 1971. 12. G. M. Nijssen and T. A. Halpin. Conceptual Schema and Relational Database Design: a fact oriented approach. Prentice Hall, Englewood Cliffs, New Jersey, 1989. ISBN: 0-13-167263- 0 13. H. A. Proper. Fundamentally understanding IT? - Why Web 2.0 needs architects. Part II, 2008. http://tinyurl.com/mc3ozv8 14. H. A. Proper and P. D. Bruza. What is information discovery about? Journal of the American Society for Information Science, 50(9):737–750, July 1999. doi:10.1002/(SICI)1097-4571(1999)50:9<737::AID-ASI2>3.0.CO; 2-C 4