Knowledge Graphs for Impactful Data Science
Victor de Boer1
1
    Vrije Universiteit Amsterdam, the Netherlands


                                         Abstract
                                         In this invited talk I will argue that to build scalable, transparent and explainable AI in various domains
                                         where heterogeneous data is available, we need to collaborate with domain experts to develop relevant
                                         and high-quality knowledge graphs as well as appropriate data science and Machine Learning methods
                                         to constantly enrich and analyse these graphs. I give examples in the Digital Humanities and Internet of
                                         Things.


   In many modern statistical approaches to AI, raw data is the preferred input for (Machine
Learning) models. In some areas and in some cases, however, we struggle to find this raw form
of data. One such area involves heterogeneous knowledge: entities, their attributes and internal
relations. The Semantic Web community has invested decades of work on just this problem:
how to use graphs to represent knowledge, in various domains, in as raw and as usable a form as
possible, satisfying many use cases. To build scalable, transparent and explainable AI in various
domains where such heterogeneous data and knowledge is available, we need to collaborate
with domain experts to develop a) relevant and high-quality knowledge graphs as well as b)
appropriate data science and ML methods to constantly enrich and analyse these graphs[1].
   In the domain of Digital Humanities (DH), a large amount of heterogeneity of data and
knowledge exists. Digitized datasets can derive from centuries-old sources and multiple views
on history and heritage should be represented. The capacity of the Knowledge Graph to capture
such heterogeneity makes this an ideal model to represent, share and combine data sources
to allow for new types of analyses. In the domain, Machine Learning and other Data Science
methods are more and more looked at to identify patterns in the data, establish new links or
categorize entities. Transparency and explainability are key requirements for such methods to
be used in serious scholarly analysis.
   Although the domain of Internet of Things (IoT) and Smart Homes differs in many ways
from that of DH, here too do we find datasets of varying sources, combined to allow for new
types of applications and analysis[2]. In smart home scenarios, methods that combine data into
knowledge graphs for further analysis or applications will need to be privacy-aware, transparent
and explainable. Using ontologies such as SAREF[3], we can achieve this interoperability. Using
re-usable (python) notebooks we can establish a Data Science pipeline.


SEMANTICS 2022 EU: 18th International Conference on Semantic Systems, September 13-15, 2022, Vienna, Austria
Envelope-Open v.de.boer@vu.nl (V. d. Boer)
GLOBE http://victordeboer.com/ (V. d. Boer)
Orcid 0000-0001-9079-039X (V. d. Boer)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)
References
[1] X. Wilcke, P. Bloem, V. De Boer, The knowledge graph as the default data model for learning
    on heterogeneous knowledge, Data Science 1 (2017) 39–57.
[2] R. van der Weerdt, V. de Boer, L. Daniele, B. Nouwt, Validating saref in a smart home
    environment, in: Research Conference on Metadata and Semantics Research, Springer,
    2020, pp. 35–46.
[3] L. Daniele, F. d. Hartog, J. Roes, Created in close interaction with the industry: the smart
    appliances reference (saref) ontology, in: International Workshop Formal Ontologies Meet
    Industries, Springer, 2015, pp. 100–112.