Mapping Patterns for Virtual Knowledge Graphs
     (A Report on Ongoing Research) ?

 Diego Calvanese1,2[0000−0001−5174−9693] , Avigdor Gal3[0000−0002−7028−661X] ,
  Davide Lanti1[0000−0003−1097−2965] , Marco Montali1[0000−0002−8021−3430] ,
Alessandro Mosca1[0000−0003−2323−3344] , and Roee Shraga3[000−0001−8803−8481]
       1
           Free-University of Bozen-Bolzano, Bolzano, Italy, lastname @unibz.it
              2
                Umeå University, Umeå, Sweden, diego.calvanese@umu.se
                 3
                   Technion – Israel Institute of Technology, Haifa, Israel
               avigal@technion.ac.il, shraga89@campus.technion.ac.il


1     Introduction

In data integration and access to legacy data sources using end user-oriented
languages, the approach based on Virtual Knowledge Graphs (VKG) is gaining
importance [8]. A VKG specification consists of three main components: (i) (re-
lational) data sources, where the actual data are stored; (ii) a domain ontology,
capturing the relevant concepts, relations, and constraints of the domain of in-
terest; and (iii) a set of mappings linking the data sources to the ontology. One
of the most critical bottlenecks towards the adoption of the VKG approach,
especially in complex, enterprise scenarios, is precisely the definition and man-
agement of mappings. Indeed, on the one hand, VKG mappings map complex
queries to complex queries, similar to mappings typically used in data integration
and exchange [4]. Thus, they are inherently more sophisticated than mappings
used, e.g., in schema matching [6] and ontology matching [2]. On the other hand,
they need to overcome the abstraction mismatch between the relational schema
of the underlying data storage, and the target ontology; consequently, they are
required to explicitly handle how (tuples of) data values extracted from the DB
lead to the creation of corresponding objects in the ontology [5].
     As a consequence, management of VKG mappings throughout their entire
life-cycle is currently a labor-intensive, essentially manual effort, which requires
highly-skilled professionals [7]. Even for such professionals, writing mappings is
demanding and poses a number of challenges related to semantics, correctness,
and performance. More concretely, no comprehensive approach currently exists
to support ontology engineers in the creation of VKG mappings, exploiting all
the involved information artifacts to their full potential: the relational schema
with its constraints and the extensional data stored in the DB, the ontology
axioms, and a conceptual schema that lies, explicitly or implicitly, at the basis
of the relational schema.
?
    Copyright © 2020 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2       D. Calvanese et al.

2     Contributions
In our ongoing work, we build on this key observation and provide the contribu-
tions described in the following.

2.1   A Catalog of VKG Mapping Patterns
We propose a catalog of mapping patterns that emerge when linking DBs to
ontologies. To do so, we build on well-established methodologies and patterns
studied in data management (such as W3C direct mappings – W3C-DM [1] –
and their extensions), data analysis (such as algorithms for discovering depen-
dencies), and conceptual modeling (such as relational mapping techniques). In
specifying each pattern, we consider not only the three main components of a
VKG specification – namely the relevant portions of the DB schema, the ontol-
ogy, and the mapping between the two – but also the conceptual schema of the
domain of interest and the underlying data, when available. We do not fix which
of these information artifacts are given and which are produced as output, but
we simply describe how they relate to each other, on a per-pattern basis.
    We organize patterns in two major groups: schema-driven patterns, shaped
by the structure of the DB schema and its explicit constraints, and data-driven
patterns, which in addition consider constraints emerging from specific configura-
tions of the data in the DB. For each schema-driven pattern, we actually identify
a data-driven version in which the constraints over the schema are now not ex-
plicitly specified, but hold in the data. But we provide also data-driven patterns
that do not have a schema-driven counterpart. The two types of patterns can
be used in combination with additional semantic information from the ontology,
for instance on how the data values from the DB translate into RDF literals.
These considerations lead us to introduce also pattern modifiers. Moreover, some
of our patterns come with accessory views defined over the DB-schema, which
make explicit the presence of specific structures over the DB schema that are
revealed through the application of the pattern itself. Such views can be used
themselves, together with the original DB schema, to identify the applicability
of further patterns.

2.2   Design Scenarios for VKG Mapping Patterns
The proposed patterns can be employed in a variety of VKD design scenarios,
depending on which information artifacts are available, and which ones have to
be produced. Specifically, we consider the following scenarios: (i) Debugging of a
VKG Specification, which arises when a full VKG specification is already in place
and must be debugged. (ii) Conceptual Schema Reverse Engineering, which aims
at inferring a conceptual schema of the DB that represents the domain of in-
terest by reflecting the content of a given full VKG specification. (iii) Mapping
Bootstrapping, where the DB and the ontology are given, but mappings relating
them are not, and patterns can be used to (semi-)automatically bootstrap an ini-
tial set of mappings. These can then be further refined and extended manually,
                                   Title Suppressed Due to Excessive Length        3

possibly exploiting again the patterns. (iv) Ontology+Mapping Bootstrapping,
where neither the ontology nor the mappings are given as input, and have to be
synthesized. This scenario can be reduced to the previous one by first inducing
a baseline ontology mirroring the structure of the DB schema. (v) VKG Boot-
strapping, where we just have a conceptual schema of the domain, and the goal
is to set up a VKG specification. The conceptual schema can be then trans-
formed into a normalized DB schema using well-established relational mapping
techniques (e.g., [3]).


2.3   Analysis of Scenarios

In our work, we have analyzed the concrete mapping strategies arising from a
number of VKG use cases in order to understand how patterns occur in prac-
tice, and with which frequency. To this purpose, we have gathered 6 different
scenarios, coming either from the literature on VKGs, or from actual real-world
applications, covering a variety of different application domains. So far, we have
manually classified a total of 1582 mapping assertions, falling in 367 pattern
applications. We have studied the coverage of mappings appearing therein in
terms of our patterns, as well as on how many times the same pattern recurs.
Our investigation has shown that only 3% of pattern applications fall outside of
our categorization, and it also gives an interesting indication on which patterns
are more pervasively used in practice.


3     Conclusions

The work carried out so far is only a first step, with respect to both categorization
of patterns, and their actual use. Regarding the former, we are now exploring
more in depth the interaction between patterns and pattern modifiers, such as
value invention or identifier alignment. Regarding the latter, so far we have
used patterns to investigate, and highlight, the specific problems to address
when setting-up a VKG scenario. We are now investigating solutions to these
problems, by exploiting approaches from other fields, e.g., schema matching.


Acknowledgements

This research has been partially supported by: the EU H2020 project INODE; the
Italian PRIN project HOPE; the European Regional Development Fund (ERDF)
Investment for Growth and Jobs Programme 2014-2020 through the project
IDEE (FESR1133); the Free University of Bozen-Bolzano through the projects
KGID, GeoVKG, OntoGeo, and STyLoLa; the Wallenberg AI, Autonomous Sys-
tems and Software Program (WASP) funded by the Knut and Alice Wallenberg
Foundation.
4       D. Calvanese et al.

References
1. Arenas, M., Bertails, A., Prud’hommeaux, E., Sequeda, J.: A direct mapping of
   relational data to RDF. W3C Recommendation, World Wide Web Consortium (Sep
   2012), available at http://www.w3.org/TR/rdb-direct-mapping/
2. Euzenat, J., Shvaiko, P.: Ontology Matching. Springer (2007)
3. Halpin, T., Morgan, T.: Information Modeling and Relational Databases. Morgan
   Kaufmann (2010)
4. Lenzerini, M.: Data integration: A theoretical perspective. In: Proc. of the 21st
   ACM Symp. on Principles of Database Systems (PODS). pp. 233–246 (2002).
   https://doi.org/10.1145/543613.543644
5. Poggi, A., Lembo, D., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
   Linking data to ontologies. J. on Data Semantics 10, 133–173 (2008)
6. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching.
   Very Large Database J. 10(4), 334–350 (2001)
7. Spanos, D.E., Stavrou, P., Mitrou, N.: Bringing relational databases into the Se-
   mantic Web: A survey. Semantic Web J. 3(2), 169–209 (2012)
8. Xiao, G., Ding, L., Cogrel, B., Calvanese, D.: Virtual Knowledge Graphs: An
   overview of systems and use cases. Data Intelligence 1(3), 201–223 (2019).
   https://doi.org/10.1162/dint a 00011