KeywDB: A System for Keyword-Driven
            Ontology-to-RDB Mapping Construction ?

    D. Zheleznyakov1 E. Kharlamov1 V. Klungre2 M. Skjæveland2 D. Hovland2
                      M. Giese2 I. Horrocks1 A. Waaler2
                          1
                              University of Oxford 2 University of Oslo


        Abstract. In ontology-based data access (OBDA) the users access relational
        databases (RDBs) via ontologies that mediate between the users and the data.
        Ontologies are connected to data via declarative ontology-to-RDB mappings
        that relate each ontological term to an SQL query. In this demo we present our
        system KeywDB that facilitates construction of ontology-to-RDB mappings in an
        interactive fashion. In KeywDB users provide examples of entities for classes that
        require mappings and the system returnes a ranked list of such mappings. In doing
        so KeywDB relies on techniques for keyword query answering over RDBs. During
        the demo the attendees will try KeywDB with Northwind and NPD FP databases
        and collections of mappings that we prepare.


1     Introduction
Motivation. Ontology-based data access (OBDA) is a prominent approach to information
integration in which an ontology that describes the domain of interest rather than the
data is used to mediate between data consumers and relational data sources (RDBs). In
OBDA data consumers are typically assumed to be domain experts who do not have a
prior knowledge about the way the data is organised at the source [7, 10]. Thus, they
access data by expressing their information needs as ontological queries. The ontology is
connected to the data via a set of (ontology-to-RDB) mappings, declarative specification
of the form P (~x) ← sql(~x) that relate ontological terms P with SQL queries sql over
the underlying data and that are used for automatic translation of ontological queries
into data-level queries which can be executed by the underlying database management
system [5, 6, 9, 11].
    Ontologies and mappings are clearly the main OBDA assets and thus acquiring them
is of utter importance for deploying and maintaining any OBDA application. Ontologies
capture domains of interest, they are data independant and thus they can be reused in
different applications with the same domain. On contrary, mappings are hardly reusable
since they depend on particular data sources. Therefore, in order to deploy an OBDA
system over a given set of data sources, one has to develop a set of mappings specific for
these sources. Building mappings manually is, however, a costly process, especially for
large and complex databases (e.g., see [8, 12]).
    In order to address this issue and facilitate mapping construction a number of
approaches has been developed. Most of them focus on mappings of a specific form,
called direct mapping [13], and under these approaches the mappings simply mirror the
database schema by associating a table to a class and an attribute to a property. There are
?
    This research was funded by the EU project Optique (FP7-IP-318338) and the EPSRC grants
    DBonto, MaSI3 , and ED3 .
also approaches that allow to construct more complex mappings. For example, in [4],
the system is able to compute all possible queries that involve joins between tables
and equalities between column names and values (under certain restriction). The main
problem of this kind of approaches is that the number of the returned mappings is huge
and manually filtering in order to select the right mappings is an expensive procedure. So
the existing approaches either compute a few simple mappings that are insufficient in
many applications or too many complex mappings most of which are irrelevant for the
application at the hand. Therefore, there is a need for techniques to facilitate mapping
creation that are precise in the sense that they compute the mappings required in a
concrete application.
Our Contribution. We propose a novel, semi-automatic approach for mapping con-
struction that (i) allows for creation of mappings expressive enough to satisfy the users’
information needs (that is more expressive that in the case of direct mappings) and (ii)
does not overwhelm users with candidate mappings. We implemented our approach in the
KeywDB system and will now explain the approach on the following scenario. Assume
that the user during (ontological) query formulation process [1, 2, 15], notices that the
ontology misses a class they would like to exploit in the query. So the user would like to
create a class and map it to the data. Typically, such a task is performed by (end-)users
in cooperation with IT-experts and often consumes a significant amount of time [10].
KeywDB will facilitate the communication between user and IT-experts in the three
following steps:
   (i) Since the user is a domain expert, they know what objects the class should contain.
       Thus, KeywDB will ask the user to provide a description of several objects from
       the class, where a description is a set of keywords.
  (ii) KeywDB will turn the input descriptions into a ranked list of queries and return the
       user top-k candidate queries, where k is fixed in advance.
 (iii) The IT-expert will give a feedback on the list by choosing those queries from the
       list that they think are correct.
In order to support this scenario we developed a formal semantics of transformation of
descriptions into a ranked list of candidate queries, and introduced a query ranking model
tailored towards our framework.
Demonstration Scenarios. We prepared two demonstration scenarios, which are based
on the Northwind1 and NPD FP [14] databases. A demo attendee will be able to create
mappings for classes in each of the scenarios.
2      KeywDB System
Setting. Consider a scenario where a user is looking for a mapping for a class C to a
relational database D. We assume that the user is a domain specialists and they know what
kind of objects should be in C. Thus, they can describe several examples of such objects
o1 , . . . , on , each with a set Ki of keywords {k1i , . . . , kni i }. To describe our approach we
first need to define the following notions. Let S be a schema of D. A schema graph
Gs = (VS , ES ) is a graph where VS is set of relations of S and (Ri , Rj ) is in ES if
and only if there is a primary to foreign key relationship between Ri and Rj . A data
graph [3]2 GD of the database D is a graph where VD is a set of all tuples occurring in
D and (ti , tj ) is in ED if and only if ti ∈ Ri , tj ∈ Rj and (Ri , Rj ) ∈ ES .
 1
     https://northwinddatabase.codeplex.com/
 2
     Note that in [3] a data graph is called a joining network of tuples.


                                                   2
                             Fig. 1. A screenshot of the KeywDB system

Our Approach in a Nutshell. Having a set Ki of keywords describing an object oi , we
extract a ranked list of candidate objects from GD , where each candidate object is a
connected subgraph of GD such that (i) every keyword from Ki is contained in at least
one tuple of this subgraph3 , and (ii) it is minimal, that is, we cannot remove any tuple
from it and still be connected and satisfying Condition (i). Then, each of the candidate
objects o0i is turned into a SQL query qi0 such that the answer qi0 (D) over D contains o0i ,
thus a ranked list of candidate queries is obtained. Note that (i) the rank of a candidate
query qi0 is a function of the rank of the corresponding candidate object o0i , and (ii) a
candidate query may correspond to several candidate objects, in which case the rank of
each of these objects influences on the rank of the query. Performing the same procedure
for each Ki , we obtain a set of lists L1 , . . . , Ln of candidate queries. We unify them into
a final list L, where the rank of each candidate query depends on (i) its rank in a list Li it
appears in and (ii) a number of such lists.
Ranking Model. In order to rank objects and then queries we rely on their several
characteristics: on the size, diameter, and distribution of keywords over them.

3      Demonstration Scenario
We prepared two databases on which our system can be tested. The first database,
Northwind, contains the sales data for a fictitious company called Northwind Traders,
which imports and exports speciality foods from around the world. The second one
is Norwegian Petroleum Directorates FactPages (NPD FP) [14], a Norwegian public
information repository about the oil and gas sector.
    During the demo KeywDB will be available in two scenarios.
(S1) Supervised: We prepared 20 goal mappings for 20 classes for each database. For
    each class, the system will automatically generate keyword descriptions of one, two
 3
     A tuple contains a keyword if the latter one appears in an attribute of the former one.


                                                  3
    or three different objects that the class is supposed to contain. The attendee will
    be demonstrated whether the top-k mapping returned by the system contain the
    corresponding goal mapping, where k = 1, 3 and 5.
(S2) Unsupervied: The attendee will be able to explore the schema and create themselves
    a class they would like to build a mapping for. Additionally, for each database, 10
    classes, not linked to the database, and their intuitive descriptions will be provided.
    Then, the user will be able to explore the data and compose descriptions of objects
    for both their and prepared class.
     In Figure 1 there is a screenshot of KewDB where the user has been looking for a
mapping for a class ‘Drink’. The user provided examples of two objects: one is described
with two keywords ‘chai’ and ‘bevarage’ and another with one keyword ‘coffee’. KeywDB
in turn returned several mappings, e.g., the mapping with the following query is returned
first and has the rank equal to 0.670:
            SELECT DISTINCT *
            FROM categories AS categories0, products AS products1
            WHERE products1."CatagoryID"=categories0."CategoryID"
4      References
 [1]    M. Arenas, B. C. Grau, E. Kharlamov, S. Marciuska, and D. Zheleznyakov. Faceted Search
        over RDF-based Knowledge Graphs. In: JWS 37 (2016).
 [2]    M. Arenas, B. C. Grau, E. Kharlamov, Šarūnas Marciuška, and D. Zheleznyakov. Faceted
        Search over Ontology-Enhanced RDF Data. In: CIKM. 2014.
 [3]    V. Hristidis and Y. Papakonstantinou. Discover: Keyword Search in Relational Databases.
        In: VLDB. 2002.
 [4]    E. Jiménez-Ruiz, E. Kharlamov, D. Zheleznyakov, I. Horrocks, C. Pinkel, M. G. Skjæveland,
        E. Thorstensen, and J. Mora. BootOX: Practical Mapping of RDBs to OWL 2. In: ISWC.
        2015.
 [5]    E. Kharlamov, S. Brandt, M. Giese, E. Jiménez-Ruiz, Y. Kotidis, et al. Enabling Semantic
        Access to Static and Streaming Distributed Data with Optique: Demo. In: DEBS. 2016.
 [6]    E. Kharlamov, S. Brandt, E. Jiménez-Ruiz, Y. Kotidis, S. Lamparter, et al. Ontology-Based
        Integration of Streaming and Static Relational Data with Optique. In: SIGMOD. 2016.
 [7]    E. Kharlamov, B. C. Grau, E. Jimenez-Ruiz, S. Lamparter, G. Mehdi, et al. Capturing
        Industrial Information Models with Ontologies and Constraints. In: ISWC. 2016.
 [8]    E. Kharlamov, D. Hovland, E. Jiménez-Ruiz, D. Lanti, H. Lie, et al. Ontology Based Access
        to Exploration Data at Statoil. In: ISWC. 2015.
 [9]    E. Kharlamov, E. Jiménez-Ruiz, C. Pinkel, M. Rezk, M. G. Skjæveland, et al. Optique:
        Ontology-Based Data Access Platform. In: ISWC Posters & Demos. 2015.
[10]    E. Kharlamov, E. Jiménez-Ruiz, D. Zheleznyakov, D. Bilidas, M. Giese, et al. Optique:
        Towards OBDA Systems for Industry. In: ESWC, Selected Papers. 2013.
[11]    E. Kharlamov, Y. Kotidis, M. Theofilos, C. Neuenstadt, C. Nikolaou, et al. Towards Analytics
        Aware Ontology Based Access to Static and Streaming Data. In: ISWC. 2016.
[12]    E. Kharlamov, N. Solomakhina, Ö. L. Özçep, D. Zheleznyakov, T. Hubauer, et al. How
        Semantic Technologies Can Enhance Data Access at Siemens Energy. In: ISWC. 2014.
[13]    J. Sequeda, S. H. Tirmizi, O. Corcho, and D. P. Miranker. Survey of Directly Mapping SQL
        Databases to the Semantic Web. In: KER 26.4 (2011).
[14]    M. G. Skjæveland, E. H. Lian, and I. Horrocks. Publishing the Norwegian Petroleum
        Directorate’s FactPages as Semantic Web Data. In: ISWC. 2013.
[15]    A. Soylu, E. Kharlamov, D. Zheleznyakov, E. Jiménez-Ruiz, M. Giese, and I. Horrocks.
        Ontology-Based Visual Query Formulation: An Industry Experience. In: ISVC. 2015.


                                                 4