=Paper= {{Paper |id=None |storemode=property |title=Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap |pdfUrl=https://ceur-ws.org/Vol-1035/iswc2013_poster_12.pdf |volume=Vol-1035 |dblpUrl=https://dblp.org/rec/conf/semweb/PinkelBKH13 }} ==Pay as you go Matching of Relational Schemata to OWL Ontologies with IncMap== https://ceur-ws.org/Vol-1035/iswc2013_poster_12.pdf
 Pay-as-you-go Matching of Relational Schemata
       to OWL Ontologies With IncMap ?

Christoph Pinkel1 ?? , Carsten Binnig2 , Evgeny Kharlamov3 , and Peter Haase1
         1
             fluid Operations AG 2 University of Mannheim 3 University of Oxford

         Abstract. Ontology Based Data Access (OBDA) enables access to re-
         lational data with a complex structure through ontologies as conceptual
         domain models. A key component of an OBDA system are mappings be-
         tween the schematic elements in the ontology and their correspondences
         in the relational schema. Today, in existing OBDA systems these map-
         pings typically need to be compiled by hand. In this paper we present
         IncMap, a system that supports a semi-automatic approach for match-
         ing relational schemata and ontologies. Our approach is based on a novel
         matching technique that represents the schematic elements of an ontol-
         ogy and a relational schema in a unified way. Finally, IncMap can extend
         user-verified mapping suggestions in a pay-as-you-go fashion.


1      Introduction
Today, enterprise information systems of large companies typically store petabytes
of data across multiple relational databases, each with hundreds or thousands
of tables (e.g., [1]). Effective understanding of complex schemata is a crucial
task for enterprises to support decision making and retain competitiveness on
the market. Ontology-based data access (OBDA) [2] is an approach that has
recently emerged to provide semantic access to complex structured (relational)
data. However, in many existing real-world systems (e.g. [2]) that follow the
ODBA principle, the mappings have to be created manually, which constitutes
a significant entry barrier for applying OBDA in practice.
    To overcome this limitation, we propose a novel semi-automatic schema
matching approach and a system called IncMap. We focus on finding one-to-
one correspondences of ontological and relational schema elements, while we
also work on extensions for finding more complex mappings.
    The matching approach of IncMap is inspired by the Similarity Flooding (SF)
algorithm [3] that works well for schemata that follow the same modeling prin-
ciples. However, we show that applying the SF algorithm naively for matching
relational schemata to OWL ontologies results in rather poor suggestion quality
due to a conceptual mismatch between ontologies and relational schemata. The
contributions of the paper are the following: In Section 2, we propose a novel
graph structure called IncGraph to represent schema elements from ontologies
and relational schemata in a unified way. In Section 3, we present our match-
ing algorithm that supports an incremental pay-as-you-go approach that can
?
     The research was supported by the EU Commission’s FP7 grant Optique (n. 318338).
??
     E-Mail: christoph.pinkel@fluidops.com
                        Ontology(O!                                                     Rela2onal(Schema(R!
                          Class)
                                                                                Director(                        Movie(
     subclassOf)          Object)            subclassOf)            Data)
                         Property)                                Property)     director)                        ?tle)
                               subclassOf)             subclassOf)              PK)                              director)
                                                                                ...)                             FK)
      Director)       directs)        Movie)       hasTitle)
              domain)          range)        domain)                                                             ...)

                                                                                                     director)
            Director)           directs)            Movie)                    Director)                 FK)
                                                                                                                          Movie)
                        ref)                 ref)                                             ref)               ref)
                                                           val)                        val)                        val)        val)
                                                                              Director)              director)
                                                    hasTitle)                    PK)                    FK)
                                                                                                                          hasTitle)
                    IncGraph(O)!                                                                 IncGraph(R)!

                                Fig. 1. IncGraph Construction Example

leverage existing mappings. Finally, Section 4 presents an experimental evalua-
tion using different (real-world) relational schemata and ontologies. Experiments
show that the basic version of IncMap reduces the effort for creating a mapping
up to 20% compared to applying SF in a naive way. The incremental version of
IncMap can reduce the total effort by another 50% − 70%.

2   The IncGraph Model

The IncGraph model used by IncMap represents schema elements of an OWL
ontology O and a relational schema R in a unified way. An IncGraph model
is defined as directed labeled graph (V, LblV , E, LblE ). V represents a set of
vertices, E a set of directed edges, LblV a set of labels for vertices and LblE a
set of labels for edges. A label lV ∈ LblV represents a name of a schema element
whereas a label lE ∈ LblE is either “ref” representing a so called ref-edge
or “value” representing a so called val-edge. Figure 1 shows a cinematography
related ontology O and relational schema R, as well as the result of constructing
graphs IncGraph(O) and IncGraph(R) according to the IncGraph model. While
O and R describe the same entities Directors and Movies and their relationship
in a different way, the IncGraph O and R is designed to represent both in a
structurally similar fashion.
    However, after constructing the IncGraph models, structural differences be-
tween IncGraph(O) and IncGraph(R) might still exist due to the mismatch
between the high level view of the domain in ontologies and the low level view
of data in relational databases. IncMap therefore adds annotations in IncGraph
to bridge these structural gaps. Annotations are added as inactive ref-edges
which can be activated during the schema matching process. For instance, addi-
tional ref-edges are added to IncGraph (R) as shortcuts for join-paths to better
match the IncGraph (O). Moreover, another idea is to add inverse ref-edges to
unify the structure resulting from modeling relationships in different directions
(e.g., the directs-predicate in O vs. the directorFK -relationship in R in Figure
1. Finally, results from reasoning over an ontology O can also be integrated into
IncGraph (O). Analyzing these annotations in detail is a future work.
                   IMDB: Naive Similarity Flooding vs. IncGraph                            Music Ontology: Naive Sim. Flood. vs. IncGraph
            1400                                                                12000
                                                     Naive [initial]                                                         Naive [initial]
            1300                                                                11000                                    IncGraph [initial]
                                                 IncGraph [initial]
            1200                                      Naive [final]             10000
                                                                                                                              Naive [final]
            1100                                  IncGraph [final]                                                        IncGraph [final]
                                                                                          9000
            1000
                                                                                          8000
Effort [actions]




                                                                       Effort [actions]
             900
             800                                                                          7000
             700                                                                          6000
             600                                                                          5000
             500                                                                          4000
             400
                                                                                          3000
             300
             200                                                                          2000
             100                                                                          1000
               0                                                                            0
                           Random   LS Similarity Inverse LS Dist.                                  Random   LS SimilarityInverse LS Dist.

                                                       Fig. 2. Naive vs. IncGraph

3                   The IncMap System
IncMap takes the IncGraphs produced for a relational schema R and for an
ontology O as input. In its basic version, IncMap applies the original SF algo-
rithm and thus creates initial mapping suggestions for the IncGraph of O and R.
Additionally, IncMap can activate ref-edges (i.e., annotations) before executing
the SF algorithm to achieve better results.
    One important extension is the incremental version of IncMap. In this version
the initial suggestions are re-ranked by IncMap by including user feedback. The
idea of user feedback is that the user confirms those mapping suggestions of the
previous iteration, which are required to answer a given user query over O.
    We support three methods for incorporating user feedback into the matching
process: First, the naive Initializer method changes the score of confirmed or
rejected mappings to initialize the next run to 1.0 and 0.0, respectively. Second,
Self-Confidence Nodes work similar but the initialization is repeated during the
fix-point computation of the SF algorithm which results in a stronger influence
of the user feedback. Finally, Influence Nodes include additional nodes in the
graph structure to locally influence the score of a confirmed or rejected mappings.
Please refer to [4] for a more detailed description of those methods.
    IncMap is designed as a framework and provides different knobs to control
which extensions and variations to use. A major avenue of future work is to
apply optimization algorithms to find the best configurations automatically.

4                   Experimental Evaluation
We evaluate IncMap using to two real-world scenarios that provided hand crafted
mappings as gold standard. As a first scenario, we evaluate a mapping from movie
database IMDB to the Movie Ontology (http://www.movieontology.org) The
second scenario is a mapping from the MusicBrainz database to the Music Ontol-
ogy (www.musicontology.com) We evaluate IncMap w.r.t. reducing work time
(i.e., effort) needed to correct the correspondences suggested by IncMap to match
the gold standard. The effort is defined as the sum of steps that users need to
validate the suggested mappings for each node in the IncGraph (O). For validat-
ing one mapping the user needs to reject all suggested correspondences in the
decreasing order of their final ranking score until reaching the correct mapping
whereas each rejection is counted as one step.
                             IMDB: Incremental Runs                                            Music Ontology: Incremental Runs
               800                                                                  6000
                                                  Non-Incremental                                                       Non-Incremental
               700
                                                          Initializer                                                           Initializer
                                            Self-Confidence Nodes                   5000                          Self-Confidence Nodes
                                                  Influence Nodes                                                       Influence Nodes
               600
                                                                                    4000
Effort [actions]




                                                                        Effort [actions]
               500

               400                                                                  3000

               300
                                                                                    2000
               200
                                                                                    1000
               100

                   0                                                                       0
                          Norm. Sim. Product Original Weights                                    Norm. Sim. Product Original Weights

                                                      Fig. 3. Incremental Evaluation

Experiment 1 – Naive vs. IncGraph. In our first experiment we compare the work
time required to correct the mapping suggestions when the schema and ontology
are represented naively as schema graphs, or using IncGraphs. Additionally, we
vary the lexical matcher using three alternatives: randomly assigned scores (base
line), Levenshtein similarity and inverse Levenshtein distance. Figure 2 shows
that IncGraph works better in all cases than the naive approach.
Experiment 2 – Incremental Mapping Generation. In the second experiment we
evaluate the incremental schema matching in IncMap. Figure 3 show the result-
ing work time for the three incremental methods. Most significantly, incremental
evaluation reduces the overall effort (work time) by up to 50% − 70% compared
to the naive non-incremantal version. For both scenarios Self-Confidence Nodes
and Influence Nodes work much better than the naive Initializer approach.
5                      Conclusions and Outlook
We presented IncMap, a novel semi-automatic matching approach for matching
relational schemata to ontologies. Our approach is based on a novel unified graph
model called IncGraph for ontologies and relational schemata. Based on the In-
cGraph model, IncMap implements a novel semi-automatic matching approach
inspired by the Similarity Flooding algorithm to derive mappings using both
lexical and structural similarities of ontologies and relational schemata. Our ex-
periments with IncMap on real-world relational schemata and ontologies showed
that the effort for creating a mapping with IncMap is up to 30% less than using
the Similarity Flooding algorithm in a naive way. The incremental version of
IncMap reduces the total effort of mapping creation by another 50% − 70%.

References
1. SAP HANA Help: http://help.sap.com/hana/html/sql export.html (2013)
2. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Poggi, A., Rodriguez-
   Muro, M., Rosati, R., Ruzzi, M., Savo, D.F.: The mastro system for ontology-based
   data access. Semantic Web Journal 2(1) (2011) 43–53
3. Melnik, S., Garcia-Molina, H., Rahm, E.: Similarity Flooding: A Versatile Graph
   Matching Algorithm and its Application to Schema Matching. In: ICDE. (2002)
4. Pinkel, C., Binnig, C., Kharlamov, E., Haase, P.: IncMap: Pay-as-you-go Matching
   of Relational Schemata to OWL Ontologies. In: OM. (2013)