<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>IncMap : Pay as you go Matching of Relational Schemata to OWL Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christoph Pinkel</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carsten Binnig</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evgeny Kharlamov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peter Haase</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Mannheim</institution>
          ,
          <addr-line>D-68131 Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Oxford</institution>
          ,
          <addr-line>Oxford</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>uid Operations AG</institution>
          ,
          <addr-line>D-69190 Walldorf</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Ontology Based Data Access (OBDA) enables access to relational data with a complex structure through ontologies as conceptual domain models. A key component of an OBDA system are mappings between the schematic elements in the ontology and their correspondences in the relational schema. Today, in existing OBDA systems these mappings typically need to be compiled by hand, which is a complex and labor intensive task. In this paper we address the problem of creating such mappings and present IncMap, a system that supports a semi-automatic approach for matching relational schemata and ontologies. Our approach is based on a novel matching technique that represents the schematic elements of an ontology and a relational schema in a uni ed way. IncMap is designed to work in a query-driven, pay as you go fashion and leverages partial, user-veri ed mappings to improve subsequent mapping suggestions. This e ectively reduces the overall e ort compared to compiling a mappings in one step. Moreover, IncMap can incorporate knowledge from user queries to enhance suggestion quality.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        E ective understanding of complex data is a crucial task for enterprises to
support decision making and retain competitiveness on the market. This task is not
trivial especially since the data volume and complexity keep growing fast in the
light of Big Data [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. While there are many techniques and tools for scalable
data analytics today, there is little known on how to nd the right data.
      </p>
      <p>
        Today, enterprise information systems of large companies store petabytes of
data distributed across multiple { typically relational { databases, each with
hundreds or sometimes even thousands of tables (e.g., [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). For example, an
installation of an SAP ERP system comes with tens of thousands of tables [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
Due to the complexity of data a typical scenario for data analyses today involves
a domain expert who formulates an analytical request and an IT expert who has
to understand the request, nd the data relevant to it, and then translate the
request into an executable query. In large enterprises this process may iterate
several times between the domain and IT experts, the complexity of data and
other factors, and may take up to several weeks.
      </p>
      <p>
        Ontology-based data access (OBDA) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is an approach that has recently
emerged to provide semantic access to complex structured relational data. The
core elements of an OBDA system are an ontology, describing the application
domain, and a set of declarative mappings, relating the ontological schema
elements (e.g., names of classes and properties) with the relational schema elements
(e.g., names of table and attributes) of the underlying data sources. Using the
ontology and the mappings, domain experts can access the data directly by
formulating queries in terms de ned in the ontology that re ects their vocabulary
and conceptualization. Using query rewriting techniques, the end-user queries
are then translated into queries over the underlying data sources.
      </p>
      <p>
        Today, most approaches for ontology-based data access focus on the de nition
of mapping languages and the e cient translation of high-level user queries over
an ontology into executable queries over relational data [
        <xref ref-type="bibr" rid="ref4 ref5">4,5</xref>
        ]. These approaches
assume that a declarative mapping of the schema elements of the ontology to
the relational elements is already given. So far, in real-world systems [
        <xref ref-type="bibr" rid="ref6 ref7">6,7</xref>
        ] that
follow the ontology-based data access principle, the mappings have to be created
manually. The costs for the manual creation of mappings constitute a signi cant
entry barrier for applying OBDA in practice.
      </p>
      <p>To overcome this limitation we propose a novel semi-automatic schema
matching approach and a system called IncMap to support the creation of mappings
directly from relational schemata to ontologies.</p>
      <p>We focus on nding one-to-one (direct) correspondences of ontological and
relational schema elements, while we also work on extensions for nding more
complex correspondences. In order to compute mapping suggestions IncMap uses
a relational schema, an OWL ontology, a set of user conjunctive queries over the
ontology, and user feedback as basic input.</p>
      <p>
        The matching approach of IncMap is inspired by the Similarity Flooding
algorithm of Melnik et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] that works well for schemata that follow the same
modeling principles (e.g., same level of granularity). However, applying the
Similarity Flooding algorithm naively for matching schema elements of a relational
schema to an OWL ontology results in rather poor quality of the suggested
correspondences as we show in our experiments. A major reason is the impedance
mismatch between ontologies and relational schemata: While ontologies
typically model high-level semantic information, relational schemata describe the
syntactical structure on a very low level of granularity.
      </p>
      <p>The contributions of the paper are the following:
{ In Section 3, we propose a novel graph structure called IncGraph to represent
schema elements from both ontologies and relational schemata in a uni ed
way. Therefore, we devise algorithms to convert an ontology as well as a
relational schema into their uni ed IncGraph representation. We also brie y
discuss techniques to further improve IncGraph.
{ In Section 4, we present our matching algorithm that we use for matching
IncGraphs. Its most prominent feature is that IncMap can produce the
mapping incrementally, query by query. While the original Similarity Flooding
algorithm generates correspondences for all schema elements, IncMap
supports a pay as you go matching strategy. For each query we produce only
required mappings. IncMap leverages the structure of mappings from
previous queries to improve suggestion quality. This e ectively reduces the total
e ort for the user to verify mapping suggestions.
{ Section 5 presents an experimental evaluation using di erent (real-world)
relational schemata and ontologies. We see that even in the basic version of
IncMap, the e ort for creating a mapping is up to 20% less than using the
Similarity Flooding algorithm in a naive way. In addition, the incremental
version of IncMap can reduce the total e ort by another 50% 70%.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Background</title>
      <p>
        In this section we brie y introduce ontologies [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], relational schemata, and the
Similarity Flooding algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Ontologies. An ontology O speci es a conceptualization of a domain in terms
of classes and properties and consists of a set of axioms. Without explanation,
ontologies in this paper are OWL ontologies and we will use the following OWL
constructs: object and data properties P , and domains Domain(P ) and ranges
Range(P ) of properties. We denote with Class(O) and Property(O) the sets of
class and property names, respectively, occurring in the ontology O. For a given
ontology O, with C 2 Domain(P ) we denote the fact that one can derive from
O that the class name C is a domain of the property P . Also, C0 2 Range(P )
denotes the fact that C0 is a range of P and it is derivable from O.
Relational Schemata. A relational schema R de nes a set of relations (tables)
T , where each table de nes a set of columns c. We also assume that a schema
contains foreign keys k that de ne references between tables.</p>
      <p>
        Similarity Flooding Algorithm. The Similarity Flooding algorithm matches a
given schema S with a schema S0. In the rst step, directed labeled graphs
G(S) and G(S0) are constructed from S and S0, where the nodes represent the
schema elements, and the edges with labels de ne relationships between the
schema elements. There is no exact procedure to construct the graphs from
the schemata given in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Thus, the Similarity Flooding algorithm is open for
any graph construction process. The second step in the algorithm is to merge
G(S) and G(S0) into one graph, a so-called pairwise connectivity graph PCG.
Intuitively, each node of the PCG is a pair of nodes, and represents a potential
match between schema elements of S and S0. Then, the PCG is enriched with
inverse edges and edge weights (propagation coe cients), where the value of the
weights is based on the number of outgoing edges with the same label from a
given node. This graph is called the induced propagation graph IPG. The nal
step of the algorithm is a x-point computation to propagate initial similarities
by using the structural dependencies represented by the propagation coe cients.
The x-point computation termination is based either on threshold values or the
number of iterations. The result is a ranked list of suggested mappings. We refer
to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for further details.
      </p>
      <p>Algorithm 1: IncGraph for constructing graphs from ontologies
INPUT : OWL ontology O</p>
      <p>OUTPUT: Graph G = (V; LblV ; E; LblE )
1 Let G = (V; LblV ; E; LblE ), V = fn&gt;g, LblV = f(n&gt;; &gt;)g, E = ;, LblE = ;;
2 foreach C 2 Class(O) do V := V [ fnCg and LblE (nC) := C
3 foreach P 2 Property(O) do
4 V := V [ fnP g and LblV (nP ) := P ; Let C 2 Domain(P );
5 if P is an object property then
6 E := E [ f(nC; nP )g and LblV ((nC; nP )) := `ref';
7 Let C0 2 Range(P );
8 E := E [ f(nP ; nC0 )g and LblV ((nP ; nC0 )) := `ref';
9 else if P is a data property then
10 E := E [ f(nC; nP )g and LblE ((nC; nP )) := `value'
11 return G.</p>
      <p>Algorithm 2: IncGraph for constructing graphs from relational schemata
INPUT : Relational Schema R</p>
      <p>OUTPUT: Graph G = (V; LblV ; E; LblE )
1 Let V = ;, LblV = ;, E = ;, LblE = ;;
2 foreach table T in R do
3 V := V [ fnT g and LblV (nT ) := T ;
4 foreach column c in R do
5 V := V [ fncg and LblV (nc) := c;
6 E := E [ f(nT ; nc)g and LblE ((nT ; nc)) := `value'
7 if c has a foreign key k to some table T 0 then
8 V := V [ fnkg and LblV (nk) := k;
9 E := E [ f(nT ; nk)g and LblE ((nT ; nk)) := `ref'
10 E := E [ f(nk; nT 0 )g and LblE ((nk; nT 0 )) := `ref'
11 return G.
3</p>
    </sec>
    <sec id="sec-3">
      <title>The IncGraph Model</title>
      <p>In this section, we describe the IncGraph model used by IncMap to represent
schema elements of an OWL ontology O and a relational schema R in a uni ed
way.</p>
      <p>An IncGraph model is de ned as directed labeled graph G = (V; LblV ; E ; LblE ).
It can be used as input by the original Similarity Flooding algorithm (Section 2)
or IncMap. V represents a set of vertices, E a set of directed edges, LblV a set
of labels for vertices (i.e., one label for each vertex) and LblE a set of labels for
edges (i.e., one label for each edge). A label lV 2 LblV represents a name of a
schema element whereas a label lE 2 LblE is either \ref" representing a so called
ref-edge or \value" representing a so called val-edge.</p>
      <sec id="sec-3-1">
        <title>3.1 IncGraph Construction</title>
        <p>The goal of the procedures for the basic construction is to incorporate explicit
schema information from O and R into the IncGraph model. Incorporating
implicit schema information is discussed in the next section.</p>
        <p>Algorithm 1 creates an IncGraph model G for a given ontology O. The
algorithm constructs a vertex nC for each class name C 2 Class(O) and a vertex
!"#$%&amp;!!'()
*+,-$./,)
0/1&amp;+2)
+,&amp;'-'./(+!
6%&amp;!!)
'#7-$.)
8,/9-,.:)</p>
        <p>!"#$%&amp;!!'()
0+,-$.!)
,&amp;25-)
!"#$%&amp;!!'()
3/4+-)
*&amp;.&amp;)
8,/9-,.:)
!"#$%&amp;!!'()</p>
        <p>;&amp;!&lt;+.%-)
0/1&amp;+2)
*+,-$./,) ,-() 0+,-$.!) ,-() 3/4+-)</p>
        <p>4&amp;%)
;&amp;!&lt;+.%-)
6,%7#1849+:!
!"#$%&amp;'#(
0+,-$./,)
8=)
&gt;&gt;&gt;)
)'*"$(
?.%-)
0+,-$./,)
@=)
&gt;&gt;&gt;)
*+,-$./,) ,-() 0+,-@=$.)/,) ,-() 3/4+-)</p>
        <p>4&amp;%) 4&amp;%) 4&amp;%)
*+,8-=$.)/,) ;&amp;!&lt;+.%-)
0+,-$./,)</p>
        <p>@=)
6,%.#18490:!
0$-12',1-(3%4$51(0!
nP for each property name P 2 Property(O) using the names of these ontology
elements as label in LblV . Directed edges in the IncGraph model are created
for each domain and range de nition in O. The labels LblE for edges are either
\ref" in case of an object property or \value" in case of a data property. For
a domain de nition in O the direction of the edge in G is from the node nC
representing the domain of P to the node nP representing the property P . For a
range de nition the direction of the edge in G is from the node nP representing
object property to the node nC0 representing the range of P (i.e., another class).
If an object property in O has no range (respectively, domain) de nition, then a
directed labeled edge to a node n&gt; is added to explicitly model the most general
range (respectively, domain), i.e., a top-level concept &gt; like Thing.</p>
        <p>Algorithm 2 creates a IncGraph model G for a given relational schema R:
The algorithm constructs a vertex nT for each table and a vertex nc for each
column using the names of these schema elements as labels LblV . Directed edges
with the label \value" are created from a node nT representing a table to a node
nc representing a columns of that table. For columns with a foreign key k an
additional node nk is created. Moreover, two directed edges with the label \ref"
are added, which represent a path from node nT to a node nT 0 representing the
referenced table via node nk.</p>
        <p>Figure 1 shows the result of applying these two algorithms to the ontology
O and the relational schema R in this gure. Both O and R describe the same
entities Directors and Movies using di erent schema elements. The resulting
IncGraph models of O and R represent the schema structure in a uni ed way.
IncGraph is designed to represent both relational schemata and ontologies in
a structurally similar fashion because matching approaches such as ours work
best when the graph representations on both the source and target side are as
similar as possible. However, even in IncGraph structural di erences remain due
to the impedance mismatch and di erent design patterns used in ontologies and
relational schemata, respectively.</p>
        <p>We consider this issue by supporting annotations in IncGraph. Annotations
basically are additional ref-edges in either the source or target model that can
be designed to bridge structural gaps for di erent design patterns or levels of
granularity. For instance, shortcut edges in the relational IncGraph model could
represent a multi-hop join over a chain of relationship relations. Annotations can
be constructed by plug-ins during IncGraph construction.</p>
        <p>We plan to evaluate the opportunities of di erent kinds of annotations in
future work.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The IncMap System</title>
      <p>In this section, we present our matching approach and system called IncMap.
IncMap takes a source and target IncGraph as input, i.e., the IncGraphs produced
for a relational schema and for an ontology as described in Section 3.</p>
      <sec id="sec-4-1">
        <title>4.1 Overview of IncMap</title>
        <p>In its basic version, IncMap applies the original Similarity Flooding algorithm
(with minor adaptions) and thus creates initial mapping suggestions for the
IncGraph of an ontology O and a relational schema R. In its extended version,
IncMap activates inactive ref-edges before executing the Similarity Flooding
algorithm to achieve better mapping suggestions.</p>
        <p>Another extension is the incremental version of IncMap. In this version the
initial mapping suggestions are re-ranked by IncMap in a semi-automatic
approach by including user feedback. Re-ranking works iteratively in a query-driven
fashion thus increasing the quality of the suggested mappings. In each iteration,
IncMap applies a version of the Similarity Flooding algorithm (as described
before). However, in addition between each iteration user feedback is incorporated.</p>
        <p>The idea of user feedback is that the user con rms those mapping suggestions
of the previous iteration, which are required to answer a given user query over
ontology O. Con rmed suggestions are used as input for the next iteration to
produce better suggestions for follow-up queries. This is in contrast to many
other existing approaches (including the original Similarity Flooding algorithm)
that return a mapping for the complete source and target schema only once.</p>
        <p>IncMap is designed as a framework and provides di erent knobs to control
which extensions to use and within each extension which concrete variants to
choose (e.g., to select a concrete strategy for activating inactive edges). The goal
of this section is to present IncMap with all its variants and to show their bene ts
for di erent real-world data sets in our experimental evaluation in Section 5. A
major avenue of future work is to apply optimization algorithms to nd the best
con gurations of IncMap for a given ontology O and schema R automatically
by searching the con guration space based on the knobs presented before.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2 Basic Matching in IncMap</title>
        <p>As already mentioned, in the basic version of IncMap, we simply apply the
Similarity Flooding algorithm for the two IncGraphs produced for a relational
schema R and for an ontology O similar to the process as described in Section 2.</p>
        <p>As a rst step, IncMap generates the PCG (i.e., a combined graph which pairs
similar nodes of both input IncGraphs) using an initial lexical matching, which
supports interchangeable matchers as one knob for con guration. One di erence
is the handling of inactive ref-edges in the input IncGraphs. For inactive
refedges, which are not handled in the original Similarity Flooding, we apply the
following rule when building the PCG: if an edge in the PCG refers to at least one
inactive ref-edge in one of the IncGraph models, it also becomes inactive in the PCG.</p>
        <p>
          In addition, other than in the original Similarity Flooding approach, where
propagation coe cients for the IPG are ultimately determined during graph
construction, our propagation coe cients can be calculated several times when the
graph changes with the activation and deactivation of edges. Also, propagation
coe cients in IncMap are modular and can be changed. In particular, a new
weighting formula supported by IncMap considers the similarity scores on both
ends of an edge in the IPG. The intuition behind this is that a higher score
indicates better chances of the match being correct. Thus, an edge between two
matches with relatively high scores is more relevant for the structure than an
edge between one isolated well-scored match and another with a poor score. For
calculating the weight w(e) of a directed edge e = (n1; n2) from n1 to n2 in the
IPG where l is the label of the edge, we currently use two alternatives:
{ Original Weight as in [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] : w(e) = 1=outl where outl is the number of edges
connected to node n1 with the same label l
{ Normalized Similarity Product : w(e) = (score(n1) score(n2))=outl.
        </p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3 Extended IncMap: Iterative User Feedback</title>
        <p>Query-driven incremental mappings allow to leverage necessary user feedback
after each iteration to improve the quality of mapping suggestions in subsequent
iterations. One of the reasons why we have chosen Similarity Flooding as a basis
for IncMap is the fact that user feedback can be integrated by adopting the
initial match scores in an IPG before the x-point computation starts.</p>
        <p>
          Though the possibility of an incremental approach has been mentioned
already in the Similarity Flooding paper [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], it so far has not been implemented
and evaluated. Also, while it is simple to see where user feedback could be
incorporated in the IPG, it is far less trivial to decide which feedback should be
employed and how exactly it should be integrated in the graph. In this paper we
focus on leveraging only the most important kind of user feedback, i.e., the
previous con rmation and rejection of suggested mappings. We have devised three
alternative methods how to add this kind of feedback into the graph.
        </p>
        <p>First, as a con rmed match corresponds to a certain score of 1:0, while a
rejected match corresponds to a score of 0:0, we could simply re-run the x-point
computation with adjusted initial scores of con rmed and/or rejected matches.
We consequently name this rst method Initializer. However, there is a clear
risk that the in uence of such a simple initialization on the resulting mapping is
too small as scores tend to change rapidly during the rst steps of the x-point
computation.</p>
        <p>To tackle this potential problem, our second method guarantees maximum
in uence of feedback throughout the x-point computation. Instead of just
initializing a con rmed or rejected match with their nal score once, we could
repeat the initialization at the end of each step of the x-point computation after
normalization. This way, nodes with de nite user feedback in uence their
neighborhood with their full score during each step of the computation. We therefore
call this method Self-Con dence Nodes. However, as scores generally decrease in
most parts of the graph during the x-point computation and high scores become
more important for the ranking of matches in later x-point computation steps,
this method implies the risk of over-in uencing parts of the graph. For example,
one con rmed match in a partially incorrect graph neighborhood would almost
certainly move all of its neighbors to the top of their respective suggestion lists.</p>
        <p>Finally, with our third method, we attempt to balance the e ects of the
previous two methods. We therefore do not change a con rmed match directly but
include an additional node in IPG that can indirectly in uence the match score
during the x-point computation. We name this method In uence Nodes. By
keeping the scores of those additional in uence nodes invariant we ensure
permanent in uence throughout all steps of the x-point computation. Yet, the
inuence node only indirectly a ects the neighborhood of con rmed nodes through
the same propagation mechanism that generally distributes scores through the
graph.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimental Evaluation</title>
      <p>The main goal of IncMap is to reduce the human e ort for constructing
mappings between existing relational database schemata and ontologies. Mapping
suggestions are intended to be used only after they have been validated by a
user. Thus, there are two relevant evaluation measures: rst, the percentage of
the mappings in the reference mappings that can be represented by IncMap.
We specify this percentage for all reference mappings when introducing them.
Certain complex mappings (e.g., mappings performing data transformations)
cannot be represented by IncMap. These complex mappings are rare in all
realworld reference mappings we used in this paper. The second and most important
measure is the amount of work that a user needs to invest to transform a set
of mapping suggestions into the correct (intended) mappings. As the latter is
the most crucial aspect, we evaluate our approach by measuring the work time
required to transform our suggestions into the existing reference mappings.
5.1</p>
      <sec id="sec-5-1">
        <title>Relational Schemata and Ontologies</title>
        <p>
          To show the general viability of our approach, we evaluate IncMap in two
scenarios with fairly di erent schematic properties. In addition to showing the key
bene ts of the approach under di erent conditions, this also demonstrates how
the impact of modular parameters varies for di erent scenarios.
IMDB and Movie Ontology. As a rst scenario, we evaluate a mapping from the
schema of well known movie database IMDB4 to the Movie Ontology [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. With
27 foreign keys connecting 21 tables in the relational schema and 27 explicitly
modeled object properties of 21 classes in the ontology, this scenario is average
4 http://www.imdb.com
in size and structural complexity. The reference mappings we use to derive
correspondences for this scenario5 has been made available by the -ontop- team [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
A set of example queries is provided together with these reference mappings.
We use these to construct annotations for user queries as well as to structure
our incremental, query-by-query experiments. We extract a total of 73 potential
correspondences from this mapping, 65 of which can be represented by IncMap
as mapping suggestions. This corresponds to 89% of the mappings that could be
represented in IncMap.
        </p>
        <p>
          MusicBrainz and Music Ontology. The second scenario is a mapping from the
MusicBrainz database6 to the Music Ontology [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. The relational schema
contains 271 foreign keys connecting 149 tables, while the ontology contains 169
explicitly modeled object properties and 100 classes, making the scenario both
larger and more densely connected than the previous one. Here we use R2RML
reference mappings that have been developed in the project EUCLID.7 As there
were no example queries provided with the mapping in this case, we use
example queries provided by the Music Ontology for user query annotations and to
structure the incremental experiment runs.
        </p>
        <p>For these reference mappings, two out of 48 correspondences cannot be
represented as mapping suggestions by IncMap as they require data transformations.
This corresponds to 95:8% of the mappings that could be represented in IncMap.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2 Work Time Cost Model</title>
        <p>We evaluate our algorithms w.r.t. reducing work time (human e ort). As the
user feedback process always needs to transform mapping suggestions generated
by IncMap into the correct mappings (i.e. to achieve a precision and recall of
100%), the involved e ort is the one distinctive quality measure. To this end, we
have devised a simple and straightforward work time cost model as follows: we
assume that users validate mappings one by one, either accepting or rejecting
them. We further assume that each validation, on average, takes a user the same
amount of time tvalidate. The costs for nding the correct correspondence for any
concept in this case is identical with the rank of the correct mapping suggestion
in the ranked list of mapping suggestions for the concept times tvalidate.</p>
        <p>As IncMap is interactive by design and would propose the user one mapping
suggestion after another, this model closely corresponds to end user reality. We
are aware that this process represents a simpli cation of mapping reality where
users may compile some of the mappings by other means for various reasons.
Nevertheless, this happens in the same way for any suggestion system and therefore
does not impact the validity of our model for the purpose of comparison.</p>
      </sec>
      <sec id="sec-5-3">
        <title>5.3 Experimental Evaluation</title>
        <p>Experiment 1 { Naive vs. IncGraph. In our rst experiment we compare the
e ort required to correct the mapping suggestions when the schema and
ontol5 https://babbage.inf.unibz.it/trac/obdapublic/wiki/Example MovieOntology
6 http://musicbrainz.org/doc/MusicBrainz Database
7 http://euclid-project.eu</p>
        <p>IMDB: Naive Similarity Flooding vs. IncGraph</p>
        <p>IncGNraaipveh [[iinniittiiaall]]</p>
        <p>Naive [final]</p>
        <p>IncGraph [final]
1400
1300
1200
1100
1000
] 900
s
ion 800
t
[ca 700
ftE 500
fro 600
400
300
200
100
0
800
700
600
]s500
n
o
it
[ca400
tr
o
ffE300
200
100
0</p>
        <p>Random</p>
        <p>IMDB: Incremental Runs</p>
        <p>Non-Incremental</p>
        <p>Initializer
Self-Confidence Nodes</p>
        <p>Influence Nodes
LS Similarity Inverse LS Dist. Random LS Similarity Inverse LS Dist.</p>
        <p>(a) Naive vs. IncGraph
6000 Music Ontology: IncrementaNloRn-uInncsremental</p>
        <p>Initializer
5000 Self-CoInnffliudeennccee NNooddeess
ogy are represented naively, or as IncGraphs. Additionally, we vary the lexical
matcher used for the initial mapping between randomly assigned scores (minimal
base line), Levenshtein similarity and inverse Levenshtein distance. Figure 2(a)
shows that IncGraph in all cases works better than the naive approach. As
IncMap reliably improves the mapping for all con gurations, it also underlines the
ability of IncMap to operate in a stable manner with di erent initial matchers.
Experiment 2 { Incremental Mapping Generation. Finally, we evaluated the
best previous con gurations incrementally, i.e., leveraging partial mappings.
Figure 2(b) illustrates the e ects on the total e ort. We show total e ort for all
three incremental methods, for di erent propagation coe cients. Most signi
cantly, incremental evaluation reduces the overall e ort by up to 50% 70%.
More speci cally, Self-Con dence Nodes and In uence nodes work much better
than the naive Initializer approach.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6 Related Work</title>
      <p>
        Many existing mapping systems rely on two-step mapping procedures: They
employ lexical similarity of terms together with structural similarity of the
structures ([
        <xref ref-type="bibr" rid="ref13 ref14 ref15">13,14,15</xref>
        ] or [
        <xref ref-type="bibr" rid="ref16 ref17">16,17</xref>
        ] for surveys). A very few of them rely on variations of
Similarity Flooding to perform the latter task. However, to the best of our
knowledge, all of these approaches focus on ontology-to-ontology rather than relational
schema-to-ontology mappings. RiMOM [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] performs a multi-strategy mapping
discovery between ontologies and performs mappings using a variant of the
Similarity Flooding algorithm, while it relies on structural similarities of ontologies
derived from sub-class and sub-property relationships, rather than connectivity
of classes via properties as we do in order to get a better alignment of relational
schemata and ontologies. In Yamm++ [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] the authors used Similarity
Flooding and exploit both sub-class and sub-property relationships, and domain and
ranges of ontologies, while they did it in a naive way which, as our experimental
results showed, does not give good results for relational schemata-to-ontology
mappings. Moreover, they use Similarity Flooding to obtain new mappings on
top of the ones obtained via linguistic similarities, while we do not derive new
mappings but re ne the ranking over the linguistically derived ones. There are
works on semi-automatic discovery of relational schema-to-ontology mappings,
but they use approaches di erent from ours: For example, [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] transforms
relational schemata and ontologies into directed labeled graphs respectively and
reuse COMA [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] for essentially syntactic graph matching. Ronto [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] uses a
combination of syntactic strategies to discover mappings by distinguishing the
types of entities in relational schemata. The authors of [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ] exploit structure of
ontologies and relational schemata by calculating the con dence measures
between virtual documents corresponding to them via the TF/IDF model. All these
approaches do not incorporate implicit schema information and do not support
an incremental mapping construction in the pay as you go fashion as IncMap
does. Finally, [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] describes an approach to derive complex correspondences for
a relational schema-to-ontology mapping using simple correspondences as input.
This work is orthogonal to the approach presented in this paper.
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Conclusions and Outlook</title>
      <p>We presented IncMap, a novel semi-automatic matching approach for
generating relational schema-to-ontology mappings. Our approach is based on a novel
uni ed graph model called IncGraph for ontologies and relational schemata.
IncMap implements a semi-automatic matching approach to derive mappings from
IncGraphs using both lexical and structural similarities between ontologies and
relational schemata. In order to nd structural similarities IncMap exploits both
explicit and implicit schema information. Moreover, IncMap allows to
incorporate user queries and user feedback in an incremental way, thus, enabling a pay
as you go fashion of the mapping generation. Our experiments with IncMap on
di erent real-world relational schemata and ontologies showed that the e ort for
creating a mapping with IncMap is up to 20% less than using the Similarity
Flooding algorithm in a naive way. The incremental version of IncMap reduces
the total e ort of mapping creation by another 50% 70%. As future work we
plan to follow three lines: (1) add more implicit schema information
(annotations) to the IncGraphs, (2) support more complex mappings in IncMap, and
(3) devise a search strategy over the con guration space to auto-tune IncMap.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>This work was supported by the Seventh Framework Program (FP7) of the
European Commission under Grant Agreement 318338, the Optique project.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Beyer</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lapkin</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gall</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feinberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sribar</surname>
          </string-name>
          , V.T.:
          <article-title>`Big Data' is Only the Beginning of Extreme Information Management</article-title>
          .
          <article-title>Gartner rep</article-title>
          .
          <source>G00211490</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Crompton</surname>
          </string-name>
          , J.: Keynote talk at the W3C Workshop on Sem. Web in Oil &amp; Gas
          <string-name>
            <surname>Industry</surname>
          </string-name>
          (
          <year>2008</year>
          ) http://www.w3.org/
          <year>2008</year>
          /12/ogws-slides/Crompton.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. SAP HANA Help: http://help.sap.com/hana/html/sql export.
          <source>html</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacomo</surname>
            ,
            <given-names>G.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Linking Data to Ontologies</article-title>
          .
          <source>J. Data Semantics</source>
          <volume>10</volume>
          (
          <year>2008</year>
          )
          <volume>133</volume>
          {
          <fpage>173</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kontchakov</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toman</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zakharyaschev</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>The Combined Approach to Ontology-Based Data Access</article-title>
          . In: IJCAI. (
          <year>2011</year>
          )
          <volume>2656</volume>
          {
          <fpage>2661</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Hepp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wechselberger</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>OntoNaviERP: Ontology-Supported Navigation in ERP Software Documentation</article-title>
          . In: International Semantic Web Conference. (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Blunschi</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jossen</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kossmann</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mori</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stockinger</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>SODA: Generating SQL for Business Users</article-title>
          .
          <source>PVLDB</source>
          <volume>5</volume>
          (
          <issue>10</issue>
          ) (
          <year>2012</year>
          )
          <volume>932</volume>
          {
          <fpage>943</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Melnik</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia-Molina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>Similarity Flooding: A Versatile Graph Matching Algorithm and its Application to Schema Matching</article-title>
          . In: ICDE, IEEE Computer Society (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>OWL 2 Web Ontology Language Structural Speci cation</article-title>
          and
          <string-name>
            <surname>Functional-Style Syntax</surname>
          </string-name>
          (
          <year>2012</year>
          )
          <article-title>W3C Rec</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Bouza</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>MO { The Movie Ontology</article-title>
          , http://www.movieontology.org (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Rodriguez-Muro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>High Performance Query Answering over DL-Lite Ontologies</article-title>
          . In: KR. (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Raimond</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giasson</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , (eds): Music Ontology, www.musicontology.com (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Jimenez-Ruiz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grau</surname>
            ,
            <given-names>B.C.</given-names>
          </string-name>
          :
          <article-title>LogMap: Logic-Based and Scalable Ontology Matching</article-title>
          . In: International Semantic Web Conference (
          <volume>1</volume>
          ). (
          <year>2011</year>
          )
          <volume>273</volume>
          {
          <fpage>288</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lambrix</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , H.:
          <article-title>SAMBO { A system for aligning and merging biomedical ontologies</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>4</volume>
          (
          <issue>3</issue>
          ) (
          <year>2006</year>
          )
          <volume>196</volume>
          {
          <fpage>206</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Fagin</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haas</surname>
            ,
            <given-names>L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Popa</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velegrakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Clio: Schema Mapping Creation and Data Exchange</article-title>
          .
          <source>In: Conceptual Modeling: Foundations and Applications</source>
          . (
          <year>2009</year>
          )
          <volume>198</volume>
          {
          <fpage>236</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Shvaiko</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Euzenat</surname>
          </string-name>
          , J.: Ontology Matching:
          <article-title>State of the Art and Future Challenges</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          .
          <volume>25</volume>
          (
          <issue>1</issue>
          ) (
          <year>2013</year>
          )
          <volume>158</volume>
          {
          <fpage>176</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Rahm</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>A Survey of Approaches to Automatic Schema Matching</article-title>
          . In: VLDB J. (
          <year>2001</year>
          )
          <volume>334</volume>
          {
          <fpage>350</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>RiMOM: A Dynamic Multistrategy Ontology Alignment Framework</article-title>
          .
          <source>IEEE Trans. Knowl. Data Eng</source>
          . (
          <year>2009</year>
          )
          <volume>1218</volume>
          {
          <fpage>1232</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Ngo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bellahsene</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          : YAM++
          <article-title>: A Multi-strategy Based Approach for Ontology Matching Task</article-title>
          . In: EKAW. (
          <year>2012</year>
          )
          <volume>421</volume>
          {
          <fpage>425</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Dragut</surname>
            ,
            <given-names>E.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lawrence</surname>
          </string-name>
          , R.:
          <article-title>Composing Mappings Between Schemas Using a Reference Ontology</article-title>
          . In: CoopIS/DOA/ODBASE (1). (
          <year>2004</year>
          )
          <volume>783</volume>
          {
          <fpage>800</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Do</surname>
            ,
            <given-names>H.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahm</surname>
          </string-name>
          , E.:
          <article-title>COMA { A System for Flexible Combination of Schema Matching Approaches</article-title>
          . In: VLDB. (
          <year>2002</year>
          )
          <volume>610</volume>
          {
          <fpage>621</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Papapanagiotou</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katsiouli</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tsetsos</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anagnostopoulos</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hadjiefthymiades</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Ronto:
          <article-title>Relational to Ontology Schema Matching</article-title>
          .
          <source>In: AIS SIGSEMIS BULLETIN</source>
          . (
          <year>2006</year>
          )
          <volume>32</volume>
          {
          <fpage>34</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Discovering Simple Mappings Between Relational Database Schemas and Ontologies</article-title>
          . In: ISWC/ASWC. (
          <year>2007</year>
          )
          <volume>225</volume>
          {
          <fpage>238</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>An</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgida</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mylopoulos</surname>
          </string-name>
          , J.:
          <article-title>Inferring Complex Semantic Mappings Between Relational Tables and Ontologies from Simple Correspondences</article-title>
          .
          <source>In: OTM Conferences (2)</source>
          . (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>