Learning data-consistent mappings from a
         relational database to an ontology

                               Benjamin Habegger

                    Dipartimento di Informatica e Sistemistica
                       Università di Roma 1 “La Sapienza”
                       Via Salaria 113, 00198 Rome, Italy
                        Email: habegger@dis.uniroma1.it


      Abstract. Determining how to map information between different rep-
      resentational models is an essential task of semantic integration. Up to
      now, most research in this field has concentrated on the problem of find-
      ing pairs of elements which could have similar semantics (ie. finding
      matches). From the work we have seen, it is not clear that any guarantees
      can be given that the obtained matches respect the intended semantics
      of the target model. We also argue that matches are not sufficient to
      clearly define mappings which respect the intended interpretation of the
      target model. In this paper, we define mapping consistency with respect
      to an intended interpretation and derive a notion of data-consistency, a
      necessary condition for consistency. From this, we propose a relational
      learning algorithm to construct data-consistent mappings.


1   Introduction
Building mappings between schema and/or ontologies is one of the central is-
sues of semantic integration [1]. Hand crafting mappings is a complex, tedious
and error-prone task. Much research on how to (partially) automate semantic
integration has been lead in both the database [2, 3] and ontology [4] communi-
ties. Most of this work has concentrated on finding correspondences, also called
matches, between schema/ontology elements using some form of heuristics.
    There are two major drawbacks to existing approaches. Firstly, the ap-
proaches we have knowledge of do not rely on any formal notion of intended
interpretation. Most implicitly rely on the quite arguable intuition that two sep-
arately conceived models of similar “worlds” (eg. a computer science department)
will contain similar concepts/roles with similar characteristics. They therefore
can give no guarantee on the correctness (with respect to the intended inter-
pretation) of the obtained matchings. Secondly, having matches is far from be-
ing sufficient in order to correctly be able to map data between representations.
Correct reasoning (and in particular querying) over integrated ontologies and/or
databases requires having mappings which respect the intended interpretation
of the integration system. In formal frameworks which have been proposed for
both database and ontology integration [5, 1], mappings usually take the form of
complex queries. The few systems which do allow to build mappings [6], require
extra knowledge (eg. foreign keys) and only produce mapping suggestions with
no guarantee of their correctness.
    In this paper, we propose a framework defining mapping consistency with
respect to an intended interpretation (ie. the one given by an expert) and de-
rive a notion of data-consistency allowing to check if a mapping is compatible
with a previously given set of positive and negative examples. Given that the
examples are in accordance with the intended interpretation, data-consistency
is a necessary condition for consistency. From this we adapt existing inductive
logic programming techniques to build data-consistent mappings. We consider
here the problem of mapping a relational database into an ontology in a global
as view setting. There are mainly two aspects which make mapping databases
(rather than ontologies) a particular problem. First, databases usually have very
few (if any) intentional knowledge which can be exploited to build mappings.
Second, the objects which are described by a relational database do not have
an explicit representation as they do in ontologies. The mappings we consider
consist of a set of horn clauses where the head of each clause is a concept or role
of the target ontology and the body is a conjunctive query over the relations of
the source schema. In the work presented here, we will consider the case where
each concept or role of the target schema can be described by a unique rule.
    The contributions of this paper are the following : (1) We introduce the notion
of mapping consistency with respect to a given intended interpretation. (2) We
show how a reduced form of consistency, data-consistency, can be tested in the
presence of examples. (3) We propose a relational learning algorithm allowing
to build data-consistent mappings.
    This paper is organized as follows. Section 2 presents related work in semantic
integration an positions our work. Section 3 presents how we handle instance-
level mappings. Section 4 introduces our working example. Mapping consistency
and data-consistency are presented in section 5. Section 6 proposes a framework
which allows to learn data-consistent mappings. Finally, in section 7 we conclude
and discuss future work.


2   Background and related work

Building mappings from one model to another requires to overcome multiple
difficulties both at the conceptual and instance levels. At the conceptual level
choices such as which information should be made explicit, how detailed the
information should be or how it should be organized may differ. For example, one
model might explicitly define a CSP rof essor concept while in another model
this information might be left implicit (as a professor who teaches a computer
science course). At the instance level, different levels of detail may exist between
attribute values. For example, one source may split into pieces an “address” field
which another source may represent as a whole. Another important aspect at
the instance level is that sharing instances between sources requires having a
common naming. The major difficulty to overcome is that any instance (eg. a
particular person) should be represented by a unique object identifier in a target
ontology while it is represented as a particular instance of a table in a database.
    Integrating data from one source to another has two major requirements :
(1) knowing how instances are represented (identified) in each of the models
and how to map from one representation to the other and (2) knowing how to
translate relationships which hold between instances. In both cases, mappings
may be used to represent the relationships. The problems of instance mapping
and schema mapping respectively seek in resolving these two requirements.
    Defining mappings between sources is a tedious and error-prone task. Re-
searchers have therefore looked to (partially) automate semantic integration. Up
to now most research, whether in the database [2] or the ontology [4] commu-
nities, has concentrated on methods which semi-automatically search for corre-
spondences between schema elements (eg. attributes in the relational case). A
correspondence between a set of elements A of one schema to a set of elements
B of another schema is called a match (noted A ∼   = B). When a set of elements is
greater than one, one must also define how to combine the values. A particular
match might put into correspondence an attribute name of a relation P ersonS
of a source schema with the pair of attributes f irstN ame and lastN ame of a
M emberT relation of a target schema using the concatenation function to join
the first and last names.
    Different approaches to finding matches have been proposed. Many systems
(eg. [7–10]) search for pairs of “similar” elements according to different heuris-
tics. Typical heuristics are of the form, “concepts with similar names are likely to
be similar”. The heuristics are usually formalized as similarity measures which
may take into account different types of information including for example the
names and textual descriptions (eg. comments in the schema) of the elements,
constraints on the values the elements can take (eg. primary key, type, unique-
ness). [11] give a list of the most commonly used informations. Most approaches
also consider the similarity of related elements. [7] base their similarity measures
on the basis of a distance to elements known a priori to be the same.
    Other approaches such as the LSD [12] and GLUE [13] systems make use
of machine learning over instance data to build matches. LSD learns classi-
fiers based on previously hand-coded matches to predict how to match unseen
schemas and combines multiple classifiers some of which rely on having instance
data. GLUE learns to predict similarity between two elements of two sources by
estimating the joint probability that an instance belongs to both elements. Until
recently most systems were only able to suggest one-to-one matches between
elements. More recent systems such as iMAP allow to suggest complex matches
(eg. price = rate ∗ (1 + taxes)) [14]. The suggestions are obtained by multiple
predefined searchers such as text searchers looking for concatenations.
    While matches can be helpful to integrate data, we argue that they are
not sufficient to define mappings, since a set of matches can be interpreted in
several ways and some information may be missing. For example, knowing that
P rof essorT matches M emberS does not allow to determine a mapping such as

         P rof essorT (x) ← M emberS (x), P rof ession(x, ”prof essor”)
In the literature there have only been few attempts to (partially) automate map-
ping construction. An example, is the Clio system [6], which suggests mappings
given a set of matches. In particular it searches for joins between relations by
following foreign keys. Depending on existing matches and the possible presence
of constraints can be restrictive. This work does provide a notion of correctness
with respect to the constraints of the source and target schemas but not to the
intended interpretation. The approach proposed in this paper suggests learning
by examples the constraints which must be satisfied by the data to correctly
reflect the intended meaning the mappings should expose.
    In this paper, we consider using machine learning to build the mappings of
an integration system between a source database and a target ontology in a
global as view perspective. Following [5], we are building an integration system
I = hG, S, Mi where G (resp. S) is the global ontology (resp. source schema)
expressed in a language LG (resp. LS ) over an alphabet ΣG (resp. ΣG ) and M
is the mapping between G and S. In our framework M is a set of horn clauses
pG (−
    →x ) ← conjS (−
                  →x,−→
                      x ) where pG ∈ ΣG and conjS (−→x,−→
                                                        x ) is a conjunctive query
over ΣS . When learning will we consider a particular interpretation domain
∆ and a given database instance DB of S over ∆ as well as a set of positive
and negative instantiation assertions (resp. KB + and KB − ) over ΣS and ∆. As
usual, the semantics of an ontology (similarly for a database) O defined over an
alphabet ΣO is given by an interpretation I = (∆, ·I ) where ∆ is the domain of
interpretation and ·I is an interpretation function which associates each concept
C (resp. each role R) in ΣO a subset of elements (resp. couples) of ∆.
    The objects of the interpretation domain ∆ will have been obtained after
an initial identification of unique object identifiers as described in section 3.
Our framework currently only considers that the source is a flat database and
therefore contains only positive assertions and no intentional knowledge. Also,
we currently require learning independent mappings for each concept and role
of the target ontology.


3   Mapping instances to objects


                                             T elephone
                                      F name Lname T elephone
               people(”Mary”,”Green”) ”Mary” ”Green”    0768
               people(”Mary”,”Green”) ”Mary” ”Green”    0679
               people(”John”,”White”) ”John” ”White”    0687


        Fig. 1. An example relation and the object identifiers of each tuple


    It is common to hypothesize that both the target and source databases are
defined over a common fixed interpretation domain ∆. In the following we will
be proceeding similarly. However, this requires having a common object space,
ie knowing how to map the representation of objects in the concrete source
database (eg. the instance of the table members whose attribute id have the
value 165) into a unique representation at the conceptual level of the target
ontology (eg. the person ’john’). In this section we describe how such identifiers
can be obtained.
    For any given database, we can consider that each line of a table contained
in the database corresponds to an object. In the most favorable case, primary
and foreign keys have be defined in order to determine relationships between
tables. This information can be used to define object instances and relationships
between them. In some cases no primary keys have been given and connections
between tables have not been made explicit or the keys do not give the correct
view of the data and therefore we will not rely on having such data.
    We will rather simply, consider that the user provides for each relation R a set
of attributes ident(R) allowing to construct an identifier for each instance which
corresponds to the target view he wishes to have of the data. This identifier
will be considered to refer to a unique object. Consider a relation R of a given
database DB and a set of identification attributes K = ident(R) for R. Any
instance of R is a tuple (− →x ) which can be uniquely identified in R by −      →
                                                                                 y =
  R −
    →             R −→                                          →
                                                                −
πK ( x ), where πK ( x ) represents the selection of values in x corresponding to
the attributes K in R. If we have a set of identification attributes for each relation
and associate to each relation a unique functor fR , any object described by the
database can be assigned a unique identifier by applying this functor to the values
of the identification attributes. We can complete the domain of interpretation
of the original database ∆v (v for values) with a domain of interpretation for
objects
                                  R −
                                    (→
                                     x ))/(−→
                       [ 
                ∆o =         fR (πK          x ) ∈ R, K = ident(R)
                      R∈ΣS
    Figure 1 gives an example relation where the attributes F N ame and LN ame
have been specified as the identification attributes. The relation also includes
a specific object identifier where the values correspond to the unique object
identifiers for the table. As the figure shows, an identifier is not necessarily
a primary key for the table. For example, in the figure both “Mary Green”
instances are considered to refer to the same person and which in this case
would have two telephone numbers.
    In the remainder of this work we will consider that identifiers have been
previously determined and added explicitly to the relations of the database and
the domain of interpretation shared between the target ontology and source
database will be the set ∆ = ∆o ∪ ∆v . For readability, we will use short names
for each object. For example, we will use john as an object identifier instead of
a functor notation such as people(”John”, ”W hite”).

4    Introductory example
Let us consider the following example situation. We have a university with dif-
ferent departments, one of which is the computer science department. The uni-
versity wishes to make use of the data which is already made available in existing
department databases. However, it wants to have access to this data given its
own predefined ontology. This requires building mappings between the depart-
ment schema’s and its own ontology. Our working example will consider building
the mappings from the CS department schema to the university ontology.


                  P eopled                            T eamsd
         Oid Ident          N ame         Oid Id        N ame       Director
         mary       1      ”Mary”      dbteam 1      ”Databases”       2
         john       2      ”John”       tcteam 2 ”Theorectical CS”     4
       charlotte 3 ”Charlotte” mlteam 3 ”Machine Learning”             1
         fred       4      ”Fred”      systeam 4      ”Systems”        3
                   T eachesd                     Coursesd
               T eacher Course      Oid Id        N ame     CSCourse
                   2         1       db      1 ”Databases”     true
                   2         3       ilp     2    ”ILP”        true
                   1         2     stats 3 ”Statistics”       false
                   5         4      algo 4 ”Algorithmics”      true
                   4         5    algebra 5     ”Algebra”     false
                             (a) Computer science database
            P rof essoru+                               CSP rof essoru−
                          P rof essoru− CSP rof essoru+
                john                                        jules
                              jules           john
                mary                                      charlotte
                            charlotte         mary
                fred                                         fred
                                 (b) Target concepts


         P rof essoru (X) ←P eopled (X, Y, ), T eamsd ( , , , Y ), T eachess (Y, ).
      CSP rof essoru (X) ←P eopled (X, Y, ), T eamsd ( , , , Y ), T eachesd (X, Y ),
                             Coursesd ( , Y, , ”true”).
                                (c) Target mappings


                   Fig. 2. An example mapping learning problem


   The CS department schema contains four relations : P eopled which describes
the members of the department, T eamsd which describes the teams of the de-
partment and who they are directed by, T eachesd which relates members to the
courses they teach and Coursesd which contains the set of courses members of
the department participate in. In this department, all professors both direct a
team and teach a (computer science) course. This information is implicit and
has not directly been modeled. Figure 2(a) gives an example partial database
using the schema of the CS department. The relations have a d index to denote
that they belong to the department’s schema.
     We will consider two scenarios. In the first, the university ontology contains
(among others) a P rof essoru concept and, in the second, it contains a more spe-
cific CSP rof essoru concept. In the first (resp. second) scenario, the intended
interpretation of P rof essoru (resp. CSP rof essoru ) is any member of the com-
puter science department which teaches any course (resp. a computer science
course) and directs a team. Figure 2(b) gives the set of instances which appear
in the partial database of figure 2(a) and should (resp. should not) belong to
each of the two target concepts and the correct target mappings for each of them.
In the approach we will describe afterwards, these mappings can be learned by
generalizing the descriptions of a set of example instances (not necessarily all)
we know to belong (or not) to the target concept or role.


5   Intended interpretation
In the context of this paper we only consider the alphabet Σ of symbols associ-
ated to the elements of an ontology or schema. In the following, the word schema
may therefore often refer to its associated alphabet.
    During the conception of a schema or ontology, the conceiver has in mind a
specific semantics for the schema elements : their “intended” semantics. In some
way, this means that if the conceiver was presented any “world” to be modeled
by his schema he would be capable of giving the “intended” semantics of his
schema for this world (ie. construct the (extensional) database for which this
“world” is a model). Therefore the conceiver or any person or entity having a
complete understanding of the schema can be modeled as an “expert oracle”.
Definition 1 (Expert oracle). An expert oracle for a given alphabet Σ can be
modeled as a function OΣ which returns for any domain of interpretation ∆ an
interpretation function ·I
    In the previous definition, the oracle is defined as a function and thus each
alphabet yields a unique interpretation for each world. This simply suggests that
there is no ambiguity in the idea the conceiver has of the predicates appearing
in Σ. Once we have an oracle OΣ for our schema, the intended interpretation of
a given world is the one returned by the oracle.
Definition 2 (Intended interpretation). Given a schema Σ an expert oracle
OΣ , and a world ∆, the intended interpretation of Σ over ∆ according to OΣ is
I = (∆, ·I = OΣ (∆)).
    As previously stated, we will consider working over a common domain of
interpretation ∆. Given a schema Σ what remains to be determined by the
intended interpretations is the set of tuples belonging to each relation in Σ. In
the following we will denote this set by ∆I given the intended interpretation I.
    From a general point of view, a mapping from a source schema ΣS to a target
schema ΣT can be seen as a function m which transforms any database over ΣS
into a database over ΣT . To simplify discourse, we will consider a database to be
a set of ground atoms, which is the case when we have no intentional knowledge.
    A mapping m is consistent when the intended interpretation of ΣT (over ∆)
contains the database obtained by applying m to the intended interpretation of
ΣS (over ∆). m is complete when its application to the intended interpretation
of ΣS contains the intended interpretation of ΣT . Whenever it is both consistent
and complete, we say that m is a perfect mapping.
Definition 3 (Mapping consistency and completeness). Let m be a map-
ping from a source schema ΣS to a target schema ΣT , IS and IT respectively
be the intended interpretations for ΣS and ΣT over ∆.
 – m is consistent iff m(∆IT ) ⊆ ∆IS
 – m is complete iff ∆IS ⊆ m(∆IT )
 – m is perfect if it is both complete and consistent (ie. ∆IS = m(∆IT ))
    When working with real databases, we only have access to a subset of ∆.
Furthermore in many cases, the effective database we work with is incomplete in
the sense that some relations holding among objects of ∆ we know of, might not
appear in the database. In some cases it might be inconsistent with the intended
interpretation in that some relations which appear in the effective database do
not hold in the intended interpretation. In the following work on learning map-
pings, we will consider that we may have incomplete data but that it is consistent
with intended interpretation. If Σ denotes a schema, DB denotes the effective
database over Σ and ∆e ⊆ ∆ denotes the subset of objects we have a descrip-
tion for in DB and I denotes the intended interpretation of Σ over ∆ we have
DB ⊆ ∆Ie ⊆ ∆I .
    When considering mapping a source schema S into a target ontology T we
will consider situations where we have an effective database DB S over an effective
subset of objects ∆S ⊆ ∆. Also we will consider two effective sets of instantiation
                          −
assertions KB +T and KB T over an effective subset of objects ∆T ⊆ ∆ (usually
                                            −
∆T ⊆ ∆S will also hold). KB +   T (resp. KB T ) is a subset of assertions we know
that hold (resp. do not hold) among objects of ∆T in ∆I .
    In this context and given a mapping m we can define a notion of consistency
with respect to the effective data we have which we will call data-consistency. Of
course, data-consistency is a necessary condition for consistency.
Definition 4 (Data-consistency). Given the effective databases DB S , KB +T
and (eventually) KB −
                    T , a mapping m is data-consistent with respect to DB S ,
            −                            −
KB +                 +
   T and KB T iff KB T ⊆ m(DB S ) and KB T ∩ m(DB S ) = ∅


6   Learning mapping rules
We now propose an approach to learning mappings which is data-consistent.
As an inductive hypothesis, we will consider that having seen sufficient data
will allow us to consider the mapping to hold for remaining data. Therefore, to
learn a mapping between a source schema ΣS and a target schema ΣT , we will
consider having a database DB S over ΣS and two sets of respectively positive
                                                   −
and negative instantiation assertions KB +
                                         T and KB T .
    In our framework, we consider mappings which are defined as a set of horn
clauses where the predicate of the head is in the target schema and the predicates
of the atoms of the body are in the source schema. A mapping only exists between
the source and target languages if the target concept can be described precisely
using the source predicates. When building mappings, we will consider this to
be true. When learning a target concept C (resp. role R) we will consider the
                             −
restriction of KB +
                  T (and KB T ) to the subset of atoms which belong to C (resp.
R). DB S will however not be restricted in any manner.


6.1   Instance descriptions

When building a mapping for a target concept C (and similarly for a role R),
we are trying to build a mapping rule which with the given database DB S
entails at least the positive examples and no negative examples (ie. ensures data-
consistency). If we consider a particular concept P rof essoru , we are looking for
a description of the instances of the concept P rof essoru using the language
constructs of the schema ΣS . In some sense, we hypothesize that the whole
database describes that each particular instance belongs to the target concept.
For each instance of the target concept, we could build a clause C(o1 ) ← DB(o1 )
where DB(o1 ) denotes that we have a special focus on the instance o1 . The
most specific clause which is consistent with a set of positive instance C + =
{o1 , . . . , on }, would therefore be the least general generalization of the clauses
C(oi ) ← DB(oi ). However, such an approach will likely require many positive
examples in order to generalize over common content of DB and still contain
sub-clauses which are specific to the database itself and not to the examples.

       desc1DB (john) =P eopled (john, 2, ”John”).
           desc1DB (2) =P eopled (john, 2, ”John”), T eaches(2, 1), T eaches(2, 3),
                        T eamsd (tcteam, 2, ”T heoreticalCS”, 4),
                        T eamsd (dbteam, 1, ”Databases”, 2).
       desc2DB (john) =desc1DB (john), desc1DB (2), desc1DB (”John”).
              Fig. 3. Description of level 1 and 2 of the instance john


    Given a particular instance of interest, there are some facts (atoms of the
database) which have a greater importance depending on there “proximity” to
the target instance. Given an instance x, we will denote by desc1DB (x) the set
of atoms which appear in DB where x appears as one of the arguments. For
example, the first equation of figure 2(a) gives the description of john if the
database DB is the one of figure 2(a).
    In order to take into account more distant relationships which could be re-
quired to define the target concepts, the description of an instance/value can be
extended with the descriptions of the instances/values of the atoms to which it is
connected. For example, john is connected to the values 2 and ”John” through
the P eopled relation. The description of john can be extended as show in fig-
ure 3. It should be noted that in the presence of foreign keys, we may restrict
description extensions to only follow foreign keys.
Definition 5 (Description of an instance). Given a database DB and an
instance x, the descriptions of level i are defined recursively as follows :
                        desc1DB (x) = {A(. . . , x, . . .) ∈ DB}
                                         [
                  desciDB (x) =                             desci−1 (y)
                                 y∈instances(desci−1
                                                 DB (x))


    The complete description of an instance x is the fix point obtained by grad-
ually incrementing in level, desc∗DB (x) = descnDB (x) where n is the smallest
integer such that descnDB (x) = descn+1
                                    DB (x). It should be noted that, without any
intentional knowledge, a database is always a finite set of atoms. In this case,
the complete description of any instance always exists and is finite (at worst it
is the whole database itself). Also, given a database DB where the underlying
relational structure which has only one connected component then for any in-
stance x we have desc∗ (x) = DB. Otherwise, desc∗ (x) is the set of atoms which
form the connected component to which x belongs.

6.2   Learning mappings from instance descriptions
In this section we present our mapping learning approach based on the gener-
alization of instance descriptions. The general idea behind our approach is that
the description of an instance can be seen as a query which will at least return
the instance itself as a result. Generalizing the complete descriptions of the set of
instances we know to belong to the target concept will generate a query which
will return the initial instances and a subset of the target concept as long as
the mapping can be described as a conjunctive query over the source database.
However, generalizing the complete descriptions will likely be both infeasible and
generate over specific mappings. The approach we propose is to initially learn
mappings using only a local context (low description depth) which will be grad-
ually augmented if required. The intuition is that, what is likely to characterize
a concept are the features which are the most directly related to the instance.
    The generalization operator we will be using is the least general generalization
(lgg) defined by Plotkin [15] which is based on the notion of θ-subsumption.
Definition 6 (θ-subsumption). A clause C θ-subsumes a clause C 0 (noted
C  C 0 ) iff there exists a substitution θ such that C ⊆ C 0 θ.
    Plotkin’s least general generalization, given two clauses C and C 0 , produces
the most specific (with respect to θ-subsumption) clause lgg(C, C 0 ) which gen-
eralizes both C and C 0 .
Definition 7 (Least general generalization). The least general generaliza-
tion lgg(C1 , C2 ) of two clauses C1 and C2 is a clause C such that : (1) C  C1 ,
(2) C  C2 and (3) no other clause C 0 such that C  C 0 , verifies (1) and (2).
    Plotkin has shown that the least general generalizations of two clauses is
unique and can be calculated in polynomial time. However, the obtained clause
is not necessarily reduced. Reducing a clause requires testing θ-subsumption
between clauses which is know to be NP-complete. Recent work have shown
that efficient implementations [16] remain tractable in many practical cases.
Furthermore, in most mappings will likely not require high depths of descriptions
and therefore the clauses to generalize will likely remain small.


Algorithm 1 Algorithm learn(DB, E + , E − )
Input A database base DB, positive examples E + and negative examples E −
Output A mappings M which covers all E + and no E −
    d ← 0; M ← ∅
    while M is not consistent with the examples do
      {Expand description depth and restart learning}
      d ← d + 1; M ← ∅
      while M consistent with E − and there is a T (−
                                                    →
                                                    v ) ∈ E + do
        C ← descriptiond (−
                          →v)
        M ← lgg(E ← C, M )
      end while
    end while
    return M


    Algorithm 1 gives describes how a mapping is learned given a set of posi-
tive and negative examples. The lgg operator is the one given in definition 7.
The description function calculates the description of an instance as described
previously.
    The mapping generation process can easily be made interactive. The user
starts by given some positive examples from which the system builds a mapping
and applies it to the source ontology. This results in a set of tuples which the
mapping would transform into the target concept. From these results, the user
can either remove instances which have wrongly been added (generating negative
examples) and/or add new instances as positive examples.


7     Conclusion and future work

In this paper we have proposed an approach to learning mappings between a
database containing the effective data and an ontology. An important distinc-
tion with related work is that our approach builds mappings which completely
define how to transform the representation of instances from one model to an-
other. Currently, most existing efforts to automate semantic integration have
concentrated on building matches which only relate schema elements together.
Furthermore, we have propose a notion of data-consistency which allows to have
a minimum of guarantee that the semantics of the obtained mappings are correct.
    The current work is mostly at its beginning. We have already started an im-
plementation of the ideas developed in this paper and plan to finish them in a
near future. We are also willing to test the approach on real-world mapping con-
struction problems. If the tests are successful we plan to extend the framework to
handling multiple sources. The current approach does not yet consider existing
matches which could be made available by using existing semantic integration
approaches. We have started studying how these could be used to generate initial
mappings which could then be corrected with respect to data-consistency.


References
 1. Lenzerini, M.: Data integration: A theoretical perspective. In Popa, L., ed.: PODS,
    ACM (2002) 233–246
 2. Doan, A., Halevy, A.Y.: Semantic Integration Research in the Database Commu-
    nity : A Brief Survey. AI Magazine (2005)
 3. Rahm, E., Bernstein, P.A.: A survey of approaches to automatic schema matching.
    VLDB J. 10 (2001) 334–350
 4. Noy, N.F.: Semantic integration: A survey of ontology-based approaches. SIGMOD
    Record 33 (2004) 65–70
 5. Calvanese, D., Giacomo, G.D., Lenzerini, M.: Ontology of integration and integra-
    tion of ontologies. In Goble, C.A., McGuinness, D.L., Möller, R., Patel-Schneider,
    P.F., eds.: Description Logics. Volume 49 of CEUR Workshop Proceedings., CEUR-
    WS.org (2001)
 6. Popa, L., Velegrakis, Y., Miller, R.J., Hernández, M.A., Fagin, R.: Translating web
    data. In: VLDB. (2002) 598–609
 7. Noy, N.F., Musen, M.A.: The PROMPT suite: interactive tools for ontology merg-
    ing and mapping. International Journal Human-Computer Studies (2003)
 8. Euzenat, J., Valtchev, P.: Similarity-based ontology alignment in owl-lite. In
    de Mántaras, R.L., Saitta, L., eds.: ECAI, IOS Press (2004) 333–337
 9. Palopoli, L., Terracina, G., Ursino, D.: Dike: a system supporting the semi-
    automatic construction of cooperative information systems from heterogeneous
    databases. Softw., Pract. Exper. 33 (2003) 847–884
10. Castano, S., Antonellis, V.D., Melchiori, M.: Artemis: A process modeling and
    analysis tool environment. In Ling, T.W., Ram, S., Lee, M.L., eds.: ER. Volume
    1507 of Lecture Notes in Computer Science., Springer (1998) 168–182
11. Ehrig, M., Sure, Y.: Ontology mapping - an integrated approach. In Bussler, C.,
    Davies, J., Fensel, D., Studer, R., eds.: ESWS. Volume 3053 of Lecture Notes in
    Computer Science., Springer (2004) 76–91
12. Doan, A., Domingos, P., Halevy, A.Y.: Reconciling schemas of disparate data
    sources: A machine-learning approach. In: SIGMOD Conference. (2001)
13. Doan, A., Madhavan, J., Domingos, P., Halevy, A.Y.: Learning to map between
    ontologies on the semantic web. In: WWW. (2002) 662–673
14. Dhamankar, R., Lee, Y., Doan, A., Halevy, A.Y., Domingos, P.: imap: Discover-
    ing complex mappings between database schemas. In Weikum, G., König, A.C.,
    Deßloch, S., eds.: SIGMOD Conference, ACM (2004) 383–394
15. Plotkin, G.D.: A note on inductive generalization. Machine Intelligence 5 (1970)
16. Maloberti, J., Sebag, M.: Fast theta-subsumption with constraint satisfaction al-
    gorithms. Machine Learning 55 (2004) 137–174