<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Knowledge Base Exchange</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marcelo Arenas</string-name>
          <email>marenas@ing.puc.cl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Botoeva</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Diego Calvanese</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Computer Science</institution>
          ,
          <addr-line>PUC</addr-line>
          <country country="CL">Chile</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>KRDB Research Centre, Free Univ. of Bozen-Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we study the problem of exchanging knowledge between two knowledge bases (KBs) connected through mappings, with a special interest in exchanging implicit knowledge, not only data like in the traditional database exchange setting. As representation formalism we use Description Logics (DL) that exhibit a reasonable tradeoff between expressive power and complexity of reasoning. Thus, we assume that the source and target KBs are given as a DL TBox+ABox, while the mappings have the form of DL TBox assertions. We study the problem of translating the knowledge in the source KB according to these mappings. We define a general framework of KB exchange, and specify the problems of computing and representing different kinds of solutions, i.e., target KBs with specified properties, given a source KB and a mapping. We then develop first results and techniques and study the complexity of KB exchange for the case of DL-LiteRDFS, a DL that corresponds to the FOL fragment of RDFS.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        1 Introduction
In data exchange, data structured under one schema (called source schema) must be
restructured and translated into an instance of a different schema (called target schema),
and the way in which this restructuring should occur is specified by means of a mapping
from the source schema to the target schema [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Such a problem has been studied
extensively in recent years, under various choices for the languages used to specify the
source and target schema, and the mappings [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. While incomplete information in this
setting is introduced by the mapping layer (see also [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), one fundamental assumption
in the works on data exchange is that the source is a (completely specified) database.
      </p>
      <p>
        In this paper, we go beyond this setting, and consider data exchange in the case
where implicit knowledge is present in the source, by which new data may be inferred.
We follow the line of the work in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where a general framework for data exchange
is proposed, in which the source data may be incompletely specified, and thus
(possibly infinitely) many source instances are implicitly represented. The framework in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
in based on the general notion of representation system, as a mechanism to represent
multiple instances of a data schema, and considers the problem of incomplete data
exchanges under mappings constituted by a set of tuple generating dependencies (tgds).
      </p>
      <p>We refine that framework to the case where as a representation system we use
Description Logics (DL) knowledge bases (KBs) constituted by a TBox and an ABox,
and where mappings are sets of DL inclusions. While in the traditional data exchange
setting, given a source instance and a mapping specification, (universal) solutions are
target instances derived from the source instance and the mapping, in our case solutions
are target DL KBs, derived from the source KB and the mapping. Besides a notion of
(universal) solution based on the correspondence between models of source and
target KBs, we introduce also the weaker notion of (universal) CQ-solution, based on the
correspondence between answers to conjunctive queries over source and target KBs.</p>
      <p>In our setting where we exchange DL KBs, in order to minimize the exchange (and
hence transfer and materialization) of explicit (i.e., ABox) information, we are
interested in computing universal (CQ-)solutions that contain as much implicit knowledge
as possible. This leads us to define a new problem, called representability, whose goal
is to compute from a source TBox and a mapping, a target TBox that leads to a
universal (CQ-)solution when it is combined with a suitable ABox computed from the source
ABox, independently of the actual source ABox.</p>
      <p>
        We then develop first results and techniques for KB exchange and for the
representability problem in the case of KBs expressed in DL-LiteRDFS, a DL that corresponds
to the FOL fragment of RDFS. DL-LiteRDFS is a fragment of DL-LiteR [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that does not
allow for existential quantification (i.e., concepts of the form 9R) in the right-hand side
of concept inclusions, nor for disjointness assertions.
      </p>
      <p>The paper is organized as follows. In Section 2 we give preliminary notions on DLs
and conjunctive queries (CQs). In Section 3 we define our framework of KB exchange.
In Section 4 we present the techniques for deciding the defined reasoning tasks. Finally,
in Section 5 we draw some conclusions and outline issues for future work.
2</p>
      <p>Preliminaries
We introduce here the necessary notions about the description logic (DL) that we use in
this article, and about conjunctive queries, which we adopt as our query formalism.
2.1</p>
      <p>
        DL-LiteR Knowledge Bases
The DLs of the DL-Lite family [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] of light-weight description logics are characterized
by the fact that reasoning can be done in polynomial time, and that data complexity of
reasoning and conjunctive query answering is in AC0. We adopt here DL-LiteR, and
present now its syntax and semantics.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Let NC , NR, Na be sets of concept, role, and individual names, respectively, and</title>
      <p>assume that A 2 NC and P 2 NR. In DL-LiteR, B and C are used to denote basic
and complex concept descriptions, respectively, and R and Q are used to denote basic
and complex roles, respectively. These concept and roles constructs are defined by the
following grammar:</p>
      <p>R ::= P j P
Q ::= R j :R</p>
      <p>B ::= A j 9R
C ::= B j :B</p>
    </sec>
    <sec id="sec-3">
      <title>In the following, for a basic role R, we use R when R = P . to denote P when R = P , and P</title>
      <sec id="sec-3-1">
        <title>A DL-LiteR TBox is a finite set of concept inclusions B v C and role inclusions</title>
      </sec>
      <sec id="sec-3-2">
        <title>R v Q. A DL-LiteR ABox is a finite set of membership assertions of the form A(a) and</title>
      </sec>
      <sec id="sec-3-3">
        <title>P (a; b), where a; b 2 Na. A DL-LiteR KB K is a pair hT ; Ai, where T is a DL-LiteR</title>
      </sec>
      <sec id="sec-3-4">
        <title>TBox and A is a DL-LiteR ABox. As usual, TBoxes represent implicit knowledge,</title>
        <p>while ABoxes represent the data.</p>
      </sec>
      <sec id="sec-3-5">
        <title>In the following, we will also make use of a restricted form of DL-LiteR TBoxes.</title>
      </sec>
      <sec id="sec-3-6">
        <title>A DL-LiteR TBox is said to be definite if there are only atomic concepts and atomic</title>
        <p>
          roles on the right-hand side of its inclusions. In other words, a definite TBox may not
mention in its right-hand side a concept of the form 9R, and may not contain concept
or role disjointness assertions. We call DL-LiteRDFS the fragment of DL-LiteR obtained
by considering only definite DL-LiteR TBoxes. Intuitively, DL-LiteRDFS corresponds to
the fragment of RDFS [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] that is embeddable in FOL (and hence in DLs).
        </p>
      </sec>
      <sec id="sec-3-7">
        <title>The semantics of DL-LiteR is defined in the standard way. We just remark that</title>
        <p>we use MOD(K) to denote the set of all models of KB K. From now on, we assume
that interpretations satisfy the standard name assumption, that is, we assume given a
fixed infinite set U of individual names, and we assume that for every interpretation
I = h I ; I i, it holds that I U and aI = a for every a 2 I . Notice that this
implies the Unique Name Assumption (UNA), i.e., different individuals are interpreted
as different domain elements.</p>
        <p>A signature is a set of concept and role names. An interpretation I = h I ; I i is
said to be an interpretation of if it is defined exactly on the concept and role names in
. Given a KB K, the signature (K) of K is the alphabet of concept and role names
occurring in K, and K is said to be defined over (or simply, over) a signature if
(K) (and likewise for a TBox T , an ABox A, a concept inclusions B v C, a
role inclusions R v Q, and membership assertions A(a) and R(a; b)).
2.2</p>
        <p>Conjunctive Queries and Certain Answers
A conjunctive query (CQ) over a signature is a first-order formula of the form
q(x) = 9y:conj (x; y), where x, y are tuples of variables and conj (x; y) is a
conjunction of atoms of the form: (1) A(t), with A a concept name in and t either a
constant from U or a variable from x or y, or (2) P (t1; t2), with P a role name in
and ti (i = 1; 2) either a constant from U or a variable from x or y. In a
conjunctive query q(x) = 9y:conj (x; y), x is the tuple of free variables of q(x). A union of
conjunctive queries (UCQ) is a formula of the form: q(x) = Wn
i=1 9yi:conj i(x; yi);
where each conji(x; yi) is as before. A query q (either a CQ or a UCQ) is said to be a
query over a KB K if q is a query over a signature and (K).</p>
        <p>Let q be a CQ 9y1 9y`:conj (x1; : : : ; xk; y1; : : : ; y`) over a signature and
I = h I ; I i an interpretation of . Then the answer of q over I, denoted by qI ,
is defined as the set of tuples (a1; : : : ; ak) of elements from I for which there
exist a tuple (b1; : : : ; b`) of elements from I such that I satisfies every conjunct in
conj (a1; : : : ; ak; b1; : : : ; b`). Moreover, given a UCQ q = Win=1 qi, the answer of q
over an interpretation I, denoted by qI , is defined as Sin=1 qiI . Finally, given a query q
(either a CQ or a UCQ) over a KB K, the answer to q over K, denoted by cert (q; K),
is defined as cert (q; K) = TI2MOD(K) qI . Each tuple in cert (q; K) is called a certain
answer for q over K.</p>
      </sec>
      <sec id="sec-3-8">
        <title>Certain answers in DL-LiteR can be characterized through the notion of chase. We</title>
        <p>
          call a chase a (possibly infinite) set of assertions of the form A(t), P (t; t0), where t, t0
are either individuals, or labeled nulls interpreted as not necessarily distinct domain
elements. For DL-LiteR KBs, we employ the notion of oblivious chase as defined in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-9">
        <title>For such a KB hT ; Ai, the chase of A w.r.t. T , denoted chaseT (A), is a chase obtained from A by adding facts implied by inclusions in T , and introducing labeled nulls whenever required by an inclusion with 9R in the right-hand side (see [4] for details).</title>
        <p>3</p>
        <p>
          Exchanging Knowledge Bases
In this section, we introduce the knowledge exchange framework used in the paper.
The starting point to define this framework is the notion of mapping, which has been
shown to be of fundamental importance in the context of data exchange [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. Formally,
a DL-LiteR-mapping (or just mapping) is a tuple M = ( 1; 2; T12), where 1, 2
are disjoint signatures and T12 is a DL-LiteR TBox whose inclusions are of the form:
(1) C1 v C2, where C1, C2 are complex concepts over 1 and 2, respectively, and
(2) Q1 v Q2, where Q1 and Q2 are complex roles over 1 and 2, respectively.
        </p>
        <p>Let M = ( 1; 2; T12) be a mapping. Intuitively, mapping M specifies how a
knowledge base over the vocabulary 1 should be translated into a knowledge base
over the vocabulary 2. This intuition is formalized in terms of the notion of solution,
which is defined as follows. Given an interpretation I1 of 1 and an interpretation
I2 of 2, pair (I1; I2) satisfies TBox T12, denoted by (I1; I2) j= T12, if for each
concept inclusion C1 v C2 2 T12, it holds that C1I1 C2I2 , and for each role inclusion</p>
      </sec>
      <sec id="sec-3-10">
        <title>Q1 v Q2 2 T12, it holds that Q1I1 Q2I2 . Moreover, given an interpretation I of 1,</title>
        <p>SATM(I) is defined as the set of interpretations J of 2 such that (I; J ) j= T12, and
given a set X of interpretations of 1, SATM(X ) is defined as:</p>
        <p>
          SATM(X ) = SI2X SATM(I):
Then the notion of solution under a mapping is defined as follows, by considering this
notion of satisfaction and the knowledge exchange framework proposed in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Definition 1. Let M = ( 1; 2; T12) be a mapping, K1 a KB over
over 2. Then K2 is said to be a solution for K1 under M if:
1, and K2 a KB
MOD(K2)
        </p>
      </sec>
      <sec id="sec-3-11">
        <title>SATM(MOD(K1)):</title>
      </sec>
      <sec id="sec-3-12">
        <title>That is, K2 is a solution for K1 under M if for every model I2 of K2, there exists a</title>
        <p>model I1 of K1 such that (I1; I2) j= T12.</p>
        <p>
          Let M = ( 1; 2; T12). A KB K1 over 1 can have an infinite number of solutions
under M. Thus, it is natural to ask what is a good solution for this knowledge base. Next
we introduce the notion of universal solution, which is a simple extension of the concept
of solution introduced in Definition 1, and is based on the notion of universal solution
introduced in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Definition 2. Let M = ( 1; 2; T12) be a mapping, K1 a KB over
over 2. Then K2 is said to be a universal solution for K1 under M if:
1, and K2 a KB
MOD(K2) = SATM(MOD(K1)):</p>
      </sec>
      <sec id="sec-3-13">
        <title>In the preceding definition, KB K2 is considered to be a good solution for KB K1 under</title>
        <p>mapping M as the models of K2 exactly correspond to the valid translations of the
models of K1 according to M. We illustrate Definitions 1 and 2 in an example.
Example 1. Let M = ( 1; 2; T12), where 1 = fA1; B1g, 2 = fA2; B2g, and
T12 = fA1 v A2; B1 v B2g. Furthermore, assume that K1 = hT1; A1i, where
T1 = fB1 v A1g and A1 = fB1(a)g. Then the following knowledge bases over 2
are solutions for K1 under M:</p>
        <p>K2 = hT2; A2i;
K20 = hT20; A02i;
where
where</p>
        <p>T2 = ;;
T20 = fB2 v A2g;</p>
        <p>A2 = fB2(a); A2(a)g
A02 = fB2(a)g</p>
      </sec>
      <sec id="sec-3-14">
        <title>Moreover, K2 is a universal solution for K1 under M, while K20 is not. In fact, if I2 is an interpretation of 2 such that: I2 = fa; bg, A2I2 = fag, and B2I2 = fa; bg, then we have that I2 62 MOD(K20) since I2 does not satisfy inclusion B2 v A2, but I2 2 SATM(MOD(K1)) since (I1; I2) j= T12 for I1 2 MOD(K1) defined as I1 = fag,</title>
        <p>AI1 = fag and BI1 = f g
1 1 a .</p>
      </sec>
      <sec id="sec-3-15">
        <title>In the previous example, K20 is not a universal solution for K1 under M since inclusion</title>
      </sec>
      <sec id="sec-3-16">
        <title>B2 v A2 cannot be deduced from the information in K1 and M. Or, more formally,</title>
      </sec>
      <sec id="sec-3-17">
        <title>K20 is not a universal solution as B2 v A2 is not implied by hT1 [ T12; A1i. However,</title>
      </sec>
      <sec id="sec-3-18">
        <title>K20 can also be considered as a solution of K1 that is desirable to materialize, as the</title>
        <p>implicit knowledge in K2 (i.e., TBox T2) represents the implicit knowledge in K1 (i.e.,</p>
      </sec>
      <sec id="sec-3-19">
        <title>TBox T1), given the way that concepts A1 and B1 have to be translated according to</title>
        <p>mapping M. In fact, solution K20 is as good as solution K2 in terms of the information
that can be extracted from them by using some specific queries, but with the advantage
that K20 represents knowledge in a more compact way. In what follows, we introduce a
new class of good solutions that captures this intuition with respect to the widely used
fragment of CQs.</p>
        <p>Definition 3. Let M = ( 1; 2; T12) be a mapping, K1 = hT1; A1i a KB over 1,
and K2 a KB over 2. Then K2 is said to be a CQ-solution for K1 under M if for every
CQ q over 2, we have that cert (q; hT1 [ T12; A1i) cert (q; K2).</p>
        <p>Moreover, K2 is said to be a universal CQ-solution for K1 under M if for every CQ
q over 2, we have that cert (q; hT1 [ T12; A1i) = cert (q; K2).</p>
      </sec>
      <sec id="sec-3-20">
        <title>Notably, we have in Example 1 that both KB K2 and KB K20 are universal CQ-solutions for KB K1 under mapping M.</title>
        <p>A natural question at this point is what is the relationship between the notions of
solution presented in this section. The following proposition, which is straightforward
to prove, shows that the notion of (universal) CQ-solution is weaker than the notion of
(universal) solution.</p>
        <p>Proposition 1. Let M = ( 1; 2; T12) be a mapping, K1 a KB over 1, and K2 a
KB over 2. If K2 is a (universal) solution for K1 under M, then K2 is a (universal)
CQ-solution for K1 under M.</p>
        <p>In this paper, we study several fundamental problems related to the notions of solution
presented here. These problems are formally introduced below.</p>
        <p>
          Knowledge Base Exchange: Reasoning Tasks
In the data exchange scenario [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], as well as in the knowledge exchange scenario [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ],
the problem of materializing solutions has been identified as the fundamental problem
to solve. Given a class U of mappings (for example, the class of DL-LiteR-mappings)
and a DL L (for example, DL-LiteR), the problem of computing solutions is defined as
follows for U and L:
        </p>
      </sec>
      <sec id="sec-3-21">
        <title>PROBLEM : COMPSOL(U ; L)</title>
      </sec>
      <sec id="sec-3-22">
        <title>INPUT : A mapping M 2 U and an L-knowledge base K1 over 1</title>
      </sec>
      <sec id="sec-3-23">
        <title>TO DO : Compute an L-knowledge base K2 over 2 such that K2 is a solution for K1 under M</title>
      </sec>
      <sec id="sec-3-24">
        <title>Given a class U of mappings and a DL L, the computation problems</title>
      </sec>
      <sec id="sec-3-25">
        <title>COMPUNIVSOL(U ; L), COMPCQSOL(U ; L), and COMPUNIVCQSOL(U ; L) are de</title>
        <p>fined exactly as above, but considering universal solutions, CQ-solutions, and universal
CQ-solutions instead of solutions, respectively.</p>
        <p>In the rest of this paper, we study these problems, and some other fundamental
problems associated to them, for a restriction of the class of mappings introduced in
this section. More precisely, a mapping M = h 1; 2; T12i is said to be definite if T12
is a DL-LiteRDFS TBox (recall the definition of definite TBoxes in Section 2.1). We use</p>
      </sec>
      <sec id="sec-3-26">
        <title>Udef to denote the class of definite mappings. Then, as our first result, we obtained that</title>
        <p>the chase can be used to compute universal solutions for definite mappings and
DL</p>
      </sec>
      <sec id="sec-3-27">
        <title>LiteRDFS TBoxes. In what follows, let chaseT ; (A) denote the projection of chaseT (A)</title>
        <p>on the signature .</p>
        <p>Proposition 2. Let M
DL-LiteRDFS KB over
for K1 under M.</p>
        <p>= ( 1; 2; T12) be a definite mapping and K1 = hT1; A1i a
1. Then h;; chaseT12; 2 (chaseT1 (A1))i is a universal solution</p>
      </sec>
      <sec id="sec-3-28">
        <title>Thus, given that for definite mappings and DL-LiteRDFS TBoxes, the sets chaseT1 (A1)</title>
        <p>
          and chaseT12; 2 (chaseT1 (A1)) are always finite and can be computed in polynomial
time [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], we obtain as a corollary of Propositions 1 and 2 that the problems of computing
solutions defined above can be solved in polynomial time for definite mappings and
DLLiteRDFS TBoxes.
        </p>
        <p>Corollary 1. COMPSOL(Udef ; DL-LiteRDFS), COMPUNIVSOL(Udef ; DL-LiteRDFS),
COMPCQSOL(Udef ; DL-LiteRDFS), and COMPUNIVCQSOL(Udef ; DL-LiteRDFS) can all
be solved in polynomial time.</p>
        <p>Unfortunately, the solutions obtained by using Proposition 2 are of little interest to us
because the generated target ABoxes can be very large. Hence, we turn our attention to
the case of non-empty target TBoxes and, more specifically, to the problem of
computing universal CQ-solutions that include as much implicit knowledge as possible. This
gives rise to the following separation properties.</p>
        <p>Definition 4. Let M = ( 1; 2; T12) be a mapping and T1 a TBox over 1.
– T1 is representable in M if there exists a TBox T2 over 2 such that for every
ABox A1 over 1, it holds that hT2; chaseT12; 2 (A1)i is a universal CQ-solution
for hT1; A1i under M. T2 is called a representation of T1 in M.
– T1 is weakly representable in M if there exists a mapping M? = ( 1; 2; T1?2)
such that T12 T1?2, T1 [ T12 j= T1?2 and T1 is representable in M?.</p>
        <p>The separation problems are interesting to us because a positive answer would mean
that we can construct the TBox of a solution by considering only the input TBox and
the mapping, independently of the input ABox. On the other hand, the ABox of the
solution can be found simply by translating the input ABox according to the rules in the
mapping (and the input TBox). We illustrate the notions just defined in the following
example.</p>
        <p>Example 2. Let 1 = fA1; B1g, 2 = fA2; B2g, and T1 = fB1 v A1g. If M =
( 1; 2; T12) with T12 = fA1 v A2; B1 v B2g, then we have that T2 = fB2 v A2g
is a representation of T1 in M.</p>
        <p>On the other hand, if M0 = ( 1; 2; T102) with T102 = fA1 v A2g, then we have
that T1 is not representable in M0: let A01 = fB1(a)g, then chaseT102; 2 (A01) = ; and
fMor0.nHooTwBeovxerT,2i0f, whTe20c;ocnhsaisdeeTr10T2;1?22 =(AT011)02i [is faBu1nivveArs2agl,CwQe-scoolnuctliuodnetothhaTt1T;1Ais01iwuenakdleyr
representable in M0 since T102 T1?2, T1 [ T102 j= T1?2 and T1 is representable in
M? = ( 1; 2; T1?2) (in fact, ; is a representation of T1 in M?). Note, that T102 T12.</p>
        <p>Now, if M00 = ( 1; 2; T1020 ) with T1020 = fA1 v A2; B1 v B2; C1 v B2g,
then again we have that T1 is not representable in M00: let A010 = fB1(a); C1(c)g,
then chaseT1020 ; 2 (A010) = fB2(a); B2(c)g. The query q() A2(a) evaluates to true
in hT1 [ T1020 ; A010i, hence in order for a TBox T200 to be a representation of T1 in M00,
it must contain the inclusion B2 v A2. On the other hand, it implies that the query
q0() A2(c) evaluates to true in hT200; chaseT1020 ; 2 (A010)i, whereas it evaluates to false
in hT1 [T1020 ; A010i. However, again if we consider T1?2? = T1020 [fB1 v A2g, we conclude
that T1 is weakly representable in M00.
4</p>
        <p>Techniques for Deciding Representability
We address now the problem of deciding (weak) representability of a TBox in a
mapping, for the case where TBoxes are expressed in DL-LiteRDFS and mappings are definite.
We start by showing some properties that characterize solutions in terms of chases.</p>
      </sec>
      <sec id="sec-3-29">
        <title>In the following for two chases C1 and C2, a homomorphism from C1 to C2 is a</title>
        <p>mapping h from the individuals and labeled nulls in C1 to those in C2 such that (1) h is
the identity on the individuals, (2) if A(t) 2 C1 then A(h(t)) 2 C2, and (3) if P (t; t0) 2</p>
      </sec>
      <sec id="sec-3-30">
        <title>C1 then P (h(t); h(t0)) 2 C2. We write C1 ! C2 if there is a homomorphism from C1 to</title>
        <p>C2, and C1 C2 if C1 ! C2 and C2 ! C1.</p>
        <p>
          From now on, we assume 1 and 2 as given, and for a mapping M = ( 1; 2; T12),
we use simply M to denote also T12. Then we have the following characterizations of
solutions in terms of chases, which are similar to the characterizations in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
Proposition 3. Let M be a definite mapping, K1 = hT1; A1i a DL-LiteRDFS KB over
1, and K2 = hT2; A2i a DL-LiteRDFS KB over 2. If chaseM; 2 (chaseT1 (A1)) !
chaseT2 (A2); then K2 is a CQ-solution for K1 under M.
Corollary 2. Let M be a definite mapping, K1 = hT1; A1i a DL-LiteRDFS KB over 1,
and K2 = hT2; A2i a DL-LiteRDFS KB over 2. Then K2 is a universal CQ-solution for
K1 under M if chaseM; 2 (chaseT1 (A1)) chaseT2 (A2).
4.1
        </p>
        <p>Checking Representability of a TBox in a Mapping</p>
      </sec>
      <sec id="sec-3-31">
        <title>We address now the representability problem, which is to decide, given a TBox T1 over</title>
      </sec>
      <sec id="sec-3-32">
        <title>1 and a mapping M, whether T1 is representable in M (cf. Definition 4).</title>
        <p>We start by considering the decision problem associated with representability:</p>
      </sec>
      <sec id="sec-3-33">
        <title>Given a mapping M, a TBox T1 over 1, and a TBox T2 over 2, check whether T2 is a representation of T1 in M, i.e., for each ABox A1 over 1, hT2; chaseM; 2 (A1)i is a universal CQ-solution for hT1; A1i under M.</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>For definite mappings and DL-LiteRDFS TBoxes, the decision problem associated with</title>
      <p>representability can be solved in two steps:</p>
      <sec id="sec-4-1">
        <title>1. Check whether for each ABox A1 over 1, hT2; chaseM; 2 (A1)i is a CQ-solution</title>
        <p>for hT1; A1i under M.</p>
      </sec>
      <sec id="sec-4-2">
        <title>2. Check whether for each ABox A1 over 1 and for each CQ q over 2, we have</title>
        <p>that cert (q; hT2; chaseM; 2 (A1)i) cert (q; hT1 [ M; A1i).</p>
      </sec>
      <sec id="sec-4-3">
        <title>If both checks succeed, then T2 is a representation of T1 in M, otherwise it is not. We</title>
        <p>develop now techniques to perform these two tests.</p>
        <p>Step 1: Checking whether for each ABox A1 over
solution for hT1; A1i under M.
1, hT2; chaseM; 2 (A1)i is a
CQWe introduce now some notions that help us to characterize when a mapping is able to
“translate” the inclusions in T1 to the target TBox.</p>
      </sec>
      <sec id="sec-4-4">
        <title>We use pn(X) to denote A if X is A, and P if X is 9P , 9P , P , or P . For X,</title>
        <p>Y DL-LiteRDFS expressions, we say that X is compatible with Y if (i) pn(X) = pn(Y ),
and (ii) if X is 9R, then Y is 9R, R, or R . For a DL-LiteRDFS inclusion = X1 v X2,
we use lhs( ) to denote X1, and rhs( ) to denote X2. We say that is left-compatible
(resp., right-compatible) with X, if X is compatible with lhs( ) (resp., rhs( )).</p>
        <p>Let M be a definite mapping, = N1 v M1 a DL-LiteRDFS inclusion over 1,
and 2 M left-compatible with M1. Then the translation set of and in M,
denoted M( ; ; M), is the set of DL-LiteRDFS inclusions over 2 such that, if there
exists an inclusion in M left-compatible with N1, then 2 M( ; ; M), where is
uniquely defined by , , as shown in Table 1. When the mapping M is clear from
the context, we write M( ; ). We say that is redundant w.r.t. = M10 v M2 (in M)
if M2 v M2 2 M( ; ). We explain these definitions in an example.</p>
        <p>Example 3. Let
= S1 v R1 and</p>
        <p>= 9R1 v A2.
9R1
9S1</p>
        <p>R1
S1
9R1
9S1
0
00
9S2</p>
        <p>A2
0
E2
S2 9S2
Let = S1 v S2 and 0 = 9S1 v E2 be in M. Then f9S2 v A2; E2 v A2g
M( ; ; M). If 00 = 9S1 v A2 2 M, then is redundant w.r.t. .</p>
      </sec>
      <sec id="sec-4-5">
        <title>If none of , 0 and 00 is in M, then M( ; ; M) is empty.</title>
        <p>With these notions in place, we can characterize CQ-solutions in the context of the
representability problem.</p>
        <p>Proposition 4. Let M be a definite mapping, T1 a DL-LiteRDFS TBox over 1, and T2 a
DL-LiteRDFS TBox over 2. Then for each ABox A1, the KB hT2; chaseM; 2 (A1)i is a
CQ-solution for hT1; A1i under M if and only if for each inclusion , s.t. T1 j= , and
for each inclusion 2 M left-compatible with rhs( ), there exists 2 M( ; ) such
that T2 j= .</p>
        <p>Step 2: Checking whether for each ABox A1 over 1, and for each CQ q over
have that cert (q; hT2; chaseM; 2 (A1)i) cert (q; hT1 [ M; A1i).
2, we
Recall the definition of the translation set. Now, let = N2 v M2 be a DL-LiteRDFS
inclusion over 2, and 2 M right-compatible with N2. Then the reverse-translation
set of and in M, denoted M ( ; ; M), is the set of DL-LiteRDFS inclusions over</p>
      </sec>
      <sec id="sec-4-6">
        <title>1 such that, if there exists an inclusion in M right-compatible with M2, then 2</title>
      </sec>
      <sec id="sec-4-7">
        <title>M ( ; ; M), where is uniquely defined by , , and as shown in Table 1.</title>
        <p>Proposition 5. Let M be a definite mapping, T1 a DL-LiteRDFS TBox over 1, and T2
a DL-LiteRDFS TBox over 2. Then for each ABox A1 over 1 and for each CQ q over
2, cert (q; hT2; chaseM; 2 (A1)i) cert (q; hT1 [ M; A1i) if and only if for each
inclusion , s.t. T2 j= , and for each inclusion 2 M right-compatible with lhs( ),
there exists 2 M ( ; ), s.t. T1 j= .</p>
        <p>With the techniques for Steps 1 and 2 at hand, we are ready to characterize
representations.
Corollary 3. Let M be a definite mapping, T1 a DL-LiteRDFS TBox over 1, and T2 a
DL-LiteRDFS TBox over 2. Then for each ABox A1 over 1, hT2; chaseM; 2 (A1)i is a
universal CQ-solution for hT1; A1i under M iff
– for each inclusion , s.t. T1 j= , and for each inclusion 2 M left-compatible
with rhs( ), there exists 2 M( ; ) s.t. T2 j= , and
– for each inclusion , s.t. T2 j= , and for each inclusion 2 M right-compatible
with lhs( ), there exists 2 M ( ; ), s.t. T1 j= .</p>
      </sec>
      <sec id="sec-4-8">
        <title>Now, we can check whether a given T1 is representable in a given M.</title>
        <p>Theorem 1. Let M be a definite mapping and T1 a DL-LiteRDFS TBox over
we can check whether T1 is representable in M in polynomial time.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>We can easily solve the weak representability problem for DL-LiteRDFS KBs even if the</title>
      <p>mappings are arbitrary tgds, i.e., assertions of the form q1 ! q2, mapping a CQ q1 over</p>
    </sec>
    <sec id="sec-6">
      <title>1 to a CQ q2 over 2 of the same arity as q1. We call such a mapping a tgd-mapping.</title>
      <sec id="sec-6-1">
        <title>Let T1 be a DL-LiteRDFS TBox and M a tgd-mapping. We can enrich M by compiling</title>
        <p>
          knowledge from T1 into it:
– Let q1 ! q2 be a tgd in M, with q1 a CQ over 1 and q2 a CQ over 2. Let the
UCQ Q1 = fq11; : : : ; q1kg be the perfect reformulation of q1 w.r.t. T1 (see, e.g., [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]).
        </p>
        <p>Then we extend M with a tgd q1i ! q2 for each q1i 2 Q1.</p>
        <p>– We perform this for each tgd in M. The resulting mapping is denoted with M .
Proposition 6. Let M be a tgd-mapping, T1 a DL-LiteRDFS TBox over
the mapping constructed as described above. Then for each ABox A1 over
h;; chaseM ; 2 (A1)i is a universal CQ-solution for hT1; A1i under M.
1, and M
1, the KB</p>
        <p>From this we immediately get:
Theorem 2. Let M be a definite mapping and T1 a DL-LiteRDFS TBox over 1. Then T1
is weakly representable in M.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Thus, when the source KB is expressed in DL-LiteRDFS it is always possible to sepa</title>
      <p>rate data by enriching mappings, even if mappings are tgds. When M is a set of tgds,
the size of M might be exponential in the size of M. If M is a DL-LiteRDFS mapping,
the size of M is always polynomial in the size of M.
5</p>
      <p>
        Conclusions
We have specialized the framework for KB exchange proposed in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] to the case of
DLs, i.e., the source and target KBs are given as DL KBs and the mappings have the
form of DL TBox assertions. Moreover, we have defined a new reasoning problem:
representability of a TBox in a mapping, whose goal is to compute from a source TBox
and a mapping, a target TBox that leads to a universal (CQ-)solution when it is
combined with a suitable ABox computed from the source ABox, independently of the
actual source ABox. A variation of this problem, called weak representability, allows
for modification of the mapping, so that the source TBox becomes representable.
      </p>
      <p>Then, we have developed first results and techniques for KB exchange and for the
representability problem in the case of mappings and KBs expressed in DL-LiteRDFS
(such mappings are called definite). DL-LiteRDFS is a fragment of DL-LiteR that does
not allow for existential quantification (i.e., concepts of the form 9R) in the right-hand
side of concept inclusions, nor for disjointness assertions. It implies, first, that the chase
is always finite in DL-LiteRDFS, and, second, that KBs are always consistent. We have
shown the following results for definite mappings and DL-LiteRDFS KBs: (i) the
problems of computing (universal) (CQ-)solutions can be solved in polynomial time; (ii) the
problem of representability of a TBox in a mapping is decidable in polynomial time;
(iii) every DL-LiteRDFS TBox is weakly representable in a definite mapping.</p>
      <p>The main direction for future work is to extend the results to the case of full
DL</p>
      <sec id="sec-7-1">
        <title>LiteR. This brings up the problem of labelled nulls in the chase, which becomes infinite</title>
        <p>in general. Moreover, due to the possible presence of disjointness assertions in TBoxes,
consistency of the source and target KBs has to be checked.</p>
        <p>Acknowledgments
The authors are grateful to Vladislav Ryzhikov and Evgeny Sherkhonov for helpful
discussions, and to the anonymous referees for many helpful comments. The authors
were supported by Marie Curie action “International Research Staff Exchange Scheme
(IRSES)”, FP7-PEOPLE-2009-IRSES, under grant number 24761. M. Arenas was also
supported by Fondecyt grant 1090565. E. Botoeva and D. Calvanese were also
supported by the EU under the ICT Collaborative Project ACSI (Artifact-Centric Service
Interoperation), grant agreement number FP7-257593, and under the large-scale
integrating project (IP) OntoRule (ONTOlogies meet Business RULEs), grant agreement
number FP7-231875.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>Pe´rez, and</article-title>
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Reutter</surname>
          </string-name>
          .
          <article-title>Data exchange beyond complete data</article-title>
          .
          <source>In Proc. of the 30th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS</source>
          <year>2011</year>
          ), pages
          <fpage>83</fpage>
          -
          <lpage>94</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B.</given-names>
            <surname>Barcelo</surname>
          </string-name>
          <article-title>`. Logical foundations of relational data exchange</article-title>
          .
          <source>SIGMOD Record</source>
          ,
          <volume>38</volume>
          (
          <issue>1</issue>
          ):
          <fpage>49</fpage>
          -
          <lpage>58</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          and
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Guha</surname>
          </string-name>
          .
          <source>RDF vocabulary description language 1</source>
          .0:
          <string-name>
            <given-names>RDF</given-names>
            <surname>Schema. W3C Recommendation</surname>
          </string-name>
          , World Wide Web Consortium, Feb.
          <year>2004</year>
          . Available at http://www. w3.org/TR/rdf-schema/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          , G. De Giacomo,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lembo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lenzerini</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Rosati</surname>
          </string-name>
          .
          <article-title>Tractable reasoning and efficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>J. of Automated Reasoning</source>
          ,
          <volume>39</volume>
          (
          <issue>3</issue>
          ):
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. G.</given-names>
            <surname>Kolaitis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Miller</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          .
          <article-title>Data exchange: Semantics and query answering</article-title>
          .
          <source>Theoretical Computer Science</source>
          ,
          <volume>336</volume>
          (
          <issue>1</issue>
          ):
          <fpage>89</fpage>
          -
          <lpage>124</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>L.</given-names>
            <surname>Libkin</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Sirangelo</surname>
          </string-name>
          .
          <article-title>Data exchange and schema mappings in open and closed worlds</article-title>
          .
          <source>J. of Computer and System Sciences</source>
          ,
          <volume>77</volume>
          (
          <issue>3</issue>
          ):
          <fpage>542</fpage>
          -
          <lpage>571</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>