Knowledge Base Exchange

Knowledge Base Exchange MarceloArenas marenas@ing.puc.cl Dept. of Computer Science PUC Chile ElenaBotoeva KRDB Research Centre Free Univ. of Bozen-Bolzano

Italy

DiegoCalvanese KRDB Research Centre Free Univ. of Bozen-Bolzano

Italy

Knowledge Base Exchange 56C02962DB25CCB5D047CED9B642E71E GROBID - A machine learning software for extracting information from scholarly documents

In this paper, we study the problem of exchanging knowledge between two knowledge bases (KBs) connected through mappings, with a special interest in exchanging implicit knowledge, not only data like in the traditional database exchange setting. As representation formalism we use Description Logics (DL) that exhibit a reasonable tradeoff between expressive power and complexity of reasoning. Thus, we assume that the source and target KBs are given as a DL TBox+ABox, while the mappings have the form of DL TBox assertions. We study the problem of translating the knowledge in the source KB according to these mappings. We define a general framework of KB exchange, and specify the problems of computing and representing different kinds of solutions, i.e., target KBs with specified properties, given a source KB and a mapping. We then develop first results and techniques and study the complexity of KB exchange for the case of DL-LiteRDFS, a DL that corresponds to the FOL fragment of RDFS.

Introduction

In data exchange, data structured under one schema (called source schema) must be restructured and translated into an instance of a different schema (called target schema), and the way in which this restructuring should occur is specified by means of a mapping from the source schema to the target schema [5]. Such a problem has been studied extensively in recent years, under various choices for the languages used to specify the source and target schema, and the mappings [2]. While incomplete information in this setting is introduced by the mapping layer (see also [6]), one fundamental assumption in the works on data exchange is that the source is a (completely specified) database.

In this paper, we go beyond this setting, and consider data exchange in the case where implicit knowledge is present in the source, by which new data may be inferred. We follow the line of the work in [1], where a general framework for data exchange is proposed, in which the source data may be incompletely specified, and thus (possibly infinitely) many source instances are implicitly represented. The framework in [1] in based on the general notion of representation system, as a mechanism to represent multiple instances of a data schema, and considers the problem of incomplete data exchanges under mappings constituted by a set of tuple generating dependencies (tgds).

We refine that framework to the case where as a representation system we use Description Logics (DL) knowledge bases (KBs) constituted by a TBox and an ABox, and where mappings are sets of DL inclusions. While in the traditional data exchange setting, given a source instance and a mapping specification, (universal) solutions are target instances derived from the source instance and the mapping, in our case solutions are target DL KBs, derived from the source KB and the mapping. Besides a notion of (universal) solution based on the correspondence between models of source and target KBs, we introduce also the weaker notion of (universal) CQ-solution, based on the correspondence between answers to conjunctive queries over source and target KBs.

In our setting where we exchange DL KBs, in order to minimize the exchange (and hence transfer and materialization) of explicit (i.e., ABox) information, we are interested in computing universal (CQ-)solutions that contain as much implicit knowledge as possible. This leads us to define a new problem, called representability, whose goal is to compute from a source TBox and a mapping, a target TBox that leads to a universal (CQ-)solution when it is combined with a suitable ABox computed from the source ABox, independently of the actual source ABox.

We then develop first results and techniques for KB exchange and for the representability problem in the case of KBs expressed in DL-Lite RDFS , a DL that corresponds to the FOL fragment of RDFS. DL-Lite RDFS is a fragment of DL-Lite R [4] that does not allow for existential quantification (i.e., concepts of the form ∃R) in the right-hand side of concept inclusions, nor for disjointness assertions.

The paper is organized as follows. In Section 2 we give preliminary notions on DLs and conjunctive queries (CQs). In Section 3 we define our framework of KB exchange. In Section 4 we present the techniques for deciding the defined reasoning tasks. Finally, in Section 5 we draw some conclusions and outline issues for future work.

Preliminaries

We introduce here the necessary notions about the description logic (DL) that we use in this article, and about conjunctive queries, which we adopt as our query formalism.

DL-Lite R Knowledge Bases

The DLs of the DL-Lite family [4] of light-weight description logics are characterized by the fact that reasoning can be done in polynomial time, and that data complexity of reasoning and conjunctive query answering is in AC 0 . We adopt here DL-Lite R , and present now its syntax and semantics.

Let N C , N R , N a be sets of concept, role, and individual names, respectively, and assume that A ∈ N C and P ∈ N R . In DL-Lite R , B and C are used to denote basic and complex concept descriptions, respectively, and R and Q are used to denote basic and complex roles, respectively. These concept and roles constructs are defined by the following grammar: In the following, we will also make use of a restricted form of DL-Lite R TBoxes. A DL-Lite R TBox is said to be definite if there are only atomic concepts and atomic roles on the right-hand side of its inclusions. In other words, a definite TBox may not mention in its right-hand side a concept of the form ∃R, and may not contain concept or role disjointness assertions. We call DL-Lite RDFS the fragment of DL-Lite R obtained by considering only definite DL-Lite R TBoxes. Intuitively, DL-Lite RDFS corresponds to the fragment of RDFS [3] that is embeddable in FOL (and hence in DLs).

The semantics of DL-Lite R is defined in the standard way. We just remark that we use MOD(K) to denote the set of all models of KB K. From now on, we assume that interpretations satisfy the standard name assumption, that is, we assume given a fixed infinite set U of individual names, and we assume that for every interpretation I = ∆ I , • I , it holds that ∆ I ⊆ U and a I = a for every a ∈ ∆ I . Notice that this implies the Unique Name Assumption (UNA), i.e., different individuals are interpreted as different domain elements.

A signature Σ is a set of concept and role names. An interpretation I = ∆ I , • I is said to be an interpretation of Σ if it is defined exactly on the concept and role names in Σ. Given a KB K, the signature Σ(K) of K is the alphabet of concept and role names occurring in K, and K is said to be defined over (or simply, over) a signature Σ if Σ(K) ⊆ Σ (and likewise for a TBox T , an ABox A, a concept inclusions B C, a role inclusions R Q, and membership assertions A(a) and R(a, b)).

Conjunctive Queries and Certain Answers

A conjunctive query (CQ) over a signature Σ is a first-order formula of the form q(x) = ∃y.conj (x, y), where x, y are tuples of variables and conj (x, y) is a conjunction of atoms of the form: (1) A(t), with A a concept name in Σ and t either a constant from U or a variable from x or y, or (2) P (t 1 , t 2 ), with P a role name in Σ and t i (i = 1, 2) either a constant from U or a variable from x or y. In a conjunctive query q(x) = ∃y.conj (x, y), x is the tuple of free variables of q(x). A union of conjunctive queries (UCQ) is a formula of the form: q(x) = n i=1 ∃y i .conj i (x, y i ), where each conj i (x, y i ) is as before. A query q (either a CQ or a UCQ) is said to be a query over a KB K if q is a query over a signature Σ and Σ ⊆ Σ(K).

Let q be a CQ ∃y 1 • • • ∃y .conj (x 1 , . . . , x k , y 1 , . . . , y ) over a signature Σ and I = ∆ I , • I an interpretation of Σ. Then the answer of q over I, denoted by q I , is defined as the set of tuples (a 1 , . . . , a k ) of elements from ∆ I for which there exist a tuple (b 1 , . . . , b ) of elements from ∆ I such that I satisfies every conjunct in conj (a 1 , . . . , a k , b 1 , . . . , b ). Moreover, given a UCQ q = n i=1 q i , the answer of q over an interpretation I, denoted by q I , is defined as n i=1 q I i . Finally, given a query q (either a CQ or a UCQ) over a KB K, the answer to q over K, denoted by cert(q, K), is defined as cert(q, K) = I∈MOD(K) q I . Each tuple in cert(q, K) is called a certain answer for q over K.

Certain answers in DL-Lite R can be characterized through the notion of chase. We call a chase a (possibly infinite) set of assertions of the form A(t), P (t, t ), where t, t are either individuals, or labeled nulls interpreted as not necessarily distinct domain elements. For DL-Lite R KBs, we employ the notion of oblivious chase as defined in [4]. For such a KB T , A , the chase of A w.r.t. T , denoted chase T (A), is a chase obtained from A by adding facts implied by inclusions in T , and introducing labeled nulls whenever required by an inclusion with ∃R in the right-hand side (see [4] for details).

Exchanging Knowledge Bases

In this section, we introduce the knowledge exchange framework used in the paper. The starting point to define this framework is the notion of mapping, which has been shown to be of fundamental importance in the context of data exchange [5]. Formally, a DL-Lite R -mapping (or just mapping) is a tuple M = (Σ 1 , Σ 2 , T 12 ), where Σ 1 , Σ 2 are disjoint signatures and T 12 is a DL-Lite R TBox whose inclusions are of the form:

(1) C 1 C 2 , where C 1 , C 2 are complex concepts over Σ 1 and Σ 2 , respectively, and (2) Q 1 Q 2 , where Q 1 and Q 2 are complex roles over Σ 1 and Σ 2 , respectively.

Let M = (Σ 1 , Σ 2 , T 12 ) be a mapping. Intuitively, mapping M specifies how a knowledge base over the vocabulary Σ 1 should be translated into a knowledge base over the vocabulary Σ 2 . This intuition is formalized in terms of the notion of solution, which is defined as follows. Given an interpretation I 1 of Σ 1 and an interpretation

I 2 of Σ 2 , pair (I 1 , I 2 ) satisfies TBox T 12 , denoted by (I 1 , I 2 ) |= T 12 , if for each concept inclusion C 1 C 2 ∈ T 12 , it holds that C I1 1 ⊆ C I2 2 ,

and for each role inclusionQ 1 Q 2 ∈ T 12 , it holds that Q I1 1 ⊆ Q I2 2 .

Moreover, given an interpretation I of Σ 1 , SAT M (I) is defined as the set of interpretations J of Σ 2 such that (I, J ) |= T 12 , and given a set X of interpretations of Σ 1 , SAT M (X ) is defined as:

SAT M (X ) = I∈X SAT M (I).

Then the notion of solution under a mapping is defined as follows, by considering this notion of satisfaction and the knowledge exchange framework proposed in [1]. Definition 1. Let M = (Σ 1 , Σ 2 , T 12 ) be a mapping, K 1 a KB over Σ 1 , and K 2 a KB over Σ 2 . Then K 2 is said to be a solution for K 1 under M if: Definition 2. Let M = (Σ 1 , Σ 2 , T 12 ) be a mapping, K 1 a KB over Σ 1 , and K 2 a KB over Σ 2 . Then K 2 is said to be a universal solution for K 1 under M if:

MOD(K 2 ) ⊆ SAT M (MOD(K 1 )). That is, K 2 is a solution for K 1 under M if for every model I 2 of K 2 , there exists a model I 1 of K 1 such that (I 1 , I 2 ) |= T 12 . Let M = (Σ 1 , Σ 2 , T 12 ). A KB K 1 over Σ 1 canMOD(K 2 ) = SAT M (MOD(K 1 )).

In the preceding definition, KB K 2 is considered to be a good solution for KB K 1 under mapping M as the models of K 2 exactly correspond to the valid translations of the models of K 1 according to M. We illustrate Definitions 1 and 2 in an example.

Example 1. Let M = (Σ 1 , Σ 2 , T 12 ), where Σ 1 = {A 1 , B 1 }, Σ 2 = {A 2 , B 2 }, and T 12 = {A 1 A 2 , B 1 B 2 }. Furthermore, assume that K 1 = T 1 , A 1 , where T 1 = {B 1 A 1 } and A 1 = {B 1 (a)}.

Then the following knowledge bases over Σ 2 are solutions for K 1 under M:

K 2 = T 2 , A 2 , where T 2 = ∅, A 2 = {B 2 (a), A 2 (a)} K 2 = T 2 , A 2 , where T 2 = {B 2 A 2 }, A 2 = {B 2 (a)} Moreover, K 2 is a universal solution for K 1 under M, while K 2 is not. In fact, if I 2 is an interpretation of Σ 2 such that: ∆ I2 = {a, b}, A I2 2 = {a}, and B I2 2 = {a, b}, then we have that I 2 ∈ MOD(K 2 ) since I 2 does not satisfy inclusion B 2 A 2 , but I 2 ∈ SAT M (MOD(K 1 )) since (I 1 , I 2 ) |= T 12 for I 1 ∈ MOD(K 1 ) defined as ∆ I1 = {a}, A I1 1 = {a} and B I1 1 = {a}. In the previous example, K 2 is not a universal solution for K 1 under M since inclusion B 2

A 2 cannot be deduced from the information in K 1 and M. Or, more formally, K 2 is not a universal solution as B 2 A 2 is not implied by T 1 ∪ T 12 , A 1 . However, K 2 can also be considered as a solution of K 1 that is desirable to materialize, as the implicit knowledge in K 2 (i.e., TBox T 2 ) represents the implicit knowledge in K 1 (i.e., TBox T 1 ), given the way that concepts A 1 and B 1 have to be translated according to mapping M. In fact, solution K 2 is as good as solution K 2 in terms of the information that can be extracted from them by using some specific queries, but with the advantage that K 2 represents knowledge in a more compact way. In what follows, we introduce a new class of good solutions that captures this intuition with respect to the widely used fragment of CQs. Definition 3. Let M = (Σ 1 , Σ 2 , T 12 ) be a mapping, K 1 = T 1 , A 1 a KB over Σ 1 , and K 2 a KB over Σ 2 . Then K 2 is said to be a CQ-solution for K 1 under M if for every CQ q over Σ 2 , we have that cert(q, T 1 ∪ T 12 , A 1 ) ⊆ cert(q, K 2 ).

Moreover, K 2 is said to be a universal CQ-solution for K 1 under M if for every CQ q over Σ 2 , we have that cert(q, T 1 ∪ T 12 , A 1 ) = cert(q, K 2 ).

Notably, we have in Example 1 that both KB K 2 and KB K 2 are universal CQ-solutions for KB K 1 under mapping M.

A natural question at this point is what is the relationship between the notions of solution presented in this section. The following proposition, which is straightforward to prove, shows that the notion of (universal) CQ-solution is weaker than the notion of (universal) solution.

Proposition 1. Let M = (Σ 1 , Σ 2 , T 12 ) be a mapping, K 1 a KB over Σ 1 , and K 2 a KB over Σ 2 . If K 2 is a (universal) solution for K 1 under M, then K 2 is a (universal) CQ-solution for K 1 under M.

In this paper, we study several fundamental problems related to the notions of solution presented here. These problems are formally introduced below.

Knowledge Base Exchange: Reasoning Tasks

In the data exchange scenario [5], as well as in the knowledge exchange scenario [1], the problem of materializing solutions has been identified as the fundamental problem to solve. Given a class U of mappings (for example, the class of DL-Lite R -mappings) and a DL L (for example, DL-Lite R ), the problem of computing solutions is defined as follows for U and L:

PROBLEM : COMPSOL(U, L) INPUT : A mapping M ∈ U and an L-knowledge base K 1 over Σ 1 TO DO : Compute an L-knowledge base K 2 over Σ 2 such that K 2 is a solu- tion for K 1 under M

Given a class U of mappings and a DL L, the computation problems COMPUNIVSOL(U, L), COMPCQSOL(U, L), and COMPUNIVCQSOL(U, L) are defined exactly as above, but considering universal solutions, CQ-solutions, and universal CQ-solutions instead of solutions, respectively.

In the rest of this paper, we study these problems, and some other fundamental problems associated to them, for a restriction of the class of mappings introduced in this section. More precisely, a mapping M = Σ 1 , Σ 2 , T 12 is said to be definite if T 12 is a DL-Lite RDFS TBox (recall the definition of definite TBoxes in Section 2.1). We use U def to denote the class of definite mappings. Then, as our first result, we obtained that the chase can be used to compute universal solutions for definite mappings and DL-Lite RDFS TBoxes. In what follows, let chase T ,Σ (A) denote the projection of chase T (A) on the signature Σ. Proposition 2. Let M = (Σ 1 , Σ 2 , T 12 ) be a definite mapping and K 1 = T 1 , A 1 a DL-Lite RDFS KB over Σ 1 . Then ∅, chase T12,Σ2 (chase T1 (A 1 )) is a universal solution for K 1 under M. Thus, given that for definite mappings and DL-Lite RDFS TBoxes, the sets chase T1 (A 1 ) and chase T12,Σ2 (chase T1 (A 1 )) are always finite and can be computed in polynomial time [4], we obtain as a corollary of Propositions 1 and 2 that the problems of computing solutions defined above can be solved in polynomial time for definite mappings and DL-Lite RDFS TBoxes.

Corollary 1. COMPSOL(U def , DL-Lite RDFS ), COMPUNIVSOL(U def , DL-Lite RDFS ), COMPCQSOL(U def , DL-Lite RDFS ), and COMPUNIVCQSOL(U def , DL-Lite RDFS ) can all be solved in polynomial time.

Unfortunately, the solutions obtained by using Proposition 2 are of little interest to us because the generated target ABoxes can be very large. Hence, we turn our attention to the case of non-empty target TBoxes and, more specifically, to the problem of computing universal CQ-solutions that include as much implicit knowledge as possible. This gives rise to the following separation properties. Definition 4. Let M = (Σ 1 , Σ 2 , T 12 ) be a mapping and T 1 a TBox over Σ 1 .

-T 1 is representable in M if there exists a TBox T 2 over Σ 2 such that for every ABox

A 1 over Σ 1 , it holds that T 2 , chase T12,Σ2 (A 1 ) is a universal CQ-solution for T 1 , A 1 under M. T 2 is called a representation of T 1 in M.

-T 1 is weakly representable in M if there exists a mapping M = (Σ 1 , Σ 2 , T 12 ) such that T 12 ⊆ T 12 , T 1 ∪ T 12 |= T 12 and T 1 is representable in M .

The separation problems are interesting to us because a positive answer would mean that we can construct the TBox of a solution by considering only the input TBox and the mapping, independently of the input ABox. On the other hand, the ABox of the solution can be found simply by translating the input ABox according to the rules in the mapping (and the input TBox). We illustrate the notions just defined in the following example.

Example 2. Let Σ 1 = {A 1 , B 1 }, Σ 2 = {A 2 , B 2 },andT 1 = {B 1 A 1 }. If M = (Σ 1 , Σ 2 , T 12 ) with T 12 = {A 1 A 2 , B 1 B 2 }, then we have that T 2 = {B 2 A 2 } is a representation of T 1 in M.

On the other hand, if

M = (Σ 1 , Σ 2 , T 12 ) with T 12 = {A 1 A 2 }, then we have that T 1 is not representable in M : let A 1 = {B 1 (a)}, then chase T 12 ,Σ2 (A 1 ) = ∅ and for no TBox T 2 , T 2 , chase T 12 ,Σ2 (A 1 ) is a universal CQ-solution to T 1 , A 1 under M . However, if we consider T 12 = T 12 ∪ {B 1 A 2 }, we conclude that T 1 is weakly representable in M since T 12 ⊆ T 12 , T 1 ∪ T 12 |= T 12 and T 1 is representable in M = (Σ 1 , Σ 2 , T 12 ) (in fact, ∅ is a representation of T 1 in M ). Note, that T 12 ⊂ T 12 . Now, if M = (Σ 1 , Σ 2 , T 12 ) with T 12 = {A 1 A 2 , B 1 B 2 , C 1 B 2 }, then again we have that T 1 is not representable in M : let A 1 = {B 1 (a), C 1 (c)}, then chase T 12 ,Σ2 (A 1 ) = {B 2 (a), B 2 (c)}. The query q() ← A 2 (a) evaluates to true in T 1 ∪ T 12 , A 1 ,

hence in order for a TBox T 2 to be a representation of T 1 in M , it must contain the inclusion B 2 A 2 . On the other hand, it implies that the query q () ← A 2 (c) evaluates to true in T 2 , chase T 12 ,Σ2 (A 1 ) , whereas it evaluates to false in T 1 ∪T 12 , A 1 . However, again if we consider T 12 = T 12 ∪{B 1 A 2 }, we conclude that T 1 is weakly representable in M .

Techniques for Deciding Representability

We address now the problem of deciding (weak) representability of a TBox in a mapping, for the case where TBoxes are expressed in DL-Lite RDFS and mappings are definite. We start by showing some properties that characterize solutions in terms of chases.

In the following for two chases C 1 and C 2 , a homomorphism from C 1 to C 2 is a mapping h from the individuals and labeled nulls in

C 1 to those in C 2 such that (1) h is the identity on the individuals, (2) if A(t) ∈ C 1 then A(h(t)) ∈ C 2 , and (3) if P (t, t ) ∈ C 1 then P (h(t), h(t )) ∈ C 2 . We write C 1 → C 2 if there is a homomorphism from C 1 to C 2 , and C 1 C 2 if C 1 → C 2 and C 2 → C 1 .

From now on, we assume Σ 1 and Σ 2 as given, and for a mapping M = (Σ 1 , Σ 2 , T 12 ), we use simply M to denote also T 12 . Then we have the following characterizations of solutions in terms of chases, which are similar to the characterizations in [1].

Proposition 3. Let M be a definite mapping, K 1 = T 1 , A 1 a DL-Lite RDFS KB over Σ 1 , and K 2 = T 2 , A 2 a DL-Lite RDFS KB over Σ 2 . If chase M,Σ2 (chase T1 (A 1 )) → chase T2 (A 2 ), then K 2 is a CQ-solution for K 1 under M. Corollary 2. Let M be a definite mapping, K 1 = T 1 , A 1 a DL-Lite RDFS KB over Σ 1 , and K 2 = T 2 , A 2 a DL-Lite RDFS KB over Σ 2 . Then K 2 is a universal CQ-solution for K 1 under M if chase M,Σ2 (chase T1 (A 1 )) chase T2 (A 2 ).

Checking Representability of a TBox in a Mapping

We address now the representability problem, which is to decide, given a TBox T 1 over Σ 1 and a mapping M, whether T 1 is representable in M (cf. Definition 4). We start by considering the decision problem associated with representability:

Given a mapping M, a TBox T 1 over Σ 1 , and a TBox T 2 over Σ 2 , check whether T 2 is a representation of T 1 in M, i.e., for each ABox

A 1 over Σ 1 , T 2 , chase M,Σ2 (A 1 ) is a universal CQ-solution for T 1 , A 1 under M.

For definite mappings and DL-Lite RDFS TBoxes, the decision problem associated with representability can be solved in two steps:

1. Check whether for each ABox A 1 over Σ 1 , T 2 , chase M,Σ2 (A 1 ) is a CQ-solution for T 1 , A 1 under M. 2. Check whether for each ABox A 1 over Σ 1 and for each CQ q over Σ 2 , we have that cert(q, T 2 , chase M,Σ2 (A 1 ) ) ⊆ cert(q, T 1 ∪ M, A 1 ). If both checks succeed, then T 2 is a representation of T 1 in M, otherwise it is not. We develop now techniques to perform these two tests.

Step 1: Checking whether for each ABox

A 1 over Σ 1 , T 2 , chase M,Σ2 (A 1 ) is a CQ- solution for T 1 , A 1 under M.

We introduce now some notions that help us to characterize when a mapping is able to "translate" the inclusions in T 1 to the target TBox.

We use pn(X) to denote A if X is A, and P if X is ∃P , ∃P − , P , or P − . For X, Y DL-Lite RDFS expressions, we say that X is compatible with Y if (i) pn(X) = pn(Y ), and (ii) if X is ∃R, then Y is ∃R, R, or R − . For a DL-Lite RDFS inclusion α = X 1 X 2 , we use lhs(α) to denote X 1 , and rhs(α) to denote X 2 . We say that α is left-compatible (resp., right-compatible) with X, if X is compatible with lhs(α) (resp., rhs(α)).

Let M be a definite mapping, α = N 1 M 1 a DL-Lite RDFS inclusion over Σ 1 , and µ ∈ M left-compatible with M 1 . Then the translation set of α and µ in M, denoted M(α, µ, M), is the set of DL-Lite RDFS inclusions over Σ 2 such that, if there Table 1. Definition of M(α, µ, M). Ai, Ei are atomic concepts, Ri, Si are basic roles.

α µ ν β A1 E1 E1 E2 A1 A2 A2 E2 ∃R1 E1 E1 E2 ∃R1 A2 A2 E2 R1 R2 ∃R2 E2 R1 S1 S1 S2 R1 R2 R2 S2 ∃S1 E2 ∃R1 A2 A2 E2 R1 R2 ∃R2 E2 ∃S1 − E2 ∃R1 − A2 A2 E2 R1 R2 ∃R2 − E2

exists an inclusion ν in M left-compatible with N 1 , then β ∈ M(α, µ, M), where β is uniquely defined by α, µ, ν as shown in Table 1. When the mapping M is clear from the context, we write M(α, µ). We say that α is redundant w.r.

t. µ = M 1 M 2 (in M) if M 2 M 2 ∈ M(α, µ).

We explain these definitions in an example.

Example 3. Let α = S 1 R 1 and µ = ∃R 1 A 2 . ∃S − 1 ∃S1 S1 ∃R − 1 ∃R1 R1 α ∃S − 2 ∃S2 S2 E2 A2 ν ν µ ν β β Let ν = S 1 S 2 and ν = ∃S 1 E 2 be in M. Then {∃S 2 A 2 , E 2 A 2 } ⊆ M(α, µ, M). If ν = ∃S 1 A 2 ∈ M, then α is redundant w.r.t. µ.

If none of ν, ν and ν is in M, then M(α, µ, M) is empty.

With these notions in place, we can characterize CQ-solutions in the context of the representability problem. Proposition 4. Let M be a definite mapping, T 1 a DL-Lite RDFS TBox over Σ 1 , and T 2 a DL-Lite RDFS TBox over Σ 2 . Then for each ABox A 1 , the KB T 2 , chase M,Σ2 (A 1 ) is a CQ-solution for T 1 , A 1 under M if and only if for each inclusion α, s.t. T 1 |= α, and for each inclusion µ ∈ M left-compatible with rhs(α), there exists β ∈ M(α, µ) such that T 2 |= β.

Step 2: Checking whether for each ABox A 1 over Σ 1 , and for each CQ q over Σ 2 , we have that cert(q, T 2 , chase M,Σ2 (A 1 ) ) ⊆ cert(q, T 1 ∪ M, A 1 ).

Recall the definition of the translation set. Now, let β = N 2 M 2 be a DL-Lite RDFS inclusion over Σ 2 , and ν ∈ M right-compatible with N 2 . Then the reverse-translation set of β and ν in M, denoted M − (β, ν, M), is the set of DL-Lite RDFS inclusions over Σ 1 such that, if there exists an inclusion µ in M right-compatible with M 2 , then α ∈ M − (β, ν, M), where α is uniquely defined by β, ν, and µ as shown in Table 1.

Proposition 5. Let M be a definite mapping, T 1 a DL-Lite RDFS TBox over Σ 1 , and T 2 a DL-Lite RDFS TBox over Σ 2 . Then for each ABox A 1 over Σ 1 and for each CQ q over Σ 2 , cert(q, T 2 , chase M,Σ2 (A 1 ) ) ⊆ cert(q, T 1 ∪ M, A

Checking Weak Representability

We can easily solve the weak representability problem for DL-Lite RDFS KBs even if the mappings are arbitrary tgds, i.e., assertions of the form q 1 → q 2 , mapping a CQ q 1 over Σ 1 to a CQ q 2 over Σ 2 of the same arity as q 1 . We call such a mapping a tgd-mapping. Let T 1 be a DL-Lite RDFS TBox and M a tgd-mapping. We can enrich M by compiling knowledge from T 1 into it:

-Let q 1 → q 2 be a tgd in M, with q 1 a CQ over Σ 1 and q 2 a CQ over Σ 2 . Let the UCQ Q 1 = {q 1 1 , . . . , q k 1 } be the perfect reformulation of q 1 w.r.t. T 1 (see, e.g., [4]). Then we extend M with a tgd q i 1 → q 2 for each q i 1 ∈ Q 1 . -We perform this for each tgd in M. The resulting mapping is denoted with M * . Proposition 6. Let M be a tgd-mapping, T 1 a DL-Lite RDFS TBox over Σ 1 , and M * the mapping constructed as described above. Then for each ABox A 1 over Σ 1 , the KB ∅, chase M * ,Σ2 (A 1 ) is a universal CQ-solution for T 1 , A 1 under M.

From this we immediately get: Theorem 2. Let M be a definite mapping and T 1 a DL-Lite RDFS TBox over Σ 1 . Then T 1 is weakly representable in M.

Thus, when the source KB is expressed in DL-Lite RDFS it is always possible to separate data by enriching mappings, even if mappings are tgds. When M is a set of tgds, the size of M * might be exponential in the size of M. If M is a DL-Lite RDFS mapping, the size of M * is always polynomial in the size of M.

Conclusions

We have specialized the framework for KB exchange proposed in [1] to the case of DLs, i.e., the source and target KBs are given as DL KBs and the mappings have the form of DL TBox assertions. Moreover, we have defined a new reasoning problem: representability of a TBox in a mapping, whose goal is to compute from a source TBox and a mapping, a target TBox that leads to a universal (CQ-)solution when it is combined with a suitable ABox computed from the source ABox, independently of the actual source ABox. A variation of this problem, called weak representability, allows for modification of the mapping, so that the source TBox becomes representable.

Then, we have developed first results and techniques for KB exchange and for the representability problem in the case of mappings and KBs expressed in DL-Lite RDFS (such mappings are called definite). DL-Lite RDFS is a fragment of DL-Lite R that does not allow for existential quantification (i.e., concepts of the form ∃R) in the right-hand side of concept inclusions, nor for disjointness assertions. It implies, first, that the chase is always finite in DL-Lite RDFS , and, second, that KBs are always consistent. We have shown the following results for definite mappings and DL-Lite RDFS KBs: (i) the problems of computing (universal) (CQ-)solutions can be solved in polynomial time; (ii) the problem of representability of a TBox in a mapping is decidable in polynomial time;

(iii) every DL-Lite RDFS TBox is weakly representable in a definite mapping.

The main direction for future work is to extend the results to the case of full DL-Lite R . This brings up the problem of labelled nulls in the chase, which becomes infinite in general. Moreover, due to the possible presence of disjointness assertions in TBoxes, consistency of the source and target KBs has to be checked.

R::= P | P − Q ::= R | ¬R B ::= A | ∃R C ::= B | ¬B In the following, for a basic role R, we use R − to denote P − when R = P , and P when R = P − . A DL-Lite R TBox is a finite set of concept inclusions B C and role inclusions R Q. A DL-Lite R ABox is a finite set of membership assertions of the form A(a) and P (a, b), where a, b ∈ N a . A DL-Lite R KB K is a pair T , A , where T is a DL-Lite R TBox and A is a DL-Lite R ABox. As usual, TBoxes represent implicit knowledge, while ABoxes represent the data.

have an infinite number of solutions under M. Thus, it is natural to ask what is a good solution for this knowledge base. Next we introduce the notion of universal solution, which is a simple extension of the concept of solution introduced in Definition 1, and is based on the notion of universal solution introduced in[1].

1 ) if and only if for each inclusion β, s.t. T 2 |= β, and for each inclusion ν ∈ M right-compatible with lhs(β), there exists α ∈ M − (β, ν), s.t. T 1 |= α. Corollary 3. Let M be a definite mapping, T 1 a DL-Lite RDFS TBox over Σ 1 , and T 2 a DL-Lite RDFS TBox over Σ 2 . Then for each ABox A 1 over Σ 1 , T 2 , chase M,Σ2 (A 1 ) is a universal CQ-solution for T 1 , A 1 under M iff for each inclusion α, s.t. T 1 |= α, and for each inclusion µ ∈ M left-compatible with rhs(α), there exists β ∈ M(α, µ) s.t. T 2 |= β, and for each inclusion β, s.t. T 2 |= β, and for each inclusion ν ∈ M right-compatible with lhs(β), there exists α ∈ M − (β, ν), s.t. T 1 |= α. Now, we can check whether a given T 1 is representable in a given M. Theorem 1. Let M be a definite mapping and T 1 a DL-Lite RDFS TBox over Σ 1 . Then we can check whether T 1 is representable in M in polynomial time.With the techniques for Steps 1 and 2 at hand, we are ready to characterize repre-sentations.

Acknowledgments

The authors are grateful to Vladislav Ryzhikov and Evgeny Sherkhonov for helpful discussions, and to the anonymous referees for many helpful comments. The authors were supported by Marie Curie action "International Research Staff Exchange Scheme (IRSES)", FP7-PEOPLE-2009-IRSES, under grant number 24761. M. Arenas was also supported by Fondecyt grant 1090565. E. Botoeva and D. Calvanese were also supported by the EU under the ICT Collaborative Project ACSI (Artifact-Centric Service Interoperation), grant agreement number FP7-257593, and under the large-scale integrating project (IP) OntoRule (ONTOlogies meet Business RULEs), grant agreement number FP7-231875.

Data exchange beyond complete data MArenas JPérez JLReutter Proc. of the 30th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2011) of the 30th ACM SIGACT SIGMOD SIGART Symp. on Principles of Database Systems (PODS 2011) 2011 Logical foundations of relational data exchange BBarcelò SIGMOD Record 38 1 2009 RDF vocabulary description language 1.0: RDF Schema DBrickley RVGuha W3C Recommendation, World Wide Web Consortium Feb. 2004 Tractable reasoning and efficient query answering in description logics: The DL-Lite family DCalvanese GDe Giacomo DLembo MLenzerini RRosati J. of Automated Reasoning 39 3 2007 Data exchange: Semantics and query answering RFagin PGKolaitis RJMiller LPopa Theoretical Computer Science 336 1 2005 Data exchange and schema mappings in open and closed worlds LLibkin CSirangelo J. of Computer and System Sciences 77 3 2011