Introduction

Importing Ontologies with Hidden Content

Bernardo Cuenca Grau

Boris Motik

0 0 Computing Laboratory University of Oxford , UK

Introduction

The Web Ontology Language (OWL) and its revision OWL 2 are widely used ontology languages whose formal underpinnings are given by description logics (DLs). OWL ontologies are used, for example, in several countries to describe electronic patient records (EPR). Patients’ data typically involves descriptions of human anatomy, medical conditions, drugs, and so on. These domains have been described in well-established reference ontologies such SNOMED-CT and GALEN. In order to save resources, increase interoperability between applications, and rely on experts’ knowledge, an EPR application should preferably reuse these reference ontologies. For example, assume that a reference ontology Kh describes concepts such as the “ventricular septum defect.” An EPR application might reuse the concepts and roles from Kh to define its own ontology Kv of concepts such as “patients having a ventricular septum defect.” It is generally accepted that ontology reuse should be modular—that is, the axioms of Kv should not affect the meaning of the symbols reused from Kh [ 1, 2 ].

To enable reuse, OWL allows Kv to import Kh. OWL reasoners deal with imports by merging the axioms of the two ontologies; thus, to process Kv ∪ Kh, an EPR application would require physical access to the axioms of Kh. The vendor of Kh, however, might be reluctant to distribute the axioms of Kh, as doing this might allow competitors to plagiarize Kh. Moreover, Kh might contain information that is sensitive from a privacy point of view. Finally, the vendor of Kh might impose different costs on parts of Kh. To reflect this situation, we call the ontology Kh hidden and, by analogy, Kv visible.

To enable reuse without providing physical access to the axioms, Kh could be made accessible via an oracle (i.e., a limited query interface), thus allowing Kv to import Kh “by query.” In this paper, we study import-by-query algorithms, which can solve certain reasoning tasks on Kv ∪ Kh by accessing only Kv and the oracle. We focus on schema reasoning problems, such as concept subsumption and satisfiability; this is in contrast to the information integration [ 3 ] and peerto-peer [ 4 ] scenarios, which focus on the reuse of data.

In our recent work [ 5 ], we studied the import-by-query problem when the oracle query language is concept satisfiability (or, equivalently, concept subsumption). We showed that no import-by-query algorithm exists even if Kv and Kh are expressed in the lightweight DL E L [ 6 ]. This negative result holds if Kv reuses an atomic role from Kh, so we studied the case when Kv reuses only atomic concepts from Kh in a modular way [ 2 ]. We presented an import-by-query algorithm for the case when the aforementioned assumptions are satisfied and Kv and Kh are expressed in SROIQ [ 7 ] and SRIQ [ 8 ], respectively. We also proposed an algorithm that is better suited for practice for the case where Kh is expressed in a Horn DL. Finally, we extended these results to the case when Kv reuses atomic roles from Kh in a syntactically restricted way.

The conditions in [ 5 ] on the usage of imported roles, however, can be limiting in practice, so in this paper we investigate ways of relaxing them. In particular, we study a different oracle query language that is based on ABox query answering, rather than concept subsumption or concept satisfiability checking. We present an import-by-query algorithm that uses such a query language and that is applicable if Kv and Kh are expressed in E L and the only restriction is that Kv imports concepts and roles from Kh in a modular way. We also establish links between this result and the negative result from [ 5 ]. In particular, we show that, if the axioms of Kv involving the roles imported from Kh are weakly acyclic [ 9 ], an import-by-query algorithm based on concept subsumption exists. 2

Preliminaries

In this section, we present an overview of the DL E L [ 6 ] and a suitable reasoning algorithm. For simplicity, we restrict ourselves to the basic version of E L; however, our results can be easily extended to E L++. We also recapitulate the notion of locality [ 2 ], which ensures modular ontology reuse. Finally, we define a restricted form of E L axioms that guarantees locality. 2.1

The Description Logic EL The syntax of E L is defined w.r.t. a signature, which is the union of disjoint countable sets of atomic concepts, atomic roles, and individuals. The set of E L concepts is the smallest set containing >, ⊥, A, C1 u C2, and ∃r.C, for A an atomic concept, C, C1, and C2 E L concepts, and r an atomic role.

A concept inclusion (GCI) has the form C1 v C2 for C1 and C2 E L concepts, and a concept equivalence C1 ≡ C2 is an abbreviation for C1 v C2 and C2 v C1. A TBox T is a finite set of GCIs. An assertion has the form C(a) or r(a, b), for C an E L concept, r an atomic role, and a and b individuals. An axiom is either a GCI or an assertion. An ABox A is a finite set of assertions. An E L knowledge base is a pair K = hT , Ai where T is a TBox and A is an ABox. For α a concept, an axiom, or a set of axioms, sig(α) is the signature of α—that is, the set of atomic concepts, roles, and individuals occurring in α.

The semantics of E L is given by means of interpretations. The definitions of an E L interpretation I = (4 , ·I ), satisfaction of an axiom, TBox, ABox, or

I KB S in an interpretation I (written I |= S), entailment of an axiom α from S (written S |= α), and concept subsumption (written K |= C v D), are standard and can be found in [ 6 ]. Classification of K is the problem of determining whether K |= C v D for each C, D ∈ sig(K) ∪ {>, ⊥}. For each satisfiable E L knowledge base K = hT , Ai, it is well known that hT , Ai |= C v D iff hT , ∅i |= C v D; hence, classification is usually considered w.r.t. a TBox only. We next present two algorithms for reasoning with E L KBs. First, we present the algorithm sat(T , A) that can be used to compute ABox assertions entailed by T ∪ A. Then, we show how to use sat(T , A) to obtain the E L classification algorithm el(T ). These algorithms are similar to the one in [ 10 ] and can be seen as a minor variation of the algorithm in [ 6 ].

Let Γ be a signature. Then, BC(Γ ) is the set of basic concepts for Γ , which is the smallest set of E L concepts containing >, ⊥, all atomic concepts in Γ , and all concepts of the form ∃r.C such that r ∈ Γ and C ∈ Γ ∪ {>, ⊥}.1 An E L TBox T is in normal form if every GCI α ∈ T is of the form

C1 u . . . u Cn v D (1) with n ≥ 1, and where Ci and D are basic concepts for sig(T ). An E L ABox A is in normal form if, in each assertion C(a) ∈ A, the concept C is basic for sig(A). An E L KB K = hT , Ai is in normal form if both T and A are in normal form. Given an ABox A in normal form and Γ ⊆ sig(A), we denote with A|Γ the set of assertions C(a) and r(a, b) in A such that C ∈ BC(Γ ) and r ∈ Γ .

For each concept C ∈ sig(T ), let xC be an individual uniquely associated with C. The algorithm sat(T , A) accepts an E L TBox T and ABox A and produces a sequence A0, . . . An of ABoxes, called a run, s.t. the following conditions hold: 1. A0 := A ∪ {C(xC ) | C ∈ sig(T )}. 2. Ai is obtained from Ai−1 by applying a rule from Table 1 for each 1 ≤ i ≤ n. 3. No rule from Table 1 is applicable to An.

For each T and A, the ABox An is unique across all runs of sat(T , A), so we often write sat(T , A) = An. Furthermore, by modifying slightly the proofs from [ 6 ] and [ 10 ], it is straightforward to show that the following properties hold: – P1: For each TBox T , ABox A, basic concept C for sig(T ), and each individual x in A, we have C(x) ∈ sat(T , A) if and only if T ∪ A |= C(x). – P2: For all TBoxes T and T 0, and all ABoxes A and A0, • A ⊆ sat(T ∪ T 0, ∅) implies sat(T , A) ⊆ sat(T ∪ T 0, ∅), and • A0 ⊆ sat(T , A) implies sat(T , A ∪ A0) ⊆ sat(T , A). 1 Note that this notion of basic concept differs from the one introduced in [ 6 ].

The algorithm el(T ) accepts a TBox T and returns C v D for each pair of concepts C and D from sig(T ) ∪ {>, ⊥} such that D(xC ) ∈ sat(T , ∅). Properties P1 and P2 straightforwardly imply that C v D ∈ el(T ) iff T |= C v D. 2.3

Locality When a TBox Tv reuses a TBox Th, it is commonly accepted [ 1, 2 ] that Tv should not affect the meaning of the symbols reused from Th—that is, Tv ∪ Th |= α should imply Th |= α for each axiom α containing only the reused symbols. This is guaranteed if Tv is local w.r.t. the set of concepts and roles Γ imported from Th [ 2 ]. More precisely, for Γ a signature such that {>, ⊥} ⊆ Γ , a TBox T is local w.r.t. Γ if, for each interpretation I such that XI = ∅ for each atomic concept and each atomic role X ∈/ Γ , we have I |= T .

An E L TBox T in normal form is safe for Γ if sig(C1 u . . . u Cn) 6⊆ Γ for each axiom of the form (1) in T . It is easy to see that T is then local for Γ . 3

Importing Ontologies by Query

To illustrate the notion of import-by-query, Table 2 shows a TBox Th whose axioms are to be kept hidden, but that is reused in a visible TBox Tv. The TBox Th provides concepts describing the structure of organs such as Heart, and medical conditions associated to them such as CHD (congenital heart defect), VSD (ventricular septum defect), and AS (aortic stenosis). Furthermore, the role part relates organs with their parts, and cond relates organs with medical conditions. The former role is used to define concepts such as Heart (an organ one of whose parts is the tricuspid valve); the latter is used to define concepts such as CHD Heart (a heart with a congenital heart disorder) and VSD Heart (a heart with a ventricular septal defect). The shared symbols of Th are written in bold font. The TBox Th might also contain nonshared symbols; however, for simplicity, we do not show any axioms involving such symbols. The TBox Tv provides the concept Pat representing patients, and it defines types of patients by relating the organs from Th with the patients using the hasOrgan role. In addition, Tv extends the list of defects in Th by EA (Ebstein’s anomaly), which is a congenital heart defect in which the opening of the tricuspid valve is displaced (axioms δ7 and δ8), and defines a patient with tricuspid valve disease (TVD Pat ) as a patient with a heart that has a congenital heart defect and an abnormal tricuspid valve (δ9). The symbols private to Tv are written in italic font.

As already mentioned, we assume that Tv is local w.r.t. the imported symbols. For example, δ1 is local w.r.t. {CHD Heart} because δ1 is satisfied in any interpretation that interprets the nonshared symbols as ∅. Note also that, according to the definition in Section 2, Tv is safe for the imported signature.

We next introduce two types of oracles, which are responsible for advertising the shared signature Γ of Th and answering queries w.r.t. Th. Definition 1. Let T be an EL TBox and Γ be s.t. {>, ⊥} ⊆ Γ ⊆ sig(T ). A subsumption oracle ΩTs ,Γ is a function that, for each pair of EL concepts C and D with sig(C) ⊆ Γ and sig(D) ⊆ Γ , returns a Boolean value as follows: ΩTs ,Γ (C, D) = t if T |= C v D f otherwise An ABox query oracle ΩTa ,Γ is a function that, for each normalized EL ABox A with sig(A) ⊆ Γ , a concept C ∈ BC(Γ ), and an individual x occurring in A, returns a Boolean value as follows: ΩTa ,Γ (A, C, x) = t if hT , Ai |= C(x) f otherwise

Without any further qualification, the generic term oracle refers either to the subsumption of the ABox query oracle.

A subsumption oracle is able to answer subsumption queries between (possibly complex) EL concepts, and it is analogous to the concept satisfiability oracle we considered in [ 5 ]. Concept subsumption is available as a reasoning service in all DL reasoners known to us, so it provides us with a natural oracle query language. Our results from [ 5 ], however, suggest that the way in which the roles from Th are used in Tv must be restricted in import-by-query algorithms based on subsumption oracles; we summarize these results in Section 4.

As we show in Section 5, the use of ABox query oracles allows us to overcome these limitations. Intuitively, an ABox query accepts an ABox A containing only symbols in Γ and “completes it” with the (basic) concepts over Γ that are entailed by A ∪ Th. Roughly speaking, the ABox represents the information that has already been deduced; when “handed over” to the oracle, the oracle then completes the ABox with the relevant information entailed by Th.

An import-by-query (classification) algorithm checks the subsumption relations between the atomic concepts in sig(Tv) w.r.t. Tv ∪ Th by using an oracle. C and D such that {C, D} ⊆ sig(Tv) ∪ {>, ⊥} and Tv ∪ Th |= C v D. Definition 2. An import-by-query algorithm takes as input an EL TBox Tv and an oracle ΩTh,Γ with sig(Tv) ∩ sig(Th) ⊆ Γ , and it returns C v D for each 4

Import-by-Query and Subsumption Oracles

We next summarize our results from [ 5 ]. We first show that no general importby-query algorithm based on subsumption oracles exists.

Theorem 3. No import-by-query algorithm based on subsumption oracles exists, if Tv and Th are in EL, Γ contains only one atomic role, and Tv is local in Γ . Proof. Assume that an import-by-query algorithm exists that terminates on all inputs, and consider Tv as in (2) with Γ = {r}. Clearly, Tv is local w.r.t. Γ . Since the algorithm terminates, the number of questions posed to any subsumption oracle is bounded by some integer m, and the quantifier depth of each concept C passed to the oracle is bounded by an integer n, where both m and n depend only on Γ and Tv. Let Th1 and Th2 be as in (3) and (4), respectively.

Tv = {A v ∃r.A}

Th1 = ∅ Th2 = { ∃r. . . . ∃r .> v ⊥ } n|+ 1{ztim}es (2) (3) (4) For each C with sig(C) ⊆ Γ and quantifier depth at most n , Th1 |= C v ⊥ iff Th2 |= C v ⊥, so ΩTsh1,Γ (C, ⊥) = ΩTsh2,Γ (C, ⊥). When applied to Tv and ΩTsh1,Γ , the algorithm returns the same value as when applied to Tv and ΩTsh2,Γ . Since Tv ∪ Th1 6|= A v ⊥ but Tv ∪ Th2 |= A v ⊥, it does not satisfy Definition 2. tu

This negative result relies on the presence of an atomic role r in the reused signature Γ and the fact that r is used in Tv in a “cyclic” axiom A v ∃r.A. To circumvent this problem, we studied in [ 5 ] the case where Γ contains only atomic concepts and proposed an import-by-query algorithm that applies even when Tv and Th are given in a very expressive DL.

In [ 5 ] we also studied the case where both concepts and roles are imported, but with the condition that sig(C) ⊆ Γ for each concept C appearing in Tv in the scope of a quantifier over r from Γ . For example, if r ∈ Γ , then Tv is allowed contain the concept ∃r.A only if A ∈ Γ . The algorithm in [ 5 ] for this case applies to very expressive DLs. In our example, the conditions in [ 5 ] allow us to write (using δ1, δ2 and δ5) and AS Pat v CHD Pat (using δ1 and δ3). axioms δ1–δ5, which together with Th allow us to conclude VSD Pat v CHD Pat

In order to express axioms such as δ5–δ9, we next investigate ways of relaxing the conditions from [ 5 ]. In Section 5, we study the design of import-by-query to relax the syntactic restrictions from [ 5 ] when using subsumption oracles. 5

Import-by-Query with ABox Query Oracles

We next present the import-by-query algorithm ibqa(Tv, ΩTah,Γ ), which accepts a normalized E L TBox Tv and an ABox query oracle Ωa

Let sata(Tv, A, ΩTah,Γ ) be the same as sat(Tv, A Th,Γ s.t. Tv is safe for Γ .

) (see Section 2), with the difference that rule R3 shown Table 3 is applied alongside rules R1 and R2 from the result of transforming Th into normal form. The following properties hold: D(xC ) ∈ A el implies D(xC ) ∈ A .

a – Soundness: A a

⊆ Ael; – Completeness: For each C ∈ sig(Tv) and D ∈ BC(sig(Tv)), we have that Proof (Soundness). Let A0, . . . , An with An = A We show by induction on the rule applications that Ai ⊆ A a be a run of sata(Tv, ∅, ΩTah,Γ ).

el for each 0 ≤ i ≤ n. sat(Th0, Ai−1|Γ ) ⊆ Ael, which implies our claim.

The induction base (i = 0) follows from the initialization conditions of both algorithms. Assume now that Ai−1 ⊆ A

el and let Ai be obtained from Ai−1 by a rule application. If either R1 or R2 is applied, property P2 from Section 2 implies that Ai ⊆ Ael. Assume that R3 extends Ai−1 with D(xC ). Then, C ∈ Γ ; D ∈ BC(Γ ); and, by the induction hypothesis, Ai−1|Γ ⊆ Ael. Also, ΩTah,Γ (Ai−1|Γ , D, xC ) = t, which implies Th0 ∪ Ai−1|Γ |= D(xC ). But then, property P1 from Section 2 implies that D(xC ) ∈ sat(Th0, Ai−1|Γ ). Since Ai−1|Γ ⊆ Ael, by P2 we have that tu Proof (Completeness). Let A0, . . . , An with An = A For Aj occurring in the run and Σ a signature, let el be a run of sat(Tv ∪ Th0, ∅).

Aj kΣ = {D(xC ) ∈ Aj | {C, D} ⊆ BC(Σ)} ∪ {r(xC , xD) ∈ Aj | {r, C, D} ⊆ Σ}. Completeness follows immediately from statement 2 of the following claim. CLAIM: The following conditions hold for each Aj with 1 ≤ j ≤ n: 1. Aj ⊆ A ∪ sat(Th0, Aa|Γ ).

a 4. If r(xC , xD) ∈ Aj , then (b) C ∈ sig(Tv) \ Γ implies D ∈ sig(Tv). (a) C ∈ sig(Th0) implies D ∈ BC(sig(Th0)), and

a 2. Aj ksig(Tv) ⊆ A . 3. If D(xC ) ∈ Aj , then (a) r ∈ sig(Tv) \ Γ implies C ∈ sig(Tv) \ Γ , (b) r ∈ Γ and D ∈ sig(Tv) \ Γ imply C ∈ sig(Tv) \ Γ , and (c) r ∈ sig(Tv) and C ∈ sig(Tv) \ Γ imply D ∈ sig(Tv).

The proof is by induction on the rule applications. For the induction base (j = 0), we have A0 = {C(xC ) | C ∈ sig(Tv ∪ Th0)}. Statements 1–3 hold straightforwardly, and statement 4 is vacuously true. Assume now that statements 1–4 hold for Aj−1 and consider an application of a rule that derives Aj .

By the induction hypothesis, ∃r.D(xC ) ∈ A

Assume that R1 is applied to ∃r.D(xC ) ∈ Aa j−or1 a∃nrd.Dth(xaCt )it∈desraitv(eTsh0r, (AxC| Γ,x).DI)f. a ∃r.D(xC ) ∈ Aa, then r(xC , xD) ∈ A a because R1 is not applicable to Aa, so a a statement 1 holds. If ∃r.D(xC ) ∈ sat(Th0, A |Γ ), then r(xC , xD) ∈ sat(Th0, A |Γ ) a because R1 is not applicable to sat(Th0, A |Γ ), so statement 1 also holds. If C, D, and r are all in sig(Tv), by the induction hypothesis ∃r.D(xC ) ∈ Aa; since R1 is not applicable to A , we have r(xC , xD) ∈ Aa, so statement 2 holds. Statement a 3 holds vacuously. Note that the contrapositive of statement 3a is as follows (*): Assume now that R2 is applied to an axiom α ∈ Tv ∪ Th0 of the form implies statements 4a and 4b. Finally, statement 3b implies statement 4c. r ∈ sig(Tv) \ Γ or D ∈ sig(Tv) \ Γ imply C ∈ sig(Tv) \ Γ . Now (*) immediately A1 u . . . u Ak u ∃r1.B1 u . . . u ∃rm.Bm v D and that it derives D(xC ). Then, individuals yE1 , . . . , yEm in Aj−1 exist such that Ai(xC ) ∈ Aj−1 for each 1 ≤ i ≤ k, and {r`(xC , yE` ), B`(yE` )} ⊆ Aj−1 for each 1 ≤ ` ≤ m. Statement 4 holds vacuously, so we next prove 1–3, depending on whether α belongs to Th0 or Tv. – α ∈ Th0. Statement 3 holds trivially. Consider each assertion Ai(xC ) with 1 ≤ i ≤ n. By the induction hypothesis, we have Ai(xC ) ∈ Aa ∪sat(Th0, A |Γ ). a If Ai(xC ) ∈ Aa, since α ∈ Th0, we have Ai ∈ Γ , so Ai(xC ) ∈ A |Γ . Thus, we a a have Ai(xC ) ∈ sat(Th0, A |Γ ). By a completely analogous argument, we have a {r`(xC , yE` ), B`(yE` )} ⊆ sat(Th0, A |Γ ) for each 1 ≤ ` ≤ m as well. But then, since R2 is not applicable to sat(Th

a a 0, A |Γ ), we have D(xC ) ∈ sat(Th0, A |Γ ), so statement 1 holds. Statement 2 is vacuously true if C 6∈ Γ or D 6∈ BC(Γ ). a Otherwise, by statement 1 and property P1, we have Th0 ∪ A |Γ | since R3 is not applicable to Aa, we have D(xC ) ∈ Aa, so statement 2 holds. = D(xC ); have C ∈ sig(Tv) \ Γ . – α ∈ Tv. Note that Tv is safe for Γ , so we have these possibilities: • Ai ∈/ Γ for some 1 ≤ i ≤ k. By the contrapositive of statement 3a, we • B` ∈/ Γ and r` ∈ Γ for some 1 ≤ ` ≤ • r` 6∈ Γ for some 1 ≤ ` ≤ m. By statement 4a, we have C ∈ sig(Tv) \ Γ . Since C statement 3a, yE` ∈ sig(Tv)\Γ ; however, by statement 4b, C ∈ sig(Tv)\Γ .

∈ sig(Tv) \ Γ , statement 4c implies E` ∈ sig(Tv). But then, by statement 2, all Ai(xC ), r`(xC , yE` ), and B`(yE` ) are in Aa. Since R2 is not applicable to A , D(xC ) ∈ Aa, which implies statements 1 and 2. Statements a 3a and 3b respectively hold since C 6∈ sig(Th0) and D ∈ BC(sig(Tv)). m. By the contrapositive of and D from sig(T ) ∪ {>, ⊥} such that D(xC ) ∈ sata(T , ∅, ΩTah,Γ ).

The algorithm ibqa(Tv, ΩTah,Γ ) returns C v D for each pair of concepts C with a polynomial number of calls to ΩTah,Γ .

Theorem 5. The algorithm ibqa(Tv, ΩTah,Γ ) is an import-by-query algorithm, and it can be implemented such that it runs in time polynomial in the size of Tv

Since no rule removes assertions from A

Proof. The size of BC(sig(Tv)) is quadratic in the size of Tv. Each application of R1, R2, or R3 adds an assertion of the form C(a) or r(a, b) for C ∈ BC(sig(Tv)). , the total number of rule applications is polynomial. Thus, the algorithm can be implemented such that it runs in polynomial time with a polynomial number of calls to the oracle. Since Th0 is obtained from Th through normalization, for all concepts C and D with {C, D} ⊆ sig(Th) ∪ sig(Tv) ∪ {>, ⊥} we have Tv ∪ Th |= C v D iff Tv ∪ Th0 |= C v D. Let A a and A el be as in Lemma 4. If ibqa(Tv, Ωa a

Th,Γ ) returns C v D, then D(xC ) ∈ A ; direction holds analogously by the completeness property of Lemma 4. the soundness property of Lemma 4 implies D(xC ) ∈ Ael; thus, el(Tv ∪ Th0) returns C v D; hence, Tv ∪ Th0 |= C v D; finally, Tv ∪ Th |= C v D. The converse tu 6

Weakly Acyclic Knowledge Bases

We next present an import-by-query algorithm based on subsumption oracles where Tv needs to satisfy weaker conditions than those in [ 5 ]. Our results rely on the notion of weak acyclicity borrowed from database dependency theory [ 9 ]. This condition prohibits cyclic existential quantification over the roles imported (using δ4–δ6) and EA Pat v TVD Pat (using δ4–δ7 and δ9). from Th, which invalidates the proof of Theorem 3. In our example, weak acyclicity allow us to represent all axioms δ1–δ9 and hence to entail EA Pat v CHD Pat 6.1

Weak Acyclicity Let [A] = A for A an atomic concept, and let [∃r.D] = r. edges for each axiom C1 u . . . u Cn v D in T and each 1 ≤ i ≤ n: Definition 6. Let Γ be a signature such that {>, ⊥} ⊆ Γ , and let T be an E L TBox in normal form. The dependency graph for T w.r.t. Γ is the smallest graph that contains a node wX for each symbol X ∈ sig(T ) and the following – an unlabeled edge hw[Ci], w[D]i, and – another edge hw[Ci], w[E]i labeled with ? if D of the form ∃r.E with r ∈ Γ . w.r.t. Γ , all edges are unlabeled.

T is weakly acyclic w.r.t. Γ if, in each cycle in the dependency graph for T

It can be readily checked using Definition 6 that our example TBox Tv in

Let Γ be a signature and A an ABox occurring in a run of sata(Tv, ∅, ΩTah,Γ ). Then A contains a harmful cycle for Γ if assertions ri(xCi−1 , xCi ) ∈ A with 1 ≤ i ≤ n exist such that xCn = xC0 , Ci ∈ sig(Tv) for each 0 ≤ i ≤ n, and ri ∈ Γ for each 1 ≤ i ≤ n. As we show next, if Tv is weakly acyclic w.r.t. Γ , then no ABox occurring in a run of sat(Tv, ∅, ΩTah,Γ ) contains a cycle harmful for Γ . Lemma 7. Let Tv be weakly acyclic w.r.t. Γ , and let Aa = sata(Tv, ∅, ΩTah,Γ ). Then, Aa does not contain a cycle that is harmful for Γ .

Proof (Sketch). Let G be the dependency graph for Tv w.r.t. Γ , and A0, . . . , An a run of sata(Tv, ∅, Ωa

Th,Γ ) such that An = Aa. The following claim can be straightforwardly shown by induction on the number of rule applications: CLAIM: The following properties hold for each Ai with 1 ≤ i ≤ n: 1. if D(xC ) ∈ Ai and {C, D} ⊆ BC(sig(Tv) \ Γ ), then w[D] is reachable in G from w[C]; in addition, if D is of the form ∃r.E, then w[E] is reachable in G from w[C] via a path containing a labeled edge; 2. if r(xC , xD) ∈ Ai, {C, D} ⊆ sig(Tv) \ Γ , and r ∈ Γ , then w[D] is reachable from w[C] via a path containing a labeled edge.

The lemma follows from statement 2 and the fact that Tv is weakly acyclic. ut 6.2

An Import-by-Query Algorithm The algorithm ibqs(Tv, ΩTsh,Γ ) accepts a normalized EL TBox Tv that is safe and weakly ac)ywcliitchfothreΓd, iaffnedreannceotrhacalterΩulTseh,RΓ 3.’Lient Tsaatbsl(eTv4, Ais ,aΩpTpahli,Γed) ableonthgesidsaemruelaess sat(Tv, A R1 and R2. For Σ a signature and A an ABox not containing cycles harmful for Σ, the concept con(xC , A, Σ) used in rule R3’ is inductively defined as follows: con(xC , A, Σ) = D(xC)∈A|Σ

d  D(xC)∈A|Σ  d

D D

if C ∈ Σ u

d r(xC,yE)∈A|Σ ∃r.con(yE, A, Σ) otherwise Since A does not contain cycles harmful for Σ, induction is well-founded. Intuitively, con(xC , A, Σ) represents the “rolled-up” part of A|Σ reachable from xC . The algorithm ibqs(Tv, ΩTsh,Γ ) then returns C v D for each pair of concepts C and D such that {C, D} ⊆ sig(T ) ∪ {>, ⊥} and D(xC ) ∈ sats(T , ∅, ΩTsh,Γ ). Lemma 8. For each signature Γ , each EL TBox Tv that is safe and weakly acyclic for Γ , and each EL TBox Th such that sig(Tv) ∩ sig(Th) ⊆ Γ , we have sats(Tv, ∅, ΩTsh,Γ ) = sata(Tv, ∅, ΩTah,Γ ).

Proof (Sketch). The lemma is a consequence of the following claim together with the fact that both algorithms are identical except for the rule R3’. individual x in A, and concept D ∈ BC(Γ ), we have CLAIM: For each signature Γ , ABox A not containing cycles harmful for Γ ,

Th ∪ A|Γ |= D(x) if and only if Th |= con(x, A, Γ ) v D.

To prove this claim, let Qx be a fresh concept uniquely associated with each individual x in A|Γ . Let T 0 be a TBox containing the axiom QxC v D for each D(xC ) ∈ A|Γ , and QxD v ∃r.QyE for each r(xD, yE) ∈ A|Γ with D ∈/ Γ . It is easy to show that Th ∪ A|Γ |

= D(x) iff D(x) ∈ el(Th0 ∪ T 0), for Th0 the normalized unfolding the relevant concepts Qy with their definitions in T 0. version of Th. Since A|Γ does not contain cycles harmful for Γ , by induction on the structure of con(x, A, Γ ), we have that con(x, A, Γ ) is obtained from Qx by tu a polynomial number of calls to ΩTsh,Γ .

Theorem 9. The algorithm ibqs(Tv, Ωs

Th,Γ ) is an import-by-query algorithm and it can be implemented such that it runs in time exponential in the size of Tv with Proof. The algorithm is an import-by-query algorithm by Lemma 8 and Theorem 5. Finally, just like in the proof of Theorem 5, rules R1, R2, and R3’ can only be applied a polynomial number of times. The inductive definition of con(x, A, Γ ), however, involves constructing a tree of polynomial depth and branching factor, the size of which can be at most exponential in the size of A|Γ . tu 7

Related and Future Work

equal to Th; thus, publishing Th In a P2P setting, [ 4 ] consider the problem of answering a query q over KBs Kv and Kh and mappings M by reformulating q as queries that can be evaluated over Kv and Kh in isolation. The query reformulation algorithm accesses only Kv and M; thus, q can be answered using an oracle for Kh. In this setting, however, a satisfiable Kh cannot affect the subsumption of concepts in Kv (see [ 5 ] for an example); thus, the results in [ 4 ] are not applicable to schema reasoning. Th Γ such that sig(ThΓ ) ⊆ Γ , Th |= Th

Γ and Th | in Γ is to publish a uniform interpolant Th Γ

Another possibility for the owner of Th ΓtoofreTshtriwct.r.atc.cΓess[1o1n] —lyatno osnytmolbooglys = C v D iff Th |= C v D for all concepts C and D with sig(C) ∪ sig(D) ⊆ Γ . Publishing ThΓ , however, may reveal more information about Th than what is strictly needed. For example, for Γ = {r, B}, Tv = {A v ∃r.B} and Th = {∃r.∃r.B v B}, the interpolant Th Γ is Γ reveals entire contents of the hidden ontology Th. In contrast, an import-by-query algorithm reveals only that Th 6|= ∃r.B v ⊥ and Th 6|= ∃r.B v B; hence it does not reveal the actual syntactic structure of Th. Thus, enabling the reuse of Th through import-by-query preserves the hidden content of Th to a higher degree than if reuse were enabled by publishing ThΓ .

We are currently working on extending the results presented in this paper to DLs more expressive than EL.

[1] Lutz , C. , Walther , D. , Wolter , F. : Conservative Extensions in Expressive Description Logics . In: Proc. IJCAI . ( 2007 ) 453 - 458

[2]

Cuenca

Grau , B. , Horrocks , I. , Kazakov , Y. , Sattler , U. : Modular Reuse of Ontologies: Theory and Practice . Journal of Artificial Intelligence Research 31 ( 2008 ) 273 - 318

[3] Lenzerini , M. : Data Integration: A Theoretical Perspective . In: Proc. PODS . ( 2002 ) 233 - 246

[4] Calvanese , D. , De Giacomo , G. , Lembo , D. , Lenzerini , M. , Rosati , R.: What to Ask to a Peer: Ontolgoy-based Query Reformulation . In: Proc. KR . ( 2004 ) 469 - 478

[5]

Cuenca

Grau , B. , Motik , B. , Kazakov , Y. : Import-by- Query : Ontology Reasoning under Access Limitations . In: Proc. IJCAI , AAAI Press ( 2009 ) To Appear.

[6] Baader , F. , Brandt , S. , Lutz , C. : Pushing the EL Envelope . In: Proc. IJCAI ,. ( 2005 ) 364 - 369

[7] Kutz , O. , Horrocks , I. , Sattler , U. : The Even More Irresistible SROIQ . In: Proc. KR . ( 2006 ) 68 - 78

[8] Horrocks , I. , Kutz , O. , Sattler , U. : The irresistible SRIQ . In: Proc. of OWLED . ( 2005 )

[9] Kolaitis , P.G. , Panttaja , J. , Tan , W.C. : The complexity of data exchange . In: PODS . ( 2006 ) 30 - 39

[10] Motik , B. , Horrocks , I. : Individual Reuse in Description Logic Reasoning . In: Proc. of IJCAR 2008 . Volume 5195 of LNAI ., Springer ( 2008 ) 242 - 258

[11] Konev , B. , Walter , D. , Wolter , F. : Forgetting and uniform interpolation in largescale description logic terminologies . In: Proc. IJCAI , AAAI Press ( 2009 ) To Appear.