Introduction

Effective Query Rewriting with Ontologies over DBoxes (extended abstract)?

0 Free University of Bozen-Bolzano , Italy

We consider query answering on Description Logic (DL) ontologies with DBoxes, where a DBox is a set of assertions on individuals involving atomic concepts and roles called DBox predicates. The extension of a DBox predicate is exactly defined in every interpretation by the contents of the DBox, i.e., a DBox faithfully represents a database whose table names are the DBox predicates and the tuples are the DBox assertions. Our goals are (i) to find out whether the answers to a given query are solely determined by the DBox predicates and, if so, (ii) to find a rewriting of the query in terms of them. The resulting query can then be efficiently evaluated using standard database technology. We have that (i) can be reduced to entailment checking and (ii) can be reduced to finding an interpolant. We present a procedure for computing interpolants in the DL ALC with general TBoxes. We extend the procedure with standard tableau optimisations, and we discuss abduction as a technique for amending ontologies to gain definability of queries of interest.

Introduction

We address the problem of concept query answering on databases with ontologies. An ontology provides a conceptual view of the database and it is composed by constraints on a vocabulary extending the basic vocabulary (tables and attributes) of the data. Querying a database using the terms in such a richer ontology allows for more flexibility than using only the basic vocabulary of the relational database directly.

Description Logics [ 1 ] (DLs) are a prominent formalism for representing ontologies. However, DLs such as ALC or OWL or DL-lite were not originally developed with this use case in mind. In particular, the data in a DL knowledge base is represented using an ABox, which can be seen as an incomplete database. This means the extensions of the concepts and roles contain at least the data mentioned in the ABox, but may contain additional data, and this may vary among the models of the knowledge base. This is in contrast to relational ? This paper is an excerpt from the IJCAI-09 paper “Effective Query Rewriting with Ontologies over DBoxes” by the same authors. We wish to thank Alex Borgida, Tommaso Di Noia, Umberto Straccia and David Toman with whom we are studying a more general framework on query rewriting based on Beth definability and abduction, and the anonymous reviewers for insightful comments. The work presented in this paper has been partially funded by the European project ONTORULE. databases, which are complete: the extensions of the predicates (i.e., the tables) contain exactly the data in the database and nothing more.

We introduce in this paper the notion of a DBox, which is syntactically similar to an ABox; it is a set of ground atomic concept and role assertions. However, semantically it behaves like a database, i.e., the extensions of the concepts and roles mentioned in the DBox are exactly defined by the contents of the DBox. We call the concepts and roles appearing in the DBox the DBox predicates. Observe that the DBox predicates are closed, i.e., their extensions are the same in every interpretation, whereas the other predicates in the knowledge base are open, i.e., their extensions may vary among different interpretations. This has been called also locally closed world [ 2 ] or exact views [ 3,4,5 ].

The queries we consider in this paper are concept expressions (and the answers are their instances), and the ontologies are general ALC TBoxes. Our goals are (i) to check whether the answers to a given query under a TBox are solely determined by the extension of the DBox predicates and, if so, (ii) to find an equivalent rewriting of the query in terms of the DBox predicates to allow the use of standard database technology for answering the query. This means we benefit from the low computational complexity in the size of the data of answering first-order queries on relational databases. In addition, we would like as much as possible to use standard techniques for DL reasoning to find rewritings. As was pointed out recently also by [ 6 ], (i) corresponds to implicit definability [ 7 ], and can be reduced to checking entailment and (ii) corresponds to explicit definability [ 8 ]. Inspired by the results of [ 9 ], this problem can be reduced to finding an interpolant.

Our contributions in this paper are as follows. We introduce the notion of DBoxes as a faithful encoding of databases, and show how to find a query rewriting over DBoxes, using Beth definability and interpolation; we present a procedure for calculating interpolants of ALC concepts under general TBoxes, using an adaptation of standard DL tableau techniques; we extend the procedure to deal with common tableau optimisation techniques found in implemented systems, in particular, non-atomic closure, semantic branching, lazy unfolding and absorption, and backjumping; and we show how abduction techniques can be used for amending TBoxes to gain definability of queries of interest. 2

ALC and DBoxes

ALC TBoxes. Let NC and NR be countably infinite sets of concept and role names, respectively. The set of ALC concepts is the smallest set that includes the atomic concepts NC [ f>g, and if C, D are concepts and R 2 NR, then :C, C u D, and 8R:C are concepts. As usual, ?, C t D, and 9R:C are short for :>, :(:C u :D), and :8R::C, respectively. An ALC TBox T is a finite set of axioms of the form C v D, where C and D are ALC concepts.

An interpretation I is a pair h I ; I i where I is a non-empty set and I is a function that maps concept names A to subsets AI of I and role names R to binary relations RI on I ; I extends to concepts as follows: >I = I ; (:C)I = I n CI ; (C u D)I = CI \ DI ; and (8R:C)I = f 2 I j 8 0 (h ; 0i 2 RI ! 0 2 CI )g.

I is a model of a TBox T , written I j= T , iff CI DI for all axioms C v D 2 T . Since C v D is equivalent to > v :C t D, we assume that all axioms in T are of this form. We can thus view the TBox as a set of universal concepts fC j > v C 2 T g.

A concept C is satisfiable w.r.t. T iff there is some model I of T such that CI 6= ;; D subsumes C w.r.t. T , written C vT D, iff CI DI for all models I of T . We write C T D if CI DI and DI CI .

For ease of presentation, we assume all concepts to be in negation normal form (NNF), which means the negation signs appear only in front of atomic concepts. The negation normal form of the complement of a concept C is written :_C. With sig(C) (resp. sig(T )) we denote the set of all concept and role names occurring in C (resp. T ), i.e., the signature of C (resp. T ). With sig(C; T ) we denote the set of all concept and role names occurring in C or T .

DBoxes. Let NI be a countably infinite set of individual names. A DBox D is a set of assertions of the forms A(a) and R(a; b), where A 2 D (C), R 2 D (R), and a; b 2 D (I). The signature D of D consists of the DBox concepts D (C) NC , the DBox roles D (R) NR, and the DBox individuals

D (I) NI ; we call DBox predicates the set D (P ) = D (C)[ D (R). The active domain of D is the set of individuals in D (I) appearing in the assertions.

An interpretation I = h I ; I i is a model of D , written I j= D , iff aI = a for every DBox individual a 2 D (I), and for every concept (resp., role) name P in D (P ) and every u 2 I (resp., (u; v) 2 I I ), we have that u 2 P I iff P (u) 2 D (resp., (u; v) 2 P I iff P (u; v) 2 D ). In other words, in every model of D the extensions of the DBox predicates are given by the contents of the DBox, and are the same in every model. Please note that the domain I of a model of D is not fixed, but it includes all the DBox individuals in D (I), which in turn includes the active domain of D .

A DBox D is satisfiable with respect to a TBox T iff there is a model of D that is also a model of T . An interpretation I is a model of C(a), where C is a concept and a an individual name, written I j= C(a), iff aI 2 CI . A pair (T ; D ) entails a membership expression C(a) – this is the instance checking problem, or concept querying problem – written (T ; D ) j= C(a), iff for every model I of T and D it holds that I is a model of C(a). Finally, D entails C(a), written D j= C(a), iff (;; D ) j= C(a).

The unique names assumption (UNA) holds for a DL L if for any interpretation I considered in L , aI 6= bI whenever a 6= b, for any two a; b 2 NI . If we assume UNA, reasoning in a DL with DBoxes can be reduced to reasoning in the same DL extended with nominals – i.e., concept expressions of the form a I = faI g – and vice versa. fag, where a is an individual name, such that f g Proposition 1. Given a description logic L with UNA that includes ALC, for every DBox D and TBox T in L there exists a TBox T D in L extended with nominals with size polynomially bounded by the size of T [ D , such that D is satisfiable with respect to T iff T D is satisfiable, and vice versa. Proof (sketch). ()) T D is obtained from T by adding completion and closure axioms. For every DBox concept A add the axiom A fa1g t t fang, such that A(a1); : : : ; A(an) are the assertions involving A in D . For every DBox role R add the axioms faig v 9R:fbj g, faig v 8R:fb1g t t fbng, and 9R:> v fa1g t t famg, such that R(ai; bj ) for i = 1; ; m and j = 1; ; n are the assertions involving R in D . (() For each occurrence of fag in T D , the assertion Aa(a) is added to D , where Aa is a new concept name; each occurrence of fag in T D is replaced with Aa. Please note that this reduction encodes the entire DBox into a TBox, and that, although concept querying in ALCO has the same combined complexity as in ALC, if we consider more expressive logics such as SHIQ we do get an increase in combined complexity when nominals are added (from ExpTime to NExpTime). It can also be shown that the data complexity of query answering with DBoxes (i.e., with closed data predicates) is at least as hard as query answering with ABoxes (i.e., with open data predicates), and that it is strictly harder if we consider conjunctive query answering in simple description logics such as DL-lite (it increases from LogSpace to coNP-hard, which can be shown by a reduction from non-existence of 3-colourings). 3

Definability and Query Answering

We introduce in this Section implicit and explicit definability for queries, and discuss how explicit definitions can be used for concept query rewriting. In the following, given a TBox T and a DBox D , let Q be a query concept (i.e., a query asking for all the instances of the concept) and let sig(Q; T ) = fB1; : : : ; Bm; D1; : : : ; Dng be the combined signature where fD1; : : : ; Dng D (P ) and fB1; : : : ; Bmg \

D (P ) = ;; fD1; : : : ; Dng is the set of DBox predicates appearing in T or in Q. Definition 1 (Implicit definability). Let a concept '0 (resp. a TBox T 0) be like ' (resp. T ) but with occurrences of B1; : : : ; Bm replaced by distinct occurrences of B10; : : : ; Bm0 62 sig(Q; T ). Then, Q is implicitly definable from D1; : : : ; Dn in T iff Q T [T 0 Q0.

In other words, given a TBox, a concept Q is implicitly definable if the set of all its instances depends only on the extension of the DBox predicates. This means that it may be possible to find an expression using only predicates in the DBox whose instances are the same as in the original concept: this would be its explicit definition.

Definition 2 (Explicit definability). Q is explicitly definable from D1; : : : ; Dn in T iff there is some concept C such that Q T C and sig(C) fD1; : : : ; Dng. C vT [T 0 Q0.

Clearly, explicit definability implies implicit definability. [ 8 ] shows that the converse also holds for the case of first-order logic: if Q is implicitly definable from D1; : : : ; Dn in T , then it is explicitly definable. To prove this property for ALC with general TBoxes, we exploit interpolation.

Lemma 1 (Interpolation for ALC with TBoxes). Let Q be an ALC concept and let T be an ALC TBox such that Q vT [T 0 Q0, where Q0 and T 0 are obtained as above. Then, there exists some concept C, called the interpolant of Q and Q0 under T and T 0, such that sig(C) sig(Q; T ) \ sig(Q0; T 0), Q vT [T 0 C, and Theorem 1 (Beth Definability for ALC with TBoxes). If Q is implicitly definable from D1; : : : ; Dn in T then Q is explicitly definable from D1; : : : ; Dn in T .

The proof of Beth definability for ALC with general TBoxes is constructive, provided we have a constructive method of finding interpolants as defined in Lemma 1. In Section 4 we present such a method, and we prove its correctness and completeness.

We can now combine these results to reduce answering definable queries to query answering using only the DBox.

Theorem 2. Let D be a DBox that is satisfiable with respect to a TBox T , let Q be a concept and let a 2 D (I) be a DBox individual. If Q is implicitly definable from D (P ), then there is an ALC concept C with sig(C) D (P ) such that (T ; D) j= Q(a) iff D j= C(a).

To enable rewriting C(a) to standard database query languages we first need to prove its domain independence.

Definition 3 (Domain independence). Let C be a concept and let a be an individual name. C(a) is said to be domain independent if for every interpretation I = h I ; I i such that I j= C(a) it is the case that I0 j= C(a) for every interpretation I0 = h I0 ; I0 i with I0 I and I = I0 .

Theorem 3. Let C be an ALC concept and let a be an individual name. Then, C(a) is domain independent.

Consider a query C(a) with a 2 D (I) and sig(C) D (P ). Since C(a) is domain independent, deciding D j= C(a) can be reduced to checking aI 2 CI for an arbitrary model I of D; in particular we choose the smallest model ID = h ID ; ID i of D, where ID is equal to the set of the DBox individuals in D (I) and ID considers only the set of DBox assertions. In other words, the entailment check reduces to model checking over ID and a standard database query language over the DBox database can be used for deciding D j= C(a).

Given a query concept Q, a TBox T , and a DBox D, in order to find the answer set fa j (T ; D) j= Q(a)g by means of the rewriting fa j D j= C(a)g, one can use the equivalent database query Q(x) :- CFO(x) ^ >D (x) over the database D, where CFO is the standard first order translation of C and >D is the relation containing exactly the DBox individuals D(I). It is possible to precompute offline the rewriting of all definable atomic concepts in the ontology as materialised SQL views, so that at run time query answering on ALC ontologies with DBoxes is reduced to answering SQL queries over the DBox database with materialised views. 4

Constructive Interpolation for ALC

[ 10 ] and [ 11 ] give constructive proofs using symmetric Gentzen systems, respectively tableau calculi. [ 12 ] provides an algorithm for calculating interpolants between ALC concepts; however, he does not consider TBoxes, nor does he provide a reference with standard or optimised implemented tableau algorithms. (x : C1 u C2) 2 L(g) and f(x : C1) ; (x : C2) g 6 L(g) L(g0) = L(g) [ f(x : C1) ; (x : C2) g (x : C1 t C2) 2 L(g) and f(x : C1) ; (x : C2) g \ L(g) = ; L(g0) = L(g) [ f(x : C1) g, L(g00) = L(g) [ f(x : C2) g (x : 9R:C) 2 L(g), there is no variable y in L(g) such that y : fC g [ succ(x; R; S) L(g)

L(g0) = L(g) [ y : fC g [ succ(x; R; S) [ T , where y is fresh for S The Ru rule Condition: Effect: The Rt rule Condition: Effect: The R9 rule Condition:

Let NV be a countably infinite set of variable names and < a well-order relation on NV . A biased constraint is an expression of the form (x : D) where x 2 NV , D is a concept, and 2 fl; rg is a bias. A biased constraint system S for hC; T i is a finite, non-empty set of biased constraints.

We say a variable x 2 NV is in S if S contains a mention of x; x is fresh for S if x is not in S and y < x for all y in S. We assume that when a variable x is in S, the constraints (x : >)l, (x : >)r are also in S.

Let x be a variable in S. The set of x-constraints of S is defined as fx : C 2 Sg. For a role name R, with succ(x; R; S) we denote the set fC j (x : 8R:C) 2 Sg. If X is a set of labelled concepts, then x : X is a shorthand for f(x : C) j C 2 Xg.

S is said to contain a clash if for some variable x and some concept name A, f(x : A) ; (x : :A) g S.

Let H, J be ALC concepts and T = T l [ T r a set of biased concepts with T l being l- and T r being r-labelled concept. A biased tableau for hH u :_J; T i is a triple T = hV; E; Li, where hV; Ei is a finite tree, with g0 2 V being the root node, and L is a labelling function associating with each node g 2 V a biased constraint system for hH u :_J; T i and with each edge hg; g0i 2 E a biased completion rule R from Figure 1. In addition, we require L(g0) = x0 : fHlg [ f( :_J )rg [ T [ T 0. A biased tableau T0 for hH u :_J; T i that contains only the root node is called the initial biased tableau for hH u :_J; T i. A completion rule is applicable in a node g if its condition is satisfied in g.

A branch of a tableau is a path from the root down to a leaf. If in a tableau there is some successor g0 of g such that L(hg; g0i) = Rt, then g is called a branching point in the tableau. A branch is closed if the label of its leaf node contains a clash; otherwise it is open. A tableau is closed if all its branches are closed; otherwise it is open. A tableau is complete if no completion rule is applicable in the leaf nodes of any of its open branches.

The biased tableau algorithm takes as input the initial biased tableau T0 for hH u :_ J; T i. T0 is then expanded by repeatedly applying the completion rules given in Figure 1 with the following strategy: R9 is applied only when Rt is not applicable which in turn is applied only when Ru is not applicable. If the obtained complete tableau is closed, the algorithm returns false, otherwise true.

If the biased tableau algorithm returns false, then a second phase is initiated to extract an interpolant I.

C:(ll) (x : A)l; (x : :A)l 2 L(g)

int(g)(x) := ? (x : A)r; (x : :A)r 2 L(g) C:(rr) int(g)(x) := > C:(lr) (x : A)l; (x : :A)r 2 L(g)

int(g)(x) := A C:(rl) (x : A)r; (x : :A)l 2 L(g)

int(g)(x) := :A Definition 4. Let g be a node in a tableau and let f(x : C1)l; : : : ; (x : Cn)l; (x : D1)r; : : : ; (x : Dm)rg be the x-constraints of L(g). A concept I is called an interpolant for this set under T if I is an interpolant for C1 u : : : u Cn and :D1 t : : : t :Dm under T l and T r. (Take the empty intersection to be > and the empty union to be ?.)

Let C and D be concepts. We define the binary infix operator t as follows: C t D = C tD, C t ? = ? t C = C, C t null = null t C = C, and null t null = null. If ht1 and ht2 are hash tables, ht0 = ht1 t ht2 is defined as: ht0(x) = ht1(x) t ht2(x), for every variable x. The operator u is defined analogously.

Interpolants at nodes g are stored in hash tables int(g). Every key in int(g) is a variable in L(g) and every value is a concept or the special symbol null. With int(g)(x) we denote the value associated with the key x. The algorithm used in the second phase starts from the leaves of the tableau tree and applies the interpolant calculation rules C ( ) in Figures 2, 3, and 4, which are if-then rules. If the rules do not assign a value, it is assumed to be null. Lemma 2 (Soundness). Let T be a closed biased tableau for hH u :_J; T i, g a node in the tableau, and x a variable in L(g). If int(g)(x) is not null, it is an interpolant for the x-constraints of L(g).

Lemma 3 (Completeness). Let T be a closed biased tableau for hH u :_J; T i with root node g0. Then, int(g0)(x0) 6= null.

(x : 9R:C)l 2 L(g)

L(g0) = L(g) [ y : fClg [ X C9(l) int(g)(z) := int(g0)(z) (z 6= x & z 6= y) int(g)(y) := null I := int(g0)(y) int(g)(x) := int(g0)(x) [if I = null] int(g)(x) := int(g0)(x) t ? [if I = ?] int(g)(x) := int(g0)(x) t 9R:I [otherwise]

(x : 9R:C)r 2 L(g)

L(g0) = L(g) [ (y : fCrg [ X) C9(r) int(g)(z) := int(g0)(z) (z 6= x & z 6= y) int(g)(y) := null I := int(g0)(y) int(g)(x) := int(g0)(x) [if I = null] int(g)(x) := > [if I = >] int(g)(x) := int(g0)(x) t 8R:I [otherwise]

Finally, we remark that the interpolant calculation algorithm can be made more space-efficient by integrating it into the satisfiability checking algorithm (one-pass) and generating the tableau in a depth-first manner: interpolants are constructed for each subbranch while backtracking, so that the subbranch may be discarded from memory before exploring the next subbranch. Interpolants are combined at the branching points using the Ct( ) rule. Please note that, in this way, the final algorithm to compute the interpolant can be obtained by simply adorning a standard tableau algorithm for ALC. 5

Integrating Tableau Optimisations

In this section, we extend the tableau algorithm with commonly used optimisations, which can be applied to the one-pass algorithm sketched above, so that the process to compute the interpolant is just an adorned variant of the standard and optimised implemented tableau algorithm for ALC.

Non-atomic Closure. One can relax the definition of the closure condition for a tableau such that a constraint system S is said to contain a clash if for some variable x and some concept C, fx : C; x : :_Cg S. For example the unsatisfiability of the constraint system S fC u D; :_(C u D)g can be detected without branching for :_(C u D). Figure 5 depicts the calculation rules for non-atomic clashes, replacing the ones in Figure 2. Correctness is proved analogously to the atomic case.

The RU1 rule Condition: (x : A) (x : C)r 622LL((gg))), A

C is in TU (or TU0 ), and (x : C)l 62 L(g) (or Effect:

L(g0) = L(g) [ f(x : C)lg (or f(x : C)rg) Effect: L(g0) = L(g) [ f(x : C)lg (or f(x : C)rg) The RU2 rule Condition: (x : :A) 2 L(g), A C is in TU (or TU0 ), and (x : :_C)l 62 L(g) (or (x : :_C)r 62 L(g)) Effect: L(g0) = L(g) [ f(x : :_C)lg (or f(x : :_C)rg) The RU3 rule

(x : C)r 622LL((gg))), A v C is in TU (or TU0 ), and (x : C)l 62 L(g) (or Condition: (x : A) Semantic Branching. The standard definition of Rt (cf. Figure 1), which is based on syntactic branching, does not prevent the recurrence of unsatisfiable disjuncts in different branches of the tree. Semantic branching is a technique that allows the subtrees introduced by non-deterministic rules to be distinct in the sense that the satisfiability of a disjunct is searched only in a single subtree [ 13 ]. This requires a slight change in the Rt rule: the second effect becomes L(g00) = L(g) [ fx : :_C1; x : C2g, obtaining the new rule Rt0 . Replacing Rt with R0t in our tableau calculus requires modifying the third condition of Ct( ) in Figure 3 with L(g00) = L(g) [ f(x : :_C1) ; (x : C2) g. However, the way the interpolants are calculated does not change.

Lazy Unfolding and Absorption. Absorption is a TBox rewriting technique that has been developed to address the problem of the high level of non-determinism caused by general TBox axioms [ 13 ]. Being a rewriting technique, absorption of general axioms into simple axioms is performed before the satisfiability algorithm starts. For this reason, we are only concerned here with the structure of the output TBox. Given a general TBox T , the absorption process rewrites and partitions T into the general TBox TG and the unfoldable TBox TU [ 13 ] such that a concept is satisfiable w.r.t. T iff it is satisfiable w.r.t. TU [ TG.

The implicit definability problem takes as input two TBoxes T and T 0. Therefore, we assume w.l.o.g. that the absorption process is only applied to T , resulting in TU and TG, and TU0 and TG0 are obtained analogously to T 0. R9 is modified to use only TG and TG0 and, in addition, we need the rules in Figure 6 that handle unfoldable TBoxes. In the literature, these rules constitute what is known as the lazy unfolding optimisation [ 13 ].

To complement these rules we use the interpolant calculation rules given in Figure 7. One needs to verify the soundness of these rules. We show only CU1(l) for TU . Suppose X = f(x : C1)l; : : : ; (x : Cn)l; (x : D1)r; : : : ; (x : Dm)rg, g0 is the result of applying RU1 to g for A C 2 TU , and X [ f(x : A) ; (x : C)lg are the x-constraints of L(g0). Note that although may not be equal to l, we can assume w.l.o.g. that = l since A 2 sig(TU ). By our assumption, int(g0)(x) is an interpolant for the x-constraints of L(g0) and by Definition 4, A u C u C1 u : : : u Cn vT int(g0)(x) vT :D1 t : : : t :Dm. CU1 assigns int(g0)(x) to int(g)(x). We have that (i) A u C1 u : : : u Cn vT int(g0)(x) follows from CU1( ) CU2( ) CU3( )

(x : A) 2 L(g)

A C 2 TU (or TU0 ) L(g0) = L(g) [ f(x : C) g int(g) := int(g0) (x : :A) 2 L(g) (A C) 2 TU (or TU0 ) L(g0) = L(g) [ f(x : :_C) g int(g) := int(g0) (x : A) 2 L(g) (A v C) 2 TU (or TU0 ) L(g0) = L(g) [ f(x : C) g

int(g) := int(g0) (A u C u C1 u : : : u Cn)I = (A u C1 u : : : u Cn)I , for all models I of T , since A C 2 TU and (ii) int(g0)(x) vT :D1 t : : : t :Dm follows from the fact that L(g) has the same r-labelled x-constraints as L(g0).

Backjumping. Our tableau decision procedure for ALC, upon discovering a clash, backtracks to the last branching point to which Rt is applicable. Blindly progressing in this way may be very inefficient if the given concept causes a lot of branching but the source of unsatisfiability does not depend on this branching (cf. [ 13 ]). Backjumping addresses this problem by identifying the causes of clashes and it is one of the most effective optimisation techniques, along with absorption + lazy unfolding, to deal with non-determinism. It works by associating a constraint to a branching point in the tableau such that the introduction of the constraint depends on that branching point. Because of this, tableau rules are equipped with dependency propagating information [ 13 ]. When a clash fx : A; x : :Ag is discovered, the dependencies of both of these constraints are combined, and the algorithm backjumps to the deepest branching point in the set. As a consequence, backjumping is an optimisation that modifies the shape of the tableau generated by the satisfiability checking algorithm by eliminating redundant branches from the tree. For this reason, no modification is required in our interpolant calculation rules. 6

Abduction

In our framework, finding the rewriting of a query in terms of the DBox predicates is possible only if the query is implicitly definable. However, not always a given query concept is implicitly definable given a TBox. To gain definability of queries of interest, it is necessary to modify the ontology by adding axioms to the TBox such that they become implicitly definable. We propose to use techniques from the area of abduction to find such axioms.

[ 14,15 ] define abductive reasoning tasks in DLs. Among these, we consider TBox abduction problems.

Definition 5 (TBox Abduction Problem). Let T be a TBox, and let C and D be concepts such that both are satisfiable w.r.t. T and C 6vT D. A TBox abduction problem (TAP) is denoted by hT ; C; Di. A TBox TA is a solution to a TAP hT ; C; Di if T [ TA is satisfiable and C vT [TA D.

We look for solutions with particular syntactic shapes.

Definition 6 (Sub-concept Solution). Let P = hC; D; T i be a TAP. We say that a solution TA to P is a sub-concept solution to P iff TA is a set of axioms of the form E1 v :E2, where concepts E1; E2 are sub-formulae of C, D, or T .

Since the concepts in sub-concept solutions appear in the abduction problem, we conjecture that such amendment to the TBox are more easily understandable than amendments involving arbitrary concepts. Moreover, among sub-concept solutions we identify semantically minimal ones – i.e., the ones that minimally change the TBox in order to obtain definability of a query. With subSol(P) we denote the set of all sub-concept solutions to a TAP P and with M(T ) the set of all models of the TBox T .

Definition 7. Let P = hC; D; T i be a TAP, let be a preference order, and let TA; TB 2 subSol(P). Then TA TB iff M(T ) n M(TA) is a subset of M(T ) n M(TB). A sub-concept solution TA is minimal iff there is no other sub-concept solution TB such that TA TB.

The idea behind this minimality criteria is that it favours a solution TA over TB whenever TA has more common models with the original TBox T than TB. In a way, TA, when added to T , changes the models of T in a minimal way. Note that is a partial order since (M(T ) n M(TA)) (M(T ) n M(TB)) does not always hold for arbitrary two sub-concept solutions TA and TB 7

Conclusions

In this paper we considered the basic propositionally closed DL ALC. However, several languages currently in use, e.g., OWL, are based on more expressive DLs, with constructs such as inverse roles and qualified cardinality restrictions. For example, in the presence of inverse roles, standard DL tableau algorithms may propagate concepts back and forth between individuals. Therefore, proving interpolation constructively by induction over the tableau is problematic. But one can overcome this problem by adapting a calculus with analytic cut rules [ 16 ] for inverse roles. Techniques like algebraic reasoning seem promising in the case of qualified cardinality restrictions [ 17 ].

Another interesting research direction is to increase the expressiveness of the query language beyond concept expressions towards, e.g., conjunctive queries. Deciding implicit definability reduces to checking query containment under Description Logic constraints [ 18 ]. However, standard Description Logic tableau techniques are not easily extended in this direction. One would need to develop new interpolation calculation methods or use techniques for first-order logic (e.g., [ 9 ]).

1. Baader , F. , Calvanese , D. , McGuinness , D.L. , Nardi , D. , Patel-Schneider , P.F., eds.: The Description Logic Handbook: Theory, Implementation, and Applications . In Baader, F., Calvanese , D. , McGuinness , D.L. , Nardi , D. , Patel-Schneider , P.F., eds.: Description Logic Handbook, Cambridge University Press ( 2003 )

2. Etzioni , O. , Golden , K. , Weld , D.S. : Sound and efficient closed-world reasoning for planning . Artificial Intelligence 89 ( 1-2 ) ( 1997 ) 113 - 148

3. Abiteboul , S. , Duschka , O.M.: Complexity of answering queries using materialized views . In: Proc. PODS . ( 1998 ) 254 - 263

4. Segoufin , L. , Vianu , V. : Views and queries: determinacy and rewriting . In: Proc. PODS . ( 2005 ) 49 - 60

5. Nash , A. , Segoufin , L. , Vianu , V. : Determinacy and rewriting of conjunctive queries using views: A progress report . In: Proc. ICDT . ( 2007 ) 59 - 73

6. Marx , M. : Queries determined by views: pack your views . In: Proc. PODS . ( 2007 ) 23 - 30

7. Tarski , A. : Some methodological investigations on the definability of concepts . In: Logic, Semantics and Metamathematics: Papers from 1923 to 1938 . Clarendon Press, Oxford, UK ( 1956 ) 296 - 319

8. Beth , E.W. : On Padoa's methods in the theory of definitions . Koninklijke Nederlandse Akademie van Wetenschappen, Proceedings 56 ( 1953 ) 330 - 339 also Indagationes mathematicae, vol. 15 .

9. Craig , W.: Three uses of the herbrand-gentzen theorem in relating model theory and proof theory . The Journal of Symbolic Logic 22 ( 3 ) ( 1957 ) 269 - 285

10. Fitting , M. : Proof Methods for Modal and Intuitionistic Logics . Volume 169 of

Synthese

Library . D. Reidel, Dordrecht, The Netherlands ( 1983 )

11. Rautenberg , W.: Modal tableau calculi and interpolation . Journal of Philosophical Logic 12 ( 4 ) ( 1983 ) 403 - 423

12. Schlobach , S. : Explaining subsumption by optimal interpolation . In Alferes, J.J. , Leite , J.A ., eds. : JELIA . Volume 3229 of Lecture Notes in Computer Science., Springer ( 2004 ) 413 - 425

13. Horrocks , I. : Implementation and optimization techniques . [ 1 ] 306 - 346

14. Colucci , S. , Noia , T.D., Sciascio , E.D. , Donini , F.M. , Mongiello , M.: A uniform tableaux-based approach to concept abduction and contraction in aln . In: Proc. of the 16th International Workshop on Description Logics (DL'04) . ( 2004 ) 104

15. Elsenbroich , C. , Kutz , O. , Sattler , U. : A case for abductive reasoning over ontologies . In: OWL: Experiences and Directions . ( 2006 ) 10 - 11

16. Goré , R. , Nguyen , L.A. : Exptime tableaux with global caching for description logics with transitive roles, inverse roles and role hierarchies . In: TABLEAUX '07: Proceedings of the 16th international conference on Automated Reasoning with Analytic Tableaux and Related Methods , Berlin, Heidelberg, Springer-Verlag ( 2007 ) 133 - 148

17. Haarslev , V. , Timmann , M. , Möller , R.: Combining tableaux and algebraic methods for reasoning with qualified number restrictions . In Goble, C.A. , McGuinness , D.L. , Möller , R. , Patel-Schneider , P.F., eds.: Description Logics. Volume 49 of CEUR Workshop Proceedings., CEUR-WS.org ( 2001 )

18. Calvanese , D. , Giacomo , G.D. , Lenzerini , M. : Conjunctive query containment and answering under description logic constraints . ACM Transactions on Computational Logic 9 ( 3 ) ( 2008 )