-

Concept Pro jection in Algebras for Computing Certain Answer Descriptions

Je rey Pound

David Toman

Grant Weddell

gweddellg@uwaterloo.ca 0

Jiewen Wu

0 0 Cheriton School of Computer Science, University of Waterloo , Canada

We introduce a projection operator over concepts in a given description logic that produces least subsumers of concepts in a language fragment that satis es additional syntactic requirements speci ed as a parameter of the operator. The operator complements existing operators for concept selection and concept ordering that have been proposed in earlier work [3, 4]. We show how, altogether, the operators can form the basis of an algebra for computing more informative and user friendly varieties of certain answers over what we call a description base. This is a description logic (DL) knowledge base in which the ABox is replaced by a xed and nite collection of arbitrary concepts. In the sections that follow, we introduce the notion of a description base, of a certain answer description over a description base and then present an algebra for computing a simple variety of certain answer descriptions that incorporates the new projection operator. We elaborate on the semantics of this operator, in particular the notion of a projection description, in the remaining sections of the paper where we consider the operator's semantics as well as methods for e cient evaluation. In the nal section, we outline a motivating example and suggest useful ways in which both projection descriptions and our query algebra might be extended.

Introduction 1.1

Certain Answer Descriptions and Description Bases Assume K (= (T ; A)) denotes a knowledge base in which concepts conform to a given DL, and consider the purpose served by K's ABox A when it comes to computing certain answers over K to a conjunctive query of the form Q(x1; :::; xk)

QueryBody; where QueryBody in turn has the form 9xk+1; :::; 9xm : ^ C(xi) ^ ^ R(xi; xj ) in which the predicate names correspond to concepts and roles in the DL. Most importantly, the ABox supplies a xed and nite collection of constant symbols fa1; :::; ang which in turn de nes a space of nk possible answers to Q: k-tuples i of the form \f(x1; ai1 ); ::::; (xk; aik )g" in which query variables xj are paired with constant symbols aij . Viewed as a substitution, recall that i is a certain answer to Q exactly when K j= QueryBody i. Remember, however, that constant symbols and certain answers are still syntactic artifacts and that the former are not actual domain elements in an interpretation.

Now assume the DL supports nominals, fag. Answers i can then have the alternative form f(x1; fai1 g); :::; (xk; faik g)g ( 1 ) in which query variables xj are now paired with nominal concepts faij g. In this case, ( 1 ) will be a certain answer exactly when

K j= 8x1; :::; 8xk : (fai1 g(x1) ^ ::: ^ faik g(xk) ! QueryBody): ( 2 ) Among other things, nominals allow one to add inclusion dependencies faig v C and faig v 9R:fajg to T that correspond to ABox assertions C(ai) and R(ai; aj) in A, respectively. An ABox can then be replaced by a xed and nite collection of nominals ffa1g; :::; fangg since its only remaining purpose is to list the nominal concepts that can occur in ( 1 ) above, and therefore in the antecedent of the implication in ( 2 ) above.

From a user's perspective, this characterization of answers generalizes in an appealing way when one allows the collection of nominals that replaces an ABox to consist instead of a xed and nite collection of arbitrary concepts fC1; :::; Cng. In this way, the concepts appearing in ( 1 ) can provide more informative descriptions to the user that necessarily describe xi-components of answer k-tuples. In such circumstances, and for the remainder of the paper, we refer to this list of arbitrary concepts as a DBox or description box, to a knowledge base K that replaces an ABox with a DBox as a description base, and to certain answers that pair query variables with arbitrary concepts as certain answer descriptions.

Note that, for a description base K to qualify as consistent, it is sensible to add the condition that there exists a model for which the interpretation of each concept in K's DBox is nonempty in addition to the standard requirement that the model satis es each inclusion dependency in K's TBox. This ensures that concepts describing an xi-component of an answer k-tuple are never vacuous, indeed, that each concept in a DBox is there because it describes some entity of interest to a user. Such a condition can always be captured in a TBox by adding inclusion dependencies of the form \> v 9R:Ci", where R is a fresh role name. 1.2

Towards an Algebra for Description Bases Returning to the standard notion of K as a knowledge base, consider the subclass of queries having the form \Q(x) C(x)". In this simplest case, answers may elide any mention of the single query variable x, and correspond more simply to the constant symbols ai that must satisfy a QueryBody with conditions that can be rolled up in a single concept C. For a description base, when K = (T ; fC1; :::; Cng), such queries return unary certain answer descriptions, that is, each concept Ci for which T j= Ci v C.

The basis of an algebra for computing unary certain answer descriptions has been proposed in earlier work [3, 4]. The algebra consists of three operators given by the following grammar for a query Q.

Q ::=

K :: Od j C (Q) j SortOd(Q)

Since the focus of this earlier work was on performance and scalability, the rst priority was to introduce operators that could express such things as an index scan or a sort, operations that are fundamental to issues of e ciency in database query evaluation. In particular, this entailed the development of a theory of ordering descriptions that could be used to characterize strict partial orders over the concepts generated by a given DL (the \Od" part of the rst and third operators are ordering descriptions) together with the notion of a binary search tree index of concepts called a description index. Altogether, this enabled a formal consideration of sorting and comparison based search over a description base DBox.

The rst operator has two purposes: it de nes a description index of the concepts that occur in the DBox of a given description base, and it computes a list of concepts that corresponds to an inorder traversal of the index. The second operator lters concepts occurring in a concept list by retaining only those subsumed by a given parameter concept. The third operator sorts a concept list according to a partial order de ned by a given ordering description.

An outstanding issue with this algebra is that it is necessary to supply, a priori, all concepts that might be of interest to any user in the DBox of a description base. Since users will want concepts that re ect di erent external views and interests, the DBox will almost certainly require multiple concepts for the same underlying entity e together with a means to distinguish among the possible concepts for e. In this paper, we propose to address this problem by adding a new projection operator to the grammar for the above algebra.

Q ::=

K :: Od j C (Q) j SortOd(Q) j P d(Q) ( 3 ) The operator can be used to rewrite concepts produced by a subquery Q to subsuming concepts that satisfy an additional syntactic requirement given by the P d part of the new operator. We call the syntactic requirement a projection description. The new operator replaces each concept in the concept list produced by its subquery by a least subsuming concept that satis es the projection description.

As in earlier work on ordering descriptions, our main objective is to develop a theory of projection descriptions that will work for a variety of DLs that satisfy basic requirements. The rst is that any qualifying DL has E L as a subdialect. One important reason is that E L concepts are a close approximation to nested relations and to XML and are therefore an ideal starting point for user friendly descriptions. Another reason is that we can show that P d computes least subsumers of argument concepts with respect to an E L fragment satisfying P d. We also assume that any qualifying DL will have at least one concrete domain that supports constant terms and an underlying total order. These constant terms can then be used in descriptions that correspond to the notion of a tuple in the relational model. As a consequence, our projection operator then has relational tuple projection as a special case. In addition, the operator naturally accommodates nested relations and their projections as well.

We formally de ne projection descriptions in the next section, and then proceed to consider how projection operations can be e ciently implemented by appealing only to concept subsumption in a given DL, e.g., by using classi cations of concepts that occur in a given terminology. 2

Projection Descriptions To introduce projection descriptions, we use the DL ALC(D) in which the concrete domain D supports equality and comparison operations over an underlying domain consisting of rationals and strings.1 However, to reiterate, any DL that contains E L(D) would qualify.

De nition 1 (Description Logic ALC(D)). Let fA; A1; : : :g, fR; R1; : : :g, ff; g; f1; : : :g, and fk; k1; : : :g denote sets of primitive concepts, roles, concrete features, and constants respectively. A concept is de ned by the grammar: C; D ::= > j ? j A j :C j C u D j 9R:C j f = k (equality over 4D) j f < g (linear order over 4D) An inclusion dependency has the form C v D. A terminology or TBox T is a nite set of inclusion dependencies.

An interpretation I is a 2-tuple (4; I ) in which 4 is a domain that contains 4D, the set of rationals and nite strings, as a proper subset. We write 4A to denote the abstract domain 4 n 4D. Also, I is an interpretation function that maps each concrete feature f to a total function (f )I : 4A ! 4D, each role R to a relation (R)I (4A 4A), each primitive concept A to a set (A)I 4A, the \=" symbol to the equality relation on 4D, the \<" symbol to the binary relation for the linear order on 4D, and each constant k to a constant in 4D. The interpretation function is extended to arbitrary concepts in the standard way.

An interpretation I satis es an inclusion dependency C v D if (C)I (D)I , and T j= C v D if (C)I (D)I for any interpretation I that satis es all inclusion dependencies in T .

We use standard abbreviations such as C t D for :(:C u :D) and f k for (f = k) t ((f < g) u (g = k)). Also, given a nite set S of ALC(D) concepts, we write u S to denote > if S is empty and the concept D1 u u Dn otherwise, when S = fD1; :::; Dng.

With the presumption of ALC(D), the syntax and semantics of our new projection operation are given by the following. 1 See [1] for an excellent introduction to description logics.

De nition 2 (Projection Description). Let L be a DL that includes ALC(D), possibly without negation, and f , R and C any concrete feature, role and concept in L, respectively. A projection description P d is de ned by the grammar: P d ::= C? j C" j f j P d1 u P d2 j P d1 [ P d2 j 9R:P d ( 4 ) Now let (P d) be de ned as follows: (P d) = 8 < ;

fRg : (P d1) [ if P d = \C?"; \C"" or \f "; if P d = \9R:P d1"; and (P d2) if P d = \P d1 u P d2" or \P d1 [ P d2".

Then P d is well formed if, for any subexpression P d1 u P d2 or P d1 [ P d2, (P d1) \ (P d2) = ;.

The condition that P d is well-formed ensures that requests for information concerning a given role are never distributed in di erent parts of an answer (sub) tuple. This is a necessary condition to help combat combinatorial e ects that would otherwise make projection involving roles infeasible.

De nition 3 (Induced Concepts). Let L be a DL that includes ALC(D), possibly without negation, and P d a projection description over roles in L. We de ne the sets LjP d and LjTPUdP, the L concepts generated by P d and L tuple concepts generated by P d, respectively, as follows: LjP d = fu S j S

n LjTPUdPg, and LjTPUdP = >8 fCg if P d = \C?"; >>> A if P d = \C""; ><> ff = k j k 2 4Dg if P d = \f "; > fC1 u C2 j C1 2 LjTPUdP1 ^ C2 2 LjTPUdP2 g if P d = \P d1 u P d2"; >>>> LjTPUdP1 [ LjTPUdP2 if P d = \P d1 [ P d2"; and >: fu S j S n f9R:C j C 2 LjP d1 gg if P d = \9R:P d1", where A is the set of all primitive concepts in the alphabet of L. Note that, while in general the language LjP d is in nite, for every xed and nite terminology T and concept C, the language LjP d restricted to the symbols used in T and C is necessarily nite.

De nition 4 (Projection Semantics). The semantics of the new projection operator is given as follows: query P d(Q) replaces each concept C in the concept list computed by Q by a least subsumer of C in LjP d.

The remainder of this section begins to consider how least subsumers in LjP d can be e ectively computed when P d is well-formed. To start, we introduce the notion of a role path that helps to simplify manipulating concepts of the form 9R1: ::: :9Rn:C.

De nition 5 (Role Path). Let L be a DL that includes ALC(D), possibly without negation, and R be any role in L. A role path Rp is de ned by the grammar:

Id j Rp:R The expression 9Rp:C denotes the concept C when Rp = \ Id ", and the concept 9Rp1:9R:C otherwise, when Rp = \Rp1:R".

De nition 6 (Projection Function). Let L be a DL that includes ALC(D), possibly without negation, and T , C, Rp and P d a respective TBox, concept, role path and projection description in L. The functional projection of C for P d with respect to T and Rp, written (P d)T ;Rp(C), is a nite set S of L concepts satisfying the following conditions.

Case P d = \C1?": S = fC1g if T j= C v 9Rp:C1, and ; otherwise. Case P d = \C1"":

S = (A1? [

[ An?)T ;Rp(C); where fA1; :::; Ang are all primitive concepts A occurring in T and C such that T 6j= 9Rp:A v 9Rp:C1; S = ; for n = 0.

Case P d = \f ":

S = ((f = k1)? [

[ (f = kn)?)T ;Rp(C); where fk1; :::; kng are all constants occurring in T and C; S = ; for n = 0. Case P d = \P d1 u P d2":

S = fD1uD2 j (8i 2 f1; 2g : Di 2 (P di)T ;Rp(C)) ^ T j= C v 9Rp:(D1uD2)g: Case P d = \P d1 [ P d2": S = (P d1)T ;Rp(C) [ (P d2)T ;Rp(C): Case P d = \9R:P d1":

S = bf9R:(u S0) j S0

(P d1)T ;Rp:R(C) ^ T j= C v 9Rp:9R:(u S0)gcT ;Rp: In the nal case, we write bScT ;Rp to denote the reduction of S with respect to T and Rp, that is, the set of all D1 2 S such that there is no D2 2 S for which T j= 9Rp:D2 v 9Rp:D1 and T 6j= 9Rp:D1 v 9Rp:D2.

Our main result now follows, in which functional projection is related to least subsumers.

Theorem 1. Let L be a DL that includes ALC(D), possibly without negation, and T , C and P d a respective terminology, concept and well-formed projection description in L. Then ub(P d)T ;Id (C)cT ;Id is a least subsumer of C in LjP d. Proof (outline). By induction on the structure of P d.

Note that the restriction to LjP d is essential to guarantee that the least subsumer always exists (otherwise, least subsumers may not exist [2]). 3

Computing Projection Descriptions A top-level procedure for computing (P d)T ;Rp(C) is given by a de nition of Project in Figure 1. Clearly, a straightforward coding for the procedures invoked by Project to handle each of the six conditions follows directly from De nition 6. However, it is evident that this simple coding strategy for many of Project(P d; T ; Rp; C) switch on P d case \C1?": return ProjectConcept(C1; T ; Rp; C) case \C1"": return ProjectAtomicConcepts(C1; T ; Rp; C) case \f ": return ProjectFeature(f; T ; Rp; C) case \P d1 u P d2": return ProjectJoin(P d1; P d2; T ; Rp; C) case \P d1 [ P d2": return Project(P d1; T ; Rp; C) [ Project(P d2; T ; Rp; C) case \9R:P d1": return ProjectRole(P d1; R; T ; Rp; C) these procedures will lead to an unacceptably large number of subsumption tests. In particular, the resulting ProjectAtomicConcepts procedure for the \C"" case will require a number of such tests proportional to the number of primitive concepts in the terminology, or much worse: exponential in the number of primitive concepts, e.g., when called indirectly by a navely coded ProjectRole procedure. This happens, for example, with a projection description of the form \9R:(?")".

A navely coded ProjectJoin procedure for the \P d1 u P d2" case is also problematic since it requires a number of subsumption tests equal to the product of the number of descriptions computed by P d1 and P d2. This circumstance quickly becomes intolerable for iterated uses of this procedure such as for projection descriptions of the form \(f1 u u fn)" that compute descriptions of relational tuples, cases that are likely to occur in practice.

While these performance issues are unavoidable in the worst case, it is possible to improve the performance of projection computation for many situations. Starting with procedure ProjectJoin, we now outline algorithms for the procedures invoked by Project in Figure 1 that will have much better performance in situations that are more likely to occur in practice.

Procedure ProjectJoin

Our algorithm for this procedure follows a \nested loops" strategy in which concepts computed by an outer loop may be used to further qualify concepts computed by an inner loop. A simple way to accomplish this is to replace De nition 5 above with a more general notion of a role path that enables sideways communication of outer loop concepts.

De nition 7 (General Role Path). Let L be a DL that includes ALC(D), possibly without negation, and R and C be any role or concept in L, respectively. A general role path Rp is de ned by the grammar: The expression 9Rp:C denotes the concept C when Rp = \ Id ", the concept 9Rp1:9R:C when Rp = \Rp1:R", and the concept 9Rp1:(C1 u C) otherwise, when Rp = \Rp1:C1".

Algorithm 1: ProjectJoin(P d1; P d2; T ; Rp; C) result ; foreach C1 2 Project(P d1; T ; Rp; C) do foreach C2 2 Project(P d2; T ; Rp:C1; C) do

result result [ f(C1 u C2)g return result There are additional opportunities for improving the performance of Algorithm 1. For one, the procedure might explore alternative permutations of projection descriptions with the form \(P d1 u u P dn)" that might result in a more e cient nesting order. For another, the procedure may opt to only append C1 to Rp in the inner loop if the overall cost of subsumption checks is expected to decrease.

Procedure ProjectAtomicConcepts

The expected time for this procedure can be improved considerably by employing a classi cation of the primitive concepts in a given terminology. In particular, one conducts a depth- rst-search of the classi cation, taking advantage of pruning opportunities when parent atomic concepts in the classi cation are disquali ed. An implementation of this procedure given by Algorithm 2 below assumes the following de nition.

De nition 8 (TBox Classi cation). Let T be a TBox in ALC(D). A classi cation of T is a directed acyclic graph G (= (N; E)) with nodes N corresponding to the primitive concepts in T and with E consisting of all edges (A1; A2) such that: { T j= A1 v A2, { T 6j= A2 v A1, and { there is no distinct A3 2 N such that T j= A1 v A3 and T j= A3 v A2. We write NG (resp. EG) to denote the nodes (resp. edges) of G, where the terminology is assumed by context.

Again, there are opportunities for improving the performance of Algorithm 2. For example, it would help to ensure the performance gains obtained by pruning during depth- rst-search if one adds new internal primitive concepts to the classi cation if there are a large number of root nodes, or to split cases in which a large number of edges would otherwise lead to an existing node.

Procedure ProjectRole

The nave implementation of this procedure involves quantifying over all subsets of concepts computed by the evaluation of the nested projection description. However, for a given terminology T , role path Rp and concept C, consider the following:

Algorithm 2: ProjectAtomicConcepts(C1; T ; Rp; C) stack ; result ; // Initialize stack with the roots of the classification of T foreach node A 2 NG with no outgoing edges do

stack.Push(A) // Perform a depth first search of the classification while stack:NotEmpty() do

A stack.Pop() if T j= C v 9Rp:A and T 6j= 9Rp:A v 9Rp:C1 then result result [ fAg foreach (A0; A) 2 EG do

stack.Push(A0) foreach A occurring in C such that A 62 NG and T j= C v 9Rp:A do result result [ fAg return result 1. It is less likely that T j= C v 9Rp:(u S), for concept sets S with larger numbers of elements; and 2. For any pair of concept sets S1 and S2, if T 6j= C v 9Rp:(u S1), then

T 6j= C v 9Rp:(u (S1 [ S2)).

These observations motivate the implementation of ProjectRole given by Algorithm 3. The algorithm uses a queue to conduct a breadth- rst-search of subsets in such a way that ensures a \frontier" of disquali ed sets will prune containing candidate sets. Note that, to avoid considering more than one permutation of a candidate, the procedure assumes a total lexicographic order \ " over all concepts.

The algorithm works particularly well in cases where the argument projection description parameter P d has the form \f u P d " such as in the case of projection descriptions that compute descriptions of sets of relational tuples. In this case, the number of subsumption checks performed directly by the algorithm is quadratic in the size of result computed by the recursive call to Project in the rst line.

Procedures ProjectConcept, ProjectFeature and Reduce

An implementation of the remaining procedures is straightforward. In the case of ProjectConcept, it clearly su ces to mirror the corresponding case speci cation in De nition 2. An e cient algorithm for ProjectFeature requires only a bit more thought: one proceeds by a progressive re nement of intervals over the concrete domain, narrowing by binary search on intervals to the required set of concepts of the form \f = k". Finally, an implementation of procedure Reduce is given by Algorithm 4.

Algorithm 3: ProjectRole(P d; R; T ; Rp; C)

S1 Project(P d; T ; Rp:R; C) // Check empty set cases if S1 = ; then if T j= C v 9(Rp:R):(>) then

return f9R:(>)g return ; result ; queue ; foreach C1 2 S1 do

queue:Enqueue(fC1g) while queue:NotEmpty() do

S2 queue:Dequeue() notgrown true foreach C1 2 S1 where for all C2 2 S2 : C2 if T j= C v 9(Rp:R)(C1 u (u S2)) then queue:Enqueue(fC1g [ S2) notgrown f alse if notgrown then

result result [ f9R:(uS2)g return Reduce(result; T ; Rp) C1 do 4

Discussion An application that serves as a guiding in uence on this work relates to a hypothetical description base system to which an internet browser submits queries in our algebra. The queries originate in the web pages for an enterprise, and are replaced on the y by the browser with the resulting lists of EL-like concepts (suitably mapped to HTML) that are returned by the system in response. For example, consider the case of an online supplier of photography equipment. As part of a web presence, the supplier maintains a description base K of concepts that describe items that are available for purchase as well as web pages with embedded queries over this description base. One query Q in some web page might have the form \ P d(SortP rice( C (K :: P rice)))" in which the selection condition C is the concept \P roductCode = \digicam" u P rice < 300" and where the projection description is given by (N ame u P rice u 9Supplier:(OnlineAddress u Rating)): (5) After rst rewriting Q as a projection of an index scan \ P d( C (K :: P rice))" (see below), the system returns the result of evaluating Q on the description base. Thus, when browsing the page containing Q, a user sees an ordered list of inexpensive digital camera information ordered by price, and with each item displaying the name and price of the camera together with a sublist of supplier URL addresses and ratings for that camera.

Algorithm 4: Reduce(S; T ; Rp) result ; foreach C1 2 S do minimal true foreach C2 2 S fC1g do if T j= 9Rp:C2 v 9Rp:C1 and T 6j= 9Rp:C1 v 9Rp:C2 then minimal f alse break if minimal then

result result [ fC1g return result

This example suggests a useful extension to projection descriptions that can be easily accommodated. In particular, adding the case \P d :: Od " as an additional option in ( 4 ) would enable a useful re nement to (5) above: (N ame u P rice u 9Supplier:((OnlineAddress u Rating) :: Rating)): Consequently, the user will see the sublist of supplier URL addresses and ratings for each camera in the order of the supplier rating.

Also, an algebraic approach to querying description bases lends itself nicely to problems in query rewriting (witness the rewriting on Q above). In particular, the following identities hold in our algebra:

C (SortOd(Q)) SortOd(SortOd0 (Q))

C1 ( C2 (Q))

SortOd( C (Q));

SortOd(Q) and

C1uC2 (Q): The identities yield a normal form \SortOd2 ( C (K :: Od1))" for queries. One of the objectives of earlier work [3, 4] was to determine when queries in normal form could be evaluated e ciently, in particular when they could be rewritten as an index scan \ C (K :: Od1)" in which pruning made possible by the ltering concept C during an inorder traversal of the description index K :: Od1 was su cient to guarantee logarithmic performance (and allowing that the choice of permutation for a concept list could be more limited than required by the original query). An underlying problem addressed in the earlier work is to determine the truth of a \ T ;C " predicate over the pair (Od1; Od2) of ordering descriptions, that is, to reason when, for terminology T , the partial order induced by Od1 contains the partial order induced by Od2 over concepts that are subsumed by C. This predicate can be usefully \overloaded" to apply to projection descriptions as well. In particular, a su cient condition for the additional algebraic identity P d2 ( P d1 ( C (Q)))

P d2 ( C (Q)) might be expressed as \P d1

T ;C P d2", another topic for future work.

Franz

Baader , Diego Calvanese, Deborah L. McGuinness , Daniele Nardi , and Peter F. Patel-Schneider . The Description Logic Handbook: Theory, Implementation, and Applications . Cambridge University Press, 2003 .

Franz

Baader , Baris Sertkaya, and Anni-Yasmin Turhan . Computing the least common subsumer w .r.t. a background terminology . J. Applied Logic , 5 ( 3 ): 392 { 420 , 2007 .

3. Je rey Pound, Lubomir Stanchev, David Toman,

and Grant

Weddell . On Ordering Descriptions in a Description Logic. 20th International Workshop on Description Logics , pages 123 { 134 , 2007 .

4. Je rey Pound, Lubomir Stanchev, David Toman,

and Grant

Weddell . On Ordering and Indexing Metadata for the Semantic Web . 21st International Workshop on Description Logics , 2008 .