CF Dnc: A PTIME Description Logic with Functional Constraints and Disjointness David Toman and Grant Weddell Cheriton School of Computer Science University of Waterloo, Canada {david,gweddell}@cs.uwaterloo.ca Abstract. We consider CFDnc, an alternative to the description logic CFD that retains the latter’s ability to support PTIME reasoning in the presence of termi- nological cycles with universal restrictions over functional roles and also in the presence of functional constraints over functional role paths. In contrast, CFDnc replaces the ability to have conjunction on left-hand-sides of inclusion dependen- cies with the ability to have primitive negation on right-hand-sides. This makes it possible to say that primitive concepts must denote disjoint sets of individuals, a common requirement with many information sources. 1 Introduction Scalability issues in reasoning over the semantic web have led the W3C to adopt two description logic (DL) fragments of OWL 2 that are designed to ensure PTIME com- plexity in the size of respective knowledge bases for a number of important reasoning problems. Called profiles, the DLs are EL++ [2] and DL-Lite [1, 6, 7]. Medical ontolo- gies were an important motivation for the former, whereas the latter was heavily influ- enced by a need to access information residing in data sources conforming to relational schema, particularly in cases where the schema has been derived via ER modelling. Toman and Weddell proposed an alternative to DL-Lite called CF D that was de- signed to provide better support for data sources based on relational schema that in- clude more extensive collections of dependencies such as primary and foreign keys [22]. The paper showed that the problem of deciding concept subsumption in CF D had PTIME complexity, and therefore might qualify as a useful additional option for an OWL 2 profile. However, there are two problems with CF D that make it less attractive in this role: (1) unlike DL-Lite, it is not possible to say that two primitive concepts must denote disjoint sets of individuals or entities, a common requirement with many information sources, and (2) computing the certain answers to conjunctive queries is PSPACE-complete, even for queries of the form ∃x.A(x), for A a primitive concept. In this paper we introduce CF Dnc, an alternative to CF D that retains the latter’s key abilities: supporting terminological cycles with universal restrictions over func- tional roles, and supporting a rich variety of functional constraints over functional role paths. In particular, CF Dnc replaces the ability in CF D to have conjunction on left- hand-sides of inclusion dependencies with a new ability to have primitive negation on right-hand-sides (the same is also true for the original version of DL-Lite). This re- moves both problems with CF D. In particular, we show that the following fundamental reasoning problems are in PTIME. S YNTAX S EMANTICS : “(·)I ” C ::= A AI ⊆ 4 | ¬A 4 \ AI | C1 u C2 CI1 ∩ CI2 | ∀ Pf .C {x : Pf I (x) ∈ CI } Vk | A : Pf 1 , . . . , Pf k → Pf {x : ∀ y ∈ AI . i=1 Pf Ii (x) = Pf Ii (y) ⇒ Pf I (x) = Pf I (y)} Fig. 1. CFDnc C ONCEPTS . 1. Knowledge base consistency: determining if at least one model exists for a given knowledge base; 2. Logical implication: determining if a given inclusion dependency is logically en- tailed by the terminological component of a given knowledge base; and 3. Instance checking: determining if a given concept assertion is entailed by a given knowledge base. We also show that the problem of computing certain answers for arbitrary conjunctive queries over a CF Dnc knowledge base K is in PTIME in the size of K and is PSPACE- complete for combined complexity, that is, when the size of a query is included. Reasoning in DL-Lite, EL, and their variants often relies on the existence of poly- nomially-sized canonical models (or canonical structures that closely resemble such models) to address the above reasoning tasks [14, 16]. It is worth noting that CF Dnc does not share this property: an equivalent of a canonical model for a CF Dnc knowl- edge base is necessarily exponential in the size of the knowledge base. We begin in the next section by introducing the syntax and semantics of CF Dnc and talk about some of its key features and limitations. The problems above are the focus of Section 4 in which we appeal to an automata-based method for their resolution. This method is introduced in Section 3 where we consider the simpler problem of concept satisfiability. Computing certain answers for conjunctive queries is considered in Sec- tion 5, and a review of related work and summary comments then follow in Sections 6 and 7, respectively. 2 The Description Logic CF Dnc A formal definition of CF Dnc knowledge bases and the above reasoning problems now follows. Observe that the logic is based on attributes or features instead of the more common case of roles which can denote arbitrary binary relations. However, this is not really a issue. Indeed, CF Dnc is ideal for expressing reification for predicates of arbitrary arity [19]. Definition 1 (CF Dnc Knowledge Bases) Let F, PC and IN be disjoint sets of (names of) attributes, primitive concepts and individuals, respectively. A path function Pf is a word in F∗ with the usual convention that the empty word is denoted by id and concate- nation by “.”. Concept descriptions are defined by the grammar on the left-hand-side of Figure 1 in which occurrences of “A” denote primitive concepts. A concept produced by the “A : Pf 1 , . . . , Pf k → Pf” production of this grammar is called a path functional dependency (PFD). In addition, any occurrence of a PFD must adhere to one of the fol- lowing two forms: 1. A : Pf 1 , . . . , Pf . Pf i , . . . , Pf k → Pf or (1) 2. A : Pf 1 , . . . , Pf . Pf i , . . . , Pf k → Pf .f Metadata and data in a CF Dnc knowledge base K are respectively defined by a TBox TK consisting of a finite set of inclusion dependencies of the form A v C, and by an ABox AK consisting of a finite set of concept assertions of the form A(a) and path function assertions of the form Pf 1 (a) = Pf 2 (b), where A is a primitive concept, C an arbitrary concept, {Pf 1 , Pf 2 } ⊆ F∗ and where {a, b} ⊆ IN. Semantics is defined in the standard way with respect to a structure (4, (·)I ), where 4 is a domain of “objects” and (·)I an interpretation function that fixes the interpretation of primitive concepts A to be subsets of 4, attributes f to be total functions on 4, and individuals a to be elements of 4. The interpretation is extended to path expressions by interpreting id , the empty word, as the identity function λx.x, concatenation as function composition, and to derived concept descriptions C as defined on the right-hand-side for the remaining concept constructors. An interpretation satisfies an inclusion dependency A v C if AI ⊆ CI , a concept assertion A(a) if aI ∈ AI and a path function assertion Pf 1 (a) = Pf 2 (b) if Pf I1 (aI ) = Pf I2 (bI ). An interpretation satisfies a knowledge base K if it satisfies each inclusion dependency and assertion in K. 1 There are several reasoning problems for CF Dnc that shall concern us. Logical impli- cation asks if T |= A v C holds; that is, if A v C is satisfied by any interpretation satisfying T . Knowledge base consistency asks if there exists at least one interpretation for a give knowledge base K, and instance checking asks if K |= A(a) holds; that is, if A(a) is satisfied by any interpretation that satisfies K. 2 (aside on notation) We write FK and PCK to denote all attributes and primitive concepts occurring in K, respectively, and write: (1) ⊥ as shorthand for A u ¬A, (2) a = b as shorthand for id (a) = id (b), and (3) f (a) = b and shorthand for f (a) = id (b). Also we elide any mention of subscripts in our notation when their presence is clear from context. (end of aside) The conditions imposed on PFDs in (1) distinguish, for example, PFDs of the form C : f → id and C : f → g from PFDs of the form C : f → g.h. This is necessary to ensure PTIME complexity for reasoning problems in CF Dnc [13] and does not impact the modelling utility of CF Dnc for formatted legacy data sources. It remains possible, for example, to capture arbitrary keys or functional dependencies in a relational schema. Observe that only atomic concepts can appear on the left-hand-side or as part of a PFD. Indeed, relaxing this assumption in some cases will lead to a loss of PTIME 1 Note that CFDnc does not assume the unique name assumption, but that its ability to express disjointness enables mutual inequality between each pair of n individuals to be captured by introducing O(n) new atomic concepts, concepts assertions and inclusion dependencies in a straightforward way. complexity for at least one of the reasoning problems for CF Dnc, and remains an open issue for others. (See related work and our conclusions below.) 3 TBox and Concept Satisfiability It is easy to see that every CF Dnc TBox T is consistent (by setting all primitive con- cepts to be interpreted as the empty set). However, for other reasoning tasks such as concept satisfiability and knowledge base consistency, it is convenient to assume by default, and without loss of generality, that CF Dnc knowledge bases are given in a normal form. Lemma 2 (TBox and ABox Normal Forms) For every CF Dnc TBox T , there ex- ists an equivalent TBox T 0 that adheres to the following (more limited) grammar for CF Dnc concept descriptions. C ::= A | ¬A | ∀f.A | A : Pf 1 , . . . , Pf k → Pf Also, for every ABox A, there exists an equivalent ABox A0 containing only assertions of the form f (a) = b and a = b. 2 Obtaining T 0 and A0 from an arbitrary knowledge base K is achieved by a straightfor- ward introduction of auxiliary names for intermediate concept descriptions and individ- uals (e.g., see defn. of simple concepts in [21]). Definition 3 (A Transition Relation for T ) Let T be a CF Dnc TBox in normal form. We define a transition relation δ(T ) over the set of states S = PC ∪ {¬A | A ∈ PC} and the alphabet F as follows:  A1 → A2 ∈ δ(T ) if A1 v A2 ∈ T  A1 → ¬A2 ∈ δ(T ) if A1 v ¬A2 ∈ T f A1 → A2 ∈ δ(T ) if A1 v ∀f.A2 ∈ T where  is the empty letter transition. 2 The transition relation will allow us to construct non-deterministic finite automata (NFA) that can be used for various reasoning problems that relate to a CF Dnc TBox T . Note that we also follow common practice in automata theory and use  for the empty letter in transition relations.2 Lemma 4 Let M = (S, {A}, {B}, δ(T )) be an NFA with the set of states S, start state A, final state B, and transition relation δ(T ). Then T |= A v ∀ Pf .B whenever Pf ∈ L(M ). Proof (sketch) For Pf ∈ L(M ) there must be a run 1 l2 l k l A = A0 → A1 → A2 · · · Ak−1 → Ak = B 2 Another option would have been to use id for this purpose, but we thought, on balance, that this would hinder readability. in M where li ∈ F ∪ {} and such that Pf = l1 .l2 . · · · .lk . It follows from the definition li of δ(T ) that Ai−1 → Ai exists if Ai−1 v Ai , for li = , or Ai−1 v ∀li .Ai , for li ∈ F (and hence these dependencies are trivially implied by T ). The claim then follows by simple transitive reasoning, all necessary cases derive from the fact that {B1 v ∀ Pf .B2 , B2 v ∀ Pf 0 .B3 } |= B1 v ∀ Pf . Pf 0 .B3 , and by induction on the length of the run. 2 Note that the converse implication in this lemma may not hold, such as when A is inconsistent with respect to T . The problem of concept satisfiability asks, for a given concept C and TBox T , if there exists an interpretation I for T in which CI is non-empty. Such problems can be reduced to the case where C is a primitive concept A by simply augmenting T with {A v C}, where A is a fresh primitive concept. Given a primitive concept A and TBox T , one can test for primitive concept satisfi- ability by using the following NFA, denoted nfaaB (T , {A(a)}):  (S ∪ {a}, {a}, {B}, δ(T ) ∪ {a → A}), with states given by primitive concepts, their negations, and a distinguished node a,  with start state a, with final state B ∈ S, and with transition relation δ(T ) ∪ {a → A}. Theorem 5 (Concept Satisfiability) A is satisfiable with respect to the TBox T if and only if L(nfaaB (T , {A(a)})) ∩ L(nfaa¬B (T , {A(a)})) = ∅ for every B ∈ PC. Proof (sketch) For a primitive concept B ∈ PC, a word Pf in the intersection language of the two automata above is a witness of the fact that Pf I (aI ) ∈ BI and Pf I (aI ) ∈ ¬BI must hold in every model of T , for reasons analogous to the proof of Lemma 4, which leads to a contradiction since Pf is a (total) function. Conversely, if no such word exists then one can construct a deterministic finite automa- ton from nfaaB (T , {A(a)}), using the standard subset construction, in which no state containing both B and ¬B is reachable from the start state {a}. Unfolding the transition relation of this automaton, starting from the state {a}, labelling nodes by the concepts associated with the automaton’s states, and adding missing features to complete trees in which no primitive concept is true for any node, yields a tree interpretation that satisfies T (in particular in which all PFD constraints are satisfied vacuously) and whose root a provides a witness for consistency of A. 2 Since all the automata operations run in PTIME we immediately get the following re- sult. Corollary 6 Concept satisfiability with respect to CF Dnc TBoxes is in PTIME. Note it is not possible to precompute all inconsistent classes for an arbitrary C since that would require consideration of all possible types over PC (i.e., finite subsets of primitive concepts), a process essentially equivalent to constructing the deterministic automaton used in the proof of Theorem 5, and in turn make the procedure exponential. 4 ABox Reasoning The automata-based approach to concept satisfiability can be extended to the more gen- eral problem of knowledge base consistency. Intuitively, each ABox individual a must be linked to the TBox automaton in a fashion similar to how the “prototypical object” a was linked in Section 3. This idea leads to the following definition: Definition 7 (A Transition Relation for A) Let A be a CF Dnc ABox in normal form. We create a transition relation δ(A) for an nfa over the set of states S = PC ∪ {a | a in A} and the alphabet F as follows:  a → a ∈ δ(A) if a appears in A,  a → A ∈ δ(A) if A(a) ∈ A, f a → b ∈ δ(A) if f (a) = b ∈ A and   a → b, b → a ∈ δ(A) if a = b ∈ A. where  is the empty letter transition. 2 Observe that we have used  transitions to simulate equality assertions in A. This is justified, e.g., by considering the ABox individuals to be nominals. Pf (aside on notation) Hereon, we write “n ; m in δ” if Pf ∈ L(nfa(S, {m}, {n}, δ)), where S will be some set of states (that will be clear from the context), where m and n will be two states in S, and where δ will denote a NFA transition relation over S (that will also be clear from context). (end of aside) Unfortunately, taking δ(T ) ∪ δ(A) alone as the transition relation of an NFA and then testing for consistency of every ABox individual (as in Theorem 5) is not sufficient as the following cases illustrate. The problems raised by each case will be addressed by defining rules that impose conditions on a transition relation. To begin, we need to ensure that ABox assertions f (a) = b are functional: Example 8 (Path Function Assertions) Consider the ABox A = {f (a) = b, f (a) = c}. Clearly bI must equal cI in any model I of a knowledge base that includes A. 2 To remedy this, we define a functionality rule for the transition relation δ(T , A) as follows: f f   if a ; b and a ; c in δ(T , A) then {b → c, c → b} ⊆ δ(T , A). Next, we need to ensure that ABox assertions of the form f (a) = b are coherent with TBox assertions A v ∀f.B with respect to concept memberships of a and b: Example 9 (ABox and Value Restrictions) Consider the TBox T = {A v ∀f.B} and an ABox A = {f (a) = b, A(a)}. Clearly, in any model I of the knowledge base (T , A), bI must be an element of BI . However, B cannot be reached from b in δ(T ) ∪ δ(A), and therefore an automaton based on this transition relation alone cannot reflect the correct concept membership of b. 2 We define a coherence rule for the transition relation δ(T , A) to remedy this as follows: f  f  if a ; b, a ; A, and A ; B in δ(T , A) then b → B ∈ δ(T , A). And finally, consider that tree interpretations, such as the one we used to show con- cept consistency in Theorem 5, vacuously satisfy all PFDs in T , but that this is not necessarily the case for a given ABox A. Example 10 (ABox and PFDs) Consider A = {A(a), B(b), f (a) = c, f (b) = c}. – A TBox T = {A v B : f → id } implies that the individuals a and b must denote the same domain element. – A TBox T = {A v B : f → g} implies that there must be an additional (anony- mous) individual d such that g(a) = d and g(b) = d. Note that the PFD A v B : f.g → id is also violated by the pair of individuals a and b, this despite the fact that neither of these two individuals is the origin of an explicit f.g path in A: since features are interpreted as total functions, individual c must have an “outgoing” g feature, and therefore a and b must agree on f.g. 2 A remedy for these cases is obtained by defining a PFD closure rule for the transition relation δ(T , A) for each PFD A v B : Pf 1 , . . . , Pf k → Pf ∈ T . The rule will refer to the following auxiliary functions. match(a, b, Pf, δ(T , A)): Returns true if there is a (possibly empty) prefix Pf 0 of Pf Pf 0 Pf 0 such that a ; c and b ; c in δ(T , A) for some individual c; it returns false other- wise. expf(a, Pf, δ(T , A)): Returns the minimal set of transitions (by creating new individ- Pf uals) such that a ; c in δ(T , A) holds for some c.   mkeq(a, b, Pf, δ(T , A)): Returns {c → d, d → c} where, for some individuals c and d, Pf Pf we have a ; c and b ; d in δ(T , A). The PFD closure rule is then defined as follows:   if {a ; A, b ; B} ⊆ δ(T , A) and match(a, b, Pf i , δ(T , A)), for 0 < i ≤ k, and not match(a, b, Pf, δ(T , A)) then expf(a, Pf, δ(T , A)) ⊆ δ(T , A), expf(b, Pf, δ(T , A)) ⊆ δ(T , A), and mkeq(a, b, Pf, δ(T , A)) ⊆ δ(T , A) The rules enable one to define a transition relation for an NFA that captures reasoning in the knowledge base (T , A) as follows. Definition 11 (Transition Relation δ(T , A)) Let δ(T , A) be the smallest transition relation containing δ(T ) and δ(A) that is closed under the functionality, coherence, and the PFD closure rules. 2 Note that δ(T , A) is constructed by applying the closure rules to δ(T ) ∪ δ(A). Since this process is monotonic, it is sound to check for the preconditions of the rules in the partially completed δ(T )∪δ(A). We use δ(T , A) as the transition function for the NFA nfaaB (T , A) with the start state {a} and final state B (similarly to Section 3). Theorem 12 (Knowledge Base Consistency) A knowledge base (T , A) is consistent if and only if L(nfaaB (T , A)) ∩ L(nfaa¬B (T , A)) is empty for all primitive concepts B ∈ PC and all ABox individuals a in A. Proof (sketch) Assume Pf ∈ L(nfaaB (T , A)) ∩ L(nfaa¬B (T , A)) for some path func- tion Pf, individual a and primitive concept B, and that I |= (T , A). Composing all the assertions corresponding to the transitions in δ(T , A) along the runs corresponding to Pf in the two automata, however, implies that Pf I (aI ) ∈ BI and Pf I (aI ) ∈ ¬BI (similarly to Lemma 4); a contradiction as interpretations of path functions are func- tional. For the other direction we define an interpretation I as follows: let dae be an represen-   tative of the equivalence class {a | a ; b, b ; a in δ(T , A)} and let PF(a) denote f {f. Pf | a ; b not in δ(T , A)} for any individual b}. Then set – 4I = a in A {dae. id } ∪ {dae. Pf | Pf ∈ PF(a)}; S – aI = dae. id ; Pf – AI = {dae. Pf | a ; A in δ(T , A)}; and f – f I = {(dae. id , dbe. id ) | a ; b in δ(T , A)} ∪ {(dae. Pf, dbe. Pf .f ) | dae. Pf, dae. Pf .f ∈ 4I }. It is immediate that I |= A since δ(A) ⊆ δ(T , A) and we corrected for all violations of PFDs. By inspecting inclusion dependencies in T it is also easy to see that I |= T . 2 Note that the core of this construction is again the subset construction for NFA deter- minization (cf. Theorem 5) where the TBox-ABox interactions are facilitated by the closure rules. What remains is to show that knowledge base consistency can be checked in PTIME. Lemma 13 |δ(T , A)| is polynomial in |T | + |A|. Proof (sketch) The number of individuals in δ(T , A) is bounded by |A| + 2|T ||A|2 since the PFD closure rule can add at most two new individuals per pair of individuals in A and PFD in T . Thus, since the number of states is polynomial in |T | + |A|, the number of transitions in δ(T , A) is also at most polynomial in |T | + |A|. 2 Taken together with the argument we made for concept consistency with respect to a TBox yields PTIME algorithm for KB consistency. Since we do not assume the unique name assumption, the problem is also PTIME-hard (we have Horn-SAT embedded in reasoning with the PFDs alone). Corollary 14 Knowledge base consistency for CF Dnc is PTIME-complete. 2 4.1 Logical Implication Now we consider the questions of logical implication of the form (T , A) |= C(a), (T , A) |= Pf 1 (a) = Pf 2 (b), and ultimately T |= A v C. Since C can be a complex concept and CF Dnc is not closed under negation, logical implication must be resolved by asking several separate questions by exhaustively applying the following simplifica- tion rules: Simp(C) → {C} Simp(∀ Pf .C1 u C2 ) → Simp(∀ Pf .C1 ) ∪ Simp(∀ Pf .C2 ) Simp(∀ Pf .∀ Pf 0 .C1 ) → Simp(∀ Pf . Pf 0 .C1 ) where C is one of the irreducible concepts of the forms ∀ Pf .A, ∀ Pf .¬A, and ∀ Pf .A : Pf 1 , . . . , Pf k → Pf 0 . We call the irreducible concepts obtained by these rules the sim- plifications of the given concept. Lemma 15 (T , A) |= C(a) (T |= A v C) if and only if (T , A) |= D(a) (T |= A v D, respectively) for all D ∈ Simp(C). Proof (sketch) By observing that the each step of simplifications preserves logical implication. 2 The simplified logical implication questions can now be reduced in a natural way to CF Dnc knowledge base satisfiability as follows: Theorem 16 (Instance Checking) 1. (T , A) |= ∀ Pf .A(a) iff (T , A ∪ {∀ Pf .¬A(a)}) is not satisfiable. 2. (T , A) |= ∀ Pf .¬A(a) iff (T , A ∪ {∀ Pf .A(a)}) is not satisfiable. 3. (T , A) |= (∀ Pf .A : Pf 1 , . . . , Pf k → Pf 0 )(a) iff [ (T , A ∪ {Pf(a) = b, A(c), D(Pf 0 (b)), ¬D(Pf 0 (c)} ∪ {Pf i (b) = Pf i (c)}) 0