=Paper= {{Paper |id=None |storemode=property |title=CFDnc: A PTIME Description Logic with Functional Constraints and Disjointness |pdfUrl=https://ceur-ws.org/Vol-1014/paper_65.pdf |volume=Vol-1014 |dblpUrl=https://dblp.org/rec/conf/dlog/TomanW13 }} ==CFDnc: A PTIME Description Logic with Functional Constraints and Disjointness== https://ceur-ws.org/Vol-1014/paper_65.pdf
            CF Dnc: A PTIME Description Logic
         with Functional Constraints and Disjointness

                             David Toman and Grant Weddell

                           Cheriton School of Computer Science
                             University of Waterloo, Canada
                        {david,gweddell}@cs.uwaterloo.ca


       Abstract. We consider CFDnc, an alternative to the description logic CFD that
       retains the latter’s ability to support PTIME reasoning in the presence of termi-
       nological cycles with universal restrictions over functional roles and also in the
       presence of functional constraints over functional role paths. In contrast, CFDnc
       replaces the ability to have conjunction on left-hand-sides of inclusion dependen-
       cies with the ability to have primitive negation on right-hand-sides. This makes
       it possible to say that primitive concepts must denote disjoint sets of individuals,
       a common requirement with many information sources.


1   Introduction
Scalability issues in reasoning over the semantic web have led the W3C to adopt two
description logic (DL) fragments of OWL 2 that are designed to ensure PTIME com-
plexity in the size of respective knowledge bases for a number of important reasoning
problems. Called profiles, the DLs are EL++ [2] and DL-Lite [1, 6, 7]. Medical ontolo-
gies were an important motivation for the former, whereas the latter was heavily influ-
enced by a need to access information residing in data sources conforming to relational
schema, particularly in cases where the schema has been derived via ER modelling.
    Toman and Weddell proposed an alternative to DL-Lite called CF D that was de-
signed to provide better support for data sources based on relational schema that in-
clude more extensive collections of dependencies such as primary and foreign keys
[22]. The paper showed that the problem of deciding concept subsumption in CF D had
PTIME complexity, and therefore might qualify as a useful additional option for an
OWL 2 profile. However, there are two problems with CF D that make it less attractive
in this role: (1) unlike DL-Lite, it is not possible to say that two primitive concepts
must denote disjoint sets of individuals or entities, a common requirement with many
information sources, and (2) computing the certain answers to conjunctive queries is
PSPACE-complete, even for queries of the form ∃x.A(x), for A a primitive concept.
    In this paper we introduce CF Dnc, an alternative to CF D that retains the latter’s
key abilities: supporting terminological cycles with universal restrictions over func-
tional roles, and supporting a rich variety of functional constraints over functional role
paths. In particular, CF Dnc replaces the ability in CF D to have conjunction on left-
hand-sides of inclusion dependencies with a new ability to have primitive negation on
right-hand-sides (the same is also true for the original version of DL-Lite). This re-
moves both problems with CF D. In particular, we show that the following fundamental
reasoning problems are in PTIME.
S YNTAX                          S EMANTICS : “(·)I ”

C ::= A                          AI ⊆ 4
    | ¬A                         4 \ AI
    | C1 u C2                    CI1 ∩ CI2
    | ∀ Pf .C                    {x : Pf I (x) ∈ CI }
                                                     Vk
    | A : Pf 1 , . . . , Pf k → Pf {x : ∀ y ∈ AI .    i=1
                                                            Pf Ii (x) = Pf Ii (y) ⇒ Pf I (x) = Pf I (y)}


                                    Fig. 1. CFDnc C ONCEPTS .


 1. Knowledge base consistency: determining if at least one model exists for a given
    knowledge base;
 2. Logical implication: determining if a given inclusion dependency is logically en-
    tailed by the terminological component of a given knowledge base; and
 3. Instance checking: determining if a given concept assertion is entailed by a given
    knowledge base.
We also show that the problem of computing certain answers for arbitrary conjunctive
queries over a CF Dnc knowledge base K is in PTIME in the size of K and is PSPACE-
complete for combined complexity, that is, when the size of a query is included.
    Reasoning in DL-Lite, EL, and their variants often relies on the existence of poly-
nomially-sized canonical models (or canonical structures that closely resemble such
models) to address the above reasoning tasks [14, 16]. It is worth noting that CF Dnc
does not share this property: an equivalent of a canonical model for a CF Dnc knowl-
edge base is necessarily exponential in the size of the knowledge base.
    We begin in the next section by introducing the syntax and semantics of CF Dnc and
talk about some of its key features and limitations. The problems above are the focus
of Section 4 in which we appeal to an automata-based method for their resolution. This
method is introduced in Section 3 where we consider the simpler problem of concept
satisfiability. Computing certain answers for conjunctive queries is considered in Sec-
tion 5, and a review of related work and summary comments then follow in Sections 6
and 7, respectively.


2    The Description Logic CF Dnc
A formal definition of CF Dnc knowledge bases and the above reasoning problems
now follows. Observe that the logic is based on attributes or features instead of the
more common case of roles which can denote arbitrary binary relations. However, this
is not really a issue. Indeed, CF Dnc is ideal for expressing reification for predicates of
arbitrary arity [19].

Definition 1 (CF Dnc Knowledge Bases) Let F, PC and IN be disjoint sets of (names
of) attributes, primitive concepts and individuals, respectively. A path function Pf is a
word in F∗ with the usual convention that the empty word is denoted by id and concate-
nation by “.”. Concept descriptions are defined by the grammar on the left-hand-side of
Figure 1 in which occurrences of “A” denote primitive concepts. A concept produced
by the “A : Pf 1 , . . . , Pf k → Pf” production of this grammar is called a path functional
dependency (PFD). In addition, any occurrence of a PFD must adhere to one of the fol-
lowing two forms:

                         1. A : Pf 1 , . . . , Pf . Pf i , . . . , Pf k → Pf or
                                                                                            (1)
                         2. A : Pf 1 , . . . , Pf . Pf i , . . . , Pf k → Pf .f
Metadata and data in a CF Dnc knowledge base K are respectively defined by a TBox
TK consisting of a finite set of inclusion dependencies of the form A v C, and by an
ABox AK consisting of a finite set of concept assertions of the form A(a) and path
function assertions of the form Pf 1 (a) = Pf 2 (b), where A is a primitive concept, C an
arbitrary concept, {Pf 1 , Pf 2 } ⊆ F∗ and where {a, b} ⊆ IN.
Semantics is defined in the standard way with respect to a structure (4, (·)I ), where 4
is a domain of “objects” and (·)I an interpretation function that fixes the interpretation
of primitive concepts A to be subsets of 4, attributes f to be total functions on 4, and
individuals a to be elements of 4. The interpretation is extended to path expressions by
interpreting id , the empty word, as the identity function λx.x, concatenation as function
composition, and to derived concept descriptions C as defined on the right-hand-side
for the remaining concept constructors.
An interpretation satisfies an inclusion dependency A v C if AI ⊆ CI , a concept
assertion A(a) if aI ∈ AI and a path function assertion Pf 1 (a) = Pf 2 (b) if Pf I1 (aI ) =
Pf I2 (bI ). An interpretation satisfies a knowledge base K if it satisfies each inclusion
dependency and assertion in K. 1
There are several reasoning problems for CF Dnc that shall concern us. Logical impli-
cation asks if T |= A v C holds; that is, if A v C is satisfied by any interpretation
satisfying T . Knowledge base consistency asks if there exists at least one interpretation
for a give knowledge base K, and instance checking asks if K |= A(a) holds; that is, if
A(a) is satisfied by any interpretation that satisfies K.                                  2

(aside on notation) We write FK and PCK to denote all attributes and primitive concepts
occurring in K, respectively, and write: (1) ⊥ as shorthand for A u ¬A, (2) a = b as
shorthand for id (a) = id (b), and (3) f (a) = b and shorthand for f (a) = id (b). Also
we elide any mention of subscripts in our notation when their presence is clear from
context.                                                                 (end of aside)
The conditions imposed on PFDs in (1) distinguish, for example, PFDs of the form
C : f → id and C : f → g from PFDs of the form C : f → g.h. This is necessary to
ensure PTIME complexity for reasoning problems in CF Dnc [13] and does not impact
the modelling utility of CF Dnc for formatted legacy data sources. It remains possible,
for example, to capture arbitrary keys or functional dependencies in a relational schema.
    Observe that only atomic concepts can appear on the left-hand-side or as part of
a PFD. Indeed, relaxing this assumption in some cases will lead to a loss of PTIME
1
     Note that CFDnc does not assume the unique name assumption, but that its ability to express
    disjointness enables mutual inequality between each pair of n individuals to be captured by
    introducing O(n) new atomic concepts, concepts assertions and inclusion dependencies in a
    straightforward way.
complexity for at least one of the reasoning problems for CF Dnc, and remains an open
issue for others. (See related work and our conclusions below.)


3     TBox and Concept Satisfiability

It is easy to see that every CF Dnc TBox T is consistent (by setting all primitive con-
cepts to be interpreted as the empty set). However, for other reasoning tasks such as
concept satisfiability and knowledge base consistency, it is convenient to assume by
default, and without loss of generality, that CF Dnc knowledge bases are given in a
normal form.

Lemma 2 (TBox and ABox Normal Forms) For every CF Dnc TBox T , there ex-
ists an equivalent TBox T 0 that adheres to the following (more limited) grammar for
CF Dnc concept descriptions.
                    C ::= A | ¬A | ∀f.A | A : Pf 1 , . . . , Pf k → Pf
Also, for every ABox A, there exists an equivalent ABox A0 containing only assertions
of the form f (a) = b and a = b.                                                   2

Obtaining T 0 and A0 from an arbitrary knowledge base K is achieved by a straightfor-
ward introduction of auxiliary names for intermediate concept descriptions and individ-
uals (e.g., see defn. of simple concepts in [21]).

Definition 3 (A Transition Relation for T ) Let T be a CF Dnc TBox in normal form.
We define a transition relation δ(T ) over the set of states S = PC ∪ {¬A | A ∈ PC}
and the alphabet F as follows:
                                 
                            A1 → A2 ∈ δ(T ) if A1 v A2 ∈ T
                              
                           A1 → ¬A2 ∈ δ(T ) if A1 v ¬A2 ∈ T
                                 f
                          A1 → A2 ∈ δ(T ) if A1 v ∀f.A2 ∈ T
where  is the empty letter transition.                                                      2

The transition relation will allow us to construct non-deterministic finite automata (NFA)
that can be used for various reasoning problems that relate to a CF Dnc TBox T . Note
that we also follow common practice in automata theory and use  for the empty letter
in transition relations.2

Lemma 4 Let M = (S, {A}, {B}, δ(T )) be an NFA with the set of states S, start
state A, final state B, and transition relation δ(T ). Then T |= A v ∀ Pf .B whenever
Pf ∈ L(M ).
Proof (sketch) For Pf ∈ L(M ) there must be a run
                                 1   l2   l           k    l
                          A = A0 → A1 → A2 · · · Ak−1 → Ak = B
2
     Another option would have been to use id for this purpose, but we thought, on balance, that
    this would hinder readability.
in M where li ∈ F ∪ {} and such that Pf = l1 .l2 . · · · .lk . It follows from the definition
                    li
of δ(T ) that Ai−1 →   Ai exists if Ai−1 v Ai , for li = , or Ai−1 v ∀li .Ai , for li ∈ F
(and hence these dependencies are trivially implied by T ). The claim then follows by
simple transitive reasoning, all necessary cases derive from the fact that
               {B1 v ∀ Pf .B2 , B2 v ∀ Pf 0 .B3 } |= B1 v ∀ Pf . Pf 0 .B3 ,
and by induction on the length of the run.                                              2
Note that the converse implication in this lemma may not hold, such as when A is
inconsistent with respect to T .
    The problem of concept satisfiability asks, for a given concept C and TBox T , if
there exists an interpretation I for T in which CI is non-empty. Such problems can be
reduced to the case where C is a primitive concept A by simply augmenting T with
{A v C}, where A is a fresh primitive concept.
    Given a primitive concept A and TBox T , one can test for primitive concept satisfi-
ability by using the following NFA, denoted nfaaB (T , {A(a)}):
                                                            
                          (S ∪ {a}, {a}, {B}, δ(T ) ∪ {a → A}),
with states given by primitive concepts, their negations, and a distinguished node a,
                                                                                    
with start state a, with final state B ∈ S, and with transition relation δ(T ) ∪ {a → A}.

Theorem 5 (Concept Satisfiability) A is satisfiable with respect to the TBox T if and
only if
              L(nfaaB (T , {A(a)})) ∩ L(nfaa¬B (T , {A(a)})) = ∅
for every B ∈ PC.
Proof (sketch) For a primitive concept B ∈ PC, a word Pf in the intersection language
of the two automata above is a witness of the fact that Pf I (aI ) ∈ BI and Pf I (aI ) ∈
¬BI must hold in every model of T , for reasons analogous to the proof of Lemma 4,
which leads to a contradiction since Pf is a (total) function.
Conversely, if no such word exists then one can construct a deterministic finite automa-
ton from nfaaB (T , {A(a)}), using the standard subset construction, in which no state
containing both B and ¬B is reachable from the start state {a}. Unfolding the transition
relation of this automaton, starting from the state {a}, labelling nodes by the concepts
associated with the automaton’s states, and adding missing features to complete trees in
which no primitive concept is true for any node, yields a tree interpretation that satisfies
T (in particular in which all PFD constraints are satisfied vacuously) and whose root a
provides a witness for consistency of A.                                                  2
Since all the automata operations run in PTIME we immediately get the following re-
sult.

Corollary 6 Concept satisfiability with respect to CF Dnc TBoxes is in PTIME.
Note it is not possible to precompute all inconsistent classes for an arbitrary C since
that would require consideration of all possible types over PC (i.e., finite subsets of
primitive concepts), a process essentially equivalent to constructing the deterministic
automaton used in the proof of Theorem 5, and in turn make the procedure exponential.
4   ABox Reasoning
The automata-based approach to concept satisfiability can be extended to the more gen-
eral problem of knowledge base consistency. Intuitively, each ABox individual a must
be linked to the TBox automaton in a fashion similar to how the “prototypical object”
a was linked in Section 3. This idea leads to the following definition:

Definition 7 (A Transition Relation for A) Let A be a CF Dnc ABox in normal form.
We create a transition relation δ(A) for an nfa over the set of states S = PC ∪ {a |
a in A} and the alphabet F as follows:
                                
                             a → a ∈ δ(A) if a appears in A,
                               
                             a → A ∈ δ(A) if A(a) ∈ A,
                                f
                               a → b ∈ δ(A) if f (a) = b ∈ A and
                                
                      a → b, b → a ∈ δ(A) if a = b ∈ A.
where  is the empty letter transition.                                                 2
Observe that we have used  transitions to simulate equality assertions in A. This is
justified, e.g., by considering the ABox individuals to be nominals.
                                           Pf
(aside on notation) Hereon, we write “n ; m in δ” if Pf ∈ L(nfa(S, {m}, {n}, δ)),
where S will be some set of states (that will be clear from the context), where m and n
will be two states in S, and where δ will denote a NFA transition relation over S (that
will also be clear from context).                                          (end of aside)
Unfortunately, taking δ(T ) ∪ δ(A) alone as the transition relation of an NFA and then
testing for consistency of every ABox individual (as in Theorem 5) is not sufficient as
the following cases illustrate. The problems raised by each case will be addressed by
defining rules that impose conditions on a transition relation.
    To begin, we need to ensure that ABox assertions f (a) = b are functional:

Example 8 (Path Function Assertions) Consider the ABox A = {f (a) = b, f (a) =
c}. Clearly bI must equal cI in any model I of a knowledge base that includes A. 2
To remedy this, we define a functionality rule for the transition relation δ(T , A) as
follows:
         f          f                               
    if a ; b and a ; c in δ(T , A) then {b → c, c → b} ⊆ δ(T , A).
Next, we need to ensure that ABox assertions of the form f (a) = b are coherent with
TBox assertions A v ∀f.B with respect to concept memberships of a and b:

Example 9 (ABox and Value Restrictions) Consider the TBox T = {A v ∀f.B}
and an ABox A = {f (a) = b, A(a)}. Clearly, in any model I of the knowledge
base (T , A), bI must be an element of BI . However, B cannot be reached from b in
δ(T ) ∪ δ(A), and therefore an automaton based on this transition relation alone cannot
reflect the correct concept membership of b.                                         2
We define a coherence rule for the transition relation δ(T , A) to remedy this as follows:
         f                             f                
    if a ; b, a ; A, and A ; B in δ(T , A) then b → B ∈ δ(T , A).
And finally, consider that tree interpretations, such as the one we used to show con-
cept consistency in Theorem 5, vacuously satisfy all PFDs in T , but that this is not
necessarily the case for a given ABox A.

Example 10 (ABox and PFDs) Consider A = {A(a), B(b), f (a) = c, f (b) = c}.

 – A TBox T = {A v B : f → id } implies that the individuals a and b must denote
   the same domain element.
 – A TBox T = {A v B : f → g} implies that there must be an additional (anony-
   mous) individual d such that g(a) = d and g(b) = d.

Note that the PFD A v B : f.g → id is also violated by the pair of individuals a and
b, this despite the fact that neither of these two individuals is the origin of an explicit
f.g path in A: since features are interpreted as total functions, individual c must have
an “outgoing” g feature, and therefore a and b must agree on f.g.                        2
A remedy for these cases is obtained by defining a PFD closure rule for the transition
relation δ(T , A) for each PFD A v B : Pf 1 , . . . , Pf k → Pf ∈ T . The rule will refer to
the following auxiliary functions.
match(a, b, Pf, δ(T , A)): Returns true if there is a (possibly empty) prefix Pf 0 of Pf
                 Pf 0            Pf 0
    such that a ; c and b ; c in δ(T , A) for some individual c; it returns false other-
    wise.
expf(a, Pf, δ(T , A)): Returns the minimal set of transitions (by creating new individ-
                            Pf
   uals) such that a ; c in δ(T , A) holds for some c.
                                           
mkeq(a, b, Pf, δ(T , A)): Returns {c → d, d → c} where, for some individuals c and d,
                 Pf              Pf
    we have a ; c and b ; d in δ(T , A).
The PFD closure rule is then defined as follows:
                       
    if {a ; A, b ; B} ⊆ δ(T , A) and
       match(a, b, Pf i , δ(T , A)), for 0 < i ≤ k, and not match(a, b, Pf, δ(T , A))
    then expf(a, Pf, δ(T , A)) ⊆ δ(T , A), expf(b, Pf, δ(T , A)) ⊆ δ(T , A), and
         mkeq(a, b, Pf, δ(T , A)) ⊆ δ(T , A)
The rules enable one to define a transition relation for an NFA that captures reasoning
in the knowledge base (T , A) as follows.

Definition 11 (Transition Relation δ(T , A)) Let δ(T , A) be the smallest transition
relation containing δ(T ) and δ(A) that is closed under the functionality, coherence,
and the PFD closure rules.                                                         2
Note that δ(T , A) is constructed by applying the closure rules to δ(T ) ∪ δ(A). Since
this process is monotonic, it is sound to check for the preconditions of the rules in the
partially completed δ(T )∪δ(A). We use δ(T , A) as the transition function for the NFA
nfaaB (T , A) with the start state {a} and final state B (similarly to Section 3).
Theorem 12 (Knowledge Base Consistency) A knowledge base (T , A) is consistent
if and only if
                     L(nfaaB (T , A)) ∩ L(nfaa¬B (T , A))

is empty for all primitive concepts B ∈ PC and all ABox individuals a in A.
Proof (sketch) Assume Pf ∈ L(nfaaB (T , A)) ∩ L(nfaa¬B (T , A)) for some path func-
tion Pf, individual a and primitive concept B, and that I |= (T , A). Composing all
the assertions corresponding to the transitions in δ(T , A) along the runs corresponding
to Pf in the two automata, however, implies that Pf I (aI ) ∈ BI and Pf I (aI ) ∈ ¬BI
(similarly to Lemma 4); a contradiction as interpretations of path functions are func-
tional.
For the other direction we define an interpretation I as follows: let dae be an represen-
                                             
tative of the equivalence class {a | a ; b, b ; a in δ(T , A)} and let PF(a) denote
                             f
                {f. Pf | a ; b not in δ(T , A)} for any individual b}.
Then set

 – 4I = a in A {dae. id } ∪ {dae. Pf | Pf ∈ PF(a)};
         S

 – aI = dae. id ;
                        Pf
 – AI = {dae. Pf | a ; A in δ(T , A)}; and
                                  f
 – f I = {(dae. id , dbe. id ) | a ; b in δ(T , A)} ∪
                                      {(dae. Pf, dbe. Pf .f ) | dae. Pf, dae. Pf .f ∈ 4I }.

It is immediate that I |= A since δ(A) ⊆ δ(T , A) and we corrected for all violations
of PFDs. By inspecting inclusion dependencies in T it is also easy to see that I |= T .
2

Note that the core of this construction is again the subset construction for NFA deter-
minization (cf. Theorem 5) where the TBox-ABox interactions are facilitated by the
closure rules. What remains is to show that knowledge base consistency can be checked
in PTIME.

Lemma 13 |δ(T , A)| is polynomial in |T | + |A|.
Proof (sketch) The number of individuals in δ(T , A) is bounded by |A| + 2|T ||A|2
since the PFD closure rule can add at most two new individuals per pair of individuals
in A and PFD in T . Thus, since the number of states is polynomial in |T | + |A|, the
number of transitions in δ(T , A) is also at most polynomial in |T | + |A|.         2

Taken together with the argument we made for concept consistency with respect to a
TBox yields PTIME algorithm for KB consistency. Since we do not assume the unique
name assumption, the problem is also PTIME-hard (we have Horn-SAT embedded in
reasoning with the PFDs alone).

Corollary 14 Knowledge base consistency for CF Dnc is PTIME-complete.                    2
4.1   Logical Implication
Now we consider the questions of logical implication of the form (T , A) |= C(a),
(T , A) |= Pf 1 (a) = Pf 2 (b), and ultimately T |= A v C. Since C can be a complex
concept and CF Dnc is not closed under negation, logical implication must be resolved
by asking several separate questions by exhaustively applying the following simplifica-
tion rules:
                                 Simp(C) → {C}
                    Simp(∀ Pf .C1 u C2 ) → Simp(∀ Pf .C1 ) ∪ Simp(∀ Pf .C2 )
                    Simp(∀ Pf .∀ Pf 0 .C1 ) → Simp(∀ Pf . Pf 0 .C1 )
where C is one of the irreducible concepts of the forms ∀ Pf .A, ∀ Pf .¬A, and ∀ Pf .A :
Pf 1 , . . . , Pf k → Pf 0 . We call the irreducible concepts obtained by these rules the sim-
plifications of the given concept.

Lemma 15 (T , A) |= C(a) (T |= A v C) if and only if (T , A) |= D(a) (T |= A v
D, respectively) for all D ∈ Simp(C).
Proof (sketch)    By observing that the each step of simplifications preserves logical
implication.                                                                        2
The simplified logical implication questions can now be reduced in a natural way to
CF Dnc knowledge base satisfiability as follows:

Theorem 16 (Instance Checking)
 1. (T , A) |= ∀ Pf .A(a) iff (T , A ∪ {∀ Pf .¬A(a)}) is not satisfiable.
 2. (T , A) |= ∀ Pf .¬A(a) iff (T , A ∪ {∀ Pf .A(a)}) is not satisfiable.
 3. (T , A) |= (∀ Pf .A : Pf 1 , . . . , Pf k → Pf 0 )(a) iff
                                                              [
     (T , A ∪ {Pf(a) = b, A(c), D(Pf 0 (b)), ¬D(Pf 0 (c)} ∪         {Pf i (b) = Pf i (c)})
                                                                 0