A Matter of Principles: Towards the Largest DLP Possible ⋆

A Matter of Principles: Towards the Largest DLP Possible ⋆ MarkusKrötzsch Institut AIFB Universität Karlsruhe

SebastianRudolph Institut AIFB Universität Karlsruhe

A Matter of Principles: Towards the Largest DLP Possible ⋆ B3CE2261A01F617828FB93B9DF8A2D13 GROBID - A machine learning software for extracting information from scholarly documents

Description Logic Programs (DLP) have been described as a description logic (DL) that is in the "expressive intersection" of DL and datalog. This is a very weak guideline for defining DLP in a way that can be claimed to be optimal or maximal in any sense. Moreover, other DL fragments such as EL and Horn-SHIQ have also been "expressed" using datalog. Is DLP just one out of many equal DLs in this "expressive intersection"? This paper attempts to clarify these issues by characterising DLP with various design principles that clearly distinguish it from other approaches. A consequent application of the introduced principles leads to the definition of a significantly larger variant of DLP which we conjecture to be maximal in a concrete sense. A preliminary report on the proof of this maximality is provided. While DLP is used as a concrete (and remarkably complex) example in this paper, we argue that similar approaches can be applied to find canonical definitions for other fragments of logical languages.

Introduction

Description Logic Programs (DLP) were introduced as a family of fragments of description logic (DL) that can be expressed in first-order Horn-logic [1,2]. Since common reasoning tasks are still undecidable for first-order Horn-logic, its function-free fragment datalog is of particular interest, and the term "DLP" today is most commonly used to refer to tractable DLs that can be translated to equisatisfiable datalog. This statement is slightly more concrete than describing DLP as a subset of the "expressive intersection" of DL and datalog [1], but it is still insufficient to characterise DLP. In particular, it is well-known that other tractable DLs such as EL can also be translated to equisatisfiable datalog programs [3,4]. The union of DLP and EL is an intractable DL (for some discussion, see [3]), but one may still wonder whether DLP is merely one among several equivalent subsets of the "expressive intersection" of DL and datalog.

But tractability was not among the original design goals of DLP, and one might also weaken this principle to require merely a polytime transformation to datalog. Since reasoning in datalog is still ET complete, this would not preclude intractable logics. Could the union of DLP and EL then be considered as an extended version of DLP? Possibly yes, since it is contained in the DL Horn-SHIQ for which a satisfiabilitypreserving datalog transformation is known [5]. However, EL and DLP can be translated to datalog axiom-by-axiom, i.e. in a modular fashion, while the known datalog transformation for Horn-SHIQ needs to consider the whole knowledge base. But how can we be sure that there is no simpler transformation given that both data-complexity and combined complexity of datalog and Horn-SHIQ agree? The answer is given in Proposition 1 below.

In any case, it is obvious that the design principles for DLP -but also for EL and Horn-SHIQ -are not sufficiently well articulated to clarify the distinction between these formalisms. This paper thus gives explicit characterisation of DLP (Section 3), not in terms of concrete syntax but in terms of general design principles, which captures the specifics of the known DLP for datalog. An essential principle is structurality of the language: a formula should be in DLP based on its term structure, not based on concrete entity names that it uses. Moreover, we ask whether DLP could be defined as a larger, or even as the largest, DL language satisfying the design principles. A significantly larger variant of DLP is introduced in Section 4. This paper does not give a conclusive answer on whether or not this extended DLP is the largest possible, but we conjecture this to be true, and we sketch the proof currently under construction (Section 5). We conclude with an outlook on further application areas for the presented approach (Section 6). Proofs and other details omitted herein for lack of space can be found in [6].

Preliminaries

We use SROIQ free to denote the DL SROIQ without simplicity and regularity constraints. Knowledge bases are defined over finite sets of individual names I, concept names A, and role names R, and S = I, A, R then is a signature. For this paper, we assume that the universal role U is no special symbol (it can still be introduced by suitable axiomatisation), and we write ∃R.A and ∀R.A as 1 R.A and 0 R.¬A, respectively.

We write NNF(KB) for the negation normal form (NNF) of a knowledge base KB, defined as usual. By DNF(KB) we denote the disjunctive normal form, which is obtained by exhaustively replacing subconcepts of the form (C ⊔ D) ⊓ E with (C ⊓ E) ⊔ (D ⊓ E). Note that we do not distribute Boolean concept constructors over role restrictions, i.e. our DNF may still contain complex nested concepts.

SROIQ free knowledge bases KB (and their signatures) can be expressed as semantically equivalent theories of first-order logic with equality FOL = (with according signature). When using SROIQ in the context of FOL = , we will always assume some (arbitrary) such translation to be used. We consider datalog to be the Horn-fragment of FOL = without function symbols; see [6] for further details.

Let F be a FOL = formula, or a SROIQ free axiom or concept expression, and let S be a signature. An expression F ′ is a renaming of F in S if F ′ can be obtained from F by replacing each occurrence of a role/concept/individual name with some role/concept/individual name in S . Multiple occurrences of the same entity name in F need not be replaced by the same entity name of S in this process. A language L over a signature S is a subset of all SROIQ free concept expressions and axioms over S . Infinite signatures are not admissible when studying computational complexities, so we need notation to extend the signature of a DL language: A language scheme is a class L of languages such that (1) for every signature S , there is a language L(S ) over S in L, and, (2) for any two signatures S and S ′ , and every concept expression or axiom C from L(S ), we find that L(S ′ ) contains every concept expression or axiom C ′ obtained from C by (uniformly and one-to-one) replacing all occurrences of individual, concept and role names from S into such from S ′ . We say that L contains an axiom of concept expression C if there is a signature S such that C ∈ L(S ). The notation is extended to knowledge bases KB: we write KB ∈ L to express KB ∈ 2 L .

We need a notion of semantic correspondence between logical theories of DL and datalog. Semantic equivalence is too strong -it does not allow the use of auxiliary symbols for expressing a logical relationship -while equisatisfiability is too weakit allows complex semantic translations that are not tractable. The following is a more appropriate middle-ground: Definition 1. Let T and T ′ be two first-order logic theories and let S be the signature over which T is defined. We say that T ′ emulates T if for every FOL = formula ϕ over S , T ′ ∪ {ϕ} is satisfiable if and only if T ∪ {ϕ} is.

This notion can be generalised by constraining the set of "test formulae" ϕ. Emulation is loosely related to conservative extensions [7]: every conservative extension of T emulates T . However, for a conservative extension T ′ of T we have T ⊆ T ′ , while we want the logical (sub)languages of T and T ′ to be different.

Considerations for Defining DLP

In this section, we discuss why defining DLP is not straightforward, and we specify design principles to guide our subsequent definition. The goal is a notion of DLP that is characterised by these principles, as opposed to DLP being some ad hoc fragment of DL that just happens to be expressible in datalog. The first design principle fixes our choice of syntax and underlying DL: DLP 1 (DL Syntax) DLP knowledge bases should be SROIQ free knowledge bases.

The second principle states that the semantics of a DLP knowledge base can be expressed in datalog. Since we want to allow the datalog transformation to introduce auxiliary predicate symbols, we require emulation (Definition 1) instead of semantic equivalence:

DLP 2 (Semantic Correspondence)

There should be a transformation function datalog that maps a DLP knowledge base KB to a datalog program datalog(KB) such that datalog(KB) emulates KB. DLP 2 is a strong requirement with many useful consequences since it implies that the entailment of any FOL = formula over the signature of KB can be checked in datalog(KB). Note that this is a stronger requirement than preservation of assertional consequences.

The principles DLP 1 and DLP 2 do not yet provide sufficient details to attempt a definition of DLP. A naive approach could be to define a DL ontology to belong to DLP if it can be expressed by a semantically equivalent datalog program. Such a definition is of little practical use: every inconsistent ontology can trivially be expressed in datalog, so a DL reasoner is needed to decide whether or not a knowledge base should be considered to be in DLP. This is undesirable from a practical viewpoint. It is thus preferable that a definition can be checked without complex semantic computations: DLP 3 (Tractability) Containment of a knowledge base KB in a DLP language over some signature S should be decidable in polynomial time with respect to the size of KB and S . Syntactic language definitions are often subpolynomial, e.g. if they can be decided in logarithmic space, but polytime language definitions might still be acceptable. For example, simplicity constraints in SROIQ require polytime checks.

The downside of a syntactic approach is that semantically equivalent transformations on a knowledge base may change its status with respect to DLP. This is not a new problem -many DLs are not syntactically closed under semantic equivalence -but it imposes an additional burden on ontology engineers and implementers. To alleviate this problem, a reasonable further design principle is to require closure under at least some forms of equivalence preserving transformations, such as negation normal form and disjunctive normal form as defined earlier: DLP 4 (Closure Under NNF and DNF) A knowledge base KB should be in DLP iff its negation normal form NNF(KB) and its disjunctive normal form DNF(KB) are.

Closure under NNF will turn out to be mostly harmless, while closure under DNF imposes some real restrictions to our subsequent treatment. We still include it here since it allows us to generally present DL concepts as disjunctions, such that the relationship to datalog rules (disjunctions of literals) is more direct.

The above principles still allow DLP to be defined in such a way that some DLP knowledge base subsumes another knowledge base that is not in DLP. This behaviour is undesirable since it requires implementations and knowledge engineers to consider all axioms of a knowledge base to check if it is in DLP, which motivates the following principle: DLP 5 (Modularity) Consider two knowledge bases KB 1 and KB 2 . Then KB 1 ∪ KB 2 should be in DLP if and only if both KB 1 and KB 2 are. Moreover, in this case the datalog transformation should be datalog(KB

1 ∪ KB 2 ) = datalog(KB 1 ) ∪ datalog(KB 2 ).

Modularity changes our goal from defining DLP knowledge bases to defining DLP axioms. Note that SROIQ with global constraints (regularity, simplicity) does not satisfy DLP 5 (to see this, set KB 1 = {Tra(R)} and KB 2 = {⊤ ⊑ 1 R.⊤}) which is why we consider SROIQ free instead. The above principles already suffice to establish an interesting complexity result: Proposition 1. Consider a class K of knowledge bases that belong to a language satisfying DLP 1 to DLP 5, and such that the maximal size of axioms in K is bounded. Then deciding satisfiability of knowledge bases in K is possible in polynomial time.

Proof. By DLP 2, satisfiability of KB ∈ K can be decided by checking satisfiability of datalog(KB). Assume that the size of axioms in knowledge bases in K is at most n. Up to renaming of symbols, there is only a finite number of different axioms of size n. We can assume without loss of generality that the transformation datalog produces structurally similar datalog for structurally similar axioms, so that there are only a finite number of structurally different datalog theories datalog({α}) that can be obtained from axioms α in K. The maximal number of variables occurring within these datalog programs is bounded by some m. By DLP 5, the same holds for all programs datalog(KB) with KB ∈ K. Satisfiability of datalog with at most m variables per rule can be decided in time polynomial in 2 m [8]. Since m is a constant, this yields a polynomial time upper bound for deciding satisfiability of knowledge bases in K.

⊓ ⊔

The previous result states that reasoning in any DLP language is "almost" tractable. Many DLs allow complex axioms to be decomposed into a number of simpler normal forms of bounded size, and in any such case tractability is obtained, but we will see that not all DLP axioms can be decomposed in DLP. Yet, Proposition 1 shows why Horn-SHIQ cannot be in DLP: ET worst-case complexity of reasoning can be proven for a class K of Horn-SHIQ knowledge bases as in the above proposition, see [9].

None of the above principles actually require DLP to contain any knowledge base at all. An obvious approach thus is to define DLP to be the largest language that adheres to all of the chosen design principles. The question to ask at this point is whether this is actually possible: is there a definition of DLP that adheres to the above principles and that includes as many DL axioms as possible? The answer is a resounding no: Proposition 2. Consider a language L DLP that adheres to the principles DLP 1 to DLP 5. There is a language L ′ DLP that adheres to DLP 1 to DLP 5 while covering more knowledge bases, i.e. L DLP ⊂ L ′ DLP . This shows that any attempt to arrive at a maximal definition of DLP based on the above design principles must fail. Any definition still requires further choices that, lacking concrete guidelines, are necessarily somewhat arbitrary. While it is certainly useful to capture some general requirements in explicit principles, our approach of defining DLP would not improve over existing ad hoc approaches.

The proof of Proposition 2 exploits complexity differences between datalog and DL. Intuitively speaking, a definition of DLP cannot reach the desired maximum since the computations required in this case would no longer be polynomial. DLP 5 does ameliorate the situation slightly by restricting attention to axioms, but DLs can encode complex semantic relationships even within single axioms. The core of Proposition 2 in this sense is that there is no polytime procedure for deciding whether a SROIQ axiom can be expressed in datalog.

These considerations highlight a strategy for further constraining DLP to obtain a clearly defined canonical definition. Namely, it is necessary to avoid the complicated semantic effects that may arise when considering even single DL axioms. An intuitive reason for the high complexity of evaluating single axioms is that parts of an axiom, even if structurally separated, may semantically affect each other. An important observation is that the semantic interplay of parts of an axiom usually requires entity names to be reused. For example, ⊤ ⊑ A ⊓ ¬A is unsatisfiable since the concept A is used in both conjuncts, while the structurally similar formula ⊤ ⊑ A ⊓ ¬B is satisfiable. So, to prevent such semantic effects from affecting DLP, we can require DLP to be closed under the exchange of entities: DLP 6 (Structurality) Consider knowledge bases KB and KB ′ such that KB ′ is an arbitrary renaming KB. Then KB is in DLP iff KB ′ is.

Note that renamings need not be uniform, so A ⊓ ¬B can be obtained from A ⊓ ¬A. This is a very strong requirement since it forces DLP to be based on the syntactic structure of axioms rather than on the semantic effects that occur for one particular axiom only. Together with modularity (DLP 5), this principle captures the essential difference between a "syntactic" and a "semantic" transformation from DL to datalog. Adhering to DLP 5 and DLP 6, DLP can only include axioms for which all potential semantic effects can be faithfully captured by datalog. Semantic computations for checking satisfiability must be accomplished in datalog, and not during the translation. This intuition turns out to be quite accurate, but more is needed to establish formal results.

Structurality also interacts with normal form transformations. For example, the concept (¬A ⊔ ¬B) ⊓ C can be emulated in datalog using rules → C(x) and A(x), ∧B(x) →.

But its DNF (¬A ⊓ C) ⊔ (¬B ⊓ C) is only in DLP if its renaming (¬A ⊓ C) ⊔ (¬B ⊓ D)

is, which turns out to be not the case. Therefore, the knowledge base {¬A ⊔ ¬B, C} is in DLP but the knowledge base {(¬A ⊔ ¬B) ⊓ C} is not. We have discussed above why such effects are not avoidable in general. The more transformations are allowed for DLP, the less knowledge bases are contained in DLP. Note that such effects do not occur for negation normal forms.

Defining DLP

We now provide a direct definition of DLP and show that the resulting language can be emulated in datalog. We call a language scheme L a DLP language scheme if it adheres to the principles DLP 1-DLP 6 of Section 3. A DLP language is a language of the form L(S ) for a signature S and DLP language scheme L. Without loss of generality, it is assumed that datalog(KB) is independent of the DLP language that KB is taken from.

It turns out that the above characterisation leads to a prohibitively complex syntactic description of the language. Our first goal in this section therefore is to identify ways of simplifying its presentation. Note that it is not desirable to simply eliminate "syntactic sugar" in general, since the very goal of this work is to characterise which SROIQ knowledge bases can be considered as syntactic sugar for datalog.

Restricting to axioms in negation normal form seems to free us from the burden of explicitly considering negative occurrences of non-atomic concepts. But NNF does not allow for this simplification, since concepts of the form n R.D still contain D in negative polarity. A modified NNF is more adequate. We thus say that a SROIQ free concept expression C is in positive negation normal form (pNNF) if (1) all its subexpressions n R.D have the form n R.¬D ′ , and (2) every other occurrence of ¬ in C is part of a subconcept ¬A or ¬{a} with A ∈ A, a ∈ I. Concept expressions C can be transformed into semantically equivalent concept expressions pNNF(C) in positive negation normal form in linear time.

While pNNF effectively reduces the size of a DLP definition by half, the definition is still exceedingly complex. The construction of disjunctive normal forms is compatible with pNNF, so we can additionally require this form of normalisation. Another source of complexity is the fact that SROIQ features many concept expressions for which

Body concepts: for C in normal form, C ∈ D B iff C ⊔ A (or ¬C ⊑ A) is in DLP C B ¬A | ¬{I} | ¬∃R.Self | 0 R.¬(D B ∪ {⊥}) | C B ⊓ C B D B C B | D B ⊔ D B Head concepts: for C in normal form, C ∈ D H iff A ⊑ C is in DLP C H C B | A | {I} | ∃R.Self | n R.D n! | 0 R.¬D H | 1 R.¬(D B ∪ {⊥}) | C H ⊓ C H | D 1! D H C H | D H ⊔ D B | D a ⊔ C ≥ Assertional concepts: for C in normal form, C ∈ D a iff {a} ⊑ C is in DLP C a C H | n R.D n | C a ⊓ C a D a C a | D a ⊔ D B Disjunctions of nominal assertions of the form {I} ⊓ C a D 1! {I} ⊓ C a D m+1! D m! ⊔ D 1! Conjunction of negated nominals, i.e. complements of some nominal disjunction C ≥ ¬{I} | C ≥ ⊓ C ≥ Filler concepts for n in D a D n ⊤ | C ≥ ⊔ D + a | D B ⊔ (D ≤m ∩ D + a ) (m < n) | D a ⊔ (D ≤m ∩ D + a ) ⊔ D l! (for r ≔ n − (m + l)

we have r > 0 and r(r − 1) ≥ m) where no disjuncts are added for D ≤0 ∩ D + a and D 0! Extended concepts with restricted forms of ("local") disjunctions, used in D n only

C + H C H | n R.D + n! | 0 R.¬D + H | n R.¬(D ≥ω−m ∩ D + a ) | C + H ⊓ C + H | D + 1! D + H C + H | D + H ⊔ D B | D + a ⊔ C ≥ C + a C + H | n R.(D + a ∪ {⊤}) | C + a ⊓ C + a D + a C + a | D + a ⊔ D + a D + 1! {I} ⊓ C + a D + m+1! D + m! ⊔ D + 1!

D ≤n (D ≥ω−n ) concepts contain (exclude) at most n domain elements, see [6]. Many unexpected DLP axioms are based on the observation that some DL concepts do not allow for arbitrary interpretations. This is especially the case for concepts that use nominals, but even DLs without nominals admit such constrained concept expressions. An important special case are concepts that can only represent ⊤ or ⊥ in any renaming. We say that a SROIQ concept expression C is structurally valid (structurally unsatisfiable) if ⊤ ⊑ C ′ (C ′ ⊑ ⊥) is valid for every renaming (over arbitrary alphabets) C ′ of C. Moreover, C is structurally refutable (structurally satisfiable) if it is not structurally valid (structurally unsatisfiable). It can be shown that the sets of all SROIQ concept expressions in pNNF that are structurally valid, unsatisfiable, refutable, or satisfiable can be recognised in polynomial time (see [6]). Thus we can also eliminate such concepts from our considerations. Before providing the full definition of a large DLP language scheme, we provide some interesting examples to sketch the complexities of this endeavour (datalog emulations are provided in parentheses). DLP expressions of the form

Definition 2. A concept expression C is inA ⊓ ∃R.B ⊑ ∀S .C (A(x) ∧ R(x, y) ∧ B(y) ∧ S (x, z) → C(z)) are well-known. The same is true for A ⊑ ∃R.{c} (A(x) → R(x, c)) but hardly for A ⊑ 2 R.({c} ⊔ {d}) (A(x) → R(x, c), A(x) → R(x, d)).

Another unusual form of DLP axioms arises when Skolem constants (not functions) can be used as in the case {c} ⊑ 2 R. A (R(c, s), R(c, s ′ ), A(s), A(s ′ ), s ≈ s ′ → ⊥ with fresh s, s ′ ) and A ⊑ ∃R.({c}⊓∃S .⊤) (A(x) → R(x, c), S (c, s) with fresh s). Besides these simple cases, there are various DLP axioms for which the emulation in datalog is significantly more complicated, typically requiring an exponential number of rules. Examples are {c} ⊑ 2 R.(¬{a} ⊔ A ⊔ B) and {c} ⊑ 5 R.(A ⊔ {a} ⊔ ({b} ⊓ 1 S .({c} ⊔ {d}))). These cases are based on the complex semantic interactions between nominals and atleastrestrictions. Definition 3. We define the language scheme DLP as follows. For a signature S , the language DLP(S ) contains all axioms which are SROIQ free RBox axioms over S , or GCIs C ⊑ D over S where pNNF(¬C ⊔D) is a C DLP concept as defined in the following grammars, with C H as defined in Fig. 1, and D =n and C ⊤ as defined in Fig. 2:

C DLP ⊤ | ⊥ | C H | D =n | C ⊤

In spite of the immense simplifications that DLP normal form provides, the definition of DLP still turns out to be extremely complex. We have not succeeded in simplifying the presentation any further without loosing substantial expressive features. Some intuitive explanations help to understand the underlying ideas. It is instructive to also compare these intuitions to the above examples.

The core language elements are in Fig. 1. Since all concepts are in DNF, each sublanguage consists of a conjunctive part C and a disjunctive part D. Definitions of DLP typically distinguish between "head" and "body" concepts, and C H and C B play a similar role in our definition. C H represents concepts that carry the full power of a DLP GCI and that can serve as right hand sides ("heads") of DLP GCIs. C B concepts can be seen as negated generic left hand sides ("bodies") of GCIs. However, these basic classes are not sufficient for defining a maximal DLP. C a characterises concept expressions which can be asserted for named individuals -these are even more expressive than C H in that existential restrictions are allowed (intuitively, this is possible as in the context of known individuals the existentially asserted role neighbours can be expressed by Additional concepts based on global domain size restrictions

D =1 {I} ⊓ C p H D =m+1 D =m ⊔ ({I} ⊓ C =m+1 ⊥

) Additional head and body concept expressions for unary domains ("propositional" case)

C p B C =1 ⊥ | ¬A | ¬∃R.Self | C p B ⊓ C p B | 0 R.¬(D p B ∪ {⊥}) | n R.¬C (n ≥ 1) D p B D p B | D p B ⊔ D p B C p H C p B | A | {I} | ∃R.Self | C p H ⊓ C p H | 1 R.D p H | 0 R.¬D p H D p H C p H | D p H ⊔ D p B

Additional structurally unsatisfiable concepts for domains of restricted size This emulation uses internal symbols to resolve apparently disjunctive cases in a deterministic way. The datalog program does not represent disjunctive information: its least model simply contains two successors that are not equal to b. The nested disjunction only becomes relevant in the context of some disjunctive FOL = formula, such as ∀x.x ≈ a ∨ x ≈ b. The considered theory is no longer datalog in this case, and the program simply "re-uses" the disjunctive expressive power provided by the external theory. The fact that the actual program is far from being semantically equivalent to the original axiom illustrates the motive and utility of our definition of emulation.

C =1 ⊥ ¬{I} | C =1 ⊥ ⊓ C | n R.D =1 ⊥ (n ≥ 0) | n R.D (n ≥ 2) C =m+1 ⊥ C =m+1 ⊥ ⊓ C | n R.D =m+1 ⊥ (n ≥ 0) | n R.D (n ≥ m + 2) D =m ⊥ C =m ⊥ | D =m ⊥ ⊔ D =m

Many uses of nominals and atleast-restrictions lead to more complex interactions, some of which require completely different encodings. This is witnessed by the more complex arithmetic side condition used in D n . Concepts in D ≤m ∩ D + a correspond to disjunctions of m nominal classes, each of which is required to satisfy further disjunctive conditions, as e.g. {b} ⊓ 1 R.(A ⊔ B). Now a disjunction of an atomic class and four such "disjunctive nominals" is allowed as a filler for 7 (since 3 × 2 ≥ 4) but not for 6 (since 2 × 1 < 4). Also note that the disjunctive concepts like D + H and D + a that are allowed in fillers do not allow all types of disjunctive information but only a finite amount of "local" disjunctions. For example, {a} ⊔ B ⊔ C requires one "local" decision about a, whereas concepts like {a} ⊓ 0 R.¬(B ⊔ C) or {a} ⊓ 2 R.¬⊥ require arbitrarily many decisions for all R successors.

The remaining grammars in Fig. 2 take care of less interesting special cases. Most importantly, C p H covers all concepts that can be emulated if the interpretation domain is restricted to contain just one individual. C ⊤ contains axioms which make the knowledge base inconsistent as they deny the existence of a nominal.

Further details about the datalog transformation of DLP are provided in [6].

Maximality of DLP

We conjecture that the DL DLP of Definition 3 is the largest DLP language scheme. Proving this is not straightforward, since it requires us to ensure that no further kind of axioms could admit a datalog emulation. The definition of DLP is also a result of (sometimes surprising) failures in trying to prove this result. The earlier example emulations already hint at the complexity of the problem. The general proof technique is sketched in the following, and further updates on the status of the maximality conjecture are given in the technical report [6]. Structurality (DLP 6) is essential throughout the proof since it enables us to pick suitable renamings for each argument.

A useful observation is that any DLP language scheme can be extended to include all axioms of DLP, since it can be shown that the union of any two DLP language schemes is still a DLP language scheme. Equipped with the basic toolbox of DLP axioms, it can then be shown that certain basic types of axioms can never be in DLP since their emulation would contradict basic properties of datalog. Examples of two basic such properties are the least model property and the complexity result of Proposition 1. The former entails that, for any datalog program P, if P has a model where the extension of predicate A is empty, and another model where the extension of predicate B is empty, then P has a model where both extensions are empty. This precludes datalog from emulating statements like A ⊔ B. Complexity properties can also be exploited: Lemma 1. No DLP language scheme contains axioms of the form A ⊑ ∃R.⊤.

Proof. For a contradiction, suppose that there is a DLP language scheme that includes axioms of the form A ⊑ ∃R.⊤. We can assume that all DLP axioms are also available. The hardness proof given for Horn-FL − in [9] can be adopted to show that deciding satisfiability for this DLP language is PS hard, even if axiom sizes are bounded. Since P PS, this contradicts Proposition 1.

⊓ ⊔

Both of these simple arguments do not extend to more general cases. More potent approaches are provided by model-theoretic properties that generalise the least model property to first-order interpretations: the product model and product element construction, see [6] for details. Applying these approaches to all cases requires a careful induction for various language definitions of DLP. Individual proof steps can be intricate in some cases, e.

Conclusions and Outlook

DLP provides an interesting example for the general problem of characterising syntactic fragments of a logic that are motivated by semantic properties. We derived and motivated a number of design principles for achieving such a characterisation for DLP, most notably the principles of modularity (closure under unions of knowledge bases) and structurality (closure under non-uniform renaming of signature symbols). We conjecture that the presented DLP language scheme is the largest one possible. Experiences with the ongoing proof of maximality confirm the utility of structurality in such proofs. Formalisms like our maximal DLP are unnecessarily large for practical applications, but understanding overall options and underlying design principles is indispensable for making an informed choice of DL for a concrete task.

Our results also clarify the differences between DLP and the DLs EL and Horn-SHIQ which can be expressed in datalog as well. First of all, neither EL nor Horn-SHIQ can be emulated in datalog (DLP 2). Instead, EL and Horn-SHIQ satisfy a weaker version of DLP 2 where Definition 1 is restricted to test formulae ϕ that are conjunctions of simple ABox facts. This weakening of DLP 2 allows for a larger space of possible DL fragments, but it is not clear whether (finitely many) maximal languages exist in this case. There is clearly no largest such language, since both EL and DLP abide by the weakened principles whereas their (intractable) union does not. The weakened principles still exclude Horn-SHIQ that is not modular (DLP 5), as shown by Proposition 1. It is possible to define Horn-SHIQ as a structural language (DLP 6) by using distinct signature sets for simple and non-simple roles. Again, it is open which results can be established for Horn-SHIQ-like DLs based on the remaining weakened principles.

This work also explicitly introduces a notion of semantic emulation which appears to be novel, though loosely related to conservative extensions. In essence, it requires that a theory can take the place of another theory in all logical contexts, based on a given syntactic interface. Examples given in this paper illustrate that emulation can be very different from semantic equivalence. Yet, our criteria can be argued to define minimal requirements for preserving a theory's semantics even in combination with additional information, so emulation appears to be a natural tool for enabling information exchange in distributed knowledge systems. We expect that the explicit articulation of this notion will be useful for studying the semantic interplay of heterogeneous logical formalisms in general.

The general approach of this paper -seeking a structural logical fragment that is provably maximal under certain conditions -leads to a number of further research questions. For example, what is the maximal fragment of SWRL ("datalog ∪ SROIQ") that can be expressed in SROIQ? It should contain DL Rules [10] and some form of DL-safe rules [11]. But also the maximal FOL = fragment that can be expressed in the guarded fragment or the two-variable fragment might be of general interest. Ultimate answers to such questions may indeed be obtained by a careful articulation of basic design principles.

Fig. 1 .1Fig. 1. Grammars for defining DLP concepts

DLP normal form if C = DNF(pNNF(C)) and -if C has a structurally valid subconcept D, then D = ⊤ and either C = D or D occurs in a subconcept of the form n R.D, -if C has a structurally unsatisfiable subconcept D, then D = ⊥ and either C = D or D occurs in a subconcept of the form n R.¬D. A SROIQ concept expression can be transformed into an equivalent expression in DLP normal form in polynomial time. The following definitions of DLP thus restrict to concepts in DLP normal form. Commutativity and associativity of ⊓ and ⊔ are exploited for further simplification.

⊥Fig. 2 .2Fig. 2. Grammars for defining DLP concepts: special cases with restricted domain size

g. to see why no datalog program can emulate {a} ⊑ 2 R.(A ⊔ ({b} ⊓ 1 S .(C ⊔ D)) while it could emulate {a} ⊑ 2 R.(A ⊔ {b} ⊔ {c}) and {a} ⊑ 3 R.(A ⊔ ({b} ⊓ 1 S .(C ⊔ D)).

⋆ Research reported herein was supported by the EU in the IST project ACTIVE (IST-2007-215040), and by the German Research Foundation under the ReaSem project.

Description Logic Programs: Combining logic programs with description logic BGrosof IHorrocks RVolz SDecker Proceedings of the 12th International World Wide Web Conference (WWW-03) the 12th International World Wide Web Conference (WWW-03)

Budapest, Hungary

ACM 2003 Web Ontology Reasoning with Logic Databases RVolz 2004 Germany Universität Karlsruhe (TH) PhD thesis ELP: Tractable rules for OWL 2 MKrötzsch SRudolph PHitzler May 2008 Germany Universität Karlsruhe Technical report Saturation-Based Decision Procedures for Extensions of the Guarded Fragment YKazakov 2006 Germany Universität des Saarlandes PhD thesis Data complexity of reasoning in very expressive description logics UHustadt BMotik USattler Proc. 18th Int. Joint Conf. on Artificial Intelligence (IJCAI-05) 18th Int. Joint Conf. on Artificial Intelligence (IJCAI-05)

Edinburgh, UK

Morgan-Kaufmann Publishers 2005 The largest DLP possible MKrötzsch SRudolph APR 2009 Germany Universität Karlsruhe Technical report Conservative extensions in expressive description logics CLutz DWalther FWolter IJCAI MMVeloso 2007 Complexity and expressive power of logic programming EDantsin TEiter GGottlob AVoronkov ACM Computing Surveys 33 2001 Complexity boundaries for Horn description logics MKrötzsch SRudolph PHitzler Proceedings of the 22nd AAAI Conference on Artficial Intelligence the 22nd AAAI Conference on Artficial Intelligence

Vancouver, British Columbia, Canada

AAAI Press 2007 Description logic rules MKrötzsch SRudolph PHitzler Proc. 18th European Conf. on Artificial Intelligence (ECAI-08) 18th European Conf. on Artificial Intelligence (ECAI-08) IOS Press 2008 Query answering for OWL DL with rules BMotik USattler RStuder J. of Web Semantics 3 2005