=Paper=
{{Paper
|id=Vol-1703/paper2
|storemode=property
|title=Axiomatization of General Concept Inclusions from Streams of Interpretations with Optional Error Tolerance
|pdfUrl=https://ceur-ws.org/Vol-1703/paper2.pdf
|volume=Vol-1703
|authors=Francesco Kriegel
|dblpUrl=https://dblp.org/rec/conf/ecai/Kriegel16
}}
==Axiomatization of General Concept Inclusions from Streams of Interpretations with Optional Error Tolerance ==
Axiomatization of General Concept Inclusions from Streams of Interpretations with optional Error Tolerance Francesco Kriegel Institute of Theoretical Computer Science, Technische Universität Dresden, Dresden, Germany francesco.kriegel@tu-dresden.de Abstract. We propose applications that utilize the infimum and the supre- mum of closure operators that are induced by structures occuring in the field of Description Logics. More specifically, we consider the closure operators induced by interpretations as well as closure operators induced by TBoxes, and show how we can learn GCIs from streams of interpretations, and how an error-tolerant axiomatization of GCIs from an interpretation guided by a hand-crafted TBox can be achieved. Keywords: Description Logics · Formal Concept Analysis · Most Specific Con- sequence · Error Tolerance · General Concept Inclusion · TBox · Interpretation · Model · Stream · Incremental Learning · Automatic Learning 1 Introduction Description Logics [2, 6–8] are a family of well-founded languages for knowledge rep- resentation with a strong logical foundation as well as a widely explored hierarchy of decidability and complexity of common reasoning problems. The several reasoning tasks allow for an automatic deduction of implicit knowledge from given explicitly represented facts and axioms, and many reasoning algorithms have been developed. Description Logics are utilized in many different application domains, and in particular provide the logical underpinning of Web Ontology Language (OWL) [16] and its profiles. An interesting problem is the task of (semi-)automatic generation of terminological axioms, so-called general concept inclusions (GCIs), from given data. For example, in [4, 10] Baader and Distel have generalized the construction of implicational bases from so-called formal contexts [12, 15] in the field of Formal Concept Analysis [14] to the construction of bases of EL⊥-GCIs from interpretations in Description Logics. The main difference of the underlying data structures is that interpretations additionally allow the expression of binary relations between objects, which implies a number of technical and theoretical difficulties that have been solved by them. In case of incompleteness of the input data set, a technique of Attribute Exploration [11–13] can be utilized to axiomatize implications in a sound and complete manner. This approach furthermore presupposes an expert in the domain of interest that is able to correctly answer all queries posed to her. In [5, 10] Baader and Distel have as well extended this technique from formal contexts to interpretations. A further work in the intersection of Formal Concept Analysis and Description Logics was developed by Rudolph in [20, 21]. He generalized Attribute Exploration to Relational Exploration, a technique that processes an interpretation in a multi-step approach where in each step the role-depth of the involved concept descriptions is increased. In particular, Relational Exploration is a sound and complete deduction calculus for GCIs in the Description Logic FLE. A weakness of the exploration methods is the requirement of an expert that is able to truthfully answer all questions posed to it. In order theory, closure operators (clop) denote mappings in a powerset – or more gen- erally, in a lattice – which are extensive, monotone, and idempotent. Many types of data sets give rise to an induced closure operator in such a way that an implication is valid in the data set if, and only if, it is valid in the closure operator. For example, each formal context (G, M, I) induces the clop X → 7 X II on the powerset ℘(M), and each interpre- tation (∆I , ·I ) induces the clop C →7 C II in the lattice of all EL⊥-concept descriptions ordered by subsumption v and factorized by equivalence ≡. In a recent paper [19], the author investigated how implicational bases for closure operators can be computed in a parallel manner. Furthermore, it was shown that the set of all clops in a complete lattice constitutes a complete lattice itself, and formulae for computing infima and suprema of clops were provided. In this document, we will introduce a closure operator C → 7 CT induced by a TBox T , and will furthermore provide applications for computing bases of GCIs for the infimum as well as the supremum of C → 7 C T and C → 7 C II . This document is structured as follows. Section 2 gives a brief overview on the easy description logic EL⊥. Then, Section 3 cites some related work, and recalls important notions. Section 4 introduces the notion of a most specific consequence with respect to a TBox, and shows how to define a closure operator induced by a TBox. Section 5 then discusses applications of infima and suprema of clops induced by interpretations and TBoxes, and Section 7 draws some conclusions. Note that proofs are not included, but the interested reader can find them in the corresponding technical report [17]. 2 The Description Logic EL⊥ The syntax and semantics of the light-weight description logic EL⊥ are introduced as follows. Throughout the whole document assume that (NC , NR ) is a signature, i.e., NC is a set of concept names, and NR is a set of role names. An EL⊥-concept description is a term that is constructed by means of the following inductive rule: C ::= ⊥ | > | A | C u C | ∃ r. C. A general concept inclusion (abbr. GCI ) is an expression C v D where both the premise C as well as the conclusion D are EL⊥-concept descriptions. A TBox is a set of GCIs. The role depth rd(C) of an EL⊥-concept description C is inductively defined as follows: rd(⊥) := rd(>) := rd(A) := 0, rd(C u D) := rd(C) ∨ rd(D), and rd(∃ r. C) := 1 + rd(C). An interpretation I := (∆I , ·I ) consists of a non-empty set ∆I , called the domain, and an extension function ·I that maps concept names A ∈ NC to subsets AI ⊆ ∆I , and maps role names r ∈ NR to binary relations rI ⊆ ∆I ×∆I . Then, the extension function is canonically extended to all EL⊥-concept descriptions by the following definitions: ⊥I := ∅, >I := ∆I , (C u D)I := C I ∩ DI , and (∃ r. C)I := { d ∈ ∆I | ∃ e ∈ ∆I : (d, e) ∈ rI and e ∈ C I }. A GCI C v D is valid in I if C I ⊆ DI . We then also refer to I as a model of C v D, and denote this by I |= C v D. Furthermore, I is a model of a TBox T , symbolized as I |= T , if each GCI in T is valid in I. The entailment relation is lifted to TBoxes as follows: A GCI C v D is entailed by a TBox T , denoted as T |= C v D, if each model of T is a model of C v D, too. We then also say that C is subsumed by D with respect to T . A TBox T entails a TBox U, symbolized as T |= U, if T entails each GCI in U, or equivalently if each model of T is also a model of U. Two EL⊥-concept descriptions C and D are equivalent with respect to T , and we shall write T |= C ≡ D, if T |= {C v D, D v C}. In case T = ∅ we may ommit the prefix ”∅ |=”. However, then we have to carefully interpret an expression C v D – it either just denotes a general concept inclusion, i.e., an axiom, without stating where it is valid; or it expresses that C is subsumed by D (w.r.t. ∅), i.e., C I ⊆ DI is satisfied in all interpretations I. An analogous hint applies to concept equivalences C ≡ D. It is readily verified that the subsumption v constitutes a quasi-order on the set EL⊥(NC , NR ) of all EL⊥-concept descriptions over the signature (NC , NR ). Hence, the quotient of EL⊥(NC , NR ) with respect to the induced equivalence ≡ is a partially ordered set (a poset). (In the following we will not distinguish between the equivalence classes and their representatives.) The poset is even a bounded lattice. Of course, ⊥ is the smallest element, and > is the greatest element. Furthermore, the conjunction u corresponds to the infimum operation, and the least common subsumer mapping ∨ corresponds to the supremum operation. Remark that the least common subsumer (abbr. lcs) C ∨ D of two EL⊥-concept descriptions C and D is (up to equivalence) uniquely defined by the following conditions: 1. C v C ∨ D as well as D v C ∨ D, and 2. for each EL⊥-concept description E, if C v E and D v E, then C ∨ D v E. It is easy to see that the equivalence ≡ is compatible with both u and ∨. In the sequel of this document, we shall denote this bounded lattice by EL⊥(NC , NR ) := (EL⊥ (NC , NR ), v)/ ≡. For a role-depth bound δ ∈ N, EL⊥(NC , NR )δ is the set of all EL⊥-concept descriptions with a role depth of at most δ, and accordingly EL⊥(NC , NR )δ := (EL⊥ (NC , NR )δ , v)/ ≡ symbolizes the corresponding bounded lattice of (equivalence classes of) EL⊥-concept descriptions. Note that EL⊥(NC , NR )δ is complete if the underlying signature (NC , NR ) is finite. 3 Related Work Baader and Distel introduced a technique for the axiomatization of general concept inclusions that are valid in a given interpretation. More specifically, they interconnected Formal Concept Analysis with Description Logics by defining so-called induced formal contexts such that its canonical base can directly be converted into a base of GCIs for the underlying interpretation. However, there was no possibility to include existing knowledge. For the case where a set of GCIs valid in the given interpretation is available, a solution for the computation of a relative base of GCIs has been proposed in [18]. However, it was not clear how to proceed in presence of an interpretation I and a TBox T where I |6 = T . This problem will be tackled in the following section. Beforehand, we recall the notion of a model-based most specific concept description. Let I be an interpretation, and consider a set X ⊆ ∆I as well as a role-depth bound δ ∈ N. Then an EL⊥-concept description C is called a role-depth-bounded model-based most specific concept description (abbr. mmsc) of X with respect to I and δ if the following statements hold: 1. rd(C) ≤ δ, 2. X ⊆ C I , and 3. for all concept descriptions D, if X ⊆ DI , then ∅ |= C v D. As an immediate consequence of the definition we infer that the mmsc of X w.r.t. I and δ is unique up to equivalence. Hence, we may speak of the mmsc, and denote it by X Iδ . 4 Most Specific Consequences with respect to a TBox For a given EL⊥-TBox T , and an EL⊥-concept description C, one may ask for a concept description D which is most specific with respect to the condition that C is subsumed by D w.r.t. T . Such a concept description is called most specific consequence, or most specific subsumer (abbr. mss). Note that Distel has investigated a dual notion in [10, Chapter 7], namely that of a minimal possible consequence, which he utilized to constitute an algorithm for the exploration of ontologies, called ABox Exploration. To emphasize this duality, it is reasonable to use the name of a minimal certain consequence. In this section, the notion of a most specific consequence shall be formally introduced, and necessary as well as sufficient conditions for its existence will be explored. Further- more, we investigate its relationship to entailment with respect to a TBox. The next section then provides at least one application that utilizes most specific consequences to construct a base of GCIs that are both valid in a given interpretation as well as are entailed by a given TBox. As an exemplary TBox, consider T := {A v ∃ r. A}. It can be readily verified that for each n ∈ N, the concept description (∃ r.)nA is a consequence (i.e., a subsumer) of A with respect to T . However, (∃ r.)n+1A is more specific than (∃ r.)nA, and thus a most specific consequence of A w.r.t. T does not exist in the description logic EL⊥ with descriptive semantics (as introduced in Section 2). There are two solutions to tackle this problem of existence of most specific consequences. The first one is to use the extension of EL⊥ with greatest fixpoint semantics. This extension has been extensively studied in [1, 3, 10], and in particular it has been shown that this extension can handle terminological cycles (as present in the given TBox T above). In particular, EL⊥gfp -concept descriptions are pairs C := (AC , TC ) where TC is a TBox, and AC is a defined concept name of TC . We do not introduce the full machinery of the semantics of EL⊥ gfp here, but rather refer the interested reader to [1, 3]. However, it can be shown that the most specific consequence of the example above indeed exists in EL⊥ gfp , and is given by (A, T ). It is straight-forward to claim that most specific consequences always exist in EL⊥ gfp , but nevertheless a corresponding proof is outstanding. Another solution for ensuring the existence of most specific consequences is to restrict the role-depth of the concept descriptions under consideration, as this has been done in [9] to ensure the existence of model-based most specific concept descriptions in EL⊥ with descriptive semantics. We introduce the following definition. Definition 1. Let T be an EL⊥-TBox, C an EL⊥-concept description, and δ ∈ N a role-depth bound. Then an EL⊥-concept description D is called a role-depth-bounded most specific consequence of C with respect to T and δ if it satisfies the following conditions: 1. rd(D) ≤ δ, 2. T |= C v D, and 3. for each EL⊥-concept description E, if rd(E) ≤ δ and T |= C v E, then ∅ |= D v E. Provided that such role-depth-bounded most specific consequences exist, they are unique up to equivalence, and hence we may then speak of the most specific consequence, and we shall denote it by C Tδ for a given concept description C, a TBox T , and a role-depth bound δ. By Definition 1, T |= C v C Tδ . Furthermore, from T |= C v C we conclude that ∅ |= C Tδ v C. In summary, T |= C ≡ C Tδ . Lemma 2. Role-depth-bounded most specific consequences always exist, provided that the signature is finite. Lemma 3. Let T ∪ {C v D} be an EL⊥-TBox such that D has a role depth of at most δ. Then the following statements are equivalent: 1. T |= C v D. 2. ∅ |= C Tδ v D. 3. { E v E Tδ | E ∈ EL⊥(NC , NR )δ } |= C v D. If all conclusions of GCIs in T have role depths not exceeding δ, then furthermore the following statement is equivalent to the previous ones: 4. { E v E Tδ | ∃ F : E v F ∈ T } |= C v D. Corollary 4. Let T ∪ {C v D} be an EL⊥-TBox such that T |= C v D, and both C and D have role depths not exceeding δ. Then for each EL⊥-concept description E, if ∅ |= E Tδ v C, then ∅ |= E Tδ v D. This document does not include a method for the computation of most specific consequences, and leaves this problem open for future research. However, we have shown that their existence is guaranteed in the case of descriptive semantics when the candidate concept descriptions are restricted in their role depth. In particular, the notion defined in Definition 1 always exists. As a next step, we provide a technique that allows for the computation of a TBox from a stream of interpretations and that utilizes those most specific consequences. 7 C Tδ is a closure operator in the dual of EL⊥(NC , NR )δ . Lemma 5. The mapping C → 5 Axiomatization of General Concept Inclusions from Streams of Interpretations Consider a setting where a stream of interpretations In, n ∈ N, can be observed, and furthermore for each time point n ∈ N, a TBox Tn shall be constructed such that for each GCI C v D, T |= C v D if, and only if, Ik |= C v D for all previous time points k ≤ n. Of course, for the initial moment n = 0, we may simply compute T0 as a base of GCIs for I0, utilizing the approach of Baader and Distel [4, 10]. For the following moments n ≥ 1, we may of course also construct a base of GCIs for the disjoint union of the interpretations I0, I1, . . . , In. However, since the method requires the construction of so-called induced contexts the size of which may be exponential in the size of the domain of the interpretation, this technique could possibly be infeasible for late time points. Furthermore, it requires the storing of all interpretations observed so far. We want to present another technique for solving the above mentioned task. Please note that this problem was already addressed in [18] for the case of In+1 |= Tn. Here, we propose a solution that circumvents this rather restrictive precondition. The infimum of ·T and ·II is the greatest closure operator below both ·T and ·II . The following lemma recalls an important property of infima of clops, specifically tailored to the case of implications of concept descriptions, i.e., of general concept inclusions as they are more commonly called. Lemma 6. Let I be an interpretation, T an EL⊥-TBox, and C v D a general concept inclusion such that both premise and conclusion have a role-depth not exceeding δ. Then the following statements are equivalent: 1. C v D is both valid in I as well as entailed by T . 2. ∅ |= {C IIδ v D, C Tδ v D}. 3. ∅ |= C IIδ ∨ C Tδ v D. 4. C v D is valid in the infimum of C → 7 C IIδ and C →7 C Tδ . As a conclusion, we infer the following incremental method for the computation of a sequence of TBoxes from a sequence of interpretations: 1. Upon availability of the first observed interpretation I0, compute its canonical base of GCIs, as proposed by Baader and Distel in [4, 10]. If only concept descriptions up to a certain role depth shall be considered, then the variant described by Borchmann, Distel, et al., in [9] is sufficient. 2. For each new interpretation In+1, compute the canonical base of the infimum of the clops that are induced by the current TBox Tn as well as by the newly observed interpretation In+1. It is readily verified that – by construction – for each time point n ∈ N, the TBox Tn entails a GCI C v D if, and only if, C v D is valid in the interpretations I0, . . . , In. 6 Error-Tolerant Axiomatization of General Concept Inclusions from Interpretations Assume that we were given an interpretation I as well as a TBox T such that I contains observations that may be possibly faulty due to inaccurate generation methods, and that T is certainly valid in the domain of interest, e.g., as it has been hand-crafted by experts. In particular, we assume that I is not a model of T , i.e., that at least one domain element in I exists which serves as a counterexample against at least one GCI from T . However, we are expected to axiomatize terminological knowledge from I, which is valid in the unknown domain of interest. As a solution, we will construct the implicational base of the supremum of the clops that are induced by I, and by T , respectively. It is then ensured that those implications are considered which are valid for all those domain elements of I that respect the GCIs in T , i.e., that we axiomatize general concept inclusions from I that are compatible with the axioms contained in T . In a certain sense this yields a method for an error correction in I when learning GCIs. We will describe a short motivating example. Define a TBox T and an interpretation I as follows: NC := {Person, Car, Wheel}, NR := {child} T := {∃ child. > v Person, Person u Car v ⊥} Car Wheel Person Person child child g I: d e f Consider the GCI Car v ∃ child. Wheel. Of course, it is valid in I, and furthermore is contained in the canonical base of I when applying the construction from [4, 10]. We can show that the above mentioned GCI is also valid IIδ g Tδ for δ ≥ 1: CarIIδ ≡ Car u ∃ child. Wheel (Car u ∃ child. Wheel)Tδ ≡ Car u ∃ child. Wheel u Person u ⊥ ≡ ⊥, The closure of Car with respect to the supremum is the least common subsumer of those concept descriptions that are closures of both C →7 C IIδ and C → 7 C Tδ , as well as are subsumed by Car. It is easy to see that this supremum closure can be computed by an exhaustive repeated application of both closure operators until a fixpoint is reached. As we have seen above, the fixpoint ⊥ is reached after the first iteration, and hence ⊥ is the supremum-closure. However, the GCI is a consequence of the more specific valid GCI Car v ⊥, and hence would not have been axiomatized upon construction of the canonical base. In particular, we see that d is not compatible with T – in contrast to the other domain elements e, f, and g. Eventually, Car is a pseudo-closure of the supremum, and hence the canonical base contains the axiom expressing the non-existence of cars. 7 Conclusion We have defined the notion of most specific consequences with respect to TBoxes, and considered the corresponding closure operator. We have investigated the interplay of this closure operator induced by a given TBox with the closure operator induced by an inter- pretation – more specifically, we have shown how their infimum can be utilized for learn- ing from streams of interpretations, and have motivated how their supremum can be used for an error-tolerant axiomatization of general concept inclusions from interpretations in the presence of a hand-crafted TBox that indicates errors in the observed interpretation. Acknowledgements. The author gratefully thanks the anonymous reviewers for their constructive hints and helpful remarks. References [1] Franz Baader. “Least Common Subsumers and Most Specific Concepts in a Description Logic with Existential Restrictions and Terminological Cycles”. In: IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003. Ed. by Georg Gottlob and Toby Walsh. Morgan Kaufmann, 2003, pp. 319–324. [2] Franz Baader. “Logic-Based Knowledge Representation”. In: Artificial Intelligence Today. 1999, pp. 13–41. [3] Franz Baader. “Terminological Cycles in a Description Logic with Existential Restric- tions”. In: IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Acapulco, Mexico, August 9-15, 2003. Ed. by Georg Gottlob and Toby Walsh. Morgan Kaufmann, 2003, pp. 325–330. [4] Franz Baader and Felix Distel. “A Finite Basis for the Set of EL-Implications Holding in a Finite Model”. In: Formal Concept Analysis, 6th International Conference, ICFCA 2008, Montreal, Canada, February 25-28, 2008, Proceedings. Ed. by Raoul Medina and Sergei A. Obiedkov. Vol. 4933. Lecture Notes in Computer Science. Springer, 2008, pp. 46–61. [5] Franz Baader and Felix Distel. “Exploring Finite Models in the Description Logic”. In: Formal Concept Analysis, 7th International Conference, ICFCA 2009, Darmstadt, Germany, May 21-24, 2009, Proceedings. Ed. by Sébastien Ferré and Sebastian Rudolph. Vol. 5548. Lecture Notes in Computer Science. Springer, 2009, pp. 146–161. [6] Franz Baader, Ian Horrocks, and Ulrike Sattler. “Description Logics”. In: Handbook on Ontologies. Ed. by Steffen Staab and Rudi Studer. International Handbooks on Information Systems. Springer, 2004, pp. 3–28. [7] Franz Baader, Ian Horrocks, and Ulrike Sattler. “Description Logics”. In: Handbook on Ontologies. Ed. by Steffen Staab and Rudi Studer. International Handbooks on Information Systems. Springer, 2009, pp. 21–43. [8] Franz Baader and Carsten Lutz. “Description Logic”. In: Handbook of Modal Logic. Ed. by Patrick Blackburn, Johan van Benthem, and Frank Wolter. Elsevier, 2006. Chap. 13. [9] Daniel Borchmann, Felix Distel, and Francesco Kriegel. “Axiomatisation of General Concept Inclusions from Finite Interpretations”. In: Journal of Applied Non-Classical Logics 26.1 (2016), pp. 1–46. [10] Felix Distel. “Learning description logic knowledge bases from data using methods from formal concept analysis”. PhD thesis. Dresden, Germany: Technische Universität Dresden, 2011. [11] Bernhard Ganter. “Attribute Exploration with Background Knowledge”. In: Theor. Comput. Sci. 217.2 (1999), pp. 215–233. [12] Bernhard Ganter. Two Basic Algorithms in Concept Analysis. FB4-Preprint 831. Darmstadt, Germany: Technische Hochschule Darmstadt, 1984. [13] Bernhard Ganter and Sergei A. Obiedkov. Conceptual Exploration. Springer, 2016. [14] Bernhard Ganter and Rudolf Wille. Formal Concept Analysis: Mathematical Foundations. 1st ed. Springer, Dec. 1999. [15] Jean-Luc Guigues and Vincent Duquenne. “Famille minimale d’implications informatives résultant d’un tableau de données binaires”. In: Mathématiques et Sciences Humaines 95 (1986), pp. 5–18. [16] Pascal Hitzler, Markus Krötzsch, and Sebastian Rudolph. Foundations of Semantic Web Technologies. Chapman and Hall/CRC Press, 2010. [17] Francesco Kriegel. Axiomatization of General Concept Inclusions from Streams of Interpretations with optional Error Tolerance. LTCS-Report 16-05. Dresden, Germany: Chair for Automata Theory, Institute for Theoretical Computer Science, Technische Universität Dresden, 2016. [18] Francesco Kriegel. “Incremental Learning of TBoxes from Interpretation Sequences with Methods of Formal Concept Analysis”. In: Proceedings of the 28th International Workshop on Description Logics, Athens,Greece, June 7-10, 2015. Ed. by Diego Calvanese and Boris Konev. Vol. 1350. CEUR Workshop Proceedings. CEUR-WS.org, 2015. [19] Francesco Kriegel. “NextClosures with Constraints”. In: Proceedings of the Thirteenth International Conference on Concept Lattices and Their Applications, Moscow, Russia, July 18-22, 2016. Ed. by Marianne Huchard and Sergei Kuznetsov. Vol. 1624. CEUR Workshop Proceedings. CEUR-WS.org, 2016, pp. 231–243. [20] Sebastian Rudolph. “Exploring Relational Structures Via FLE”. In: Conceptual Structures at Work: 12th International Conference on Conceptual Structures, ICCS 2004, Huntsville, AL, USA, July 19-23, 2004. Proceedings. Ed. by Karl Erich Wolff, Heather D. Pfeiffer, and Harry S. Delugach. Vol. 3127. Lecture Notes in Computer Science. Springer, 2004, pp. 196–212. [21] Sebastian Rudolph. “Relational exploration: combining description logics and formal con- cept analysis for knowledge specification”. PhD thesis. Dresden University of Technology, 2006.