Towards Situation Discovery for Clustering Instances

In this paper, we are interested in the problem of identifying a set of individuals in an ontology can be distinguished from the rest in the sense that, for such a set, we can find a proper formal definition, called a situation, that covers merely the individuals in this set. In the form of a concept description, a situation gives a detailed characterization of a set of instances, thus serving as an explanation for the set. This is useful for problems such as clustering instances in an explainable way. We formally define the problem of finding situations in Description Logics (DLs) and discuss a first algorithm for this problem.

Concept Learning in DLs is to learn definitions of classes from existing ontologies and instance data. Most methods in this area are based on Inductive Logic Programming techniques [5,3]. Distinct from these approaches [3,4] where a set of instances is given as positive examples of the target concept, the challenge of learning situations consists in discovering distinguishable sets of instances among exponentially many. Moreover, our work deviates from standard clustering problems in that our aim is to cluster individuals in such a way that we can have a DL concept definition that describes exactly these individuals but no others.

Approach Definitions Given an ontology O, we formally define our problem in terms of situations in an ontology, for which we need to first define representative concepts. Definition 1 (A representative concept). Let ∆ be a set of all individuals in an ontology O, and let X ⊆ ∆. For a concept C, we say that X is represented by C (or C represents X ) w.r.t. O and ∆, if: (1) C(x) holds for all x ∈ X , i.e. O |= C(x), and (2) C(y) does not hold for any y ∈ ∆ \ X , i.e., O |= C(y). If there exists a concept C that represents X , we say that X is representable.

Example 1 (Representative Concept). Consider the set of individuals∆ = {f 1 , f 2 , f 3 }, a subset of individuals X = {f 1 , f 2 },= {A(f 1 ), B(f 2 ), E(f 3 ), r(f 1 , f 3 ), r(f 2 , f 3 )}. It holds that X is represented by C. But there is no ELO concept that can represent the set {f 2 , f 3 }.

A set of individuals that can be distinguished have to share some common properties merely among them, which are made explicit by the concepts that represent them. For example, the individuals f 1 and f 2 share the property of being connected to some individual via r role. However, the conclusion is no longer true for the union (∪) or set complement ( \ ).

The next lemma tells that any concept naturally represents a special set of individuals. Note that when a concept represents an empty set of individuals, it means that this concept is irrelevant to characterize the properties of the individuals from this ontology. Proposition 2. Given an ELO ontology O, a set ∆ of individuals, a concept C and a set X ⊆ ∆, we consider the following two decision problems: (1) Representability: Does C represent X w.r.t. O? (2) Representability n : For an integer n > 0, is there a concept C with |C| < n that represents X w.r.t. O?

We have Representability is in PTime and Representability n is in ExpTime.

The concepts representing X are equivalent in the sense of their instances. We call each of these equivalence classes a situation in O. Next we define the problem of discovering situations for a set of instances w.r.t. O.

Definition 3 (Situation discovery problem).

Let O be an ontology and ∆ a set of individuals in O. For X ⊆ ∆, the situation discovery problem is to compute the following set:

Ξ(X ) = {X 1 , . . . , X n | X i ⊆ X , ||X i || O ∆ = ∅}.

That is, to find all the subsets of X that are representable w.r.t. O.

By Lemma 1, it is easy to see that ∅ is representable by ⊥, therefore ∅ ∈ Ξ(X ).

The following result shows the monotonicity of the set of situations with an increase in domain elements. However, a set that is representable is not necessarily distinguishable any more if more elements are added. But a concept that can represent a set of instances still represents some (probably a different) set. Proposition 4. Let O be an ontology and ∆ be a set of individuals in O. Consider ∆ 1 ⊆ ∆. Suppose that X ∈ Ξ(∆ 1 ) is represented by a concept A w.r.t. O and ∆ 1 . Then we have:

(1) ||X || O ∆ ⊆ ||X || O ∆1 (2) X is not necessarily representable w.r.t. ∆. (3) The concept A still represents some set of individuals X , that is, X ∈ Ξ(∆).

An algorithm to compute situations in ELO [1,2] is given in the long version of our paper. The intuition is that: we first construct a refinement operator α to find the most specific concept, called MSR, that best represents all instances in a given set X . Then any refinement of such MSR obtained by the operator α w.r.t. some x ∈ X will produce a new situation characterized by a concept refined from the MSR for X by the operator α. By iterating this process, the situations in O can be discovered.

A prototype to support diagnosis in the avionics sector has already been implemented, and a real application in industry of this work will be discussed in a separate paper.

and the following ontology O = T , A , where T = {C ≡ ∃r. } and A

Lemma 1 .1Given an ontology O and a set ∆ of individuals, we have: (1) is a representative concept for ∆ and (2) ⊥ is a representative concept for ∅. Proposition 1. Given an ontology O, the set ∆ of individuals in O, and X ⊆ ∆, X ⊆ ∆. If X and X are representable, then X ∩ X is representable in ELO.

Lemma 2 .2Let C be a concept, O be an ontology and ∆ be the set of individuals in O. Then C represents the set S = {x ∈ ∆ | O |= C(x)}.

Definition 2 (Proposition 3 .23Situation in O). Given an ontology O, a set ∆ of individuals in O, and a set X ⊆ ∆, a situation for X w.r.t. O is: ||X || O ∆ = {C | C represents X w.r.t.O and ∆}. Intuitively, a situation in O explicitly characterizes, via concept descriptions, a given set of individuals in the ontology. Given an ontology O and the set ∆ of individuals in O, we assume O |= A ≡ B. Then A ∈ ||X || O ∆ if and only if B ∈ ||X || O ∆ for any X ⊆ ∆. But the inverse does not hold.

Pushing the EL envelope FBaader SBrandt CLutz Proceedings of IJCAI'05 IJCAI'05 2005 An Introduction to Description Logic FBaader IHorrocks CLutz USattler 2017 Cambridge University Press Dl-foil concept learning in description logics NFanizzi CAmato FEsposito International Conference on Inductive Logic Programming Springer 2008 Concept learning in description logics using refinement operators JLehmann PHitzler Machine Learning 78 1-2 203 2010 Foundations of inductive logic programming S.-HNienhuys-Cheng RDeWolf 1997 Springer Science & Business Media 1228