-

Towards Situation Discovery for Clustering Instances

Luis Palacios

luis.palacios@lri.fr 0 1

Yue Ma

ma@lri.fr 0

Chantal Reynaud

Gaëlle Lortal

gaelle.lortal@thalesgroup.com 1 0 Laboratoire de Recherche en Informatique, Université Paris-Sud , France 1 Thales TRT Palaiseau , France

In this paper, we are interested in the problem of identifying a set of individuals in an ontology can be distinguished from the rest in the sense that, for such a set, we can find a proper formal definition, called a situation, that covers merely the individuals in this set. In the form of a concept description, a situation gives a detailed characterization of a set of instances, thus serving as an explanation for the set. This is useful for problems such as clustering instances in an explainable way. We formally define the problem of finding situations in Description Logics (DLs) and discuss a first algorithm for this problem. Concept Learning in DLs is to learn definitions of classes from existing ontologies and instance data. Most methods in this area are based on Inductive Logic Programming techniques [5,3]. Distinct from these approaches [3,4] where a set of instances is given as positive examples of the target concept, the challenge of learning situations consists in discovering distinguishable sets of instances among exponentially many. Moreover, our work deviates from standard clustering problems in that our aim is to cluster individuals in such a way that we can have a DL concept definition that describes exactly these individuals but no others. Approach Definitions Given an ontology O, we formally define our problem in terms of situations in an ontology, for which we need to first define representative concepts. Definition 1 (A representative concept). Let be a set of all individuals in an ontology O, and let X . For a concept C, we say that X is represented by C (or C represents X ) w.r.t. O and , if: (1) C(x) holds for all x 2 X , i.e. O j= C(x); and (2) C(y) does not hold for any y 2 n X ; i.e., O 6j= C(y): If there exists a concept C that represents X , we say that X is representable. Example 1 (Representative Concept). Consider the set of individuals = ff1; f2; f3g, a subset of individuals X = ff1; f2g, and the following ontology O = hT ; Ai, where T = fC 9r:>g and A = fA(f1); B(f2); E(f3); r(f1; f3); r(f2; f3)g: It holds that X is represented by C. But there is no E LO concept that can represent the set ff2; f3g. A set of individuals that can be distinguished have to share some common properties merely among them, which are made explicit by the concepts that represent them. For example, the individuals f1 and f2 share the property of being connected to some individual via r role.

Proposition 1. Given an ontology O, the set of individuals in O, and X . If X and X 0 are representable, then X \ X 0 is representable in E LO. ; X 0 However, the conclusion is no longer true for the union ([) or set complement ( n ).

The next lemma tells that any concept naturally represents a special set of individuals. Lemma 2. Let C be a concept, O be an ontology and Then C represents the set S = fx 2 j O j= C(x)g. be the set of individuals in O.

Note that when a concept represents an empty set of individuals, it means that this concept is irrelevant to characterize the properties of the individuals from this ontology. Proposition 2. Given an E LO ontology O, a set of individuals, a concept C and a set X , we consider the following two decision problems: ( 1 ) Representability: Does C represent X w.r.t. O? ( 2 ) Representabilityn: For an integer n > 0, is there a concept C with jCj < n that represents X w.r.t. O?

We have Representability is in PTime and Representabilityn is in ExpTime.

The concepts representing X are equivalent in the sense of their instances. We call each of these equivalence classes a situation in O.

Definition 2 (Situation in O). Given an ontology O, a set of individuals in O, and a set X , a situation for X w.r.t. O is: jjX jjO = fC j C represents X w.r.t.O and g:

Intuitively, a situation in O explicitly characterizes, via concept descriptions, a given set of individuals in the ontology.

Proposition 3. Given an ontology O and the set of individuals in O, we assume O j= A B. Then A 2 jjX jjO if and only if B 2 jjX jjO for any X . But the inverse does not hold.

Next we define the problem of discovering situations for a set of instances w.r.t. O. Definition 3 (Situation discovery problem). Let O be an ontology and a set of individuals in O. For X , the situation discovery problem is to compute the following set: (X ) = fX1; : : : ; Xn j Xi X ; jjXijjO 6= ;g: That is, to find all the subsets of X that are representable w.r.t. O.

By Lemma 1, it is easy to see that ; is representable by ?, therefore ; 2 (X ).

The following result shows the monotonicity of the set of situations with an increase in domain elements. However, a set that is representable is not necessarily distinguishable any more if more elements are added. But a concept that can represent a set of instances still represents some (probably a different) set.

Proposition 4. Let O be an ontology and be a set of individuals in O. Consider 1 . Suppose that X 2 ( 1 ) is represented by a concept A w.r.t. O and 1. Then we have: ( 1 ) jjX jjO jjX jjO1 ( 2 ) X is not necessarily representable w.r.t. . ( 3 ) The concept A still represents some set of individuals X 0, that is, X 0 2 ( ).

An algorithm to compute situations in E LO [1,2] is given in the long version of our paper. The intuition is that: we first construct a refinement operator to find the most specific concept, called MSR, that best represents all instances in a given set X . Then any refinement of such MSR obtained by the operator w.r.t. some x 2 X will produce a new situation characterized by a concept refined from the MSR for X by the operator . By iterating this process, the situations in O can be discovered.

A prototype to support diagnosis in the avionics sector has already been implemented, and a real application in industry of this work will be discussed in a separate paper.

Baader ,

Brandt , and

Lutz . Pushing the E L envelope . In Proceedings of IJCAI'05 , 2005 .

Baader , I. Horrocks,

Lutz , and

Sattler . An Introduction to Description Logic. Cambridge University Press, 2017 .

Fanizzi , C. d'Amato, and

Esposito . Dl-foil concept learning in description logics . In International Conference on Inductive Logic Programming , pages 107 - 121 . Springer, 2008 .

Lehmann and

Hitzler . Concept learning in description logics using refinement operators . Machine Learning , 78 ( 1-2 ): 203 , 2010 .

5. S. -H. Nienhuys-Cheng and R. De Wolf. Foundations of inductive logic programming , volume 1228 . Springer Science & Business Media , 1997 .