<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Situation Discovery for Clustering Instances</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Luis Palacios</string-name>
          <email>luis.palacios@lri.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yue Ma</string-name>
          <email>ma@lri.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chantal Reynaud</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gaëlle Lortal</string-name>
          <email>gaelle.lortal@thalesgroup.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Laboratoire de Recherche en Informatique, Université Paris-Sud</institution>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Thales TRT Palaiseau</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we are interested in the problem of identifying a set of individuals in an ontology can be distinguished from the rest in the sense that, for such a set, we can find a proper formal definition, called a situation, that covers merely the individuals in this set. In the form of a concept description, a situation gives a detailed characterization of a set of instances, thus serving as an explanation for the set. This is useful for problems such as clustering instances in an explainable way. We formally define the problem of finding situations in Description Logics (DLs) and discuss a first algorithm for this problem. Concept Learning in DLs is to learn definitions of classes from existing ontologies and instance data. Most methods in this area are based on Inductive Logic Programming techniques [5,3]. Distinct from these approaches [3,4] where a set of instances is given as positive examples of the target concept, the challenge of learning situations consists in discovering distinguishable sets of instances among exponentially many. Moreover, our work deviates from standard clustering problems in that our aim is to cluster individuals in such a way that we can have a DL concept definition that describes exactly these individuals but no others. Approach Definitions Given an ontology O, we formally define our problem in terms of situations in an ontology, for which we need to first define representative concepts. Definition 1 (A representative concept). Let be a set of all individuals in an ontology O, and let X . For a concept C, we say that X is represented by C (or C represents X ) w.r.t. O and , if: (1) C(x) holds for all x 2 X , i.e. O j= C(x); and (2) C(y) does not hold for any y 2 n X ; i.e., O 6j= C(y): If there exists a concept C that represents X , we say that X is representable. Example 1 (Representative Concept). Consider the set of individuals = ff1; f2; f3g, a subset of individuals X = ff1; f2g, and the following ontology O = hT ; Ai, where T = fC 9r:&gt;g and A = fA(f1); B(f2); E(f3); r(f1; f3); r(f2; f3)g: It holds that X is represented by C. But there is no E LO concept that can represent the set ff2; f3g. A set of individuals that can be distinguished have to share some common properties merely among them, which are made explicit by the concepts that represent them. For example, the individuals f1 and f2 share the property of being connected to some individual via r role.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Proposition 1. Given an ontology O, the set of individuals in O, and X
. If X and X 0 are representable, then X \ X 0 is representable in E LO.
; X 0
However, the conclusion is no longer true for the union ([) or set complement ( n ).</p>
      <p>The next lemma tells that any concept naturally represents a special set of individuals.
Lemma 2. Let C be a concept, O be an ontology and
Then C represents the set S = fx 2 j O j= C(x)g.
be the set of individuals in O.</p>
      <p>
        Note that when a concept represents an empty set of individuals, it means that this
concept is irrelevant to characterize the properties of the individuals from this ontology.
Proposition 2. Given an E LO ontology O, a set of individuals, a concept C and a
set X , we consider the following two decision problems: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) Representability:
Does C represent X w.r.t. O? (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) Representabilityn: For an integer n &gt; 0, is there a
concept C with jCj &lt; n that represents X w.r.t. O?
      </p>
      <p>We have Representability is in PTime and Representabilityn is in ExpTime.</p>
      <p>The concepts representing X are equivalent in the sense of their instances. We call
each of these equivalence classes a situation in O.</p>
      <p>Definition 2 (Situation in O). Given an ontology O, a set of individuals in O, and a
set X , a situation for X w.r.t. O is: jjX jjO = fC j C represents X w.r.t.O and g:</p>
      <p>Intuitively, a situation in O explicitly characterizes, via concept descriptions, a given
set of individuals in the ontology.</p>
      <p>Proposition 3. Given an ontology O and the set of individuals in O, we assume
O j= A B. Then A 2 jjX jjO if and only if B 2 jjX jjO for any X . But the
inverse does not hold.</p>
      <p>Next we define the problem of discovering situations for a set of instances w.r.t. O.
Definition 3 (Situation discovery problem). Let O be an ontology and a set of
individuals in O. For X , the situation discovery problem is to compute the following
set: (X ) = fX1; : : : ; Xn j Xi X ; jjXijjO 6= ;g: That is, to find all the subsets of X
that are representable w.r.t. O.</p>
      <p>By Lemma 1, it is easy to see that ; is representable by ?, therefore ; 2 (X ).</p>
      <p>The following result shows the monotonicity of the set of situations with an increase
in domain elements. However, a set that is representable is not necessarily distinguishable
any more if more elements are added. But a concept that can represent a set of instances
still represents some (probably a different) set.</p>
      <p>
        Proposition 4. Let O be an ontology and be a set of individuals in O. Consider
1 . Suppose that X 2 (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) is represented by a concept A w.r.t. O and 1. Then
we have: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) jjX jjO jjX jjO1 (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) X is not necessarily representable w.r.t. . (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ) The
concept A still represents some set of individuals X 0, that is, X 0 2 ( ).
      </p>
      <p>An algorithm to compute situations in E LO [1,2] is given in the long version of our
paper. The intuition is that: we first construct a refinement operator to find the most
specific concept, called MSR, that best represents all instances in a given set X . Then
any refinement of such MSR obtained by the operator w.r.t. some x 2 X will produce
a new situation characterized by a concept refined from the MSR for X by the operator
. By iterating this process, the situations in O can be discovered.</p>
      <p>A prototype to support diagnosis in the avionics sector has already been implemented,
and a real application in industry of this work will be discussed in a separate paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brandt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          .
          <article-title>Pushing the E L envelope</article-title>
          .
          <source>In Proceedings of IJCAI'05</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          , I. Horrocks,
          <string-name>
            <given-names>C.</given-names>
            <surname>Lutz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>U.</given-names>
            <surname>Sattler</surname>
          </string-name>
          . An Introduction to Description Logic. Cambridge University Press,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>N.</given-names>
            <surname>Fanizzi</surname>
          </string-name>
          , C. d'Amato, and
          <string-name>
            <given-names>F.</given-names>
            <surname>Esposito</surname>
          </string-name>
          .
          <article-title>Dl-foil concept learning in description logics</article-title>
          .
          <source>In International Conference on Inductive Logic Programming</source>
          , pages
          <fpage>107</fpage>
          -
          <lpage>121</lpage>
          . Springer,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Hitzler</surname>
          </string-name>
          .
          <article-title>Concept learning in description logics using refinement operators</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>78</volume>
          (
          <issue>1-2</issue>
          ):
          <fpage>203</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. S.
          <string-name>
            <surname>-H.</surname>
          </string-name>
          Nienhuys-Cheng and R. De Wolf.
          <article-title>Foundations of inductive logic programming</article-title>
          , volume
          <volume>1228</volume>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>