<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards Situation Discovery for Clustering Instances</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Luis</forename><surname>Palacios</surname></persName>
							<email>luis.palacios@lri.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Laboratoire de Recherche en Informatique</orgName>
								<orgName type="institution">Université Paris-Sud</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<orgName type="institution">Thales TRT Palaiseau</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yue</forename><surname>Ma</surname></persName>
							<email>ma@lri.fr</email>
							<affiliation key="aff0">
								<orgName type="laboratory">Laboratoire de Recherche en Informatique</orgName>
								<orgName type="institution">Université Paris-Sud</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Chantal</forename><surname>Reynaud</surname></persName>
							<affiliation key="aff0">
								<orgName type="laboratory">Laboratoire de Recherche en Informatique</orgName>
								<orgName type="institution">Université Paris-Sud</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Gaëlle</forename><surname>Lortal</surname></persName>
							<email>gaelle.lortal@thalesgroup.com</email>
							<affiliation key="aff1">
								<orgName type="institution">Thales TRT Palaiseau</orgName>
								<address>
									<country key="FR">France</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards Situation Discovery for Clustering Instances</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">AD91318663CA18E2DC209983F769BC95</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-23T21:13+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract/>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In this paper, we are interested in the problem of identifying a set of individuals in an ontology can be distinguished from the rest in the sense that, for such a set, we can find a proper formal definition, called a situation, that covers merely the individuals in this set. In the form of a concept description, a situation gives a detailed characterization of a set of instances, thus serving as an explanation for the set. This is useful for problems such as clustering instances in an explainable way. We formally define the problem of finding situations in Description Logics (DLs) and discuss a first algorithm for this problem.</p><p>Concept Learning in DLs is to learn definitions of classes from existing ontologies and instance data. Most methods in this area are based on Inductive Logic Programming techniques <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b2">3]</ref>. Distinct from these approaches <ref type="bibr" target="#b2">[3,</ref><ref type="bibr" target="#b3">4]</ref> where a set of instances is given as positive examples of the target concept, the challenge of learning situations consists in discovering distinguishable sets of instances among exponentially many. Moreover, our work deviates from standard clustering problems in that our aim is to cluster individuals in such a way that we can have a DL concept definition that describes exactly these individuals but no others.</p><p>Approach Definitions Given an ontology O, we formally define our problem in terms of situations in an ontology, for which we need to first define representative concepts. Definition 1 (A representative concept). Let ∆ be a set of all individuals in an ontology O, and let X ⊆ ∆. For a concept C, we say that X is represented by C (or C represents X ) w.r.t. O and ∆, if: (1) C(x) holds for all x ∈ X , i.e. O |= C(x), and (2) C(y) does not hold for any y ∈ ∆ \ X , i.e., O |= C(y). If there exists a concept C that represents X , we say that X is representable. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 1 (Representative Concept). Consider the set of individuals</head><formula xml:id="formula_0">∆ = {f 1 , f 2 , f 3 }, a subset of individuals X = {f 1 , f 2 },</formula><formula xml:id="formula_1">= {A(f 1 ), B(f 2 ), E(f 3 ), r(f 1 , f 3 ), r(f 2 , f 3 )}. It holds that X is represented by C. But there is no ELO concept that can represent the set {f 2 , f 3 }.</formula><p>A set of individuals that can be distinguished have to share some common properties merely among them, which are made explicit by the concepts that represent them. For example, the individuals f 1 and f 2 share the property of being connected to some individual via r role. However, the conclusion is no longer true for the union (∪) or set complement ( \ ).</p><p>The next lemma tells that any concept naturally represents a special set of individuals. Note that when a concept represents an empty set of individuals, it means that this concept is irrelevant to characterize the properties of the individuals from this ontology. Proposition 2. Given an ELO ontology O, a set ∆ of individuals, a concept C and a set X ⊆ ∆, we consider the following two decision problems: (1) Representability: Does C represent X w.r.t. O? (2) Representability n : For an integer n &gt; 0, is there a concept C with |C| &lt; n that represents X w.r.t. O?</p><p>We have Representability is in PTime and Representability n is in ExpTime.</p><p>The concepts representing X are equivalent in the sense of their instances. We call each of these equivalence classes a situation in O. Next we define the problem of discovering situations for a set of instances w.r.t. O.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Definition 3 (Situation discovery problem).</head><p>Let O be an ontology and ∆ a set of individuals in O. For X ⊆ ∆, the situation discovery problem is to compute the following set:</p><formula xml:id="formula_2">Ξ(X ) = {X 1 , . . . , X n | X i ⊆ X , ||X i || O ∆ = ∅}.</formula><p>That is, to find all the subsets of X that are representable w.r.t. O.</p><p>By Lemma 1, it is easy to see that ∅ is representable by ⊥, therefore ∅ ∈ Ξ(X ).</p><p>The following result shows the monotonicity of the set of situations with an increase in domain elements. However, a set that is representable is not necessarily distinguishable any more if more elements are added. But a concept that can represent a set of instances still represents some (probably a different) set. Proposition 4. Let O be an ontology and ∆ be a set of individuals in O. Consider ∆ 1 ⊆ ∆. Suppose that X ∈ Ξ(∆ 1 ) is represented by a concept A w.r.t. O and ∆ 1 . Then we have:</p><formula xml:id="formula_3">(1) ||X || O ∆ ⊆ ||X || O ∆1 (2) X is not necessarily representable w.r.t. ∆. (3) The concept A still represents some set of individuals X , that is, X ∈ Ξ(∆).</formula><p>An algorithm to compute situations in ELO <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> is given in the long version of our paper. The intuition is that: we first construct a refinement operator α to find the most specific concept, called MSR, that best represents all instances in a given set X . Then any refinement of such MSR obtained by the operator α w.r.t. some x ∈ X will produce a new situation characterized by a concept refined from the MSR for X by the operator α. By iterating this process, the situations in O can be discovered.</p><p>A prototype to support diagnosis in the avionics sector has already been implemented, and a real application in industry of this work will be discussed in a separate paper.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>and the following ontology O = T , A , where T = {C ≡ ∃r. } and A</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Lemma 1 .</head><label>1</label><figDesc>Given an ontology O and a set ∆ of individuals, we have: (1) is a representative concept for ∆ and (2) ⊥ is a representative concept for ∅. Proposition 1. Given an ontology O, the set ∆ of individuals in O, and X ⊆ ∆, X ⊆ ∆. If X and X are representable, then X ∩ X is representable in ELO.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Lemma 2 .</head><label>2</label><figDesc>Let C be a concept, O be an ontology and ∆ be the set of individuals in O. Then C represents the set S = {x ∈ ∆ | O |= C(x)}.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>Definition 2 (Proposition 3 .</head><label>23</label><figDesc>Situation in O). Given an ontology O, a set ∆ of individuals in O, and a set X ⊆ ∆, a situation for X w.r.t. O is: ||X || O ∆ = {C | C represents X w.r.t.O and ∆}. Intuitively, a situation in O explicitly characterizes, via concept descriptions, a given set of individuals in the ontology. Given an ontology O and the set ∆ of individuals in O, we assume O |= A ≡ B. Then A ∈ ||X || O ∆ if and only if B ∈ ||X || O ∆ for any X ⊆ ∆. But the inverse does not hold.</figDesc></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Pushing the EL envelope</title>
		<author>
			<persName><forename type="first">F</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Brandt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of IJCAI&apos;05</title>
				<meeting>IJCAI&apos;05</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<title level="m" type="main">An Introduction to Description Logic</title>
		<author>
			<persName><forename type="first">F</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Lutz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Sattler</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2017">2017</date>
			<publisher>Cambridge University Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Dl-foil concept learning in description logics</title>
		<author>
			<persName><forename type="first">N</forename><surname>Fanizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Esposito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Inductive Logic Programming</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="107" to="121" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Concept learning in description logics using refinement operators</title>
		<author>
			<persName><forename type="first">J</forename><surname>Lehmann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Hitzler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Machine Learning</title>
		<imprint>
			<biblScope unit="volume">78</biblScope>
			<biblScope unit="issue">1-2</biblScope>
			<biblScope unit="page">203</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<title level="m" type="main">Foundations of inductive logic programming</title>
		<author>
			<persName><forename type="first">S.-H</forename><surname>Nienhuys-Cheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">De</forename><surname>Wolf</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1997">1997</date>
			<publisher>Springer Science &amp; Business Media</publisher>
			<biblScope unit="volume">1228</biblScope>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
