<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Approximate Measures of Semantic Dissimilarity under Uncertainty</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Nicola</forename><surname>Fanizzi</surname></persName>
							<email>fanizzi@di.uniba.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica Università degli studi di Bari Campus Universitario Via</orgName>
								<address>
									<addrLine>Orabona 4</addrLine>
									<postCode>70125</postCode>
									<settlement>Bari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">'</forename><surname>Amato</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica Università degli studi di Bari Campus Universitario Via</orgName>
								<address>
									<addrLine>Orabona 4</addrLine>
									<postCode>70125</postCode>
									<settlement>Bari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Floriana</forename><surname>Esposito</surname></persName>
							<email>esposito@di.uniba.it</email>
							<affiliation key="aff0">
								<orgName type="department">Dipartimento di Informatica Università degli studi di Bari Campus Universitario Via</orgName>
								<address>
									<addrLine>Orabona 4</addrLine>
									<postCode>70125</postCode>
									<settlement>Bari</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Approximate Measures of Semantic Dissimilarity under Uncertainty</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">62F06CF158676E9D84E29540407AA07B</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T03:04+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>We propose semantic distance measures based on the criterion of approximate discernibility and on evidence combination. In the presence of incomplete knowledge, the distance measures the degree of belief in the discernibility of two individuals by combining estimates of basic probability masses related to a set of discriminating features. We also suggest ways to extend this distance for comparing individuals to concepts and concepts to other concepts.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>In the context of reasoning in the Semantic Web, a growing interest is being committed to alternative inductive procedures extending the scope of the methods that can be applied to concept representations. Among them, many are based on a notion of similarity such as case-based reasoning <ref type="bibr" target="#b3">[4]</ref>, retrieval <ref type="bibr" target="#b2">[3]</ref>, conceptual clustering <ref type="bibr" target="#b6">[7]</ref> or ontology matching <ref type="bibr" target="#b5">[6]</ref>. However this notion is not easily captured in a definition, especially in the presence of uncertainty.</p><p>As pointed out in the seminal paper <ref type="bibr" target="#b1">[2]</ref> concerning similarity in Description Logics (DL), most of the existing measures focus on the similarity of atomic concepts within simple hierarchies. Besides, alternative approaches are based on related notions of feature similarity or information content. All these approaches have been specifically aimed at assessing concept similarity.</p><p>In the perspective of crafting inductive methods for the aforementioned tasks, the need for a definition of a semantic similarity measure for individuals arises, that is a problem that so far received less attention in the literature. Some dissimilarity measures for individuals in specific DL representations have recently been proposed which turned out to be practically effective for the targeted inductive tasks <ref type="bibr" target="#b2">[3]</ref>, however they are still partly based on structural criteria which determine also their main weakness: they can hardly scale to complex languages.</p><p>We devised a new family of dissimilarity measures for semantically annotated resources, which can overcome the aforementioned limitations <ref type="bibr" target="#b7">[8]</ref>. Our measures are mainly based on Minkowski's measures for Euclidean spaces <ref type="bibr" target="#b17">[18]</ref> induced by means of a method developed in the context of relational machine learning <ref type="bibr" target="#b13">[14]</ref>.</p><p>We extend the idea a notion of discernibility borrowed from rough sets theory <ref type="bibr" target="#b12">[13]</ref> which aims at the formal definition of vague sets (concepts) by means of their approximations. In this paper, we propose (semi-)distance measures based on semantic discernibility and on evidence combination <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b14">15]</ref>.</p><p>Namely, the measures are based on the degree of discernibility of the input individuals with respect to a committee of features, which are represented by concept descriptions expressed in the concept language of choice. One of the advantages of these measures is that they do not rely on a particular language for semantic annotations. However, these new measures are not to be regarded as absolute, since they depend both on the choice (and cardinality) of the features committee and on the knowledge base they are applied to. These measures can easily be computed based on statistics on individuals that are likely to be maintained by knowledge base management systems designed for storing instances (e.g. <ref type="bibr" target="#b9">[10]</ref>), which can determine a potential speed-up in the measure computation during knowledge-intensive tasks.</p><p>Furthermore, we also propose a way to extend the presented measures to the case of assessing concept similarity by means of the notion of medoid <ref type="bibr" target="#b10">[11]</ref>, i.e., in a categorical context, the most centrally located individual in a concept extension w.r.t. a given metric.</p><p>The remainder of the paper is organized as follows. In the next section, we recall the basics of approximate semantic distance measures for individuals in a DL knowledge base. Hence, we extend the measures with a more principled treatment of uncertainty based on evidence combination. Conclusions discuss the applicability of these measures in further works,</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Semantic Distance Measures</head><p>Since our method is not intended for a particular representation, in the following we assume that resources, concepts and their relationships may be defined in terms of a generic representation that may be mapped to some DL language with the standard model-theoretic semantics (see the handbook <ref type="bibr" target="#b0">[1]</ref> for a thorough reference).</p><p>In this context, a knowledge base K = T , A contains a TBox T and an ABox A. T is a set of concept definitions. A contains assertions concerning the world state. The set of the individuals (resources) occurring in A will be denoted with Ind(A). Each individual can be assumed to be identified by its own URI. Sometimes, it could be useful to make the unique names assumption on such individuals.</p><p>As regards the inference services, our procedure requires performing instancechecking and the related service of retrieval, which will be used for the approximations.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1">A Simple Semantic Metric for Individuals</head><p>Aiming at distance-based tasks, such as clustering or similarity search, we have developed a new measure with a definition that totally depends on semantic aspects of the individuals in the knowledge base <ref type="bibr" target="#b7">[8]</ref>, following ideas borrowed from metric learning in clausal spaces <ref type="bibr" target="#b13">[14]</ref>.</p><p>Indeed, for our purposes, we needed functions to measure the (dis)similarity of individuals. However individuals do not have a syntactic (or algebraic) structure that can be compared. Then the underlying idea is that. on a semantic level, similar individuals should behave similarly with respect to the same concepts. A way for assessing the similarity of individuals in a knowledge base can be based on the comparison of their semantics along a number of dimensions represented by a set of concept descriptions (henceforth referred to as the committee). Particularly, the rationale of the measure is to compare them on the grounds of their behavior w.r.t. a given set of concept descriptions, say F = {F 1 , F 2 , . . . , F k }, which stands as a group of discriminating features expressed in the language taken into account.</p><p>We begin with defining the behavior of an individual w.r.t. a certain concept in terms of projecting it in this dimension:</p><formula xml:id="formula_0">Definition 2.1 (projection function). Given a concept F i ∈ F, the related projection function π i : Ind(A) → {0, 1/2, 1} is defined: ∀a ∈ Ind(A) π i (a) :=    1 K |= F i (a) 0 K |= ¬F i (a) 1/2 otherwise</formula><p>The case of π i (a) = 1/2 corresponds to the case when a reasoner cannot give the truth value for a certain membership query. This is due to the Open World Assumption normally made in this context. Hence, as in the classic probabilistic models uncertainty is coped with by considering a uniform distribution over the possible cases. Now the discernibility functions related to the committee concepts which compare the two input individuals w.r.t. these concepts through their projections:</p><formula xml:id="formula_1">Definition 2.2 (discernibility function). Given a feature concept F i ∈ F, the related discernibility function δ i : Ind(A) × Ind(A) → [0, 1] is defined as follows: ∀(a, b) ∈ Ind(A) × Ind(A) δ i (a, b) = |π i (a) − π i (b)|</formula><p>Finally, a whole family of distance functions for individuals inspired to Minkowski's distances L p <ref type="bibr" target="#b17">[18]</ref> can be defined as follows <ref type="bibr" target="#b7">[8]</ref>:</p><formula xml:id="formula_2">Definition 2.3 (dissimilarity measures). Let K = T , A be a knowledge base. Given a set of concept descriptions F = {F 1 , F 2 , . . . , F k }, a family of dis- similarity measures {d F p } p∈IN , contains functions d F p : Ind(A) × Ind(A) → [0, 1] defined ∀(a, b) ∈ Ind(A) × Ind(A): d F p (a, b) := L p (π i (a), π i (b)) |F| = 1 k p k i=1 δ i (a, b) p</formula><p>Note that k depends on F and the effect of the factor 1/k is just to normalize the norms w.r.t. the number of features that are involved. It is worthwhile to recall that these measures are not absolute, then they should be also be considered w.r.t. the committee of choice, hence comparisons across different committees may not be meaningful. Larger committees are likely to decrease the measures because of the normalizing factor yet these values is affected also by the degree of redundancy of the features employed.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2">Example</head><p>Let us consider a knowledge base in a DL language made up of a TBox: T = { Female ≡ ¬Male, Parent ≡ ∀child.Being ∃child.Being, Father ≡ Male Parent, FatherWithoutSons ≡ Father ∀child.Female } and of an ABox: A = { Being(ZEUS), Being(APOLLO), Being(HERCULES), Being(HERA), Male(ZEUS), Male(APOLLO), Male(HERCULES), Parent(ZEUS), Parent(APOLLO), ¬Father(HERA), God(ZEUS), God(APOLLO), God(HERA), ¬God(HERCULES), hasChild(ZEUS, APOLLO), hasChild(HERA, APOLLO), hasChild(ZEUS, HERCULES),</p><formula xml:id="formula_3">} Suppose F = {F 1 , F 2 , F 3 , F 4 } = {Male, God, Parent, FatherWithoutSons}. Let us compute the distances (with p = 1): d F 1 (ZEUS, HERA) = (|1 − 0| + |1 − 1| + |1 − 1| + |0 − 0|) /4 = 1/4 d F 1 (HERA, APOLLO) = (|0 − 1| + |1 − 1| + |1 − 1| + |0 − 1/2|) /4 = 3/8 d F 1 (APOLLO, HERCULES) = (|1 − 1| + |1 − 0| + |1 − 1/2| + |1/2 − 1/2|) /4 = 3/8 d F 1 (HERCULES, ZEUS) = (|1 − 1| + |0 − 1| + |1/2 − 1| + |1/2 − 0|) /4 = 1/2 d F 1 (HERA, HERCULES) = (|0 − 1| + |1 − 0| + |1 − 1/2| + |0 − 1/2|) /4 = 3/4 d F 1 (APOLLO, ZEUS) = (|1 − 1| + |1 − 1| + |1 − 1| + |1/2 − 0|) /4 = 1/8 2.</formula></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Discussion</head><p>It is easy to prove that these dissimilarity functions have the standard properties for semi-distances <ref type="bibr" target="#b7">[8]</ref>:</p><p>Proposition 2.1 (semi-distance). For a fixed feature set F and p &gt; 0, given any three instances a, b, c ∈ Ind(A). it holds that:</p><formula xml:id="formula_4">1. d F p (a, b) ≥ 0 and d F p (a, b) = 0 if a = b 2. d F p (a, b) = d F p (b, a) 3. d F p (a, c) ≤ d F p (a, b) + d F p (b, c)</formula><p>This measure is not a distance since it does not hold that a = b if d F p (a, b) = 0. This is the case of indiscernible individuals with respect to the given committee of features F. However, if the unique names assumption were made then one may define a supplementary dimension for the committee (a sort of meta-feature F 0 ) based on equality, such that:</p><formula xml:id="formula_5">∀(a, b) ∈ Ind(A) × Ind(A) δ 0 (a, b) := 0 a = b 1 a = b</formula><p>and then</p><formula xml:id="formula_6">d F p (a, b) := 1 k + 1 p k i=0 δ i (a, b) p</formula><p>The resulting measures are distance measures.</p><p>Compared to other proposed dissimilarity measures <ref type="bibr" target="#b1">[2,</ref><ref type="bibr" target="#b2">3]</ref>, the presented functions do not depend on the constructors of a specific language, rather they require only (retrieval or) instance-checking for computing the projections through class-membership queries to the knowledge base.</p><p>The complexity of measuring he dissimilarity of two individuals depends on the complexity of such inferences (see <ref type="bibr" target="#b0">[1]</ref>, Ch. 3). Note also that the projections that determine the measure can be computed (or derived from statistics maintained on the knowledge base) before the actual distance application, thus determining a speed-up in the computation of the measure. This is very important for algorithms that massively use this distance, such as all instance-based methods.</p><p>So far we made the assumption that F may represent a sufficient number of (possibly redundant) features that are able to discriminate really different individuals. The choice of the concepts to be included -feature selection -may be crucial. Therefore, we have devised specific optimization algorithms founded in randomized search which are able to find optimal choices of discriminating concept committees <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b6">7]</ref>.</p><p>The fitness function to be optimized is based on the discernibility factor [13] of the committee. Given the whole set of individuals Ind(A) (or just a holdout sample to be used to induce an optimal measure) HS ⊆ Ind(A) the fitness function to be maximized is:</p><formula xml:id="formula_7">discernibility(F, HS ) := (a,b)∈HS 2 k i=1 δ i (a, b)</formula><p>However, the results obtained so far with knowledge bases drawn from ontology libraries <ref type="bibr" target="#b6">[7,</ref><ref type="bibr" target="#b8">9]</ref> show that (a selection) of the (primitive and defined) concepts is often sufficient to induce satisfactory dissimilarity measures.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Dissimilarity Measures Based on Uncertainty</head><p>The measure defined in the previous section deals with uncertainty in a uniform way: in particular, the degree of discernibility of two individuals is null when they have the same behavior w.r.t. the same feature, even in the presence of total uncertainty of class-membership for both. When uncertainty regards only one projection, then they are considered partially (possibly) similar.</p><p>We would like to make this uncertainty more explicit<ref type="foot" target="#foot_0">1</ref> . One way to deal with uncertainty would be considering intervals rather than numbers in [0,1] as a measure of dissimilarity. This is similar to the case of imprecise probabilities <ref type="bibr" target="#b16">[17]</ref>.</p><p>In order to extend the measure, we propose an epistemic definition based on rules for combining evidence <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b14">15]</ref>. The ultimate aim is to assess the distance between two individuals as a combination of the evidence that they differ based on some selected features (as in the previous section).</p><p>The distance measure that is to be defined is again based on the degree of belief of discernibility of individuals w.r.t. such features. To this purpose the probability masses of the basic events (class-membership) have to be assessed. However, in this case we will not treat uncertainty in the classic probabilistic way (uniform probability). Rather, we intend to take into account uncertainty in the computation.</p><p>The new dissimilarity measure will be derived as a combination of the degree of belief in the discernibility of the individuals w.r.t. each single feature. Before introducing the combination rule (that will have the measure as a specialization), the basic probability assignments have to be considered, especially for the cases when instance-checking is not able to provide a certain answer.</p><p>As in previous works <ref type="bibr" target="#b2">[3]</ref>, we may estimate the concept extensions recurring to their retrieval <ref type="bibr" target="#b0">[1]</ref>, i.e. the individuals of the ABox that can be proved to belong to a concept. Thus, in case of uncertainty, the basic probabilities masses for each feature concept, can be approximated 2 in the following way: ∀i ∈ {1, . . . , k}</p><formula xml:id="formula_8">m i (K |= F i (a)) ≈ |retrieval(F i , K)|/|Ind(A)| m i (K |= ¬F i (a)) ≈ |retrieval(¬F i , K)|/|Ind(A)| m i (K |= F i (a) ∨ K |= ¬F i (a)) ≈ 1 − m i (K |= F i (a)) − m i (K |= ¬F i (a))</formula><p>where the retrieval(•, •) operator returns the individuals which can be proven to be members of the argument concept in the context of the current knowledge base <ref type="bibr" target="#b0">[1]</ref>. The rationale is that the larger the (estimated) extension the more likely is for individuals to belong to the concept. These approximated probability masses become more precise as more information is acquired. Alternatively, these masses could come with the ontology as a supplementary for of prior knowledge.</p><p>As in the previous section, we define a discernibility function related to a fixed concept which measures the amount of evidence that two input individuals may be separated by that concept: Definition 3.1 (discernibility function). Given a feature concept F i ∈ F, the related discernibility function</p><formula xml:id="formula_9">δ i : Ind(A) × Ind(A) → [0, 1] is defined as follows: ∀(a, b) ∈ Ind(A) × Ind(A) δ i (a, b) :=        m i (K |= ¬F i (b)) if K |= F i (a) m i (K |= F i (b)) else if K |= ¬F i (a) δ i (b, a) else if K |= F i (b) ∨ K |= ¬F i (b) 2 • m i (K |= F i (a)) • m i (K |= ¬F i (b)) otherwise</formula><p>The extreme values {0, 1} are returned when the answers from the instancechecking service are certain for both individuals. If the first individual is an instance of the i-th feature (resp., its complement) then the discernibility depends on the belief of class-membership to the complement concept of the other individual. Otherwise, if there is uncertainty for the former individual but not for the latter, the function changes its perspective, swapping the roles of the two individuals. Finally, in case there were uncertainty for both individuals, the discernibility is computed as the chance that they may belong one to the feature concept and one to its complement,</p><p>The combined degree of belief in the case of discernible individuals, assessed using the mixing combination rule <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b14">15]</ref>, can give a measure of the semantic distance between them. Definition 3.2 (weighted average measure). Given an ABox A, a dissimilarity measure for the individuals in A</p><formula xml:id="formula_10">d F avg : Ind(A) × Ind(A) → [0, 1]</formula><p>is defined as follows:</p><formula xml:id="formula_11">∀(a, b) ∈ Ind(A) × Ind(A) d F avg (a, b) := k i=1 w i δ i (a, b)</formula><p>The choices for the weights are various. The most straightforward one is, of course, considering uniform weights: w i = 1/k. Another one is</p><formula xml:id="formula_12">w i = u i u</formula><p>where</p><formula xml:id="formula_13">u i = 1 |Ind(A) \ retrieval(F k , K)| and u = k i=1 u i</formula><p>It is easy to see that this can be considered as a generalization of the measure defined in the previous section (for p = 1).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1">Discussion</head><p>It can be proved that function has the standard properties for semi-distances: Proposition 3.1 (semi-distance). For a fixed choice of weights {w i } k i=1 , function d F avg is a semi-distance.</p><p>The underlying idea for the measure is to combine the belief of the dissimilarity of the two input individuals provided by several sources, that are related to the feature concepts. In the original framework for evidence composition the various sources are supposed to be independent, which is generally unlikely to hold. Yet, from a practical viewpoint, overlooking this property for the sake of simplicity may still lead to effective methods, as the Naïve Bayes approach in Machine Learning demonstrates.</p><p>It could also be criticized that the subsumption hierarchy has not been explicitly involved. However, this may be actually yielded as a side-effect of the possible partial redundancy of the various concepts, which has an impact on their extensions and thus on the related projection function. A tradeoff is to be made between the number of features employed and the computational effort required for computing the related projection functions.</p><p>The discriminating power of each feature concept can be weighted in terms of information and entropy measures. Namely, the degree of information yielded by each of these features can be estimated as follows:</p><formula xml:id="formula_14">H i (a, b) = − A⊆Θ m i (A) log(m i (A))</formula><p>where 2 Θ , w.r.t. the frame of discernment<ref type="foot" target="#foot_2">3</ref>  <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b14">15]</ref> Θ = {D, D}. then, the sum</p><formula xml:id="formula_15">(a,b)∈HS H i (a, b)</formula><p>provides a measure of the utility of the discernibility function related to each feature which can be used in randomized optimization algorithms.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2">Extensions</head><p>Following the rationale of the average link criterion used in agglomerative clustering <ref type="bibr" target="#b10">[11]</ref>, the measures can be extended to the case of concepts, by recurring to the notion of medoids.</p><p>The medoid of a group of individuals is the individual that has the highest similarity w.r.t. the others. Formally. given a group G = {a 1 , a 2 , . . . , a n }, the medoid is defined: p (for some p &gt; 0 and committee F). Then the function for concepts can be defined as follows:</p><formula xml:id="formula_16">medoid(G) = argmin</formula><formula xml:id="formula_17">d F p (C 1 , C 2 ) := d F p (m 1 , m 2 )</formula><p>Similarly, if the distance of an individual a to a concept C has to be assessed, one could consider the nearest (resp. farthest) individual in the concept extension or its medoid. Let m = medoid(retrieval(C)) w.r.t. a given measure d F p . Then the measure for this case can be defined as follows:</p><formula xml:id="formula_18">d F p (a, C) := d F p (a, m)</formula><p>Of course these approximate measures become more and more precise as the knowledge base is populated with an increasing number of individuals.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Concluding Remarks</head><p>We have proposed the definition of dissimilarity measures over spaces of individuals in a knowledge base. The measures are not language-dependent, differently from other previous proposals <ref type="bibr" target="#b2">[3]</ref>, yet they are parameterized on a committee of concepts. Optimal committees can be found via randomized search methods <ref type="bibr" target="#b7">[8]</ref>.</p><p>Besides, we have extended the measures to cope with cases of uncertainty by means of a simple evidence combination method.</p><p>One of the advantages of the measures is that their computation can be very efficient in cases when statistics (on class-membership) are maintained by the KBMS <ref type="bibr" target="#b9">[10]</ref>. As previously mentioned, the subsumption relationships among concepts in the committee is not explicitly exploited in the measure for making the relative distances more accurate. The extension to the case of concept distance may also be improved. Hence, scalability should be guaranteed as far as a good committee has been found and does not change also because of the locality properties observed for instances in several domains (e.g. social or biological networks).</p><p>A refinement of the committee may become necessary only when a degradation of the discernibility factor is detected due to the availability of somewhat new individuals. This may involve further tasks such as novelty or concept drift detection.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1">Applications</head><p>The measures have been integrated in an instance-based learning system implementing a nearest-neighbor learning algorithm: an experimentation on performing semantic-based retrieval proved the effectiveness of the new measures, compared to the outcomes obtained adopting other measures <ref type="bibr" target="#b2">[3]</ref>. It is worthwhile to mention that results where not particularly affected by feature selection: often using the very concepts defined in the knowledge base provides good committees which are able to discern among the different individuals <ref type="bibr" target="#b8">[9]</ref>.</p><p>We are also exploiting the implementation of these measures for performing conceptual clustering <ref type="bibr" target="#b10">[11]</ref>, where (a hierarchy of) clusters is created by grouping instances on the grounds of their similarity, possibly triggering the induction of new emerging concepts <ref type="bibr" target="#b6">[7]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2">Extensions</head><p>The measure may have a wide range of application of distance-based methods to knowledge bases. For example, logic approaches to ontology matching <ref type="bibr" target="#b5">[6]</ref> may be backed up by the usage of our measures, especially when concepts to be matched across different terminologies are known to share a common set of individuals. Ontology matching could be a phase in a larger process aimed at data integration. Moreover metrics could also support a process of (semi-)automated classification of new data also as a first step towards ontology evolution.</p><p>Another problem that could be tackled by means of dissimilarity measures could be the ranking of the answers provided by a matchmaking algorithm based on the similarity between the concept representing the query and the retrieved individuals.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>, a j ) Now, given two concepts C 1 , C 2 , we can consider the two corresponding groups of individuals obtained by retrieval R i = {a ∈ Ind(A) | K |= C i (a)}, and their resp. medoids m i = medoid(R i ) for i = 1, 2 w.r.t. a given measure d F</figDesc></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">We are referring to a notion of epistemic (rather than aleatory) probability<ref type="bibr" target="#b14">[15]</ref>, which seems more suitable for our purposes. See Shafer's introductory chapter in<ref type="bibr" target="#b15">[16]</ref> on this distinction.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_1">In case of a certain answer received from the reasoner, the probability mass amounts to 0 or 1.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">Here D stands for the case of discernible individuals w.r.t. Fi, D for the opposite case, and some probability mass may be assigned also to the uncertain case represented by {D, D}.</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>The authors would like to thank the anonymous reviewers for their suggestions.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">F</forename><surname>Baader</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Calvanese</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Mcguinness</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nardi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Patel-Schneider</surname></persName>
		</author>
		<title level="m">editors. The Description Logic Handbook</title>
				<imprint>
			<publisher>Cambridge University Press</publisher>
			<date type="published" when="2003">2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Towards measuring similarity in description logics</title>
		<author>
			<persName><forename type="first">A</forename><surname>Borgida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">J</forename><surname>Walsh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Hirsh</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of the International Description Logics Workshop</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">I</forename><surname>Horrocks</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">U</forename><surname>Sattler</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">F</forename><surname>Wolter</surname></persName>
		</editor>
		<meeting><address><addrLine>Edinburgh, UK</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="volume">147</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Reasoning by analogy in description logics through instance-based learning</title>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Fanizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Esposito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of Semantic Web Applications and Perspectives, 3rd Italian Semantic Web Workshop</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">G</forename><surname>Tummarello</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">P</forename><surname>Bouquet</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">O</forename><surname>Signore</surname></persName>
		</editor>
		<meeting>Semantic Web Applications and Perspectives, 3rd Italian Semantic Web Workshop<address><addrLine>Pisa, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2006">SWAP2006. 2006</date>
			<biblScope unit="volume">201</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Decentralized case-based reasoning for the Semantic Web</title>
		<author>
			<persName><forename type="first">M</forename><surname>D'aquin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Lieber</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Napoli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 4th International Semantic Web Conference, ISWC2005, number 3279 in LNCS</title>
				<editor>
			<persName><forename type="first">Y</forename><surname>Gil</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">E</forename><surname>Motta</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">V</forename><surname>Benjamins</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Musen</surname></persName>
		</editor>
		<meeting>the 4th International Semantic Web Conference, ISWC2005, number 3279 in LNCS</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="142" to="155" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">On the combination of evidence in various mathematical frameworks</title>
		<author>
			<persName><forename type="first">D</forename><surname>Dubois</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Prade</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Reliability Data Collection and Analysis</title>
				<editor>
			<persName><forename type="first">J</forename><surname>Flamm</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Luisi</surname></persName>
		</editor>
		<imprint>
			<date type="published" when="1992">1992</date>
			<biblScope unit="page" from="213" to="241" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<title level="m" type="main">Ontology matching</title>
		<author>
			<persName><forename type="first">J</forename><surname>Euzenat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shvaiko</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2007">2007</date>
			<publisher>Springer</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Evolutionary conceptual clustering of semantically annotated resources</title>
		<author>
			<persName><forename type="first">N</forename><surname>Fanizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Esposito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 1st International Conference on Semantic Computing, IEEE-ICSC2007</title>
				<meeting>the 1st International Conference on Semantic Computing, IEEE-ICSC2007<address><addrLine>Irvine, CA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society Press</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="page" from="783" to="790" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Induction of optimal semi-distances for individuals based on feature sets</title>
		<author>
			<persName><forename type="first">N</forename><surname>Fanizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Esposito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Working Notes of the International Description Logics Workshop</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<meeting><address><addrLine>Bressanone, Italy</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2007">DL2007. 2007</date>
			<biblScope unit="volume">250</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Instance-based query answering with semantic knowledge bases</title>
		<author>
			<persName><forename type="first">N</forename><surname>Fanizzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Esposito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 10th Congress of the Italian Association for Artificial Intelligence, AI*IA2007</title>
		<title level="s">LNAI</title>
		<editor>
			<persName><forename type="first">R</forename><surname>Basili</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><forename type="middle">T</forename><surname>Pazienza</surname></persName>
		</editor>
		<meeting>the 10th Congress of the Italian Association for Artificial Intelligence, AI*IA2007<address><addrLine>Rome, Italy</addrLine></address></meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
			<biblScope unit="volume">4733</biblScope>
			<biblScope unit="page" from="254" to="265" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">The Instance Store: DL reasoning with large numbers of individuals</title>
		<author>
			<persName><forename type="first">I</forename><forename type="middle">R</forename><surname>Horrocks</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Turi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">K</forename><surname>Bechhofer</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2004 Description Logic Workshop</title>
		<title level="s">CEUR Workshop Proceedings</title>
		<editor>
			<persName><forename type="first">V</forename><surname>Haarslev</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Möller</surname></persName>
		</editor>
		<meeting>the 2004 Description Logic Workshop</meeting>
		<imprint>
			<publisher>CEUR</publisher>
			<date type="published" when="2004">2004. 2004</date>
			<biblScope unit="volume">104</biblScope>
			<biblScope unit="page" from="31" to="40" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Data clustering: A review</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">K</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">N</forename><surname>Murty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">J</forename><surname>Flynn</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">31</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="264" to="323" />
			<date type="published" when="1999">1999</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Combining belief functions when evidence conflicts</title>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">K</forename><surname>Murphy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Decision Support Systems</title>
		<imprint>
			<biblScope unit="volume">29</biblScope>
			<biblScope unit="issue">1-9</biblScope>
			<date type="published" when="2000">2000</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">Rough Sets: Theoretical Aspects of Reasoning About Data</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Pawlak</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1991">1991</date>
			<publisher>Kluwer Academic Publishers</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Distance induction in first order logic</title>
		<author>
			<persName><forename type="first">M</forename><surname>Sebag</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 7th International Workshop on Inductive Logic Programming, ILP97</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Džeroski</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Lavrač</surname></persName>
		</editor>
		<meeting>the 7th International Workshop on Inductive Logic Programming, ILP97</meeting>
		<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="1297">1297. 1997</date>
			<biblScope unit="page" from="264" to="272" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">Combination of evidence in Dempster-Shafer theory</title>
		<author>
			<persName><forename type="first">K</forename><surname>Sentz</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ferson</surname></persName>
		</author>
		<idno>SAND2002-0835</idno>
		<imprint>
			<date type="published" when="2002-04">April 2002</date>
			<pubPlace>SANDIATech</pubPlace>
		</imprint>
	</monogr>
	<note type="report_type">Technical Report</note>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<title level="m" type="main">A Mathematical Theory of Evidence</title>
		<author>
			<persName><forename type="first">G</forename><surname>Shafer</surname></persName>
		</author>
		<imprint>
			<date type="published" when="1976">1976</date>
			<publisher>Princeton University Press</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">Statistical reasoning with imprecise probabilities</title>
		<author>
			<persName><forename type="first">P</forename><surname>Walley</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Chapman and Hall</title>
				<meeting><address><addrLine>London</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1991">1991</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Similarity Search -The Metric Space Approach</title>
		<author>
			<persName><forename type="first">P</forename><surname>Zezula</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Amato</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Dohnal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Batko</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Database Systems</title>
				<imprint>
			<publisher>Springer</publisher>
			<date type="published" when="2007">2007</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
