<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Modeling Degrees of Conceptual Overlap in Semantic Web Ontologies</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Markus</forename><surname>Holi</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Eero</forename><surname>Hyvönen</surname></persName>
						</author>
						<author>
							<affiliation key="aff0">
								<orgName type="department">Media Technology</orgName>
								<orgName type="institution">Helsinki University of Technology</orgName>
							</affiliation>
						</author>
						<author>
							<affiliation key="aff1">
								<orgName type="department">Helsinki Institute for Information Technology (HIIT)</orgName>
								<orgName type="institution">University of Helsinki</orgName>
								<address>
									<postBox>P.O. Box 5500</postBox>
									<postCode>FI-02015</postCode>
									<settlement>TKK</settlement>
									<country key="FI">FINLAND</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Modeling Degrees of Conceptual Overlap in Semantic Web Ontologies</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">8E2B07866DE49A8142DF605C0EC0C03A</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T00:41+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Semantic Web ontologies are based on crisp logic and do not provide well-defined means for expressing uncertainty. We present a new probabilistic method to approach the problem. In our method, degrees of subsumption, i.e., overlap between concepts can be modeled and computed efficiently using Bayesian networks based on RDF(S) ontologies.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Ontologies are based on crisp logic. In the real world, however, relations between entities often include subtleties that are difficult to express in crisp ontologies. RDFS <ref type="bibr">[rdf, 2004]</ref> and OWL <ref type="bibr">[owl, 2003]</ref> do not provide standard ways to express partial overlap and degrees of overlap in general.</p><p>This paper presents a method for modeling degrees of overlap between concepts. In the following we first introduce the principles of our method. Then a notation that enables the representation of degrees of overlap between concepts in an ontology is presented after which a method for doing inferences based on the notation will be described. For a more detailed presentation of the method see <ref type="bibr">[Holi, 2004]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Modeling Uncertainty in Ontologies</head><p>Figure <ref type="figure" target="#fig_0">1</ref> illustrates various countries and areas in the world. There are important properties in the figure, that are not modeled in a crisp partonomy. For example, EU is a bigger part of Europe than Lapland, and Russia partly overlaps Europe and Asia.</p><p>Our method enables the representation of overlap in concept hierarchies, including class hierarchies and partonomies, and the computation of overlap between a selected concept and every other, i.e. referred concept in the hierarchy. The overlap value is defined as follows:</p><formula xml:id="formula_0">Overlap = |Selected∩Ref erred| |Ref erred| ∈ [0, 1].</formula><p>Intuitively, the overlap value has the following meaning: The value is 0 for disjoint concepts (e.g., Lapland and Asia) and 1, if the referred concept is subsumed by the selected one. High values lesser than one imply, that the meaning of the selected concept approaches the meaning of the referred one. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Representing Overlap</head><p>A concept hierarchy can be viewed as a set of sets and can be represented by a Venn diagram.</p><p>If A and B are sets, then A must be in one of the following relationships to B.</p><formula xml:id="formula_1">1. A is a subset of B, i.e. A ⊆ B. 2. A partially overlaps B, i.e. ∃x, y : (x ∈ A ∧ x ∈ B) ∧ (y ∈ A ∧ y ∈ B). 3. A is disjoint from B, i.e. A ∩ B = ∅.</formula><p>Based on these relations, we have developed a simple graph notation for representing overlap in a concept hierarchy as an acyclic overlap graph. In addition to the quantities attached to the dotted arrows, also the other arrow types have implicit overlap values. The overlap value of a solid arc is 1 (crisp subsumption) and the value of a dashed arc is 0 (disjointness). The quantities of the arcs emerging from a concept must sum up to 1. This means that either only one solid arc can emerge from a node or several dotted arcs (partial overlap). In both cases, additional dashed arcs can be used (disjointness). Intuitively, the outgoing arcs constitute a quantified partition of the concept. Thus, the dotted arrows emerging from a concept must always point to concepts that are mutually disjoint with each other.</p><p>Notice that if two concepts overlap, there must be a directed (solid or dotted) path between them. If the path includes dotted arrows, then (possible) disjointness between the concepts must be expressed explicitly using the disjointness relation. If the directed path is solid, then the concepts necessarily overlap.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Computing the Overlaps</head><p>Computing the overlap is easiest when there are only solid arcs, i.e. complete subsumption relation between concepts. If there is a directed solid path from A (selected) to B (referred),</p><formula xml:id="formula_2">then overlap o = |s(A)∩s(B)| |s(B)| = m(A) m(B) .</formula><p>If there is a mixed path then the computation is not as simple. To exploit the simple case we transform the graph into a solid path structure according to the following principle:</p><p>Transformation Principle 1 Let A be the direct partial subconcept of B with overlap value o. In the solid path structure the partial subsumption is replaced by an additional middle concept, that represents s(A) ∩ s(B). It is marked to be the complete subconcept of both A and B, and its mass is o • m(A).</p><p>If A is the selected concept and B is the referred one, then the overlap value o can be interpreted as the conditional probability</p><formula xml:id="formula_3">P (B = true|A = true) = |s(A) ∩ s(B)| |s(B)| = o,<label>(1)</label></formula><p>where s(A) and s(B) are the sets corresponding to the concepts A and B. A and B are boolean random variables such that the value true means that the corresponding concept is a match to the query, i.e, the concept in question is of interest to the user.</p><p>Based on the above, we chose to use the solid path structure as a Bayesian network topology. In the Bayesian network the boolean random variable X replaces the concept X of the solid path structure. The efficient evidence propagation algorithms developed for Bayesian networks <ref type="bibr" target="#b0">[Finin and Finin, 2001]</ref> to take care of the overlap computations.</p><p>The joint probability distribution of the Bayesian network is defined by conditional probability tables (CPT) P (A |B 1 , B 2 , . . . B n ) for nodes with parents B i , i = 1 . . . n, and by prior marginal probabilities set for nodes without parents. The CPT P (A |B 1 , B 2 , . . . B n ) for a node A can be constructed by enumerating the value combinations (true/false) of the parents B i , i = 1 . . . n, and by assigning:</p><formula xml:id="formula_4">P (A = true|B 1 = b 1 , . . . B n = b n ) = i∈{i:bi=true} m(B i ) m(A)</formula><p>(2) The value for the complementary case P (A = f alse|B 1 = b 1 , . . . B n = b n ) is obtained simply by subtracting from 1.</p><p>By instantiating the nodes corresponding to the selected concept and the concepts subsumed by it as evidence (their values are set "true"), the propagation algorithm returns the overlap values as posterior probabilities of nodes. The query results can then be ranked according to these posterior probabilities.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Discussion</head><p>Overlap graphs are simple and can be represented in RDF(S) easily. Using the notation does not require knowledge of probability theory. The concepts can be quantified automatically, based on data records annotated according to the ontology, for example.</p><p>The problem of representing uncertainty in ontologies has been tackled previously by using methods of fuzzy logic, rough sets <ref type="bibr" target="#b3">[Stuckenschmidt and Visser, 2000]</ref> and Bayesian networks <ref type="bibr" target="#b0">[Ding and Peng, 2004;</ref><ref type="bibr" target="#b0">Gu and H.K. Pung, 2004]</ref>.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A Venn diagram illustrating countries, areas, their overlap, and size in the world.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head></head><label></label><figDesc>Here concepts are nodes, and a number called mass is attached to each node. The mass of concept A is a measure of the size of the set corresponding to A, i.e. m(A) = |s(A)|, where s(A) is the set corresponding to A. A solid directed arc from concept A to B denotes crisp subsumption s(A) ⊆ s(B), a dashed arrow denotes disjointness s(A) ∩ s(B) = ∅, and a dotted arrow represents quantified partial subsumption between concepts, which means that the concepts partially overlap in the Venn diagram. The amount of overlap is represented by the partial overlap value p = |s(A)∩s(B)| |s(A)| .</figDesc></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>Our research was funded mainly by the National Technology Agency Tekes.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">A bayesian approach for dealing with uncertain contexts</title>
		<author>
			<persName><forename type="first">Peng</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Peng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">V</forename><surname>Finin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">B</forename><surname>Finin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">K</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Pung</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">Q</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><forename type="middle">K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><surname>Pung</surname></persName>
		</author>
		<ptr target="http://ethesis.helsinki.fi/julkaisut/mat/tieto/pg/holi/" />
	</analytic>
	<monogr>
		<title level="m">Proceedings of the Hawai&apos;i Internationa Conference on System Sciences</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Holi</surname></persName>
		</editor>
		<meeting>the Hawai&apos;i Internationa Conference on System Sciences</meeting>
		<imprint>
			<publisher>Springer-Verlag</publisher>
			<date type="published" when="2001">2004. 2004. 2001. 2001. 2004. 2004. 2004. 2004</date>
		</imprint>
		<respStmt>
			<orgName>Department of Computer Science, University of Helsinki</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Master of Science Thesis</note>
	<note>. Modeling uncertainty in semantic web taxonomies</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<ptr target="http://www.w3.org/TR/2003/CR-owl-guide-20030818/" />
		<title level="m">OWL Web Ontology Language Guide</title>
				<imprint>
			<date type="published" when="2003">2003. 2003</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<ptr target="http://www.w3.org/TR/rdf-schema/" />
		<title level="m">RDF Vocabulary Description Language 1.0: RDF Schema</title>
				<imprint>
			<date type="published" when="2004">2004. 2004</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Semantic translation based on approximate reclassification</title>
		<author>
			<persName><surname>Stuckenschmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Visser</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Stuckenschmidt</surname></persName>
		</author>
		<author>
			<persName><surname>Visser</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the &apos;Semantic Approximation, Granularity and Vagueness&apos; Workshop</title>
				<meeting>the &apos;Semantic Approximation, Granularity and Vagueness&apos; Workshop</meeting>
		<imprint>
			<date type="published" when="2000">2000. 2000</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
