<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Modeling Degrees of Conceptual Overlap in Semantic Web Ontologies</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Markus Holi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Eero Hyvönen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Europe Lapland Sweden</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Helsinki University of Technology, Media Technology, Helsinki Institute for Information Technology (HIIT), and University of Helsinki P.</institution>
          <addr-line>O. Box 5500, FI-02015 TKK</addr-line>
          ,
          <country country="FI">FINLAND</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Modeling Uncertainty in Ontologies</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Semantic Web ontologies are based on crisp logic and do not provide well-defined means for expressing uncertainty. We present a new probabilistic method to approach the problem. In our method, degrees of subsumption, i.e., overlap between concepts can be modeled and computed efficiently using Bayesian networks based on RDF(S) ontologies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Figure 1 illustrates various countries and areas in the world.
There are important properties in the figure, that are not
modeled in a crisp partonomy. For example, EU is a bigger part
of Europe than Lapland, and Russia partly overlaps Europe
and Asia.</p>
      <p>Our method enables the representation of overlap in
concept hierarchies, including class hierarchies and partonomies,
and the computation of overlap between a selected concept
and every other, i.e. referred concept in the hierarchy. The
overlap value is defined as follows:</p>
      <p>Overlap = jSelected\Ref erredj 2 [0; 1].</p>
      <p>jRef erredj</p>
      <p>Intuitively, the overlap value has the following meaning:
The value is 0 for disjoint concepts (e.g., Lapland and Asia)
and 1, if the referred concept is subsumed by the selected
one. High values lesser than one imply, that the meaning of
the selected concept approaches the meaning of the referred
one.
Norway</p>
      <p>Finland
EU</p>
      <p>Russia</p>
      <p>Asia</p>
      <p>World
World 37*23 = 851
Europe 15*23 = 345
Asia 18*23 = 414
EU 8*21 = 168
Sweden 4*9 = 36
Finland 4*9 = 36
Norway 4*9 = 36
Lapland 13*2 = 26 Lapland&amp;(Finland | Sweden | Norway) = 8
Lapland&amp;EU = 16 Lapland&amp;Russia = 2</p>
      <p>Russia 18*19 = 342 Russia&amp;Europe = 57 Russia&amp;Asia = 285
A concept hierarchy can be viewed as a set of sets and can be
represented by a Venn diagram.</p>
      <p>If A and B are sets, then A must be in one of the following
relationships to B.</p>
      <p>1. A is a subset of B, i.e. A
B.
2. A partially overlaps B, i.e. 9x; y : (x 2 A ^ x 2 B) ^
(y 2 A ^ y 62 B).
3. A is disjoint from B, i.e. A \ B = ;.</p>
      <p>Based on these relations, we have developed a simple
graph notation for representing overlap in a concept
hierarchy as an acyclic overlap graph. Here concepts are nodes,
and a number called mass is attached to each node. The mass
of concept A is a measure of the size of the set
corresponding to A, i.e. m(A) = js(A)j, where s(A) is the set
corresponding to A. A solid directed arc from concept A to B
denotes crisp subsumption s(A) s(B), a dashed arrow
denotes disjointness s(A) \ s(B) = ;, and a dotted arrow
represents quantified partial subsumption between concepts,
which means that the concepts partially overlap in the Venn
diagram. The amount of overlap is represented by the partial
overlap value p = js(A)\s(B)j .</p>
      <p>js(A)j</p>
      <p>In addition to the quantities attached to the dotted arrows,
also the other arrow types have implicit overlap values. The
overlap value of a solid arc is 1 (crisp subsumption) and the
value of a dashed arc is 0 (disjointness). The quantities of the
arcs emerging from a concept must sum up to 1. This means
that either only one solid arc can emerge from a node or
several dotted arcs (partial overlap). In both cases, additional
dashed arcs can be used (disjointness). Intuitively, the
outgoing arcs constitute a quantified partition of the concept. Thus,
the dotted arrows emerging from a concept must always point
to concepts that are mutually disjoint with each other.</p>
      <p>Notice that if two concepts overlap, there must be a
directed (solid or dotted) path between them. If the path
includes dotted arrows, then (possible) disjointness between the
concepts must be expressed explicitly using the disjointness
relation. If the directed path is solid, then the concepts
necessarily overlap.
4</p>
    </sec>
    <sec id="sec-2">
      <title>Computing the Overlaps</title>
      <p>Computing the overlap is easiest when there are only solid
arcs, i.e. complete subsumption relation between concepts. If
there is a directed solid path from A (selected) to B (referred),
then overlap o = js( Ajs)(\Bs)(jB)j = mm((BA)) . If there is a mixed
path then the computation is not as simple. To exploit the
simple case we transform the graph into a solid path structure
according to the following principle:
Transformation Principle 1 Let A be the direct partial
subconcept of B with overlap value o. In the solid path structure
the partial subsumption is replaced by an additional middle
concept, that represents s(A) \ s(B). It is marked to be
the complete subconcept of both A and B, and its mass is
o m(A).</p>
      <p>If A is the selected concept and B is the referred one, then
the overlap value o can be interpreted as the conditional
probability</p>
      <p>P (B0 = truejA0 = true) =
js(A) \ s(B)j
js(B)j
= o;
(1)
where s(A) and s(B) are the sets corresponding to the
concepts A and B. A0 and B0 are boolean random variables such
that the value true means that the corresponding concept is a
match to the query, i.e, the concept in question is of interest
to the user.</p>
      <p>Based on the above, we chose to use the solid path structure
as a Bayesian network topology. In the Bayesian network the
boolean random variable X 0 replaces the concept X of the
solid path structure. The efficient evidence propagation
algorithms developed for Bayesian networks [Finin and Finin,
2001] to take care of the overlap computations.</p>
      <p>The joint probability distribution of the Bayesian
network is defined by conditional probability tables (CPT)
P (A0jB0 ; B20; : : : Bn0) for nodes with parents Bi0; i = 1 : : : n,
1
and by prior marginal probabilities set for nodes without
parents. The CPT P (A0jB10; B20; : : : Bn0) for a node A0
can be constructed by enumerating the value combinations
(true/false) of the parents Bi0; i = 1 : : : n, and by assigning:
X
m(A)
m(Bi)
P (A0 = truejB10 = b1; : : : Bn0 = bn) =
(2)</p>
      <p>The value for the complementary case P (A0 =
f alsejB10 = b1; : : : Bn0 = bn) is obtained simply by
subtracting from 1.</p>
      <p>By instantiating the nodes corresponding to the selected
concept and the concepts subsumed by it as evidence (their
values are set “true”), the propagation algorithm returns the
overlap values as posterior probabilities of nodes. The query
results can then be ranked according to these posterior
probabilities.
5</p>
    </sec>
    <sec id="sec-3">
      <title>Discussion</title>
      <p>Overlap graphs are simple and can be represented in RDF(S)
easily. Using the notation does not require knowledge of
probability theory. The concepts can be quantified
automatically, based on data records annotated according to the
ontology, for example.</p>
      <p>The problem of representing uncertainty in ontologies has
been tackled previously by using methods of fuzzy logic,
rough sets [Stuckenschmidt and Visser, 2000] and Bayesian
networks [Ding and Peng, 2004; Gu and H.K. Pung, 2004].</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>Our research was funded mainly by the National Technology
Agency Tekes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <source>[Ding and Peng</source>
          , 2004]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Ding</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Peng</surname>
          </string-name>
          .
          <article-title>A probabilistic extension to ontology language owl</article-title>
          .
          <source>In Proceedings of the Hawai'i Internationa Conference on System Sciences</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Finin and Finin</source>
          , 2001]
          <string-name>
            <given-names>F. V.</given-names>
            <surname>Finin</surname>
          </string-name>
          and
          <string-name>
            <given-names>F. B.</given-names>
            <surname>Finin</surname>
          </string-name>
          .
          <source>Bayesian Networks and Decision Graphs</source>
          . Springer-Verlag,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Gu and
          <string-name>
            <given-names>H.K.</given-names>
            <surname>Pung</surname>
          </string-name>
          , 2004]
          <string-name>
            <given-names>T.</given-names>
            <surname>Gu</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.Q. Zhang H.K.</given-names>
            <surname>Pung</surname>
          </string-name>
          .
          <article-title>A bayesian approach for dealing with uncertain contexts</article-title>
          .
          <source>In Advances in Pervasive Computing</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Holi</source>
          , 2004]
          <string-name>
            <given-names>M.</given-names>
            <surname>Holi</surname>
          </string-name>
          .
          <article-title>Modeling uncertainty in semantic web taxonomies</article-title>
          ,
          <source>2004. Master of Science Thesis</source>
          . Department of Computer Science, University of Helsinki, http://ethesis.helsinki.fi/julkaisut/mat/tieto/pg/holi/.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [owl, 2003]
          <string-name>
            <given-names>OWL</given-names>
            <surname>Web Ontology Language Guide</surname>
          </string-name>
          ,
          <year>2003</year>
          . http://www.w3.org/TR/2003/CR-owl-guide-
          <volume>20030818</volume>
          /.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[rdf, 2004] RDF Vocabulary Description Language 1</source>
          .0:
          <string-name>
            <given-names>RDF</given-names>
            <surname>Schema</surname>
          </string-name>
          ,
          <year>2004</year>
          . http://www.w3.org/TR/rdf-schema/.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <source>[Stuckenschmidt and Visser</source>
          , 2000]
          <string-name>
            <given-names>H.</given-names>
            <surname>Stuckenschmidt</surname>
          </string-name>
          and
          <string-name>
            <given-names>U.</given-names>
            <surname>Visser</surname>
          </string-name>
          .
          <article-title>Semantic translation based on approximate reclassification</article-title>
          .
          <source>In Proceedings of the 'Semantic Approximation, Granularity and Vagueness' Workshop</source>
          ,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>