=Paper= {{Paper |id=Vol-173/paper-10 |storemode=property |title=Modeling Degrees of Conceptual Overlap in Semantic Web Ontologies |pdfUrl=https://ceur-ws.org/Vol-173/pos_paper1.pdf |volume=Vol-173 }} ==Modeling Degrees of Conceptual Overlap in Semantic Web Ontologies== https://ceur-ws.org/Vol-173/pos_paper1.pdf
         Modeling Degrees of Conceptual Overlap in Semantic Web Ontologies
                                        Markus Holi and Eero Hyvönen
                            Helsinki University of Technology, Media Technology,
               Helsinki Institute for Information Technology (HIIT), and University of Helsinki
                                   P.O. Box 5500, FI-02015 TKK, FINLAND
                                       http://www.cs.helsinki.fi/group/seco/
                                         email: firstname.lastname@tkk.fi

                         Abstract                                               Europe                      Asia
                                                                          Lapland
     Semantic Web ontologies are based on crisp logic                           Sweden                             World
                                                                       Norway
     and do not provide well-defined means for express-                                  Finland

     ing uncertainty. We present a new probabilistic                                               Russia
     method to approach the problem. In our method,
     degrees of subsumption, i.e., overlap between con-                             EU
     cepts can be modeled and computed efficiently us-
     ing Bayesian networks based on RDF(S) ontolo-
     gies.
                                                                       World 37*23 = 851
                                                                       Europe 15*23 = 345
1 Introduction                                                         Asia 18*23 = 414
                                                                       EU 8*21 = 168
Ontologies are based on crisp logic. In the real world, how-           Sweden 4*9 = 36
                                                                       Finland 4*9 = 36
ever, relations between entities often include subtleties that         Norway 4*9 = 36
are difficult to express in crisp ontologies. RDFS [rdf, 2004]         Lapland 13*2 = 26 Lapland&(Finland | Sweden | Norway) = 8
                                                                       Lapland&EU = 16 Lapland&Russia = 2
and OWL [owl, 2003] do not provide standard ways to ex-                Russia 18*19 = 342 Russia&Europe = 57 Russia&Asia = 285

press partial overlap and degrees of overlap in general.
   This paper presents a method for modeling degrees of over-
                                                                  Figure 1: A Venn diagram illustrating countries, areas, their
lap between concepts. In the following we first introduce the
                                                                  overlap, and size in the world.
principles of our method. Then a notation that enables the
representation of degrees of overlap between concepts in an
ontology is presented after which a method for doing infer-       3 Representing Overlap
ences based on the notation will be described. For a more
detailed presentation of the method see [Holi, 2004].             A concept hierarchy can be viewed as a set of sets and can be
                                                                  represented by a Venn diagram.
2 Modeling Uncertainty in Ontologies                                 If A and B are sets, then A must be in one of the following
                                                                  relationships to B.
Figure 1 illustrates various countries and areas in the world.
There are important properties in the figure, that are not mod-    1. A is a subset of B, i.e. A ⊆ B.
eled in a crisp partonomy. For example, EU is a bigger part        2. A partially overlaps B, i.e. ∃x, y : (x ∈ A ∧ x ∈ B) ∧
of Europe than Lapland, and Russia partly overlaps Europe             (y ∈ A ∧ y 6∈ B).
and Asia.
   Our method enables the representation of overlap in con-        3. A is disjoint from B, i.e. A ∩ B = ∅.
cept hierarchies, including class hierarchies and partonomies,       Based on these relations, we have developed a simple
and the computation of overlap between a selected concept         graph notation for representing overlap in a concept hierar-
and every other, i.e. referred concept in the hierarchy. The      chy as an acyclic overlap graph. Here concepts are nodes,
overlap value is defined as follows:                              and a number called mass is attached to each node. The mass
   Overlap = |Selected∩Ref     erred|
                     |Ref erred|      ∈ [0, 1].                   of concept A is a measure of the size of the set correspond-
   Intuitively, the overlap value has the following meaning:      ing to A, i.e. m(A) = |s(A)|, where s(A) is the set corre-
The value is 0 for disjoint concepts (e.g., Lapland and Asia)     sponding to A. A solid directed arc from concept A to B
and 1, if the referred concept is subsumed by the selected        denotes crisp subsumption s(A) ⊆ s(B), a dashed arrow
one. High values lesser than one imply, that the meaning of       denotes disjointness s(A) ∩ s(B) = ∅, and a dotted arrow
the selected concept approaches the meaning of the referred       represents quantified partial subsumption between concepts,
one.                                                              which means that the concepts partially overlap in the Venn
diagram. The amount of overlap is represented by the partial                 can be constructed by enumerating the value combinations
overlap value p = |s(A)∩s(B)|    .                                           (true/false) of the parents Bi0 , i = 1 . . . n, and by assigning:
                        |s(A)|
   In addition to the quantities attached to the dotted arrows,                                                               X
also the other arrow types have implicit overlap values. The                                                                               m(Bi )
overlap value of a solid arc is 1 (crisp subsumption) and the                                                             i∈{i:bi =true}
value of a dashed arc is 0 (disjointness). The quantities of the             P (A0 = true|B10 = b1 , . . . Bn0 = bn ) =
                                                                                                                                 m(A)
arcs emerging from a concept must sum up to 1. This means                                                                              (2)
that either only one solid arc can emerge from a node or sev-                   The value for the complementary case P (A0 =
eral dotted arcs (partial overlap). In both cases, additional                f alse|B10 = b1 , . . . Bn0 = bn ) is obtained simply by sub-
dashed arcs can be used (disjointness). Intuitively, the outgo-              tracting from 1.
ing arcs constitute a quantified partition of the concept. Thus,                By instantiating the nodes corresponding to the selected
the dotted arrows emerging from a concept must always point                  concept and the concepts subsumed by it as evidence (their
to concepts that are mutually disjoint with each other.                      values are set “true”), the propagation algorithm returns the
   Notice that if two concepts overlap, there must be a di-                  overlap values as posterior probabilities of nodes. The query
rected (solid or dotted) path between them. If the path in-                  results can then be ranked according to these posterior prob-
cludes dotted arrows, then (possible) disjointness between the               abilities.
concepts must be expressed explicitly using the disjointness
relation. If the directed path is solid, then the concepts neces-            5 Discussion
sarily overlap.
                                                                             Overlap graphs are simple and can be represented in RDF(S)
4 Computing the Overlaps                                                     easily. Using the notation does not require knowledge of
                                                                             probability theory. The concepts can be quantified automati-
Computing the overlap is easiest when there are only solid                   cally, based on data records annotated according to the ontol-
arcs, i.e. complete subsumption relation between concepts. If                ogy, for example.
there is a directed solid path from A (selected) to B (referred),              The problem of representing uncertainty in ontologies has
then overlap o = |s(A)∩s(B)|
                        |s(B)|
                                     m(A)
                                  = m(B)   . If there is a mixed             been tackled previously by using methods of fuzzy logic,
path then the computation is not as simple. To exploit the                   rough sets [Stuckenschmidt and Visser, 2000] and Bayesian
simple case we transform the graph into a solid path structure               networks [Ding and Peng, 2004; Gu and H.K. Pung, 2004].
according to the following principle:
Transformation Principle 1 Let A be the direct partial sub-
                                                                             Acknowledgments
concept of B with overlap value o. In the solid path structure               Our research was funded mainly by the National Technology
the partial subsumption is replaced by an additional middle                  Agency Tekes.
concept, that represents s(A) ∩ s(B). It is marked to be
the complete subconcept of both A and B, and its mass is                     References
o · m(A).                                                                    [Ding and Peng, 2004] Z. Ding and Y. Peng. A probabilistic
  If A is the selected concept and B is the referred one, then                  extension to ontology language owl. In Proceedings of
the overlap value o can be interpreted as the conditional prob-                 the Hawai’i Internationa Conference on System Sciences,
ability                                                                         2004.
                                                                             [Finin and Finin, 2001] F. V. Finin and F. B. Finin. Bayesian
                                        |s(A) ∩ s(B)|                           Networks and Decision Graphs. Springer-Verlag, 2001.
     P (B 0 = true|A0 = true) =                       = o,           (1)     [Gu and H.K. Pung, 2004] T. Gu and D.Q. Zhang
                                            |s(B)|
                                                                                H.K. Pung.       A bayesian approach for dealing with
   where s(A) and s(B) are the sets corresponding to the con-                   uncertain contexts. In Advances in Pervasive Computing,
cepts A and B. A0 and B 0 are boolean random variables such                     2004.
that the value true means that the corresponding concept is a
                                                                             [Holi, 2004] M. Holi. Modeling uncertainty in semantic
match to the query, i.e, the concept in question is of interest
to the user.                                                                    web taxonomies, 2004. Master of Science Thesis. De-
   Based on the above, we chose to use the solid path structure                 partment of Computer Science, University of Helsinki,
as a Bayesian network topology. In the Bayesian network the                     http://ethesis.helsinki.fi/julkaisut/mat/tieto/pg/holi/.
boolean random variable X 0 replaces the concept X of the                    [owl, 2003] OWL Web Ontology Language Guide, 2003.
solid path structure. The efficient evidence propagation al-                    http://www.w3.org/TR/2003/CR-owl-guide-20030818/.
gorithms developed for Bayesian networks [Finin and Finin,                   [rdf, 2004] RDF Vocabulary Description Language 1.0:
2001] to take care of the overlap computations.                                 RDF Schema, 2004. http://www.w3.org/TR/rdf-schema/.
   The joint probability distribution of the Bayesian net-                   [Stuckenschmidt and Visser, 2000] H. Stuckenschmidt and
work is defined by conditional probability tables (CPT)
                                                                                U. Visser. Semantic translation based on approximate re-
P (A0 |B10 , B20 , . . . Bn0 ) for nodes with parents Bi0 , i = 1 . . . n,
                                                                                classification. In Proceedings of the ’Semantic Approxi-
and by prior marginal probabilities set for nodes without
                                                                                mation, Granularity and Vagueness’ Workshop, 2000.
parents. The CPT P (A0 |B10 , B20 , . . . Bn0 ) for a node A0