=Paper= {{Paper |id=None |storemode=property |title=On ambiguity and query-specific ontology mapping |pdfUrl=https://ceur-ws.org/Vol-946/om2012_poster3.pdf |volume=Vol-946 |dblpUrl=https://dblp.org/rec/conf/semweb/TianSM12a }} ==On ambiguity and query-specific ontology mapping== https://ceur-ws.org/Vol-946/om2012_poster3.pdf
  On Ambiguity and Query-Specific Ontology Mapping

                  Aibo Tian, Juan F. Sequeda, and Daniel P. Miranker

           Department of Computer Science, The University of Texas at Austin
                                Austin, Texas, USA
        atian@utexas.edu, {jsequeda, miranker}@cs.utexas.edu



       Abstract. In the course of developing an ontology-based data integration sys-
       tem (OBDI) that includes automatic integration of data sources, and thus, in-
       cludes algorithmic ontology mapping, we have made the following observations.
       A mapping method may determine that an entity in one ontology maps with equal
       likelihood to two or more entities in the other ontology. The mapping and refor-
       mulation of certain queries is correct only if one pairing is chosen. The correct
       choice may be different for different queries. Finally, the query itself may lend
       additional semantics that correctly resolve the ambiguity.
       These observations suggest a targeted ontology mapping problem, query-specific
       ontology mapping. In addition to the two ontologies, a query serves as a third
       argument to the mapping algorithm. Further, the mapping algorithm need not
       produce a complete mapping, but only a partial mapping sufficient to correctly
       reformulate the query. We detail a number of open issues on how this problem
       statement might be refined, and consider features of its evaluation.


    Ambiguity in Ontology Mapping: Consider the idealized representation (Fig. 1)
of a critical issue in the automatic integration of new data sources in an OBDI sys-
tem. T and S respectively represent target and data source ontologies. Looking at the
ontologies alone, there is insufficient information to determine if the class T:People
should be mapped to S:Teacher or to S:Student. A third possibility is a one-to-many
mapping entailing both. Given the SPARQL query (Fig. 1c), it becomes clear that the
query should be reformulated using only the mapping {T:People = S:Teacher}. A com-
plementary query about students should be reformulated using only the complementary
mapping. Thus, any static chose of one mapping will yield reformulated queries that
return incorrect results.
    Formulations of Query-Specific Ontology Mapping: In our system we compute
a similarity matrix between all entities in the two ontologies [3]. The details may be
borrowed from any ontology mapping algorithm that includes this step [2]. Given a
query on the target ontology, our system uses a joint probability model to identify a
maximal scoring, partial mapping that covers the target ontology entities mentioned
in the query or that are needed to reformulate the query. Thus, our solution can be
characterized as one that takes three arguments, and produces a partial mapping specific
to the query.
    There are at least two other approaches that may be considered and that produce a
complete mapping and thus retain more of the standard definition of ontology matching.
First is to consider complex mappings. For example, instead of choosing {T:People =
                               Course                                          Course                                                P ref ix          course         :       < T /Course >
                                             time                                                                                    P ref ix          people         :   < T /P eople >
                                                                                                  hasSchedule
                           teacher student                            teachBy         takeBy
                                                    date                                                                             Select        ?t
                title          People                      name    Teacher            Student         Schedule                       W here            {
                               name                                   name       name                                                   ?c    course : time                   ?t    .
                                                                                              place       date
                                                                                                                                        ?c    course : teacher                     ?p        .
                               string                                        string                        date
                                                                                                                                        ?p    people : name                   “Einstein00              .}


                        (a) Ontology T                            (b) Ontology S                                                        (c) SPARQL query
                                                                                           Note that since the predicate of a triple pattern is not allowed to be a variable
                                                                                       in our definition, there exists only one query graph for each query q. The query
                                        Fig. 1. Example ontologies and SPARQL query.   graph of the SPARQL query in Figure ?? is shown in Figure ??.


                                                                                       2.3        Problem Definition

S:Teacher} or {T:People = S:Student}, the mapping system can detect “Teacher is the    A ss-path correspondence records the mapping confidence between two ss-paths.

                                             Definition 9 (SS-PATH CORRESPONDENCE). Given two graphs G and
People who teaches” (similar for Student). However,              to the best
                                             G , a ss-path correspondence    betweenoftwo our
                                                                                          0      knowledge,
                                                                                          ss-paths  p and p (denoted bythere
                                                                                                                          π ) is                                                                             0
                                                                                                                                                                                                                                  p,p0
                                                                                                                 0                                                                                                    0
                                             a tuple < p, p , c >, such that p ∈ GRAPH-SS-PATH-SET , p ∈ GRAPH-SS-
is no automatic system that can detect this kind       of   complex           mapping.
                                             PATH-SET , and c is a confidence measure.                      G0
                                                                                                                     p
                                                                                                                                 p
                                                                                                                                                                                                                 G



    Another approach may consider an entire workload
                                                 We say p ∈ π of     , andqueries,
                                                                            p ∈ π . We   asalsoa batch
                                                                                                 use α       or   as athecon-
                                                                                                             to denote       confi-
                                                                                                                           p,p0
                                                                                                                                               0
                                                                                                                                                               p,p0                                  πp,p0
                                             dence measure, which is α           = c . In the above definition, we assume the
tinual pay-as-you-go refinement. In other words,         a complete
                                             correspondence                   mapping
                                                               measures equivalence.            is determined, but                           πp,p0               p



the information in a set of queries is used to     bias 10the
                                             Definition             choices
                                                               (MATCH              made. As
                                                                            CANDIDATE).         Given many         applica-
                                                                                                       a query graph   T , a graph                                                                                         q

tions comprise a set of dynamic web pages,ΩGtheir
                                                is called a match candidate in terms of a set of correspondences Ω
                                                     = {π query       set   is  easily      identified.
                                                              : p ∈ GRAPH-SS-PATH-SET , p ∈ GRAPH-SS-PATH-SET },
                                                                                          Tq ,G            p,p0
                                                                                                                 Consider  , where
                                                                                                                                                                                   Tq
                                                                                                                                                                                             0
                                                                                                                                                                                                                          Tq ,G
                                                                                                                                                                                                                                         G
                                             if the following conditions are satisfied:
the example and a course selection application. Since students are often interested in
                                               – G is a subgraph of S;
who is teaching a class, (and their grading policy),         and privacy laws disallow revealing
                                               – SINK ⊆ SINK ;                                        G                      S

their fellow student’s enrollment, the mapping         {T:People
                                                  correspondence π       ∈Ω
                                                                             = S:Teacher}              would always
                                               – for all ss-path p ∈ GRAPH-SS-PATH-SET , there exists exact one ss-path
                                                                                 , where p ∈ GRAPH-SS-PATH-SET ;                 p,p0          Tq ,G
                                                                                                                                                                          0
                                                                                                                                                                                        Tq
                                                                                                                                                                                                                           G

be correct. Incremental, pay-as-you go, solutions          could
                                               – for all ss-path
                                                  correspondence π
                                                                  p ∈ integrate
                                                                       GRAPH-SS-PATH-SET
                                                                         ∈Ω
                                                                                        crowd-sourcing.                      0
                                                                                                  , there exists exact one ss-path
                                                                                 , where p ∈ GRAPH-SS-PATH-SET ;                 p,p0          Tq ,G
                                                                                                                                                                                        G
                                                                                                                                                                                                                          Tq

    The pedagogical example’s brevity shouldn’t– for all be     used
                                                          pair of         top diminish
                                                                   ss-paths                     the problem’s
                                                                              , p ∈ GRAPH-SS-PATH-SET
                                                  = SOURCE , the two corresponded ss-paths p , p ∈ GRAPH-SS-PATH-
                                                                                                                           im-
                                                                                                                  , if SOURCE                      1       2
                                                                                                                                                                                                 0    0
                                                                                                                                                                                                                     Tq                  p1
                                                                                                                     p2                                                                          1    2
portance. Comparing to Clio’s1 algorithms our     SET system
                                                         ,π
                                                  = SOURCE ;
                                                                 ∈ Ω demonstrates
                                                                         ,π      ∈Ω               favorable
                                                                                         , also share the same source,
                                                                                                      G      p1 ,p01  results
                                                                                                                        SOURCE    Tq ,G       p2 ,p02             Tq ,G                                                                      p01
                                                                                                                     p02
[1, 3]. Inspection of individual results suggests that resolving ambiguity is the primary
source of improvement, and can be significant. However measuring the quality of the
solutions, as a whole, and quantifying the frequency of ambiguity poses its own set of
problems. Gold standard baselines must include queries and correct mappings. OAEI
benchmarks cannot be used directly. Correct query reformulation may not require a
unique mapping. Entity level ambiguity may not manifest wrt query reformulation,
making it hard to identify through manual curation. To date, we have created three
such test cases2 . The test suite accommodates the unique mapping problem by includ-
ing additional partial mappings and including test data corresponding query results.
Not all ambiguity may be revealed. Our inspection of individual results looked at the
discrepancies between the two systems. False negatives are not quantifiable.


References
1. R. Fagin, L. Haas, M. Hernández, R. Miller, L. Popa, and Y. Velegrakis. Clio: Schema map-
   ping creation and data exchange. Conceptual Modeling: Foundations and Applications, 2009.
2. P. Shvaiko and J. Euzenat. Ontology matching: state of the art and future challenges. IEEE
   Transactions on Knowledge and Data Engineering, 2012.
3. A. Tian, J. F. Sequeda, and D. P. Miranker. Query specific ontology matching. Technical
   report, Department of Computer Science, University of Texas, 2012.

 1
   Clio is an automatic relational schema mapping system. However, the algorithms are applica-
   ble to ontologies.
 2
   The test cases are available, see http://www.cs.utexas.edu/~atian/page/dataset.html