<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On Ambiguity and Query-Specific Ontology Mapping</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aibo Tian</string-name>
          <email>atian@utexas.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juan F. Sequeda</string-name>
          <email>jsequeda@cs.utexas.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniel P. Miranker</string-name>
          <email>miranker@cs.utexas.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, The University of Texas at Austin Austin</institution>
          ,
          <addr-line>Texas</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the course of developing an ontology-based data integration system (OBDI) that includes automatic integration of data sources, and thus, includes algorithmic ontology mapping, we have made the following observations. A mapping method may determine that an entity in one ontology maps with equal likelihood to two or more entities in the other ontology. The mapping and reformulation of certain queries is correct only if one pairing is chosen. The correct choice may be different for different queries. Finally, the query itself may lend additional semantics that correctly resolve the ambiguity. These observations suggest a targeted ontology mapping problem, query-specific ontology mapping. In addition to the two ontologies, a query serves as a third argument to the mapping algorithm. Further, the mapping algorithm need not produce a complete mapping, but only a partial mapping sufficient to correctly reformulate the query. We detail a number of open issues on how this problem statement might be refined, and consider features of its evaluation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Ambiguity in Ontology Mapping: Consider the idealized representation (Fig. 1)
of a critical issue in the automatic integration of new data sources in an OBDI
system. T and S respectively represent target and data source ontologies. Looking at the
ontologies alone, there is insufficient information to determine if the class T:People
should be mapped to S:Teacher or to S:Student. A third possibility is a one-to-many
mapping entailing both. Given the SPARQL query (Fig. 1c), it becomes clear that the
query should be reformulated using only the mapping {T:People = S:Teacher}. A
complementary query about students should be reformulated using only the complementary
mapping. Thus, any static chose of one mapping will yield reformulated queries that
return incorrect results.</p>
      <p>
        Formulations of Query-Specific Ontology Mapping: In our system we compute
a similarity matrix between all entities in the two ontologies [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The details may be
borrowed from any ontology mapping algorithm that includes this step [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Given a
query on the target ontology, our system uses a joint probability model to identify a
maximal scoring, partial mapping that covers the target ontology entities mentioned
in the query or that are needed to reformulate the query. Thus, our solution can be
characterized as one that takes three arguments, and produces a partial mapping specific
to the query.
      </p>
      <p>There are at least two other approaches that may be considered and that produce a
complete mapping and thus retain more of the standard definition of ontology matching.
First is to consider complex mappings. For example, instead of choosing {T:People =
title</p>
      <p>Course
teacher student</p>
      <p>People
name
string
time
date</p>
      <p>Course
teachBy takeBy hasSchedule
name Teacher Student Schedule
name name place date</p>
      <p>string date
(a) Ontology T
(b) Ontology S</p>
      <p>Prefix course : &lt; T/Course &gt;
Prefix people : &lt; T/People &gt;
Select ?t
Where {
?c course : time ?t .
?c course : teacher ?p .
?p people : name “Einstein00 .}
(c) SPARQL query</p>
      <p>Note that since the predicate of a triple pattern is not allowed to be a variable
in our definition, there exists only one query graph for each query q. The query
Fig. 1. Example ontologgiraepsh oafnthde SSPARAQRLqQueLry iqnuFiegurrye.?? is shown in Figure ??.</p>
      <p>P
2.3 Problem Definition
S:Teacher} or {T:People = S:Student}, theA mss-apapthpcoirnregsposndyenscteeremcordcstahenmadpepintgeccontfid“eTnceebaectwheeenrtwoisss-tphathes.
People who teaches” (similar for Student). HGD0e,ofiawnsist-iepoavnthe9cro(,rSrSteso-pPoAntTdheHnecCebOetRwReeEnStwPoOsNs-DpaEthNksCnpEan).wdGpi0v(ednetnwootegdrabpythhπspeG,pr0a)enids
best of our o ledge,
is no automatic system that can detect this kaPAituTnpHlde-S&lt;ETp,Gp00c,,caopn&gt;d,cpspuicslheathcoantmfipd∈eanGcpeRpmAePianHsu-grSeS..-PATH-SETG, p0 ∈
GRAPH-SSof m x</p>
      <p>
        Another approach may consider an entire wWeosarykplo∈aπpd,p0o,afndqpu0e∈rπipe,p0s.,Waesalsao buseatαcπph,p0
otordaensoteathceoconn-fitinual pay-as-you-go refinement. In other wor dcdoernrceespmoneadseunrcee, mpwehlaiecsuhtreeiss meαqπup,aipv0aplence.
s, a com =pcipn.Ign tihse dabeovteedremfinitiinone,dwe,abssuumte the
the information in a set of queries is used tDoefibniitiaons 1t0h(eMAcThCoHicCeAsNDmIDaAdTeE).. Given maqauenryygraph Tq, a graph
As
applications comprise a set of dynamic web pages,Gtihs ecaillredqa umaetcrhycasndeidtate inetearmssiloyfa isedt eofncotrirespondencesoΩnTqs,Gi,dweherre
is fied. C
ΩTq,G = {πp,p0 : p ∈ GRAPH-SS-PATH-SETTq, p0 ∈ GRAPH-SS-PATH-SETG},
the example and a course selection applic aifttihoefnol.lowSinigncocneditiosntsuadreesantistfised:are often interested in
who is teaching a class, (and their grading p––oSGlIiNicsKyaG)s⊆u,bgSrIaNpKhSo;fpS;rivacy laws disallow revealing
and
their fellow student’s enrollment, the mapp–infogr all sTs-:paPthepo∈pGlReAP=H-SSS-:PTATeHa-ScEhTeTqr,therweeoxisutsledxacatolnwe sas-ypasth
corre{spondence πp,p0 ∈ ΩTq,G, where p0 ∈ GRA}PH-SS-PATH-SETG;
be correct. Incremental, pay-as-you go, solutio–nforsalcloss-upalthdp0i∈nGteRgAPrHa-tSeS-PcArToH-wSEdTG-s,othuerrecexiinstsge.xact one ss-path
correspondence πp,p0 ∈ ΩTq,G, where p ∈ GRAPH-SS-PATH-SETTq;
The pedagogical example’s brevity shou–ldfonr’atll bpaeir ouf ssse-pdathtsop1d,pi2m∈ iGnRiAsPhH-StSh-PeATpHr-SoEbTTlqe, mifS’OsUiRmCE-p1
= SOURCEp2, the two corresponded ss-paths p01,p02 ∈
GRAPH-SS-PATHportance. Comparing to Clio’s1 algorithms oSuErTGs,πyp1s,pt01e∈mΩTq,dG,eπmp2,p02o∈nΩsTtqr,Ga,atlesosshfarae vthoesraamebsloeurcer,eSOsUuRlCtsEp01
= SOURCEp0 ;
[
        <xref ref-type="bibr" rid="ref1 ref3">1, 3</xref>
        ]. Inspection of individual results suggests that r2esolving ambiguity is the primary
source of improvement, and can be significant. However measuring the quality of the
solutions, as a whole, and quantifying the frequency of ambiguity poses its own set of
problems. Gold standard baselines must include queries and correct mappings. OAEI
benchmarks cannot be used directly. Correct query reformulation may not require a
unique mapping. Entity level ambiguity may not manifest wrt query reformulation,
making it hard to identify through manual curation. To date, we have created three
such test cases2. The test suite accommodates the unique mapping problem by
including additional partial mappings and including test data corresponding query results.
Not all ambiguity may be revealed. Our inspection of individual results looked at the
discrepancies between the two systems. False negatives are not quantifiable.
1 Clio is an automatic relational schema mapping system. However, the algorithms are
applicable to ontologies.
2 The test cases are available, see http://www.cs.utexas.edu/~atian/page/dataset.html
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>R.</given-names>
            <surname>Fagin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Haas</surname>
          </string-name>
          , M. Herna´ndez,
          <string-name>
            <given-names>R.</given-names>
            <surname>Miller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Popa</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Velegrakis</surname>
          </string-name>
          . Clio:
          <article-title>Schema mapping creation and data exchange</article-title>
          .
          <source>Conceptual Modeling: Foundations and Applications</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>P.</given-names>
            <surname>Shvaiko</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          .
          <article-title>Ontology matching: state of the art and future challenges</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Sequeda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Miranker</surname>
          </string-name>
          .
          <article-title>Query specific ontology matching</article-title>
          .
          <source>Technical report</source>
          , Department of Computer Science, University of Texas,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>