<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Characterizing and Detecting Integrity Issues in OWL Instance Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jiao Tao</string-name>
          <email>taoj2@cs.rpi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Li Ding</string-name>
          <email>dingl@cs.rpi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jie Bao</string-name>
          <email>baojie@cs.rpi.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deborah L. McGuinness</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Tetherless World Constellation, Computer Science Department Rensselaer Polytechnic Institute 110</institution>
          <addr-line>8th St., Troy, NY 12180</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We view OWL instance data evaluation as a process in which instance data is checked for conformance with application requirements. We previously identi ed some integrity issues raised by applications demanding closed world reasoning. In this paper, we present a formal characterization of those integrity issues using autoepistemic operators, and a practical SPARQL-based issue checking approach that is a sound approximation for detecting integrity issues.</p>
      </abstract>
      <kwd-group>
        <kwd>Instance Data</kwd>
        <kwd>Evaluation</kwd>
        <kwd>Autoepistemic Description Logics</kwd>
        <kwd>OWL</kwd>
        <kwd>SPARQL</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Missing Property Value Issues (MPV)</p>
      <p>Missing property value issues (MPV) may arise when a property value that
is expected to be speci ed is not explicitly given in the data set. We identify
three MPV issues M P V9, ICMP V= , and ICMP V corresponding to the OWL
constructs owl:someValuesFrom, owl:cardinality and owl:minCardinality.
De nition 1 (M P V9 Issue) Given a knowledge base KB = fT ; Ag which is
satis able. Let ICMP V9 = fKC v 9AP:&gt;g for some C; P . If fT ; A; ICMP V9 g
is not satis able, then the ABox A has a M P V9 issue.</p>
      <p>Example Assume that there exist Individual(W type(Wine)) and Class(Wine
partial restriction(locatedIn someValuesFrom(Region))) in the instance
data and the wine ontology respectively. The application requires that each wine
instance to have a location, thus integrity contraint fKWine v 9AlocatedIn:&gt;g
is added. If the constraint is not satis ed, a MPV9 issue would occur.</p>
      <p>Similarly, the integrity constraints corresponding to owl:cardinality and
owl:minCardinality could be formalized as ICMP V= = fKC v (= n)AP:&gt;g
and ICMP V = fKC v ( n)AP:&gt;g respectively. The MPV= and MPV issues
could be de ned accordingly. Here, we extend the ALCKNF with quanti cation
constructor Q.
2.2</p>
      <p>Unexpected Individual Type Issues (UIT)</p>
      <p>Unexpected individual type issues may occur when a given individual in the
instance data is declared to have types that are not expected by the referenced
ontologies, or is missing a type declaration when it is expected. We list three
UIT issues U ITd, U ITr, and U IT8 corresponding to the RDFS/OWL constructs
rdfs:domain, rdfs:range and owl:allValuesFrom.</p>
      <p>De nition 2 (U ITd Issue) Given a knowledge base KB = fT ; Ag which is
satis able. Let ICUITd = f9KP:&gt; v ACg for some C; P . If fT ; A; ICUITd g is
not satis able, then the ABox A has a U ITd issue.</p>
      <p>De nition 3 (U ITr Issue) Given a knowledge base KB = fT ; Ag which is
satis able. Let ICUITr = f&gt; v 8P:ACg for some C; P . If fT ; A; ICUITr g is not
satis able, then the ABox A has a U ITr issue.</p>
      <p>De nition 4 (U IT8 Issue) Given a knowledge base KB = fT; Ag which is
satis able. Let ICUIT8 = fKC v 8P:ADg for some C; D; P . If fT; A; ICUIT8 g is
not satis able, then the ABox A has a U IT8 issue.
2.3</p>
      <p>Non-speci c Individual Type Issues (NSIT)</p>
      <p>Non-speci c individual type issues may arise if a given individual in the
instance data is declared to have a general type rather than a more speci c type
that is expexcted by the application.</p>
      <p>De nition 5 (NSIT Issue) Given a knowledge base KB = fT ; Ag which is
satis able. Let ICNSIT = fKC v AC1 t ::: t ACng, for some fC; C1; :::; Cng.
If fT ; A; ICNSIT g is not satis able, then the ABox A has a NSIT issue.</p>
      <p>
        We propose the extension of ALCKNF to formalize more integrity issues,
such as excessive property value issue (EPV), redundant individual issue (RIT),
and uniqueness issue (UT) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. With this approach, users can selectively enforce
integrity constraints on individual classes and properties, thus allows not only
global but also local closed world reasoning with OWL instance data.
3
      </p>
      <p>SPARQL-based Approximation to Integrity Issues
Detection</p>
      <p>
        In general, reasoning in ADLs is a hard problem. To the best of our
knowledge, there is still no reasoner that can e ciently handle the reasoning in the
autoepistemic extension of OWL. Our work adopts a practical SPARQL-based
approach. In this section, rst we give the de nition of integrity violations, then
we use the M P V9 issue as an example to show that the SPARQL-based issue
detection mechanism is a sound solution to the integrity issue detection problem.
We follow the SPARQL syntax and semantics in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>De nition 6 (Integrity Violation) Given a knowledge base KB = fT ; Ag,
where T and A denotes TBox and ABox respectively. Let Aji = f j 2 A and
i occurs in g where i is an individual in A. If fT ; Aji; ICg is not satis able,
then i is said to violate IC, denoted by i 6w IC.</p>
      <p>De nition 7 (M P V9 SPARQL Pattern) The SPARQL pattern PMP V9 is:
?i rdf:type C
OPT (?i P ?o)</p>
      <p>FILTER (!BOUND(?o))
Theorem 1 If an individual i is contained in the evaluation of the graph pattern
PMP V9 over A, i.e., i 2 [[PMP V9 ]]A, then i 6w ICMP V9 .</p>
      <p>Proof: The proof has two steps:
(1) If an individual i appears in the evaluation of the graph pattern PMP V9
over A, then C(i) 2 A and @j:P (i; j) 2 A.</p>
      <p>Proof: Let P1=(?i rdf:type C), P2=(?i P ?o), R = (!BOUND(?o)), Then
[[P1]]A = fx j C(x) 2 Ag
[[P2]]A = f(x; y) j P (x; y) 2 Ag
[[P1]]A on [[P2]]A = f(x; y) j C(x) 2 A and P (x; y) 2 Ag
[[P1]]A n [[P2]]A = fx j C(x) 2 A and @y:P (x; y) 2 Ag
[[P1 OPT P2]]A = ([[P1]]A on [[P2]]A) [ ([[P1]]A n [[P2]]A)
= f(x; y) j C(x) 2 A and P (x; y) 2 Ag [</p>
      <p>fx j C(x) 2 A and @y:P (x; y) 2 Ag
Since,
Thus, if i appears in [[PMP V9 ]]A, then C(i) 2 A and @j:P (i; j) 2 A.</p>
      <p>(2) Let i be an instance, if C(i) 2 A and @j:P (i; j) 2 A, then i 6w ICMP V9 ,
thus A has a M P V9 issue.</p>
      <p>Proof: If C(i) 2 A and @j:P (i; j) 2 A, we know that C(i) 2 Aji and @j
such that P (i; j) 2 Aji. The knowledge base K0 = fT ; Aji; ICMP V9 g is not
constant because there is no epistemic model for it. We prove that by
contradiction. Suppose K0 has an epistemic model M, then for all I 2 M, we
have (I; M; M) j= KC(i), then by ICMP V9 , (I; M; M) j= 9AP:&gt;(i),
therefore 9j 2 I such that (i; j) 2 P J ;M;M for all J 2 M. However, for every
M0 M, (I; M0; M) also satis es K0, thus M is not maximal, therefore M is
not an epistemic model of K0.</p>
      <p>
        The SPARQL solutions to other typical integrity issues are provided in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
An advantage of SPARQL-based solutions is that they are easy to implement
using existing tools. We have implemented instance data evaluation as an online
service (http://onto.rpi.edu/demo/oie/) which can detect typical integrity issues
in instance data, in addition to syntax errors and logical inconsistencies.
4
      </p>
    </sec>
    <sec id="sec-2">
      <title>Conclusions and Future Work</title>
      <p>This work investigates the characterization and detection of integrity issues in
OWL instance data. We provide a logical foundation to OWL instance data
evaluation using autoepistemic operators, and implement a practical SPARQL-based
approach to detect the issues. In future work, we will identify and characterize
more integrity issues with autoepistemic operators, and extend our
SPARQLbased solutions to handle more expressive ontologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Donini</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <source>Autoepistemic Description Logics, IJCAI</source>
          , pp.
          <volume>136</volume>
          {
          <issue>141</issue>
          (
          <year>1997</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Grimm</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Closed-World Reasoning in the Semantic Web through Epistemic Operators</article-title>
          ,
          <string-name>
            <surname>OWLED</surname>
          </string-name>
          (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Semantics and Complexity of SPARQL, ISWC</article-title>
          , pp.
          <volume>30</volume>
          {
          <issue>43</issue>
          (
          <year>2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Instance Data Evaluation for Semantic WebBased Knowledge Management Systems</article-title>
          , to appear in HICSS, (
          <year>2009</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Tao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          :
          <article-title>Characterizing and Detecting Integrity Issues in OWL Instance Data</article-title>
          ,
          <source>TW technical report</source>
          , (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>