<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Ontology Consistency and Instance Checking</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>For Real World Linked Data</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gavin E. Mendel-Gleason</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rob Brennan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kevin Feeney</string-name>
          <email>kevin.feeneyg@scss.tcd.ie</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Trinity College Dublin</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Many large ontologies have been created which make use of OWL's expressiveness for speci cation. However, tools to ensure that instance data is in compliance with the schema are often not well integrated with triple-stores and cannot detect certain classes of schema-instance inconsistency due to the assumptions of the OWL axioms. This can lead to lower quality, inconsistent data. We have developed a simple ontology consistency and instance checking service, SimpleConsist[8]. We also de ne a number of ontology design best practice constraints on OWL or RDFS schemas. Our implementation allows the user to specify which constraints should be applied to schema and instance data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Many Linked Data stores have large amounts of quite variable[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] data
(e.g. DBpedia[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). Triples can exist in a triple-store which have no
associated schema or conform to no constraints on the shape or type of data.
Typically such data is considered low quality and is hard to consume.
      </p>
      <p>
        Earlier work showed that OWL semantics make it ill suited as a
language of constraints[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. However maintaining ontology consistency and
conformance is central to high quality data storage. Programmatic
consumption of data is simpli ed if the data is well formed and well typed.
Data management is simpli ed if inserts, deletes and updates that might
violate well formedness constraints is signalled.
      </p>
      <p>
        To solve these problems, we use a persistent triple-store in
ClioPatria[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and a plugin constraint checker called SimpleConsist[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], both
implemented in SWI-Prolog[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. SimpleConsist is used to maintain
ontological consistency and constraints on instance data such that it conforms
to an ontology described in an OWL fragment using a narrower reading
of the OWL semantics. In particular, we make use of a closed world
assumption, and a unique names assumption. It is implemented as a REST
service within the Dacura[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] data curation system.
The philosophy for our ontology consistency and instance checking is to
view the ontology as static assertions, which must be self-consistent, and
to which a given instance state must conform. Given some triple-store
state S we check to make sure that our set of constraints C are satis ed.
When updating or inserting into the triple-store, somewhat arbitrary
program logic can take place, after which the triple-store is in a state S0. If C
does not hold for S' then we roll-back to the previous state S. We provide
counter-example witnesses L to the failure of the constraint C. These
witnesses are useful for debugging schema and instance updates as it gives
information about what precisely went wrong in the constraint
checking. The constraint rules are a combination of consistency constraints,
instance type checking and best practices.
2.1
      </p>
      <p>Constraints
SimpleConsist implements our constraints on ontology consistency and
implements instance checking. Because we want witnessing information
of the failure to satisfy the constraints, we write constraints which yield
the witnesses of a failure. These witnesses are realised as resources not
conforming to the constraints. Failure to provide a witness of the negation
of the constraint is viewed as success. The failure witnessing predicates
are brie y described in Table 1.</p>
      <p>All witnesses of class cycles are given in the list L. Each element of
the list names both the o ending class, and the path through the classes.
The other constraints return information about the reasons for failure.</p>
      <p>
        invalidInstanceRange(L) requires some explanation as it is an
implementation of a type checker for literals and class instances and so
requires knowing what a literal can be. The constraint implements type
checking to ensure that all literals are of the appropriate type according
to the ranges speci ed in properties. These literals can be any RDF literal
types of the XML Schema which are valid for OWL[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. All ranges which
specify a class have targets which are instances of an appropriate class
(either the class itself or a subclass). Using arti cially populated triple
stores we timed the reasoner for various numbers of triples generated from
the instance generator. These timings can be seen in Figure 1.
3
      </p>
    </sec>
    <sec id="sec-2">
      <title>Prior Work</title>
      <p>
        There are many reasoners for fragments of OWL, e.g 17 are mentioned
in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Many are sophisticated, however, the lack of the unique name
assumption can lead to problems for users developing schemata, making
it virtually impossible to use OWL to impose constraints.
      </p>
      <p>
        CWM (Closed World Machine)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is a reasoner which takes our same
pragmatic approach to closed worlds and unique names. It is capable of
expressing the types of constraints we are interested in, in a parsimonious
fashion. However, it functions at the level of transformations of RDF
les rather than being a fully functioning database system. Running the
reasoner would require export of the triple-store which is not practical for
large datasets which are changing in real-time.
      </p>
      <p>
        There are several tools provided with the Apache Jena[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] system which
facilitate consistency and instance checking. In particular the Eyeball
system is modular and allows the user to introduce new constraints by
adding Java code to perform inspection. However, it does not implement
full type checking of instance data as our constraints do.
4
      </p>
    </sec>
    <sec id="sec-3">
      <title>Future Work and Conclusion</title>
      <p>Triple stores may be applied to complex ontologies which have
incremental schema changes. However it is a challenge to provide tools which
make publishing OWL-based high quality (consistent) large scale data
easy. This requires constraints on schemata and admin tools to ensure
that updates to datastores maintain integrity. Our SimpleConsist service
has provided practical solutions to these problems. We found it useful to
reduce the expressive complexity which can be found in OWL when
constructing our interpretation of ontologies, limiting to unique names and
closed worlds and preferring to allow higher level data curation processes
deal with the greater ambiguity often inherent in large scale data.</p>
      <p>In future work constraint checks on more OWL features will be
explored. Our priority are OWL features that do not come into con ict
with manageability of the schema and tractability of constraint checking.
We would also like to have a method of checking instance updates which
limits checking to entities which could cause constraint failures. Instance
updates are generally more frequent than schema changes and so checker
execution time will be more important.
5</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgement</title>
      <p>This research is partially supported by the European Union's Horizon
2020 research and innovation programme under grant agreement No 644055
(ALIGNED, aligned-project.eu).
: duplicateClasses(L) No two classes may have the same</p>
      <p>name.
: orphanSubClasses(L) No subclass can be a child of an</p>
      <p>unspeci ed class.
: classCycles(L) No cycles exist in the class</p>
      <p>hierarchy.
: duplicateProperties(L) No two properties have the same</p>
      <p>name.
: orphanSubProperties(L) No Subproperty is the child of</p>
      <p>an unspeci ed property.
: propertyCycles(L) No cycles exist in the property</p>
      <p>hierarchy.
: invalidRange(L) Ranges must refer to classes or</p>
      <p>types, and must be unique.
: invalidDomain(L) Domains must refer to classes or</p>
      <p>types, and must be unique.
: orphanInstances(L) Instances must be members of a</p>
      <p>class.
: orphanProperties(L) Instances must not use properties</p>
      <p>which are not de ned.
: invalidInstanceRange(L) An element of the range of a</p>
      <p>property must be well typed.
: invalidInstanceDomain(L) An element of the domain of a
property must be well typed.</p>
      <p>Execution time of constraints
0
5e+06 1e+07 1.5e+07 2e+07 2.5e+07 3e+07</p>
      <p>Numberoftriples</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1. Whitepaper:
          <article-title>The ClioPatria semantic web server</article-title>
          . http://cliopatria.swiprolog.org/help/whitepaper.html.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Apache</surname>
          </string-name>
          .
          <source>Apache Jena</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. Soren Auer, Christian Bizer, Georgi Kobilarov, Jens Lehmann, and Zachary Ives.
          <article-title>DBpedia: A nucleus for a web of open data</article-title>
          .
          <source>In In 6th International Semantic Web Conference</source>
          , Busan, Korea, pages
          <volume>11</volume>
          {
          <fpage>15</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee.</surname>
          </string-name>
          CWM - closed world machine. http://www.w3.org/
          <year>2000</year>
          /10/ swap/doc/cwm.html,
          <year>2000</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Jeremy</surname>
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Carroll</surname>
            and
            <given-names>Je Z.</given-names>
          </string-name>
          <string-name>
            <surname>Pan</surname>
          </string-name>
          .
          <article-title>XML Schema datatypes in RDF and OWL</article-title>
          . W3c working group note, W3C,
          <year>March 2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kevin</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Feeney</surname>
          </string-name>
          ,
          <string-name>
            <surname>Declan O'Sullivan</surname>
            ,
            <given-names>Wei</given-names>
          </string-name>
          <string-name>
            <surname>Tai</surname>
            , and
            <given-names>Rob</given-names>
          </string-name>
          <string-name>
            <surname>Brennan</surname>
          </string-name>
          .
          <article-title>Improving curated Web-Data quality with structured harvesting and assessment</article-title>
          .
          <source>Int. J. Semant. Web Inf. Syst.</source>
          ,
          <volume>10</volume>
          (
          <issue>2</issue>
          ):
          <volume>35</volume>
          {
          <fpage>62</fpage>
          ,
          <string-name>
            <surname>April</surname>
          </string-name>
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Johannes</given-names>
            <surname>Lorey</surname>
          </string-name>
          , Ziawasch Abedjan, Felix Naumann, and
          <article-title>Christoph Bohm. RDF ontology (Re-)Engineering through large-scale data mining</article-title>
          .
          <source>In International Semantic Web Conference (ISWC)</source>
          ,
          <year>November 2011</year>
          .
          <article-title>Finalist of the Billion Triple Challenge</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Gavin</surname>
          </string-name>
          Mendel-Gleason.
          <article-title>SimpleConsist plugin for ClioPatria</article-title>
          . https://github.com/GavinMendelGleason/dacura.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Boris</given-names>
            <surname>Motik</surname>
          </string-name>
          , Ian Horrocks, and
          <string-name>
            <given-names>Ulrike</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Adding integrity constraints to OWL</article-title>
          .
          <string-name>
            <surname>In</surname>
            <given-names>OWLED</given-names>
          </string-name>
          , volume
          <volume>258</volume>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Wei</surname>
            <given-names>Tai</given-names>
          </string-name>
          , John Keeney, and
          <string-name>
            <surname>Declan O'Sullivan</surname>
          </string-name>
          .
          <article-title>Resource-constrained reasoning using a reasoner composition approach</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>6</volume>
          (
          <issue>1</issue>
          ):
          <volume>35</volume>
          {
          <fpage>59</fpage>
          ,
          <year>January 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Jan</surname>
            <given-names>Wielemaker</given-names>
          </string-name>
          , Tom Schrijvers,
          <string-name>
            <given-names>Markus</given-names>
            <surname>Triska</surname>
          </string-name>
          , and
          <article-title>Torbjorn Lager. SWI-prolog</article-title>
          .
          <source>November</source>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>