<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Pellint - A Performance Lint Tool for Pellet</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Harris T. Lin</string-name>
          <email>hlin@clarkparsia.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evren Sirin</string-name>
          <email>evren@clarkparsia.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Clark &amp; Parsia LLC</institution>
          ,
          <addr-line>Washington, DC</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Predicting the performance of a tableau reasoner for an OWL ontology is generally hard. It is even harder for users who are not familiar with the details of tableau algorithms. In this paper we present Pellint, a lint tool that reports and potentially repairs modeling constructs that are known to have bad performance characteristics for Pellet. We describe the collection of modeling constructs that are currently detected by Pellint and explain why these patterns typically cause performance problems. Finally we note that our implementation allows easy extension of patterns to be detected, and we encourage feedback and contribution of more problematic modeling constructs from the community.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        OWL reasoners are most useful when they have predictable performance, and
predicting performance for an ontology is not an easy task [3, Chap. 3, 9],
especially because the data are not usually correlated with their reasoning
performance. There are many sources of complexities that are known to have an
impact on the reasoning performance, however most of them are hardly visible to
OWL ontology engineers due to complex interactions between various modeling
constructs. Inspired by traditional lint tools [
        <xref ref-type="bibr" rid="ref4 ref6">6, 4</xref>
        ], we have developed Pellint, a
lint tool that reports and potentially repairs modeling constructs that are known
to have bad performance characteristics for Pellet, whose reasoning is based on
tableau algorithms. The results Pellint produces would mostly be applicable to
other tableau reasoners though not completely because the optimizations
implemented in reasoners differ.
      </p>
      <p>The goal of this tool is to help ontology engineers pinpoint, and potentially
remove, the source of reasoning complexities in their ontologies. Unlike traditional
lint tools, however, the problems found by Pellint do not necessarily indicate a
modeling error in the ontology. It just means that Pellet is probably going to be
relatively slow when reasoning with that ontology. Therefore, there might not be
an appropriate solution for some of the reported problems, i.e. the only options
available might be removing some of the modeling constructs.</p>
      <p>Section 2 explains the reasoning complexities in more detail. Section 3 lists
some of the nontrivial modeling constructs (patterns) detected by Pellint. Section
4 gives a functional description of Pellint.</p>
    </sec>
    <sec id="sec-2">
      <title>Reasoning Complexities</title>
      <p>
        In this section, we provide a very high-level description of the tableau algorithm
for DLs and explain the main sources of its reasoning complexities. We refer
the reader to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for a more detailed and accurate description of the tableau
algorithm.
      </p>
      <p>In the tableau algorithm all the reasoning services can be reduced to
consistency checking, which is done by building a completion graph. The nodes in the
completion graph intuitively stand for individuals and literals. Each node is
associated with its corresponding types. Property-value assertions are represented
as directed edges between nodes. If we are checking the consistency of an
ontology the initial completion graph is built from the asserted facts in the ontology.
If we are checking the satisfiability of a class the initial graph contains a single
node whose type is that concept. The reasoner repeatedly applies the tableau
expansion rules until a clash (i.e. a contradiction) is detected in the label of a
node, or until a clash-free graph is found to which no more rules are applicable.</p>
      <p>In the process of tableau completion, there are two main sources of
complexities: (1) nondeterminism in “completing” the graph; and (2) the size of the
graph built.
2.1</p>
      <sec id="sec-2-1">
        <title>Nondeterminism</title>
        <p>Building the completion graph is nondeterministic due to disjunctions which are
expressed with the UnionOf construct in OWL. The instances of the union class
must be an instance of at least one of the union elements. The reasoner will do a
case-by-case analysis to figure out if that is possible. A nondeterministic choice
made because of a union class will have an impact on the completion graph, e.g.
new edges may be added because of that choice. In the event that we made a
wrong choice (which will be indicated by an inconsistency in the later stages of
completion process) we have to backtrack and undo all the changes made to the
graph. This effect multiplies if we have many choices to make.</p>
        <p>There are many optimization algorithms implemented in Pellet to reduce the
effects of nondeterminism, but it is not hard to see how a very large number of
union classes will adversely affect reasoning time.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Completion graph size</title>
        <p>The size of the completion graph depends on the size of the initial graph (i.e., the
asserted instances), but also on the use of existential restrictions. Constructs like
SomeValuesFrom, MinCardinality, and ExactCardinality will cause the tableau
algorithm to create new nodes in the completion graph. Applying the tableau
completion rules to new nodes will require more processing time and possibly
increase the nondeterminism involved because there might be new
nondeterministic choices made for these new nodes.</p>
        <p>Predicting the exact size of the completion graph (without actually building
the graph) is not possible, but Pellint uses some heuristics and graph analysis
techniques to compute an approximation of this size.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Patterns Detected</title>
      <p>
        In this section we describe some nontrivial patterns Pellint currently detects, and
we explain why they may impact on the reasoning performance. For a complete
list please see the Pellint distribution [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>– Explicit GCI A subclass axiom with a complex concept expression on
the left hand side is called a General Concept Inclusion (GCI) axiom. A
tableau-based reasoner deals with GCI axioms by converting them into a
standard form. For example, SubClassOf(C D) is converted into the axiom
SubClassOf(owl:Thing UnionOf(ComplementOf(C) D)) where C and D can be
arbitrary concepts. Since every individual is an instance of owl:Thing, the
reasoner then applies the converted axiom to every individual. We observe
that every convertion produces a nondeterministic choice due to the UnionOf,
which is then applied to every individual. Hence GCI axioms are extremely
expensive and reported by Pellint.
– Implicit GCI Most of the time ontologies do not have explicit GCI axioms.</p>
      <p>However, there are cases where the interaction between different axioms
create implicit GCIs. For example, if a concept is equivalent to a class
expression and is also a subclass of another expression, then it implicitly defines
a subclass axiom whose left hand side is complex. There are optimization
techniques, e.g. absorption [3, Chap. 9], that minimizes the effect of such
GCIs. However, these techniques still introduce additional nondeterminism
that slows down the overall reasoning process.
– Existential Explosion An existential restriction, e.g. SomeValuesFrom or
MinCardinality construct, causes the tableau reasoner to create new nodes
in the completion graph. Many existential restrictions may interact in a
complex manner and may generate an intractable number of nodes in the
completion graph. Pellint estimates the total number of individuals generated
by such restrictions and reports as a lint if it exceeds a configurable number.</p>
      <p>Pellint currently does not offer any repairs for this pattern.
– Equivalence to AllValuesFrom An AllValuesFrom restriction does not
require the instance to have a value for the referred property but only
restricts the values for existing property values. This means any concept not
Pattern Name</p>
      <p>Example</p>
      <p>Repair
Explicit GCI SubClassOf(IntersectionOf(A B) C) None
Implicit GCI EquivalentClasses(C IntersectionOf(A B)), Change all to</p>
      <p>SubClassOf(C SomeValuesFrom(R D)) subclass axioms
Existential Interconnected usages of SomeValuesFrom, None
Explosion MinCardinality, and/or ExactCardinality
Equivalence to EquivalentClasses(C AllValuesFrom(R D)) Change to
AllValuesFrom a subclass axiom
having the property value, e.g. a concept that is disjoint with the domain of
the property, will satisfy the AllValuesFrom restriction and turn out to be a
subclass of the concept defined to be equivalent to the restriction (class C in
the example from Table 1). This typically leads to unintended inferences and
additional inferences may eventually slow down the reasoning performance.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Tool Description</title>
      <p>
        The patterns described above along with some additional patterns have been
implemented in Pellint. Pellint is available as an open source command line tool
and can be downloaded at [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Pellint receives an ontology as its input, and
produces a lint report summarizing all the detected lint patterns in plain text.
It may also save the repaired ontology to a file, where it applies the available
recommended repairs to the detected lints.
      </p>
      <p>In addition, Pellint also performs syntactic checks to detect cases such as
a resource being used without a type declaration. Such untyped resources are
legal in OWL Full but not allowed in OWL DL. Some tools automatically detect
and “patch” such syntax issues using a set of heuristics. However, these kinds
of issues might indicate an error in the ontology, e.g. the most common case is
a spelling error in the URI of the resource.</p>
      <p>
        Pellint is designed for easy extensions of new patterns based on the OWL
API library [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. The package includes Javadoc documentation and a step-by-step
guide on how to write new patterns and integrate them into Pellint. The goal is
to encourage users to experiment with new patterns with their own ontologies
and contribute them to the community.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>We have developed Pellint, a lint tool for ontologies that reports and potentially
repairs modeling constructs that are known to have bad performance
characteristics for Pellet. Pellint is designed for easy extensions of new patterns and we
encourage users to contribute in the future.</p>
      <p>
        As part of our future work we are considering a more interactive lint detection
tool, e.g. integration with the Lint Roll Prot´eg´e plug-in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Lint</given-names>
            <surname>Roll</surname>
          </string-name>
          <article-title>Prot´eg´e plug-in</article-title>
          . http://www.cs.man.ac.uk/∼iannonel/lintRoll.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Pellint</surname>
          </string-name>
          . http://pellet.owldl.com/pellint.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>F.</given-names>
            <surname>Baader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Calvanese</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. L.</given-names>
            <surname>McGuinness</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nardi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. F.</given-names>
            <surname>Patel-Schneider</surname>
          </string-name>
          .
          <article-title>The Description Logic Handbook</article-title>
          . Cambridge University Press,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>I. F.</given-names>
            <surname>Darwin. Checking C Programs with Lint. O'Reilly Media</surname>
          </string-name>
          ,
          <year>1988</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Horridge</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Bechhofer</surname>
          </string-name>
          .
          <source>Igniting the OWL 1</source>
          .
          <article-title>1 touch paper: The OWL API</article-title>
          . In
          <source>In Proc. OWL-ED</source>
          <year>2007</year>
          , volume
          <volume>258</volume>
          <source>of CEUR</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Johnson</surname>
          </string-name>
          . Lint,
          <article-title>a C program checker</article-title>
          .
          <source>Technical report, Bell Laboratories</source>
          ,
          <year>1977</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>