<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Repairing Incomplete Reasoners</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgos Stoilos</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bernardo Cuenca Grau</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oxford University Computing Laboratory</institution>
          <addr-line>Wolfson Building, Parks Road, OX1 3QD, Oxford</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The pressing need for scalable query answering has motivated the development of many incomplete ontology-based reasoners. Improving the completeness of such systems without sacrificing their favourable performance would be beneficial to many applications. In this paper, we address the following problem: given a query q, a TBox T and an incomplete reasoner ans (i.e., one that accepts q and T as valid inputs but fails to return all query answers to q w.r.t. T and some dataset), we aim at computing a (hopefully small) TBox R (a repair ) which logically follows from T and such that ans is provably complete w.r.t. q and T ∪ R, regardless of the input data. We identify conditions on q, T and ans that are sufficient to ensure the existence of such repair and present a practical repair generation algorithm. Our experiments suggest that repairs of small size can be computed for well-known ontologies and reasoners.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In Semantic Web applications, a description logic TBox is frequently used for
describing the meaning of data stored in various sources. A query language
(based on conjunctive queries) is then used for data access [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Unfortunately,
when using an expressive ontology language, computing query answers can be
costly, and in a Semantic Web setting, datasets may be extremely large.
      </p>
      <p>
        Motivated by the need for scalable query answering, there has been a growing
interest in the development of incomplete ontology-based reasoning systems,
which are not guaranteed to compute all query answers for some combinations
of queries, TBoxes and datasets accepted as valid inputs. Many such systems
(e.g., Jena, OWLim, and Oracle’s Semantic Store) are based on rule technologies;
others are based on approximate reasoning techniques [
        <xref ref-type="bibr" rid="ref1 ref11 ref7">7, 11, 1</xref>
        ].
      </p>
      <p>
        Incomplete query answers, however, may not be acceptable in some cases, and
even if some incompleteness is acceptable, computing as many answers as
possible without affecting performance would be beneficial to many applications. Not
surprisingly, many techniques for improving the completeness of (incomplete)
reasoning systems have been proposed in recent years. A widely use approach
relies on the materialisation of certain kinds of TBox consequences, prior to
accessing the data. TBox materialisation is indeed a convenient and low-cost
solution, as it does not rely on modifying the internals of the reasoning
system at hand: it can either be included by reasoning systems’ implementors as
a pre-processing step (e.g., DLEJena [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], PelletDB, TrOWL [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], Minerva [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]
and DLDB [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), or it can be performed by application designers, who have little
knowledge of and/or control over the reasoner. This approach, however, exhibits
several limitations:
1. TBox materialisation is done blindfold, without taking into account neither
the capabilities of the incomplete reasoner under consideration, nor the
precise sources of its incompleteness for the application at hand.
2. In order to avoid a blowup in the size of the TBox, materialisation involves
only simple atomic subsumption relations, and relevant entailments
describing more complex dependencies are typically ignored.
3. It is often difficult, or even infeasible, to estimate the extent to which
materialisation has improved the system’s completeness for the given application
at hand, especially if the data is very large, unknown, or frequently changing.
In this paper, we present a novel approach to TBox materialisation that attempts
to address many of these limitations. Given a TBox T expressed in a language
L, a conjunctive query (CQ) q, and a reasoning system ans satisfying certain
properties known in advance (e.g., soundness), we aim at computing a (hopefully
small) set of TBox axioms R, called a (q, T )-repair for ans, such that R logically
follows from T and the system is guaranteed to compute all answers to q w.r.t.
T ∪ R (and hence also w.r.t. T ), regardless of the input data.
      </p>
      <p>Our TBox materialisation techniques are “guided” by both the input query
and TBox and the capabilities of the reasoner at hand, thus limiting the number
of materialised consequences. Furthermore, our approach provides a provable
completeness guarantee: after extending T with a (q, T )-repair, the incomplete
reasoner at hand is indistinguishable from a complete one w.r.t. q and T , and
regardless of the data (which is often unknown or frequently changing). Finally,
if a (q, T )-repair does not exist for the given reasoner, we also provide means
for quantifying the extent to which the reasoner’s completeness improved upon
materialisation of a given set of TBox consequences.</p>
      <p>
        Our main contributions can be summarised as follows:
– We formalise the notion of a repair for a given CQ, TBox and reasoner.
– We identify sufficient conditions for the existence of a repair. Our conditions
establish a bridge between the framework of completeness evaluation [
        <xref ref-type="bibr" rid="ref15 ref16">16, 15</xref>
        ]
and the notion of interpolation in the context of DLs [
        <xref ref-type="bibr" rid="ref14 ref8 ref9">8, 14, 9</xref>
        ].
– We then present a practical algorithm that is guaranteed to compute a repair
under certain assumptions. Our algorithm extends well-known query
rewriting techniques for DLs [
        <xref ref-type="bibr" rid="ref12 ref2">12, 2</xref>
        ] with means for computing suitable interpolants
for each query in the rewriting.
– Although the assumptions required by our algorithm may seem somewhat
strict, we provide empirical evidence that repairs can be computed for
wellknown ontologies and incomplete reasoners. Furthermore, the size of such
repairs is surprisingly small and preliminary evaluation has shown that their
impact on performance is negligible. Finally, for reasoner which are complete
only for a very weak description logic, we show that our techniques can
provide “partial repairs”, whose impact can be quantified.
Although our main focus is on repairing rule-based reasoners, we feel that our
techniques are also highly relevant to recent DL approximation frameworks [
        <xref ref-type="bibr" rid="ref1 ref11">11,
1</xref>
        ], and provide new insights into the process of approximation.
      </p>
      <p>Proofs for all lemmas and theorems can be found online.1
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>We use DLs that do not allow for nominals and we also assume that only atomic
assertions are allowed in ABoxes.</p>
      <p>Let C, R, and I be countable, pairwise disjoint sets of atomic concepts,
atomic roles, and individuals. An E LHI-role is either an atomic role P or its
inverse P . The set of E LHI-concepts is defined inductively by the following
grammar, where A ∈ C, R is an E LHI-role, and C(i) are E LHI-concepts:</p>
      <p>C := ⊤ | ⊥ | A | C1 ⊓ C2 | ∃R.C</p>
      <p>An E LHI-TBox T is a finite set of GCIs C1 ⊑ C2 with Ci E LHI-concepts
and RIAs R1 ⊑ R2 with Ri E LHI-roles. An ABox A is a finite set of assertions
A(a) or P (a, b), for A ∈ C, P ∈ R, and a, b ∈ I. An E LHI-ontology O = T ∪ A
consists of an E LHI-TBox T and an ABox A.</p>
      <p>
        Moreover, the DLP fragment of E LHI [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] (DLP-E LHI) is obtained from
E LHI by disallowing concepts of the form ∃R.C on the right-hand side of GCIs.
      </p>
      <p>A conjunctive query (CQ) q is an expression of the form</p>
      <p>q(x1, . . . , xn) ← α1, . . . , αm
where each αi is a concept or role atom of the form A(t) or R(t, t0) (for t, t0
function-free terms and A, R atomic) and each xj is a distinguished variable
occurring in some αi. The remaining variables are called undistinguished. We
use the standard notion of a certain answer to q w.r.t. O = T ∪ A, and we
denote with cert(q, O) the set of all certain answers to q w.r.t. O. Given q1, q2
with the same distinguished variables ~x we say that q1 subsumes q2 w.r.t. T ,
written q2 ⊑T q1, if the FO-entailment T |= ∀~x.(Bq2 → Bq1 ) holds, with Bqi the
formula obtained from the conjunction of all body atoms in qi by existentially
quantifying over all undistinguished variables.</p>
      <p>
        Finally, a UCQ rewriting u for a CQ q and TBox T is a union of conjunctive
queries (a set of CQs with the same distinguished variables) that satisfies the
following properties for each ABox A such that O = T ∪ A is consistent [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]:
cert(q0, A) ⊆ cert(q, O) for each q0 ∈ u, and cert(q, O) ⊆ ∪q02u cert(q0, A).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Completeness Repairs</title>
      <p>
        We start by recalling from [
        <xref ref-type="bibr" rid="ref15 ref16">16, 15</xref>
        ] the notions of a CQ answering algorithm, and
of completeness for a query q and TBox T .
      </p>
      <sec id="sec-3-1">
        <title>1 https://rapidshare.com/files/460900188/main.pdf</title>
        <p>Definition 1. A CQ answering algorithm ans for a DL L is a procedure that,
for each L-ontology O and CQ q computes in a finite number of steps a set
ans(q, O) of tuples of constants. It is sound if ans(q, O) ⊆ cert(q, O) for each O
and q. It is complete if cert(q, O) ⊆ ans(q, O) for each O and q. It is monotonic
if ans(q, O) ⊆ ans(q, O0) for each O, O0 and q with O ⊆ O0. It is invariant under
renamings if, for each q, O = T ∪ A, O0 = T ∪ A0 and isomorphism ν between A
and A0 we have ~a ∈ ans(q, O) if and only if ν(~a) ∈ ans(q, O0). It is well-behaved
if it is sound, monotonic and invariant under renamings. We denote with CL
the class of all well-behaved CQ answering algorithms for L.</p>
        <p>Finally, we say that ans is (q, T )-complete for a query q and TBox T if for
each ABox A s.t. O = T ∪ A is consistent, we have that cert(q, O) ⊆ ans(q, O).
This general definition allow us to abstract from the specifics of implemented
systems. All incomplete reasoning algorithms known to us are well-behaved: they
are sound, query answers can only grow if we add axioms to the ontology, and
they do not depend on trivial renamings of ABox individuals. Furthermore, a
(q, T )-complete algorithm, even if incomplete in general, behaves exactly like
a complete one w.r.t. the given query q and TBox T , regardless of the data.
Consider, as a running example, the following E LHI-TBox T0 and query q0:
T0 = {∃takes.Course ⊑ Student, GradCo ⊑ Course,</p>
        <p>GradSt ⊑ ∃takes.GradCo, PhDSt ⊑ GradSt, Student ⊓ Course ⊑ ⊥}
q0(x) ← Student(x)
Consider an algorithm ans0 that first translates DLP-E LHI GCIs in T0 into a
datalog program using standard transformations (and discarding the remaining
GCIs), then saturates the input ABox w.r.t. the program, and finally answers
q0 w.r.t. the saturated ABox. Clearly, ans0 is not (q0, T0)-complete: it returns
the empty set for A = {GradSt(a)}, whereas cert(q0, T0 ∪ A) = {a}. Computing
the missing answer requires axiom GradSt ⊑ ∃takes.GradCo, which is discarded.
Note, however, that one could dispense with the discarded axiom and still recover
the missing answer by extending T0 with the following DLP-E LHI TBox R0:
R0 = {GradSt ⊑ Student}
(1)
Since T0 |= R0, extending T0 with R0 does not change the meaning of T0. The
behaviour of ans0 w.r.t. T0 ∪R0, however, is now different because ans0 translates
R0 into a datalog clause and hence ans0(q0, T0 ∪ R0 ∪ A) = {a}.</p>
        <p>Note also that adding R0 allows us to recover missing answers w.r.t. many
other ABoxes different from A. For example, given A0 = {PhDSt(b)} and T0 ∪A0,
we have that b is a certain answer to q0 that is computed by ans0 only after
extending T0 with R0.</p>
        <p>Our example suggests that given q, T and an algorithm ans that is incomplete
for q and T , it may be possible to recover all missing answers to q (w.r.t. all
possible ABoxes) by “repairing” T for ans—that is, by materialising a (hopefully
small) number of T -consequences that ans can process.
Definition 2. Let L be a DL, q a CQ, T an L-TBox, A an ABox, and ans a
CQ answering algorithm for L. A (q, T , A)-repair R for ans is an L-TBox such
that
1. T |= R; and
2. cert(q, T ∪ A) ⊆ ans(q, T ∪ R ∪ A).</p>
        <p>Given a set A of ABoxes, we say that R is a (q, T , A)-repair for ans if it is a
(q, T , A)-repair for ans for each A ∈ A. Finally, we say that R is a (q, T )-repair
for ans if it is a (q, T , A)-repair for ans for every ABox A s.t. T ∪A is consistent.</p>
        <p>
          Unfortunately, deciding, on the one hand, whether ans is (q, T )-complete and,
on the other hand, whether R is a (q, T )-repair for ans may involve computing
query answers w.r.t. an unlimited number of ABoxes. In [
          <xref ref-type="bibr" rid="ref15 ref16">16, 15</xref>
          ], however, it was
shown that under certain conditions on q and T it is possible to decide (q, T
)completeness by computing query answers only w.r.t. a finite (and hopefully
small) number of ABoxes. Such finite set of ABoxes is called a testing base.
Definition 3. Let L be a description logic, let C be a class of CQ answering
algorithms for L, let T be an L-TBox, and let q be a CQ. A (q, T )-testing base
B for C is a finite set of ABoxes A such that the following property holds for
each algorithm ans in C: if cert(q, T ∪ A) ⊆ ans(q, T ∪ A) for each A ∈ B, then
cert(q, T ∪ A0) ⊆ ans(q, T ∪ A0) for each ABox A0 consistent with T .
In our previous work [
          <xref ref-type="bibr" rid="ref15 ref16">16, 15</xref>
          ] we have shown how to compute a (q, T )-testing
base (if one exists) for the class CL of well-behaved algorithms.
        </p>
        <p>Consider our running example. The set of ABoxes B0 = {A1, . . . , A5} defined
as follows is a (q0, T0)-testing base for the class of all well-behaved algorithms:
A1 = {Student(a)}, A2 = {takes(a, b), Course(b)},</p>
        <p>A3 = {GradSt(a)}, A4 = {takes(a, b), GradCo(b)}, A5 = {PhDSt(a)}
Intuitively, in order to compute a (q, T )-repair, we only need to consider the
(finitely many) ABoxes in a (q, T )-testing base instead of all (infinitely many)
possible ABoxes.</p>
        <p>Theorem 1. Let L be a DL, q a CQ, T an L-TBox, and B a (q, T )-testing base
for CL. The following property holds for each ans ∈ CL: If R is a (q, T , B)-repair
for ans, then R is also a (q, T )-repair for ans.</p>
        <p>In our running example, clearly, ans0 only fails to compute certain answers w.r.t.
A3 and A5. Furthermore, R0 is a (q0, T0, B0)-repair for ans0. But then, Theorem
1 ensures that R0 is also a (q0, T0)-repair. Thus, by simply adding R0 to T0, we
can guarantee that ans0 will compute all certain answers, regardless of the data.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Computing Repairs</title>
      <p>Theorem 1 provides a sufficient condition for the existence of a (q, T )-repair for
a well-behaved algorithm ans. To apply the theorem, however, we first need to
compute a (q, T )-testing base B and then a (q, T , B)-repair R for ans.</p>
      <p>
        As shown in [
        <xref ref-type="bibr" rid="ref15 ref16">15, 16</xref>
        ], a (q, T )-testing base B for CL can always be computed
from a UCQ rewriting u for q and T . UCQ rewritings are guaranteed to exist
if T is expressed in the DL underpinning the OWL 2 QL profile and they may
also exist even if T is expressed in the DLs underpinning the OWL 2 EL profile;
this is often the case in many real-world ontologies. Hence, in this paper we will
restrict ourselves to TBoxes and queries for which a UCQ rewriting exists.2
      </p>
      <p>Given a (q, T )-testing base B, however, the existence of a (q, T , B)-repair
may also depend on the capabilities of the algorithm under consideration. For
instance, in the case of our running example, no (q0, T0, B0)-repair exists for an
algorithm that ignores the TBox and simply matches the query to the data (even
though such algorithm is well-behaved).</p>
      <p>
        Our techniques for computing repairs rely on a particular form of
interpolation [
        <xref ref-type="bibr" rid="ref14 ref8">8, 14</xref>
        ]. We next make the connection between repairs and interpolation
precise, and describe an algorithm for computing interpolants.
4.1
      </p>
      <sec id="sec-4-1">
        <title>Completeness Repairs and Interpolation</title>
        <p>As already discussed, algorithm ans0 from our running example misses answers
w.r.t. A3, A5 ∈ B0, and R0 from (1) is a (q0, T0, B0)-repair for ans0. Furthermore,
B0 can be obtained by “instantiating” queries from the following UCQ rewriting
u = {q1, . . . , q5} for q0 and T0:
q1(x) ← Student(x), q2(x) ← takes(x, y), Course(y),
q3(x) ← GradSt(x), q4(x) ← takes(x, y), GradCo(y), q5(x) ← PhDSt(x)
In particular, A3 = {GradSt(a)} is obtained by instantiating q3. We then say
that q3 is “relevant” to ans0. Formal definitions of instantiation and relevance
are given next.</p>
        <p>Definition 4. Let q be a CQ, Bq the body atoms in q, and π a mapping from
all variables of q to individuals. The following ABox is an instantiation of q:</p>
        <p>Aqπ := {A(π(x)) | A(x) ∈ Bq} ∪ {R(π(x), π(y)) | R(x, y) ∈ Bq}
Definition 5. Let u be a UCQ rewriting for q and an L-TBox T , let ans be
wellbehaved and let B be a (q, T )-testing base. We say that qi ∈ u is relevant to ans if
an instantiation Ai of qi exists s.t. Ai ∈ B and cert(q, T ∪ Ai) 6⊆ ans(q, T ∪ Ai).
Note also that q3 is subsumed by q0 w.r.t. T0. We can then identify R0 as an
interpolant for the subsumption q3 ⊑T0 q0 since T0 |= R0, q3 ⊑R0 q0 and R0
only mentions symbols occurring in either q0 or q3.</p>
        <p>Definition 6. Let T be an L-TBox and let q, q0 be CQs such that q0 ⊑T q. A
TBox = expressed in fragment L0 of L and using only symbols occurring in q or
q0 is an L0-interpolant for q0 ⊑T q if T |= = and q0 ⊑= q.
2 The notion of a testing base can be extended, and such (extended) testing bases can
be computed from Datalog rewritings. The study of such extensions is ongoing work.
The following theorem establishes which are the relevant interpolants for
computing a repair. Furthermore, if each such interpolants can be expressed in a
weak enough language, for which ans is provably complete, we can easily obtain
a repair from the union of such interpolants.</p>
        <p>Theorem 2. Let u be a UCQ rewriting for q and an L-TBox T . Let ans ∈ CL
be an algorithm that is complete for a fragment L0 of L. Let q1, . . . , qn be those
queries in u relevant to ans. Finally, for each 1 ≤ i ≤ n let =i be an
L0interpolant for qi ⊑T q. Then, = = ∪1 i n =i is a (q, T )-repair for ans.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Computing Interpolants</title>
        <p>
          Assumptions First, we restrict ourselves to TBoxes expressed in E LHI.
Systems such as REQUIEM can compute a UCQ rewriting w.r.t. an E LHI-TBox,
provided that one exists [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ].
        </p>
        <p>Second, we observe that most state-of-the-art incomplete systems are based
on deductive database and rule technologies. Given O = T ∪ A and q as input,
the ABox A is deterministically saturated with new assertions using axioms
in T , and then q is answered directly w.r.t. the saturated ABox. Furthermore
existing systems rarely extend the input ABox with “fresh” individuals and hence
cannot handle existential quantification on the right-hand-side of TBox axioms.
In fact, many such systems are complete for DLs of the DLP family. Thus, we
will consider DLP-E LHI as our target language for computing interpolants.</p>
        <p>In this setting, we also need to restrict the kinds of queries we are considering
to guarantee the existence of the relevant interpolants. It is rather
straightforward to find E LHI-TBoxes and queries q for which a UCQ rewriting does exist,
but the DLP-E LHI interpolants required by Theorem 2 do not. For example,
let T = {A ⊑ ∃R.C} and consider the following queries</p>
        <p>q(x) ← R(x, y), C(y) q0(x) ← A(x)
Clearly, {q, q0} is a UCQ rewriting for q and T . There is, however, no DLP-E LHI
interpolant for q0 ⊑T q since such interpolant would need to contain existential
quantification on the right-hand-side of GCIs.</p>
        <p>
          Consequently, from now onwards we focus on queries with only distinguished
variables. This restriction is not unreasonable in practice since the standard
language for querying ontologies on the Web (SPARQL) treats undistinguished
variables as if they were distinguished [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
        <p>The Algorithm Algorithm 1 computes a UCQ rewriting u (if one exists) for an
E LHI-TBox T and a query q with no undistinguished variables. Furthermore,
for each q0 ∈ u, it computes a DLP-E LHI interpolant for q0 ⊑T q.</p>
        <p>
          Our algorithm extends the rewriting algorithm implemented in the
REQUIEM system [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], which first transforms T and q into a set of clauses (Lines
1, 2 in Algorithm 1) and then uses a resolution calculus to compute u.
Algorithm 1 Extended Resolution-based algorithm
Algorithm: extendedUCQRewriting(q, T )
Input: T ∈ ELHI and q with no undistinguished vars.
        </p>
        <p>
          Interpolants are tracked down during resolution by means of a mapping σ
from clauses to sets of clauses. Initially, σ maps q to the empty set (since q ⊑; q),
and each clause from T to itself (Lines 3, 4). The rules in the resolution-based
calculus from [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] are exhaustively applied in Lines 6–16, and new clauses are
generated in Lines 7–9 exactly as in [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Then, for each new clause C produced as
a resolvent of C1 and C2 our algorithm computes its corresponding interpolant
σ(C). This is done by first saturating σ(C1) ∪ σ(C2) (Lines 11–14) and then
removing all clauses mentioning a symbol that is neither in C nor in q (Line
15). After u is saturated, the algorithm removes all clauses that contain function
symbols (Line 16). Then, at this point, both rewriting and interpolants are given
as sets of function-free clauses, so the procedure unfold transforms each C ∈ u
into a datalog rule and “rolls-up” σ(C) into a DLP-E LHI GCI. Finally, the
algorithm rejects the input if u is not a UCQ (e.g., a datalog program) and
returns both the UCQ rewriting and the interpolant for each query otherwise.
Lemma 1. If u computed by Algorithm 1 is a UCQ, then u is a UCQ rewriting
for q and T . Furthermore, for each q0 ∈ u, σ(q0) is a DLP-E LHI interpolant
for q0 ⊑T q.
        </p>
        <p>Algorithm 2 computes a repair by considering only those queries and ABoxes
for which ans fails. Correctness follows from Theorem 2 and Lemma 1.
Theorem 3. Algorithm 2 computes a (q, T )-repair for an algorithm ans that is
complete for DLP-E LHI.
Algorithm 2 Computing a Repair
Algorithm: computeRepair(ans, q, T )
Input: T , q as in Algorithm 1, ans well-behaved.
If ans is not complete for DLP-E LHI (e.g., if it is an RDFS-reasoner), then the
TBox R computed by Algorithm 2 may not be a repair; however, adding R may
still have the desired effect of “mitigating” the reasoner’s incompleteness.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Evaluation</title>
      <p>
        We have developed a prototype tool based on Algorithms 1 and 2. Our
implementation uses REQUIEM [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] and the system described in [
        <xref ref-type="bibr" rid="ref15 ref16">16, 15</xref>
        ] for
computing (q, T )-testing bases and relevant queries. Additionally, our tool implements
optimisations to eliminate redundancy in the computed repairs.
      </p>
      <p>
        We have evaluated our techniques first using LUBM’s TBox and its 14 test
queries (all of which contain only distinguished variables) and then a version
of the GALEN TBox together with 4 sample queries for which a UCQ
rewriting can be computed using REQUIEM. In each case, we have evaluated the
following systems: Sesame 2.3-prl,3 OWLim,4 Jena v2.6.35 and DLEJena.6 Our
experiments consisted of the following steps:
1. For each TBox T and test query q we computed a (q, T )-testing base and
measured the proportion of certain answers that each system is able to
compute w.r.t. the ABoxes in the (q, T )-testing base (δini) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
2. For each system and each q for which it was found incomplete (i.e., δini &lt; 1),
we computed a repair R (which might be only “partial” for some systems).
3. For each case in Step 2 we have extended T with the corresponding R,
computed a (q, T ∪ R)-testing base and obtained a new completeness degree
value δfin.
4. To evaluate the effects that adding TBox axioms had on systems’
performance in the LUBM scenario, we used the LUBM performance evaluation
      </p>
      <sec id="sec-5-1">
        <title>3 http://www.openrdf.org/ 4 http://www.ontotext.com/owlim/ 5 http://jena.sourceforge.net/ 6 http://lpis.csd.auth.gr/systems/DLEJena/</title>
        <p>LUBM</p>
        <p>Sesame OWLim/Jena</p>
        <p>
          Q2 Q4 Q5 Q6 Q7 Q8 Q9 Q10 Q12 Q13 Q6 Q8 Q10
♯urel 1 4 4 166 34 27 1 166 2 4 1 4 1
♯R 0 0 4 1 1 1 0 1 0 4 1 1 1
δini .75 .68 0 .003 .04 .04 0 .001 .25 .2 .99 .98 .99
δfin .75 .68 .75 .004 .05 .05 0 .002 .25 .2 1 1 1
tool [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and measured the loading and query answering times using the
original and the repaired TBox.
        </p>
        <p>Our results for LUBM are summarised in the upper part of Table 1, where ♯urel
represents the number of relevant queries in the UCQ rewriting (see Definition
5), and ♯R the number of axioms in the computed repair. The only system not
included in the table is DLEJena, which was already complete for all queries
(w.r.t. the original TBox). OWLim and Jena were found incomplete for three
queries and Sesame for 10 of them. As shown in the table, we were able to repair
both OWLim and Jena for all queries. This was a reasonable result, since these
systems are considered to be complete for DLP-E LHI. Furthermore, all repairs
consisted of the same, unique axiom. In contrast, no query could be fully repaired
for Sesame. A slight increase in the completeness degree can be observed in 4
queries and a more significant one only in Q5. Again, this is unsurprising, since
Sesame is essentially an RDFS reasoner. No measurable performance changes
were observed for any system after repair, which suggests that completeness
can be improved (at least in this scenario) without affecting scalability. Finally,
repairs were computed in times ranging between a second and 3 minutes.</p>
        <p>Results for GALEN are given in the lower part of the table. As shown in
the table, we were able to fully repair OWLim, Jena and DLEJena, and repairs
contained at most two axioms in each case (a relatively small number compared
to the number of queries in urel). All repairs could be computed in less than a
minute. Finally, Sesame hasn’t been included in the table because no
improvement could be measured for any query.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusions</title>
      <p>In this paper, we have studied the problem of repairing an incomplete reasoning
systems given a query and a TBox. An important feature of our repairs is that
they provide data independent completeness guarantees: once an incomplete
system has been repaired, we can ensure that its output answers will be the same
as those of a complete one for the given q and T , regardless of how large and
complex the dataset is. Furthermore, our preliminary experiments suggest that
repairs can indeed be very small and their effect on systems’ performance may
even be negligible in some applications. Our results will allow application
designers to use highly scalable reasoning systems in their application with the
guarantee that they will behave as if they were complete, thus bringing together
“the best of both worlds”. The extension of our techniques to more expressive
DLs and a more extensive evaluation are ongoing work.</p>
      <p>Acknowledgments Research supported by project SEALS (FP7-ICT-238975).
B. Cuenca Grau is supported by a Royal Society University Research Fellowship.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Botoeva</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez-Muro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Expressive approximations in DL-Lite ontologies</article-title>
          .
          <source>In: AIMSA</source>
          . pp.
          <fpage>21</fpage>
          -
          <lpage>31</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>De Giacomo</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Tractable reasoning and efficient query answering in description logics: The DL-Lite family</article-title>
          .
          <source>Journal of Automated Reasoning</source>
          <volume>39</volume>
          (
          <issue>3</issue>
          ),
          <fpage>385</fpage>
          -
          <lpage>429</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>G.</given-names>
            <surname>Meditskos</surname>
          </string-name>
          and
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Bassiliades: Combining a DL reasoner and a rule engine for improving entailment-based OWL reasoning</article-title>
          .
          <source>In: Proc. ISWC</source>
          . pp.
          <fpage>277</fpage>
          -
          <lpage>292</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Glimm</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , Kr¨otzsch, M.:
          <article-title>SPARQL beyond subgraph matching</article-title>
          .
          <source>In: Proc. of ISWC 2010</source>
          . pp.
          <fpage>241</fpage>
          -
          <lpage>256</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Grosof</surname>
            ,
            <given-names>B.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Volz</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Description logic programs: Combining logic programs with description logic</article-title>
          .
          <source>In: Proc. of WWW</source>
          . pp.
          <fpage>48</fpage>
          -
          <lpage>57</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heflin</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>LUBM: A Benchmark for OWL Knowledge Base Systems</article-title>
          .
          <source>Journal of Web Semantics</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          ),
          <fpage>158</fpage>
          -
          <lpage>182</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vrandecic</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Resolution-based approximate reasoning for OWL DL</article-title>
          .
          <source>In: Proc. of ISWC</source>
          <year>2005</year>
          ). pp.
          <fpage>383</fpage>
          -
          <lpage>397</lpage>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Konev</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Walther</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Forgetting and uniform interpolation in largescale description logic terminologies</article-title>
          .
          <source>In: Proc. of IJCAI 09</source>
          . pp.
          <fpage>830</fpage>
          -
          <lpage>835</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolter</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Deciding inseparability and conservative extensions in the description logic EL</article-title>
          .
          <source>Journal of Symbolic Computation</source>
          <volume>45</volume>
          ,
          <fpage>194</fpage>
          -
          <lpage>228</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xie</surname>
            ,
            <given-names>G.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards a complete OWL ontology benchmark</article-title>
          .
          <source>In: Proc. of ESWC 2006</source>
          . pp.
          <fpage>125</fpage>
          -
          <lpage>139</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thomas</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Approximating</surname>
            <given-names>OWL-DL</given-names>
          </string-name>
          <string-name>
            <surname>Ontologies</surname>
          </string-name>
          .
          <source>In: Proc. of AAAI-07</source>
          . pp.
          <fpage>1434</fpage>
          -
          <lpage>1439</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. P´
          <string-name>
            <surname>erez-Urbina</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Tractable query answering and rewriting under description logic constraints</article-title>
          .
          <source>Journal of Applied Logic</source>
          <volume>8</volume>
          (
          <issue>2</issue>
          ),
          <fpage>186</fpage>
          -
          <lpage>209</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Poggi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lembo</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacomo</surname>
            ,
            <given-names>G.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lenzerini</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rosati</surname>
          </string-name>
          , R.:
          <article-title>Linking data to ontologies</article-title>
          .
          <source>Journal of Data Semantics X</source>
          ,
          <fpage>133</fpage>
          -
          <lpage>173</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Seylan</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franconi</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>de Bruijn</surname>
          </string-name>
          , J.:
          <article-title>Effective query rewriting with ontologies over dboxes</article-title>
          .
          <source>In: Proc. of IJCAI 09</source>
          . pp.
          <fpage>923</fpage>
          -
          <lpage>925</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>Completeness guarantees for incomplete reasoners</article-title>
          .
          <source>In: Proc. of ISWC 2010. LNCS</source>
          , Springer (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Stoilos</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Cuenca</given-names>
            <surname>Grau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Horrocks</surname>
          </string-name>
          ,
          <string-name>
            <surname>I.</surname>
          </string-name>
          :
          <article-title>How incomplete is your semantic web reasoner?</article-title>
          <source>In: Proc. of AAAI-10</source>
          . pp.
          <fpage>1431</fpage>
          -
          <lpage>1436</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>