<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>E cient Axiom Pinpointing in EL using SAT Technology?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Norbert Manthey</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rafael Pen~aloza</string-name>
          <email>rafael.penaloza@unibz.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sebastian Rudolph</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>TU Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>We propose a novel approach to axiom pinpointing based on a reduction to the SAT task of minimal unsatis able subformula enumeration, allowing highly optimized algorithms and data structures developed for SAT solving in the last two decades to be used in this problem. Exploiting the speci c properties of axiom pinpointing, we apply further optimizations resulting in considerable runtime improvements of several orders of magnitude. Our ideas are implemented in SATPin, a system capable of performing axiom pinpointing in large biomedical ontologies faster than other existing tools. While our paper focuses on a slight extension of EL, the presented approach directly generalizes to all ontology languages for which consequence-based reasoning methods are available.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Axiom pinpointing is the task of identifying the axioms in an ontology that are
responsible for a consequence to follow. This task has been successfully used to correct
modeling errors and understand unexpected consequences from very large ontologies.
For example, the 2007 version of the very large bio-medical ontology Snomed CT3
incorrectly implied that every amputation of nger was also amputation of arm; i.e.,
whenever a patient had a nger amputated, the ontology would imply that they had
also lost their arm. Using automated axiom pinpointing tools it was possible to identify
the 6 axioms (from over 300,000) that caused this error [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Eventually, this led to a
change in the modelling strategy followed by the developers of Snomed, to avoid the
error that caused this fault [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Beyond understanding and correcting consequences,
axiom pinpointing has applications in many di erent reasoning scenarios, like ontology
revision [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], context-based reasoning [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], error-tolerant reasoning [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and reasoning
with probabilities [
        <xref ref-type="bibr" rid="ref9">9, 20</xref>
        ], provenance, and trust [21], to name a few. It is thus crucial to
develop axiom pinpointing tools capable of handling huge ontologies.
      </p>
      <p>
        For EL+ ontologies [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], it is possible to reduce axiom pinpointing to the enumeration
of the minimal unsatis able subformulas (MUS) of a propositional formula [22]. Thus,
one can take advantage of the numerous developments made by the SAT community
over the last decades (e.g., clause learning, the two-watched-literal data structure) to
build an e cient system for axiom pinpointing. We build on top of previous work and
? Partially supported by DFG throughd project HO 1294/11-1 and `cfAED.'
3 http://www.ihtsdo.org/snomed-ct
identify several optimizations for enumeration problem. Indeed, the propositional
formula obtained by the reduction to SAT has a very speci c shape, which can be
exploited by specialized methods. We show that incremental SAT solving, partial
restarts, and an improved search space pruning strategy can improve the e ciency of
the axiom pinpointing.
      </p>
      <p>We implemented these optimizations in the new SATPin system. We compared
the e ciency of SATPin against other axiom-pinpointing tools over real-life
bio-medical ontologies that have been used as benchmarks before. Our experiments
show that SATPin is an e cient tool for axiom pinpointing that can be used in
practice over large inputs. Interestingly, our approach depends only on the structure of
the propositional formula, and can be directly generalized to any other ontology
language allowing consequence-based reasoning beyond EL+.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <p>
        We assume that the reader is familiar with the DL EL+, and brie y describe the
completion algorithm for deciding subsumption in this logic [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The algorithm has two
phases: normalization and completion. Normalization transforms the ontology into an
equivalent one (w.r.t. subsumption of concept names), where all GCIs are in normal
form A1 u u An v B, or 9r:A v B, or A v 9r:B. In general, the normalization
step maps every axiom into a set of axioms NF( ) obtained by the application of
some simpli cation rules. For example, the GCI A v B1 u B2 is not in normal form,
but it can be simpli ed to obtain NF(A v B1 u B2) = fA v B1; A v B2g. For
more details on the normalization procedure, we refer the reader to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. For the
purpose of this paper, it su ces to recall that the function NF maps every axiom to
a set of axioms in normal form, preserving the semantics of the original axiom
w.r.t. subsumption of concept names. The normalization of the ontology T is
NF(T ) := S 2T NF( ).
      </p>
      <p>In the completion phase, the normalized ontology NF(T ) is saturated through an
exhaustive application of completion rules of the form A ) , where A [ f g is a
nite set of axioms in normal form. This rule is applicable to the ontology T if A T ;
its application adds to T . Completion rules generate the logical closure of the
axioms in normal form w.r.t. a limited signature. For example, the EL+ completion
rule fX v Y; Y v Zg ) X v Z expresses that if the two subsumption relations
X v Y , Y v Z hold, then X is also subsumed by Z. In general, the rule A )
states that if all the axioms in A hold, must hold too. To ensure termination, the
rule A ) is only applicable if 2= T 0. R denotes the set of all completion rules. Let
c(T ) be the ontology obtained from T after normalization and completion. For every
two concept names A; B appearing in T , T j= A v B i A v B 2 c(T ); i.e., the
completion algorithm makes all the implicit subsumptions between concept names that
can be derived from T explicit.</p>
      <p>Example 1. Consider the ontology Texa := fA v 9r:A u Y; Y v B; 9r:Y v Bg: The
last two axioms are already in normal form, but the axiom A v 9r:A u Y is not. It
can be transformed to normal form by separating the conjunction on the right-hand
side of the axiom, as described before. Hence,</p>
      <p>NF(Texa) = fA v 9r:A; A v Y; Y v B; 9r:Y v Bg:
Applying the completion rules to this normalized ontology yields
c(Texa):=fA v 9r:A; A v Y; Y v B; 9r:Y v Bg[fA v 9r:Y; A v 9r:B; A v Bg:
We conclude Texa j= A v B, as this subsumption relation appears in c(Texa).
In axiom pinpointing we are interested in nding the axiomatic causes for a
subsumption to hold, in order to understand this consequence. Thus, we want all the
minimal sub-ontologies that entail this subsumption.</p>
      <p>De nition 2 (MinA). A MinA for A v B w.r.t. the ontology T is a subset
M T s.t. (i) M j= A v B and (ii) for all S M, S 6j= A v B. Axiom-pinpointing
is the task of nding all MinAs for a subsumption w.r.t. an ontology.
For instance, there are two MinAs for A v B w.r.t. the ontology Texa of Example 1:
M1 := fA v 9r:A u Y; Y v Bg; and M2 := fA v 9r:A u Y; 9r:Y v Bg.</p>
      <p>One approach for solving this problem is to create a Horn formula whose satisfying
interpretations can be mapped to sub-ontologies entailing the subsumption relation.
Before describing this reduction in detail, we recall some basic notions of SAT.
Consider a xed in nite set V of Boolean variables. A literal is a variable v (positive
literal ) or a negated variable v (negative literal ). The variable of a literal x is denoted
as var(x). The complement x of a positive (negative) literal x is the negative (resp.,
positive) literal with the same variable as x. The complement of a set of literals S is
S := fx j x 2 Sg. A clause C is a nite set of literals, representing the disjunction of
its elements. A clause containing single literal is called a unit clause. Formulas
are nite multisets of clauses, which intuitively represent the conjunction of its
elements; i.e., we consider only formulas in conjunctive normal form. For two clauses
C1; C2 with x 2 C1, and x 2 C2, the resolvent of C1 and C2 upon x is the clause
C1 x C2 := (C1 n fxg) [ (C2 n fxg).</p>
      <p>A sequence of literals M is consistent, if x 2 M implies x 2= M . For simplicity, we
view consistent sequences M as sets throughout this paper. An interpretation is a
consistent set of literals I. The reduct F jI of a formula F with respect to the
interpretation I is the multiset F jI := fC n I j C 2 F; C \ I = ;g. I satis es a
formula F if F jI = ;. F is satis able if there is an interpretation that satis es it. The
standard reasoning problem in SAT is to decide satis ability of a formula.</p>
      <p>A major operation in modern SAT solvers is unit propagation, based on the fact
that a unit clause C = fxg is only satis ed by interpretations using the literal x.
Given a formula F and a consistent sequence M of literals, unit propagation returns
the set of all literals (including M ) that must occur in an interpretation to satisfy
F jM . The interpretation J is initialized with M . If there are unit clauses in the
current reduct, J is extended with the corresponding literal. Additionally, the clause C
is stored as the reason for this extension. If no further unit clauses can be found, the
algorithm returns the nal interpretation J .</p>
      <p>
        Most modern SAT solvers follow essentially the same approach, known as
CDCL [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Unit propagation is applied as long as possible to obtain all the literals
that must appear in every model of the formula. When no units can be propagated, it
checks for a con ict : a clause falsi ed by the current interpretation. If there is no
con ict, the interpretation of a variable is guessed (search decision), and unit
propagation is applied to deduce the consequences of this guess. If a con ict is found,
some of the guesses made so far are wrong. Con ict analysis nds the choices that led
to the con ict, and a new learned clause C is added to the formula to avoid repeating
this choice. The clause C is used to undo parts of the current partial interpretation in a
way that unit propagation can be applied again, and the process continues. If a con ict
is found independent of any search decision, the formula is unsatis able. Otherwise,
if all variables of the formula can be assigned a truth value without a con ict,
the formula is satis able. Notice that the utility of SAT solving goes far beyond
propositional logic: with specialized data structures, heuristics and simpli cation
techniques, modern SAT solvers are the back-end for many industrial tasks [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>One approach for axiom pinpointing originally proposed in [22] builds a propositional
formula that simulates the process of the completion algorithm. The computation of
the MinAs for an atomic subsumption is thus reduced to an enumeration problem over
this formula. We brie y describe this translation next, but refer the reader to [22] for
all the details. For every axiom 2 T [ c(T ) we introduce a unique Boolean variable
x , which represents the axiom throughout the completion process. Then, we build
the formula FT := Fn [ Fc, where</p>
      <p>Fn := ffx ; x g j
2 T ;</p>
      <p>2 NF( )g;
Fc := ffx g [ fx j</p>
      <p>2 Ag j A )
These formulas describe all the possible causes for an axiom to appear in c(S) for
some subontology S of T . The clauses in Fn (equivalent to implications x ! x )
state that, for every axiom occurring in S, all the axioms in normal form generated
by must be in c(S). The formula thus simulates the normalization step. Similarly,
the completion rule A ) expresses that if all axioms in A are contained in c(S),
then so must be ; this logical dependency is expressed in the clauses of the formula
Fc which can equivalently be written as V 2A x ! x .
2 R; A [ f g
c(T )g:
Example 3. Consider again the ontology Texa from Example 1. For brevity, we use the
following shorthand for the axioms in Texa [ c(Texa):
1 : A v 9r:A u Y , 2 : Y v B, 3 : 9r:Y v B,
1 : A v 9r:A, 2 : A v Y , 3 : A v 9r:Y , 4 : A v 9r:B, 5 : A v B.
Then, Fn = ffx 1 ; x 1 g; fx 1 ; x 2 gg since the axioms 1 and 2 are generated by
the normalization of the axiom 1, and all other axioms in Texa are already in normal
form. The formula Fc is composed by the clauses</p>
      <p>fx 1 ; x 2 ; x 3 g; fx 3 ; x 2 ; x 4 g; fx 2 ; x 2 ; x 5 g; fx 3 ; x 3 ; x 5 g; fx 1 ; x 5 ; x 4 g:
There are two clauses in Fc that entail 5. Although the completion algorithm uses
only one of them, the translation to SAT must preserve both to know all the possible
ways in which axioms can be derived from the knowledge in Texa.
Given S T , let XS := ffx g j 2 Sg; i.e., the conjunction of all the variables
from the axioms in S. A model I of XS ^ FT satis es S[c(S) f jx 2Ig. On the
other hand, the set fx j 2 S [ c(S)g is a model of XS ^ FT . It follows that
S j= A v B i xAvB 2 I for all interpretations I satisfying XS ^ FT . This means
that, in order to nd all MinAs for A v B w.r.t. T , it su ces to compute all
minimal subsets M of XT such that M ^ FT ^ fxAvBg is unsatis able. Notice that
XT ^ FT is always satis able. On the other hand, since T j= A v B, the formula
XT ^ FT ^ fxAvBg is unsatis able. Hence, this problem is well-de ned.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Enumerating MinAs</title>
      <p>We have constructed, from an ontology T , the formula FT that encodes the
derivation steps made by the completion algorithm, and the set of choice variables
XT . By construction, XT ^ FT is satis able. Given a consequence , we want to
enumerate all the minimal subsets M XT such that M ^ FT ^ x is unsatis able
(M ^ FT ^ x ?). Our approach does not depend on the precise shape of the
formula FT , but rather on these properties. Hence, we consider an arbitrary satis able
formula F , a set X of propositional variables such that X ^ F is satis able, and a
propositional variable q such that X ^ F ^ q ?.</p>
      <p>
        Enumerating all minimal subsets M of X such that a formula is unsatis able
is closely related to nding all minimal unsatis able subformulas (MUS) of a
propositional formula [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. We consider the group-MUS problem, in which some clauses
have to be handled together. In our case, only the clauses in X can be selected
separately from the rest of the formula. A single group-MUS is a minimal subset
M X, s.t. M ^ F ^ q ? still holds. We solve the all-group-MUS problem [
        <xref ref-type="bibr" rid="ref13 ref19">13, 19</xref>
        ],
and enumerate all such minimal sets M . However, we encounter only a special case of
all-group-MUS: each group contains exactly one unit clause fx g, corresponding to an
axiom from the ontology. Thus, instead of using a general-purpose all-group-MUS
tool [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], we exploit the speci c properties of axiom pinpointing and improve the
performance of a solver.
      </p>
      <p>In essence, we make several calls to a SAT solver to nd all MinAs. For the rst
MinA, we try to prove satis ability of X ^ F ^ q choosing, at search decisions, a
variable from X to be satis ed. This process eventually leads to a con ict. Con ict
analysis yields an M X that led to the con ict (a MinA). Learning the negation of
this MinA guarantees that the con ict cannot be found again. The same approach
leads to the next MinA, repeating this process until no more con icts exist. To handle
large ontologies, many optimizations are needed.</p>
      <p>
        Incremental SAT Solving. To nd one MinA, we use incremental SAT [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] solving. The
execution of a SAT solver is initialized with a set of assumption literals, used as search
decisions before any other variables. We use the activation variables X as assumption
literals. Whenever a decision is needed during the satis ability decision procedure, the
algorithm rst activates one of the variables in X as true (adding a new axiom to the
current MinA candidate). Since M ^ F ^ q ?, an inconsistency is eventually found,
which is characterized by the implication (M ^ F ) ! q, where M X is the set of
assigned assumption literals. Hence, the solver is interrupted as soon as q is implied by
the set M (see Algorithm 1).
      </p>
      <p>
        Minimizing the Candidate Set. Algorithm 1 nds a (possibly not minimal) set M s.t.
(M ^ F ) ! q. To reduce M to a MinA, we use con ict analysis [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Based on the
reason clause C for q, produce a subset of literals of M by resolving all literals from C
away with their reason clauses (Algorithm 2). Starting with the reason clause C for q
Algorithm 1 Return a set of literals R
implies (formula F , literal q, literal sequence X, reason)
Output: ?, or set of literals R with (F ^ R) ! q
IMP1 M := ; // initialize as empty
IMP2 while X 6= ; // while literals left
IMP3 if q 2 UP(F; M; reason) // check value of q
IMP4 return analyze(q; reason) // reduce candidate
IMP5 M := M [ fxg for some x 2 X // add x 2 X to M
IMP6 X := X n fxg // remove x from X
IMP7 return ? // return the result
we resolve on all literals of the intermediate resolvents, until no literal has a reason
clause. C then contains only variables that have been assigned a truth value as search
decision; i.e., an assumption. The set R obtained from this con ict analysis is still a
MinA candidate that needs to be minimized, but typically (much) smaller than M .
For each literal r 2 R we check the implication (R ^ F n f g
r ) ! q, and remove r
from R, if the check succeeds. If no more removals are possible, the set R is returned.
Enumerating All MinAs. We iterate the MinA computation procedure to detect
the remaining MinAs, guaranteeing that answers are not repeated, as shown in
Algorithm 3. First we check whether there is at least one MinA (ENU2) and abort if
this is not the case; i.e., the consequence does not follow from the ontology. Then, we
create an object responsible for enumerating all candidate subsets M of literals. This is
the major part of the algorithm. If R represents a potential MinA, it is minimized and
added to the set of MinAs S (ENU6{ENU8). It is also added to the enumeration object,
to avoid producing it again in future iterations (ENU9). If this addition makes no
further candidate sets possible (ENU9), or if there are no other candidate sets (ENU11),
the algorithm stops. Otherwise, the next candidate set M X is tested (ENU14).
When the enumeration nishes, the resulting set S is returned (ENU15). Notice that
the call in ENU14 may not return any new potential MinA; e.g., if there is only one
MinA. To ensure completeness this check has to be performed for all candidates M .
Example 4. We want to nd all the minimal subsets M of fx 1 ; x 2 ; x 3 g s.t.
M ^ FTexa ^ x 5 ?. We activate rst the literal x 1 , which unit propagates to
Algorithm 2 Return a set of literals R that lead to the implication (F ^ R) ! q.
analyze (map reason, literal q)
Output: Set of literals R that imply q wrt to F , (F ^ R) ! q
ANA1 C := reason(var(q)) // nd clause that implied q
ANA2 while C 6= ; // while literals left
ANA3 c 2 C; C := C n fcg // select a literal c 2 C
ANA4 if reason(var(c)) 6= ? // if there is a reason for c
ANA5 C := (C [ fcg) c reason(var(c)) // resolve with this reason
ANA6 else R := R [ fcg // else add c to the result
ANA7 return R // return the result
Algorithm 3 Return all minimal sets of literals R
enumerate (formula F , set of literals X, literal q)
Output: Set S of set of literals R with R
      </p>
      <p>X and (F ^ R) ! q
ENU1 S := ;
ENU2 R := implies(F; q; X; reason) // Is there a MinA?
ENU3 if R = ? then return ; // there are no MinA
ENU4 setup enumerator(X) // setup enumeration
ENU5 while &gt; // check all candidates
ENU6 if R 6= ? // if there was a MinA
ENU7 R := minimize(F; R; q) // minimize candidate
ENU8 S := S [ fRg // add R to set of MinAs
ENU9 if enumerator.avoid(R) = ? // disallow this MinA
ENU10 break // no more MinAs
ENU11 if enumerator.hasNext() = ? // Do other MinAs exist?
ENU12 break // no more MinAs
ENU13 M := enumerator.next() // next MinA candidate
ENU14 R := implies(F; q; M; reason) // Is there a MinA?
ENU15 return S // return set of MinAs
x 1 ; x 2 , and x 3 . At this point, a new choice is needed. If we activate x 2 , then unit
propagation enforces x 5 to be true, nding a contradiction; i.e., f 1; 2g is a MinA.
We learn the clause fx 1 ; x 2 g and restart the incremental SAT solver, nding the
second MinA f 1; 3g. After disallowing this MinA, the only remaining candidate set
is fx 2 ; x 3 g, which does not entail the wanted implication.</p>
      <p>
        Candidate Enumeration. Candidate enumeration is initialized with the set of literals
X. While there are candidates left, a new M X is chosen. A nave approach is to
enumerate all subsets of X as candidates. This is unfeasible as it would need to verify
2jXj candidates, with jXj very large. We partition X into the set of relevant literals
V = lits(S) and the remaining literals T . The relevant literals refer to axioms that
belong to some MinA. At the beginning, we do not know which literals are relevant
and which not. Hence, T is initialized to be X. Whenever a MinA R is found, V and
T are updated accordingly: V := V [ R, and T := T n R; all the variables in this
MinA are now known to be relevant. The new candidates are sets of the form V 0 [ T
for V 0 V . In this way, the number of candidates is bounded by 2jlits(S)j. As jlits(S)j
is typically much smaller than jXj the search space is reduced considerably. Since the
sets T and V change during the execution of the algorithm, it is important to verify
that candidates are never tested twice. We also apply the Hitting Set Tree (HST)
approach developed originally for axiom pinpointing in expressive DLs [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. After one
MinA M has been found, we try to nd a new MinA over the set of candidate
variables X n fmg for every m 2 M . Iteratively repeating this approach yields a
search tree, where each solution is di erent from all its predecessors in this tree.
As in the relevant enumeration, the current set V 0 is extended with T to form a
candidate. Finally, we exploit an idea developed for reducing the search space in
group-MUS-enumeration [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Given a candidate M X, if (M ^ F ) 6! q, but
(X ^ F ) ! q holds, then we conclude that in any future set of literals M 0 at least one
literal m0 2 (X n M ) has to be present to result in (M 0 ^ F ) ! q: if (M ^ F ) 6! q,
then the same still holds for any subset of M . Hence, once we found a candidate M
that failed the imply check, any future candidate has to pick one of these literals in
X n M . In combination with the relevant enumeration, this set is reduced to V n V 0.
Example 5. We give an example demonstrating the bene ts of candidate enumeration.
Suppose that we want to compute the MinAs for A v B w.r.t. the ontology
fHvB, AvC, AvD, CvE, DvE, EvB, AvF , F vB, AvGg. Initially, all axioms
belong to the set T of (potentially) non-relevant axioms. The rst MinA we nd is
fA v C; C v E; E v Bg. These three axioms are marked as relevant:
HvB; AvD; DvE; AvF; F vB; AvG; AvC; CvE; EvB
| {Tz } | {Vz }
We start to look for MinAs in T [V 0 for all proper subsets V 0 V . Let V 0 =
fAvC; CvEg. We nd the MinA fAvF; F vBg, and add these axioms to V .
      </p>
      <p>HvB; AvD; DvE; AvG; AvC; CvE; EvB; AvF; F vB
| {Tz } | {Vz }
We continue searching for new MinAs in all sets T [ V 0 for all proper subsets V 0 of V
which do not contain a known MinA. We can pick, e.g., V 0 = fAvC; CvE; AvF g.
Then T [ V 0 does not imply the subsumption AvB. Any future MinA must contain
at least one of the axioms from V n V 0 = fEvB; F vBg. We continue by trying a
new subset of V , e.g. V 0 = fAvC; EvB; AvF g and obtain fAvD; DvE; EvBg
as MinA. This again requires to update T and V :</p>
      <p>HvB; AvG; AvC; CvE; EvB; AvF; F vB; AvD; DvE
| {Tz } | {Vz }
The algorithm will then tell us that no other MinAs exist. Recall that typically, the set
T will be much bigger than V and using our strategy, we will never have to go
through subsets of it.</p>
      <p>We realized the enumerator object as a SAT solver. Model enumeration is based on
CDCL: once a model I is found, I is added to the formula, and the next model is
generated. We add only the decision literals of I: all other literals of I are implied. To
avoid repeating MinAs, the literals of every MinA R are added to the formula as
clause R, too. For the enumeration, we incrementally add the variables of the last
MinA R to the solver. Previous clauses remain valid: previous candidates are not
enumerated twice and known MinAs cannot be repeated. For the HST enumeration,
we modify the decision heuristic of the solver. A stack of found solutions is kept and
the decision heuristic follows the hitting set scheme to enumerate candidates. Finally,
inverse enumeration is realized adding the necessary clause X n M to include one of
the missing literals. The algorithm spends most time on the imply check, as all literals
x 2 X have to be applied. Depending on the ontology, X can be very large: the
encoding of Snomed contains 378579 literals. As discussed, the set of relevant literals
V = lits(S) might be much smaller. In our experiments, the largest set V found for
Snomed contains 88 literals. For each imply check, 378491 (378579 88) literals could
be kept in line IMP1. Instead of initializing M = ;, we can initialize M = T to the set
of (currently) irrelevant literals. The initialization is sound, since (F ^ T ) 6! q. In the
implementation we do not undo and recreate the set M , but keep the last state and
only perform the needed updates. Theoretically, this optimization improves the
algorithm by two orders of magnitude over Snomed: 99:98 % of the work is saved in
the implies routine; i.e., maintaining relevant variables may be the di erence
between solvability and unfeasibility.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experimental Evaluation</title>
      <p>
        We implemented a new tool called SATPin, based on Minisat 2.2.4 To test our
ideas, we ran SATPin on four well-known EL+ biomedical ontologies, which have
been widely used as benchmarks for DL reasoners, specially in the context of
axiom-pinpointing: the Gene Ontology, NCI, the EL+ version of FullGalen, and the
2010 version of Snomed. All computations ran with a 5h timeout (18000s) on an Intel
Xeon CPU at 2.6GHz and a memory limit of 6.5GB. We compare the performance of
SATPin with the state-of-the-art MUS enumeration tool Marco [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], and the EL+
axiom pinpointing tool EL2MUS [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which is also based on a translation to SAT.5
      </p>
      <p>
        Each ontology was transformed into a propositional formula by el2sat all [22].
Table 1 summarizes the properties of these ontologies and their translations; the
number of axioms in the original ontology is also the number of selection variables
used by SATPin. Snomed is an order of magnitude larger than the other three test
ontologies; in fact, one of the main problems when dealing with Snomed is to handle
the memory consumption issues. For each of the three smaller ontologies, we computed
all the MinAs for 100 di erent consequences: 50 randomly chosen, and 50 selected as
those where the variable xAvB appears the most often in FT , indicating that they
have the most MinAs, as originally designed in [23], and later used also in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. For
Snomed, we selected 34 consequences that are known to be problematic for axiom
pinpointing, due to the large number and size of their MinAs [23]. We ran SATPin,
EL2MUS, and Marco on all 334 problems, where SATPin uses the combination of
all enumeration mechanisms and the relevant variable selection optimization. All
systems terminated successfully on the 300 instances corresponding to GO, NCI, and
FullGalen, but Marco ran out of memory on all Snomed instances. Thus we
consider only the rst 300 tests for comparison. The results are summarized in Table 2.
      </p>
      <p>SATPin clearly outperforms the other two tools in GO. In FullGalen, Marco
behaves much worse than the other two tools. At rst sight, it seems that SATPin
has the worst behavior in NCI by far. However, EL2MUS was faster than SATPin in
only 23 of the samples tested for this ontology, and was much slower (up to an order of
magnitude) in all others. The median and 90th percentile for SATPin on these tests
is lower than those for Marco and EL2MUS. The average performance of SATPin is
4 http://minisat.se/
5 All experimental data is available at: http://goo.gl/kJ0sE4.</p>
      <p>GO</p>
      <p>Marco
EL2MUS
SATPin
a ected by three instances that took over 3000s, which explains the huge standard
deviation. If only these three instances are removed, the average time taken by
SATPin decreases drastically to only 40.55s. A similar, although less extreme
situation was observed for FullGalen.</p>
      <p>
        In theory, SATPin is a ected by the branching factor produced by the HST
method, and the number of relevant variables used. This is con rmed in Figure 1,
which shows the proportional speedup of SATPin w.r.t. EL2MUS against the number
of MinAs in each experiment. The improvement shown by SATPin grows inversely to
the number of MinAs. The size and shade of each dot is proportional to the average
MinA size in that instance. Clearly, the relative performance of SATPin decreases as
this average increases: all instances containing large MinAs appear below the dashed
line, which signals where both SATPin and EL2MUS perform equally. Notice that
real-world ontologies are typically well-structured, and their consequences have only a
few MinAs of small size [
        <xref ref-type="bibr" rid="ref7">7, 26</xref>
        ]. Our experiments also con rm that this is the case, as
shown in Table 3.
      </p>
      <p>We did another experiment to understand the in uence of (i) the order of
the selection variables, and (ii) the variable separation optimization. We ran the
20 instances of GO where SATPin behaved the worst again with (i) the order
of the selection variables reversed, and (ii) the variable separation optimization
deactivated. In the latter, the average CPU time increased from 13 to 279s; increasing
in the worst instance from 47 to 1239s; i.e., the optimization is really e ective. The
theoretical speedup is 681: at most 30 out of 20466 axioms appear in MinAs; however,
this speedup is not reached in practice. When the order of the selection variables
was reversed, the CPU time variated to up to 2x in both directions. The relative
performance of SATPin against EL2MUS and Marco was not a ected by the
ordering.</p>
      <p>We compared the performance of SATPin and EL2MUS on the 34 very hard
instances from Snomed CT. These instances are so hard that their full set of MinAs
was previously unknown. Only 9 of these instances were solved by both tools, and
EL2MUS solved three additional ones. The solved cases had in average 16.4 MinAs
(maximum 33) with an average size of 14 axioms each. In the other cases, before
timing-out SATPin found in average 32 MinAs containing 16 axioms each. In an
extreme case, SATPin found 96 MinAs, with up to 30 axioms. EL2MUS succeeded in
this case, proving that there existed no more MinAs: SATPin computed the full
answer to this instance, but could not verify it within the time limit. Recall that
these 34 test cases were specially selected for their hardness; most Snomed CT
consequences have less than ten MinAs [26].</p>
      <p>
        Discussion. Other axiom-pinpointing systems for EL+ are Cel [25], Just [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and
EL+SAT [22]. Despite several e orts, we were unable to execute our tests on either of
the two last systems. Cel limits its execution to the computation of 10 MinAs, and at
most 1000s. Thus, we do not include them in our evaluation.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        We exploited highly optimized tools and data structures from the SAT community to
produce an e cient tool for axiom pinpointing in EL+. The core of our approach is
based on the construction of a propositional formula encoding the derivation steps of
the completion-based procedure for deciding atomic subsumption. While we focused on
the standard completion algorithm for EL+ [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the methods easily generalize to any
ontology language with consequence-based reasoning methods (e.g. [
        <xref ref-type="bibr" rid="ref12 ref17 ref4">4, 12, 17, 24</xref>
        ]). We
evaluated our approach over large bio-medical ontologies. The experiments show that
SATPin behaves better than the general group-MUS solver Marco, and the
pinpointing tool EL2MUS. However, its performance degrades as the number and size
of the MinAs found increases. As future work, we will identify the causes and develop
methods for avoiding this reduction in performance. We will extend the approach to
provide a better support for supplemental reasoning tasks in DLs and other logics.
Finally, we intend to perform a more thorough experimental evaluation including a
larger class of ontologies and subsumption relations, as well as the missing reasoners.
20. Riguzzi, F., Bellodi, E., Lamma, E., Zese, R.: Probabilistic description logics under the
distribution semantics. Semantic Web - Interoperability, Usability, Applicability (To
appear) (2015)
21. Schenk, S., Dividino, R., Staab, S.: Reasoning with provenance, trust and all that other
meta knowlege in owl. In: SWPM. CEUR, vol. 526. CEUR-WS.org (2009)
22. Sebastiani, R., Vescovi, M.: Axiom pinpointing in lightweight description logics via
horn-sat encoding and con ict analysis. In: Proc. of 22nd Int. Conf. on Automated
Deduction. LNCS, vol. 5663, pp. 84{99. Springer (2009)
23. Sebastiani, R., Vescovi, M.: Axiom pinpointing in large EL+ ontologies via sat
and smt techniques. Tech. Rep. DISI-15-010, University of Trento, Italy (2015),
http://disi.unitn.it/%7Erseba/elsat/elsat_techrep.pdf, under submission
24. Simancik, F., Kazakov, Y., Horrocks, I.: Consequence-based reasoning beyond horn
ontologies. In: Proc. IJCAI-11. pp. 1093{1098. IJCAI/AAAI (2011)
25. Suntisrivaraporn, B.: Empirical evaluation of reasoning in lightweight DLs on life science
ontologies. In: Proc. of MIWAI'08 (2008)
26. Suntisrivaraporn, B.: Polynomial time reasoning support for design and maintenance of
large-scale biomedical ontologies. Ph.D. thesis, Dresden University of Technology (2009)
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arif</surname>
            ,
            <given-names>M.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Menc</surname>
            <given-names>a</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Marques-Silva</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.:</surname>
          </string-name>
          <article-title>E cient MUS enumeration of Horn formulae with applications to axiom pinpointing</article-title>
          .
          <source>CoRR abs/1505</source>
          .04365 (
          <year>2015</year>
          ), http://arxiv.org/abs/1505.04365
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brandt</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Pushing the EL envelope</article-title>
          .
          <source>In: Proc. IJCAI-05</source>
          . Morgan-Kaufmann (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knechtel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Pen~aloza, R.:
          <article-title>Context-dependent views to axioms and consequences of semantic web ontologies</article-title>
          .
          <source>J. of Web Semantics</source>
          <volume>12</volume>
          {
          <fpage>13</fpage>
          ,
          <issue>22</issue>
          {
          <fpage>40</fpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          , Pen~aloza, R.:
          <article-title>Axiom pinpointing in general tableaux</article-title>
          .
          <source>Journal of Logic and Computation</source>
          <volume>20</volume>
          (
          <issue>1</issue>
          ),
          <volume>5</volume>
          {
          <fpage>34</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schulz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spackmann</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suntisrivaraporn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>How should parthood relations be expressed in SNOMED CT?</article-title>
          <source>In: Proc. of OBML</source>
          <year>2009</year>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suntisrivaraporn</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Debugging SNOMED CT using axiom pinpointing in the description logic EL+</article-title>
          .
          <source>In: Proc. of KR-MED'08. CEUR-WS</source>
          , vol.
          <volume>410</volume>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Bail</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>The justi catory structure of the NCBO bioportal ontologies</article-title>
          .
          <source>In: Proc. of ISWC</source>
          <year>2011</year>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>I. LNCS</given-names>
          </string-name>
          , vol.
          <volume>7031</volume>
          , pp.
          <volume>67</volume>
          {
          <fpage>82</fpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Biere</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heule</surname>
            ,
            <given-names>M.J.H.</given-names>
          </string-name>
          , van Maaren,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Walsh</surname>
          </string-name>
          , T. (eds.):
          <article-title>Handbook of Satis ability</article-title>
          ,
          <source>Frontiers in Arti cial Intelligence and Applications</source>
          , vol.
          <volume>185</volume>
          . IOS Press (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Ceylan</surname>
            ,
            <given-names>I_.I_.</given-names>
          </string-name>
          , Pen~aloza, R.:
          <article-title>The Bayesian Description Logic BEL</article-title>
          . In: Demri,
          <string-name>
            <given-names>S.</given-names>
            ,
            <surname>Kapur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Weidenbach</surname>
          </string-name>
          , C. (eds.)
          <source>Proceedings of the 7th International Joint Conference on Automated Reasoning (IJCAR'14)</source>
          . LNCS, vol.
          <volume>8562</volume>
          , pp.
          <volume>480</volume>
          {
          <fpage>494</fpage>
          . Springer International Publishing (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Een</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , Sorensson, N.:
          <article-title>Temporal induction by incremental SAT solving</article-title>
          .
          <source>Electronic Notes in Theoretical Computer Science</source>
          <volume>89</volume>
          (
          <issue>4</issue>
          ),
          <volume>543</volume>
          {
          <fpage>560</fpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kalyanpur</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parsia</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horridge</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sirin</surname>
          </string-name>
          , E.:
          <article-title>Finding all justi cations of OWL DL entailments</article-title>
          .
          <source>In: Proc. of 6th Int. Semantic Web Conf. LNCS</source>
          , vol.
          <volume>4825</volume>
          , pp.
          <volume>267</volume>
          {
          <fpage>280</fpage>
          . Springer (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Simancik</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>The incredible ELK - from polynomial procedures to e cient reasoning with ontologies</article-title>
          .
          <source>J. Autom. Reas</source>
          .
          <volume>53</volume>
          (
          <issue>1</issue>
          ),
          <volume>1</volume>
          {
          <fpage>61</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Li</surname>
            <given-names>ton</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>M.H.</given-names>
            ,
            <surname>Malik</surname>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Enumerating infeasibility: Finding multiple MUSes quickly</article-title>
          . In: Gomes,
          <string-name>
            <given-names>C.P.</given-names>
            ,
            <surname>Sellmann</surname>
          </string-name>
          , M. (eds.)
          <article-title>Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems</article-title>
          . LNCS, vol.
          <volume>7874</volume>
          , pp.
          <volume>160</volume>
          {
          <fpage>175</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Just: a tool for computing justi cations w</article-title>
          .r.t. el ontologies.
          <source>In: Proc. of ORE 2014</source>
          . vol.
          <volume>1207</volume>
          , pp.
          <volume>1</volume>
          {
          <issue>7</issue>
          .
          <string-name>
            <given-names>CEUR</given-names>
            <surname>Workshop Proceedings</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ludwig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , Pen~aloza, R.:
          <article-title>Error-tolerant reasoning in the description logic el</article-title>
          . In: Ferme,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Leite</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Proc. of the 14th European Conf. on Logics in Arti cial Intelligence (JELIA'14)</source>
          . LNAI, vol.
          <volume>8761</volume>
          , pp.
          <volume>107</volume>
          {
          <fpage>121</fpage>
          . Springer-Verlag (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Marques-Silva</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sakallah</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          :
          <article-title>GRASP { a new search algorithm for satis ability</article-title>
          .
          <source>In: Proceedings of the 1996 IEEE/ACM international conference on computer-aided design</source>
          . pp.
          <volume>220</volume>
          {
          <fpage>227</fpage>
          . ICCAD '96, IEEE Computer Society (
          <year>1996</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Ortiz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rudolph</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simkus</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Worst-case optimal reasoning for the horn-dl fragments of OWL 1 and 2</article-title>
          .
          <source>In: Proc. of KR</source>
          <year>2010</year>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. Pen~aloza, R.,
          <string-name>
            <surname>Thuluva</surname>
            ,
            <given-names>A.S.:</given-names>
          </string-name>
          <article-title>Iterative ontology update using context labels</article-title>
          .
          <source>In: Proc. of OntoChange'15</source>
          (
          <year>2015</year>
          ), to appear
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Previti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marques-Silva</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Partial MUS enumeration</article-title>
          . In: des Jardins,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Littman</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.L</surname>
          </string-name>
          . (eds.)
          <source>Proc. of the 27th AAAI Conference on Arti cial Intelligence</source>
          . AAAI Press (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>