=Paper= {{Paper |id=None |storemode=property |title=Balancing Brave and Cautious Query Strategies in Ontology Debugging |pdfUrl=https://ceur-ws.org/Vol-784/evodyn5.pdf |volume=Vol-784 }} ==Balancing Brave and Cautious Query Strategies in Ontology Debugging== https://ceur-ws.org/Vol-784/evodyn5.pdf
      Balancing Brave and Cautious Query Strategies in
                    Ontology Debugging

    Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich ?

                              Alpen-Adria-Universität Klagenfurt
                                  Universitätsstrasse 65-67
                                  9020 Klagenfurt, Austria
                             firstname.lastname@aau.at



        Abstract. Sequential ontology debugging is aimed at the efficient discrimination
        between diagnoses, i.e. sets of axioms which must be altered or deleted from the
        ontology to restore consistency. By querying additional information the number
        of possible diagnoses can be gradually reduced. The selection of the best queries
        is crucial for minimizing diagnosis costs. If prior fault probabilities (FPs) are
        available, the best results are achieved by entropy based query selection. Given
        that FPs are only weakly justified, however, this strategy bravely suggests sub-
        optimal queries although more cautious strategies should be followed. In such a
        case, it is more efficient to follow a no-risk strategy which prefers queries that
        eliminate 50% of diagnoses independently of any FPs. However, choosing the
        appropriate strategy in advance is impossible because the quality of given pri-
        ors cannot be assessed before additional information is queried. We propose a
        method which combines advantages of both approaches. On the one hand, the
        method takes into account available meta information in terms of FPs and the
        user’s confidence in these. On the other hand, the method can cope with weakly
        justified FPs by limiting the risk of suboptimal query selections based on the
        user’s confidence in the FPs. The readiness to take risk is adapted depending on
        the outcome of previous queries. Our comprehensive evaluation shows that the
        proposed debugging method significantly reduces the number of queries com-
        pared to both the entropy based and the no-risk strategy for any choice of FPs.


1     Introduction
Support of ontology development and maintenance is an important requirement for the
extensive use of Semantic Web technologies. However, the correct formulation of log-
ical descriptions is an error-prone task even for experienced knowledge engineers. On-
tology debuggers [8, 3, 1] assist ontology development by identifying sets of axioms
(called diagnoses) that have to be modified s.t. inconsistencies or unwanted entailments
are avoided. In many cases the main problem of debugging is the big number of alter-
native diagnoses.
    To solve the problem a set of heuristics to rank diagnoses was proposed by [4].
However, this solution is not satisfactory as it cannot be guaranteed that the top-ranked
?
    The research project is funded by grants of the Austrian Science Fund (Project V-Know, con-
    tract 19996)
2           Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich

diagnosis is the target diagnosis and no further assistance for diagnoses discrimination is
provided. Therefore, a debugging method based on active learning was proposed by [9].
The approach exploits the fact that an ontology without the axioms of one diagnosis may
have different entailments than the ontology without the axioms of another diagnosis.
Using these entailments the algorithm queries the user or another oracle whether a set
of axioms is entailed by the target ontology or not. The answers are used to eliminate
disagreeing diagnoses. The algorithm continues to query the oracle until a diagnosis
(the target diagnosis) is significantly more probable than any other. To minimize the
number of queries the algorithm uses some meta information, namely the fact that some
errors are more common than others [6], i.e. that different language constructs have
different fault probabilities. These probabilities can either be extracted from the logs
of previous sessions or specified by the user expressing own beliefs. This advantage
might be lost if unsuitable probabilities are provided where the target diagnosis is rather
improbable. In this case the entropy-based query selection as well as active learning
methods might require significantly more queries than strategies like split-in-half which
behave independently of any meta information.
     We present a hybrid method that outperforms both types of strategies by combining
their benefits while overcoming their drawbacks. On the one hand, our method takes ad-
vantage of the given meta information as long as good performance is achieved. On the
other hand, it gradually gets more independent of meta information if suboptimal be-
havior is measured. Moreover, our strategy takes into account a user’s subjective quality
estimation of the meta information. In this way a user may decide to take influence on
the algorithm’s behavior or let the algorithm act freely and find the best strategy on
its own. To find the best strategy, our method constantly improves the quality of meta
information as well as it learns to act profitably in all kinds of situations by exploiting
its adaptiveness. In our comprehensive evaluation we demonstrate for various types and
quality of meta information that our approach is very robust and that query costs in
comparison to existing solutions are substantially minimized in all situations.
     The motivation of our work as well as basic concepts are provided in Section 2. Sec-
tion 3 gives the details of the suggested approach. Implementation details are discussed
in Section 4 and evaluation results are described in Section 5.


2      Basic Concepts and Motivation

Ontology Debugging deals with the following problem:

Problem Definition 1 (Ontology Debugging) Given a diagnosis problem instance
hO, B, TC + , TC − i where O = T ∪ A denotes an ontology, i.e. a set of terminological
axioms T and a set of assertional axioms A, B denotes a set of background knowledge
axioms (which are assumed to be true), TC + is a set of positive test cases, and TC −
is a set of negative test cases, such that

    – O     is inconsistent/incoherent1           and/or
 1
     Note that throughout the rest of this paper we consider debugging of inconsistent ontologies.
     However, the same approach can also be used to restore coherency.
              Balancing Brave and Cautious Query Strategies in Ontology Debugging       3

 – ∃ t + ∈ TC + : O ∪ B 6|= t +            and/or
 – ∃ t − ∈ TC − : O ∪ B |= t − .

The reasonable assumption is made that theVbackground theory is consistent with the
positive and negative test cases, i.e. B ∪ ( t + ∈TC + t + ) ∪ ¬t − is consistent for all
t − ∈ TC − . The task is then to find a diagnosis D ⊆ O s.t. there is a set of axioms EX
s.t. 1 and 2 and 3 hold for O∗ := (O \ D) ∪ EX:

 1. O∗ ∪ B is consistent/coherent
 2. O∗ ∪ B |= t +    ∀t + ∈ TC +
     ∗          −
 3. O ∪ B 6|= t      ∀t − ∈ TC −

O∗ denotes the target ontology and EX is called extension. A diagnosis D is minimal
iff there is no D0 ⊂ D s.t. D0 is a diagnosis. A diagnosis D gives complete information
about the correctness of each axiom axk ∈ O, i.e. all axi ∈ D are assumed to be faulty
and all axj ∈ O \ D are assumed to be correct.

    In ontology debugging two types of operations can be distinguished: diagnosis and
repair. At first, one needs to find a set of axioms, i.e. a diagnosis D, which must be
deleted from the ontology O in order to restore consistency (requirements 1 and 3).
This is the task of ontology diagnosis. Then it might be necessary to add appropriate
axioms EX to O \ D in order to give the ontology the intended semantics/entailments
(requirement 2) because by deleting axioms from O some intended entailments might
be eliminated as well. This is the topic of ontology repair. In this work weVfocus on on-
tology diagnosis. For EX we use the following approximation: EX ≈ t + ∈TC + t + .
Note that EX must at least contain all axioms in TC + .
    Ontology diagnosis can be exemplified by the following simple OWL DL ontology
Oex = T ∪ A with terminology T :
          ax 1 : A v ∀s.B        ax 2 : B v ¬∃v.¬C       ax 3 : C v D u ¬D1
          ax 4 : D v E u E1      ax 5 : E v F            ax 6 : F v R

and the assertions A : {A(w), s(w, w), v(w, w), ¬R(w)}. Let us assume that all asser-
tions are added to the background theory, i.e. B = A, and both sets TC + and TC − are
empty. Then the ontology Oex = T is inconsistent and the set of minimal diagnoses
for this diagnosis problem instance hT , A, ∅, ∅i is D = {D1 : [ax 1 ], D2 : [ax 2 ], D3 :
[ax 3 ], D4 : [ax 4 ], D5 : [ax 5 ], D6 : [ax 6 ]}.
    An algorithm for generating all possible diagnoses for a given diagnosis problem
instance is presented in [1]. However, one major issue in ontology diagnosis is that there
may be a huge number n of possible diagnoses D = {D1 , . . . , Dn }, depending on the
size and the properties of the ontology O. This problem is also manifested in the above
example where six possible diagnoses are found for an ontology consisting of merely
six axioms. To tackle this problem, the authors in [9] exploit the fact that ontologies
O \ Di and O \ Dj have different entailments for Di , Dj ∈ D (Di 6= Dj ) in order to
discriminate between potential diagnoses D. They discuss methods which gather new
information by posing queries about intended entailments of the target ontology to an
oracle and utilize this information to reduce the set of potential diagnoses as quickly as
possible. So, the following subproblem of ontology diagnosis is addressed:
4             Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich


    Algorithm 1: Query Generation
        Input: diagnosis problem instance hO, B, TC + , TC − i, set of diagnoses D, a reasoner RE instantiated to
               provide all entailments of types E ⊆ ET
        Output: a set of queries X
    1   foreach DX ⊂ D do
             X ← getEntailments(RE , B ∪ TC + ∪ D∈DX O \ D);
                                                          T
    2

    3        if X 6= ∅ then
    4             foreach Dr ∈ D \ DX do
    5                  if O \ Dr ∪ B ∪ TC + |= X then DX ← DX ∪ {Dr };
    6                  else if O \ Dr ∪ B ∪ TC + |= ¬X then D¬X ← D¬X ∪ {Dr };
    7                  else D∅ ← D∅ ∪ {Dr };
                               n                    o
    8             X ← X ∪ (X, hDX , D¬X , D∅ i)

    9   return X;




Problem Definition 2 (Diagnosis Discrimination) Given the set of possible diagnoses
D = {D1 , . . . , Dn } w.r.t. hO, B, TC + , TC − i, find a sequence (X1 , . . . , Xq ) with min-
imal q where Xi ∈ X for i = 1, . . . , q is a query to an oracle and X is the set of all
possible queries , such that the set of possible diagnoses can be reduced to D = {D∗ }
where D∗ ∈ {D1 , . . . , Dn } is called the target diagnosis. The order of queries is crucial
and affects q.
The function of the mentioned oracle to answer queries is usually realized by the user
who created the ontology or by an expert in the domain modeled by the ontology. A
set of axioms Xj is called a query if it is a subset of common entailments of a set of
ontologies (O \ Di ) where Di ∈ DX    j ⊂ D. In description logics there is a number of
different entailment types ET . However, in general not all of these types are used to
construct queries as this could produce high numbers of entailments and would make
the process of query answering very time-consuming and inefficient. In [9], for exam-
ple, queries are generated by considering only entailments obtained by classification
and realization. A query Xj means asking the oracle (O∗ |= Xj ?) and is answered
either by true, i.e. O∗ |= Xj , or by false, i.e. O∗ 6|= Xj . A query Xj partitions the set
                              ¬X     ∅
of diagnoses D into hDX  j , Dj , Dj i such that:

    – ∀ Di ∈ DX       (O \ Di ) ∪ B ∪ ( t + ∈TC + t + ) |= Xj ,
                                       V
              j :
    – ∀ Di ∈ D¬X      (O \ Di ) ∪ B ∪ ( t + ∈TC + t + ) |= ¬Xj , and
                                       V
              j   :
    – D∅j = D \ (DX     ¬X
                   j ∪ Dj ).

                                                                  ¬X     ∅
A query Xj is stored together with its partition as (Xj , hDX
                                                            j , Dj , Dj i). Algorithm 1
shows a procedure for computing all queries X for given D. The function GET E NTAIL -
MENTS (RE , axioms) calls the reasoner RE to get all entailments X of type E ∈ ET of
axioms and returns ∅ if axioms is inconsistent. Applied to our example ontology Oex ,
this algorithm returns the set of all possible queries shown in Table 1.
    If a query Xj is answered positively, then Xj is added to the positive test cases, i.e.
TC + ∪{Xj }, and a diagnosis Di ∈ D is a diagnosis for hO, B, TC + ∪{Xj }, TC − i iff
(O\Di )∪B∪TC + ∪Xj 6|= t − for all t − ∈ TC − . If a query Xj is answered negatively,
then Xj is added to the negative test cases, i.e. TC − ∪ {Xj }, and a diagnosis Di ∈ D
                  Balancing Brave and Cautious Query Strategies in Ontology Debugging                    5

                         Query     DXi                   D¬X
                                                           i                   D∅i
                      X1 : {B(w)} D2 , D3 , D4 , D5 , D6 D1                     ∅
                      X2 : {C(w)} D3 , D4 , D5 , D6      D1 , D2                ∅
                      X3 : {D(w)} D4 , D5 , D6           D1 , D2 , D3           ∅
                      X4 : {E(w)} D5 , D6                D1 , D2 , D3 , D4      ∅
                      X5 : {F (w)} D6                    D1 , D2 , D3 , D4 , D5 ∅
                           Table 1. Possible queries for diagnoses Di ∈ D


 Algorithm 2: Generic Diagnosis Discrimination
     Input: diagnosis problem instance hO, B, TC + , TC − i, set of diagnoses D, meta information Info
     Output: target diagnosis D ∗
 1   repeat
 2         X ← getBestQuery (D, Info) ;
 3         if getQueryAnswer (X) == true then D ← D \ D¬X ;
 4         else D ← D \ DX ;
 5   until |D| == 1;
 6   return D;




is a diagnosis for hO, B, TC + , TC − ∪ {Xj }i iff (O \ Di ) ∪ B ∪ TC + 6|= Xj . So,
given that Xj = true, the set of eliminated diagnoses is Dj¬X and the set of remaining
                       ∅
diagnoses is DX                                                        X
                j ∪ Dj . Otherwise, the set of rejected diagnoses is Dj and the set of
                           ¬X       ∅
remaining diagnoses is Dj ∪ Dj .
    A generic diagnoses discrimination algorithm is described in algorithm 2. The
essential part of this algorithm is the GET B EST Q UERY function whose implementa-
tion determines its performance decisively. In [9], two different implementations of
GET B EST Q UERY are studied, namely split-in-half and entropy-based query selection.
    The entropy-based approach uses meta information about probabilities pt that the
user makes a fault when using a syntactical construct of type t ∈ CT where CT is the
set of construct types available in the used ontology expression language. For example,
∀, ∃, v, ¬, t, u are some OWL DL construct types. These fault probabilities pt are
assumed to be independent and used to calculate fault probabilities of axioms as
                                             Y
                             p(ax k ) = 1 −      (1 − pt )n(t)                      (1)
                                                        t∈CT

where n(t) is the number of occurrences of construct type t in axk . These probabilities
are in turn used to determine fault probabilities of diagnoses Di ∈ D as
                                Y                Y
                     p(Di ) =         p(ax r )          (1 − p(ax s )).              (2)
                                      ax r ∈Di             ax s ∈O\Di

The strategy is then to select the query which minimizes the expected entropy in the
diagnosis probabilities after the query. This means that uncertainty should be mini-
mized and information gain should be maximized. According to [5] this is equivalent
to choosing the query Xj which minimizes the following scoring function:
                           X
        scent (Xj ) =               p(Xj = aj ) log2 p(Xj = aj ) + p(D∅j ) + 1   (3)
                              aj ∈{true,false}
6        Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich

                                                             ¬X
This function is minimized by queries Xj with p(DXj ) = p(Dj ) = 0.5. So, entropy-
based query selection favors queries whose outcome is most uncertain. After each query
Xj , the new information obtained by the feedback of the oracle is incorporated in the
meta information by updating the diagnosis probabilities according to the Bayesian
formula:
                                          p(Xj = aj |Di )p(Di )
                       p(Di |Xj = aj ) =                                           (4)
                                              p(Xj = aj )
where aj ∈ {true, false} and
                                           X                 1 X
                       p(Xj = true) =             p(Dr ) +
                                                             2
                                         Dr ∈DX
                                              j
                                                                 ∅
                                                              Dk ∈Dj


and p(Xj = aj |Dk ) := 1/2 for Dk ∈ D∅j and in {0, 1} for Dr ∈ DX
                                                                j ∪Dj
                                                                      ¬X
                                                                          according
to the explanations on page 4.
    The split-in-half approach, on the other hand, acts completely independently of
any meta information and selects the query Xj which minimizes the following scoring
function:
                                                  ¬X        ∅
                        scsplit (Xj ) = |DX
                                          j | − |Dj | + |Dj |

So, the split-in-half strategy prefers queries which eliminate half of the diagnoses, in-
dependently of the query outcome.
    The result of the evaluation in [9] is that entropy-based query selection reveals bet-
ter performance than split-in-half if the given meta information is suitable. However,
the question is how to choose the fault probabilities profitably in that they favor the
unknown target diagnosis. By setting probabilities equally for each user, e.g. according
to the results of the study conducted by [6], it cannot be taken for granted that this meta
information will be correct for all users. On the contrary, since every user is prone to
individual and thus generally different fault patterns, we would rather need empirical
data for each individual user in order to set probabilities appropriately. But unfortu-
nately there is neither a log file for each user which could be used as a documentation
of common individual faults nor any extensive user-study which would identify user
profiles and allow us to set probabilities more individually. Hence, an objective ap-
proach to assigning fault probabilities for arbitrary users will generally not bear fruit.
As a consequence, one may often need to rely on a vague subjective estimation of the
fault probabilities by the user which might not always conform to the de facto fault
probabilities. In fact, there are situations where the split-in-half approach outperforms
the entropy-based strategy, namely when the fault probabilities disfavor the target diag-
nosis D∗ , i.e. assign low probability to D∗ . In this case, entropy-based query selection
can be mislead by this poor meta information and might prefer queries that do not favor
the target diagnosis.
    To illustrate this, let us assume that the user who wants to debug our example ontol-
ogy Oex specifies the following fault probabilities: p∀ = 0.48, p∃ = 0.11, p¬ = 0.06,
pu = 0.03, pv = 0.03. Application of formulas (1) and (2) leads to diagnoses fault
probabilities p(D1 ) = 0.634, p(D2 ) = 0.2, p(D3 ) = 0.084, p(D4 ) = 0.04, p(D5 ) =
                                                                     ∗
0.02, p(D6 ) = 0.02. Using entropy-based selection, X1 , i.e. (Oex      |= {B(w)}?), will
be identified as the optimal query since the difference between probabilities of positive
              Balancing Brave and Cautious Query Strategies in Ontology Debugging           7

(p(X1 = true) = 0.365) and negative (p(X1 = false) = 0.634) outcome is less than
for all other queries {X2 , X3 , X4 , X5 } (see Table 1). However, note that this query
could eliminate only a single diagnosis, i.e. D1 , if the unfavorable answer is given.
Hence, we call X1 a high-risk query. If, e.g. D5 is assumed to be the target diagnosis,
i.e. D∗ = D5 , then X1 will be answered positively and all but one diagnoses are still
possible. The elimination rate of this query is therefore 1/6. The probability update
given by formula (4) then yields p(D2 ) = 0.316, p(D3 ) = 0.133, p(D4 ) = 0.064,
p(D5 ) = 0.031, p(D6 ) = 0.031. As a next query, X2 will be preferred, but will be
answered unfavorably as well resulting again in the elimination of only one single di-
agnosis. This procedure can be further executed, i.e. by querying X3 and X4 , until the
target diagnosis D5 will be identified by getting the answer false to query X5 . So, by ap-
plying scent all five queries are required in order to find the target diagnosis. If queries
are selected by scsplit , on the contrary, three queries will suffice. At first, query X3 will
be chosen because it eliminates half of all diagnoses in any case. We call such a query
a no-risk query. The answer will be true and querying X5 after X4 = true will finally
reveal the target diagnosis D∗ = D5 .
     This example demonstrates that the no-risk strategy scsplit (three queries) is more
suitable than scent (five queries) for fault probabilities which disfavor the target diag-
nosis. Let us suppose, on the other hand, that probabilities are assigned reasonably in
our example, e.g. that D∗ = D2 . Then it will take the entropy-based strategy only
two queries (X1 , X2 ) to find D∗ while split-in-half will still require three queries
(X3 , X2 , X1 ). The complexity of scent in terms of required queries varies between
O(1) in the best and O(|D|) in the worst case depending on the appropriateness of the
fault probabilities. On the other hand, scsplit always requires O(log2 |D|) queries.
     We learn from this example that the best choice of discrimination strategy depends
on the quality of the meta information. Unfortunately, however, it is impossible to assess
the quality of fault probabilities before the debugging session starts since this would re-
quire us to know the target diagnosis in advance. Rather, we can exploit the additional
information gathered by querying the oracle in order to estimate the quality of proba-
bilities. Let us consider e.g. the first query X1 selected by scent in our example. If this
query is answered unfavorably, i.e. by true, then many more diagnoses remain than are
eliminated. Since the observed outcome was less likely, the partition DX         1 with lower
average diagnosis probability survives. This could be a hint that the probabilities are
only weakly justified. The new strategy we present in this work incorporates exactly
this information, i.e. the elimination rate achieved by a query, when choosing the suc-
cessive query by adapting the maximum allowed ‘query-risk’. Our method combines
the advantages of both the entropy-based approach and the split-in-half approach. On
the one hand, it exploits the given meta information Info if Info is of high quality, but,
on the other hand, it quickly loses trust in Info and becomes more cautious if some
indication is given that Info is of poor quality.


3   Risk Optimization Strategy for Query Selection

Our risk optimization algorithm is a dynamic learning procedure which on the one hand
learns by reinforcement how to select optimal queries and on the other hand continu-
8        Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich

ally improves the a-priori given meta information based on new knowledge obtained
through queries to an oracle. The behavior of our algorithm can be co-determined by
the user. Therefore, the algorithm takes into account the user’s doubt about the meta
information, i.e. the fault probabilities pt for t ∈ CT , by allowing them to define the
initial cautiousness c of the algorithm as well as a cautiousness interval [c, c] where
c, c, c ∈ [0, b|D|/2c /|D|] and c ≤ c ≤ c. The interval [c, c] constitutes the set of all
admissible cautiousness values the algorithm may take during the debugging session.
High trust in the meta information is reflected by specifying a low minimum required
cautiousness c. If the user is unsure about the rationality of the fault probabilities this
can be expressed by setting c to a higher value. The value assigned to c indicates the
maximum admissible cautiousness of the algorithm and can be interpreted as the mini-
mum chance a user wants to preserve of performing better than a no-risk strategy.
     This additional cautiousness information guides the behavior of the algorithm when
selecting queries. The relationship between cautiousness c and queries is formalized by
the following definitions:
Definition 1 (Cautiousness of a Query). We define the cautiousness caut(Xi ) of a
query Xi as follows:
                                                     j |D| k 
                                            ¬X
                                   X
                               min |Di |, |Di |           2
                  caut(Xi ) :=                   ∈ 0,        
                                     |D|                 |D|

A query Xi is called braver than query Xj iff caut(Xi ) < caut(Xj ). Otherwise Xi
is called more cautious than Xj . A query with highest possible cautiousness is called
no-risk query.
Definition 2 (Elimination Rate). Given a query Xi and the corresponding answer
ai ∈ {true, false}, the elimination rate e(Xi , ai ) is defined as follows:
                                          |D¬X |
                                          |D|i
                                                    if ai = true
                           e(Xi , ai ) =      X
                                          |Di |
                                            |D|     if ai = false

The answer ai to a query Xi is called favorable iff it maximizes the elimination rate
e(Xi , ai ). Otherwise ai is called unfavorable.
So, the cautiousness caut(Xi ) of a query Xi is exactly the minimal elimination rate,
i.e. caut(Xi ) = e(Xi , ai ) given that ai is the unfavorable query result. Intuitively, the
user-defined cautiousness c is the minimum percentage of diagnoses in D which should
be eliminated by the successive query. For brave queries the interval between minimum
and maximum elimination rate is larger than for cautious queries. For no-risk queries it
is minimal.
Definition 3 (High-Risk Query). Given a query Xi and cautiousness c, then Xi is
called a high-risk query iff caut(Xi ) < c, i.e. the cautiousness of the query is lower
than the cautiousness required by the user. Otherwise Xi is called non-high-risk query.
By HR(X) ⊆ X we denote the set of all high-risk queries. The set of all queries X can
be partitioned in high-risk queries and non-high-risk queries.
             Balancing Brave and Cautious Query Strategies in Ontology Debugging         9

Example 1. Reconsider the example ontology Oex of Section 2 and let c = 0.25. Then
query X2 has cautiousness caut(X2 ) = 2/6 = 0.33 ≥ 0.25 = c and is thus a non-high-
risk query. Query X1 , on the contrary, is a high-risk query because caut(X1 ) = 1/6 =
0.17 < 0.25 = c. However, if X1 is not asked as first but for example as third query
after D3 , D4 , D5 , D6 have been eliminated by X3 and X2 (compare with the behavior
of split-in-half for Oex in Section 2), the cautiousness caut(X1 ) = 0.5 which means
that X1 is a non-high-risk query in this case.

Algorithm 3 describes our Risk Optimization Algorithm. Its core is the implementation
of the GET B EST Q UERY function in Algorithm 2 which is explained in the following
(with the denotation of the respective method in brackets). Further details concerning
Algorithm 3 are provided in Section 4.
 1. (GET M IN S CORE Q UERY) Determine the best query Xi ∈ X according to scent .
    That is:
                              Xi = arg min(scent (Xk )).
                                        Xk ∈X

 2. (GET Q UERY C AUTIOUSNESS) If Xi is a non-high-risk query, i.e. caut(Xi ) ≥ c,
    select Xi . In this case Xi is the query with maximum information gain among all
    queries X and additionally guarantees the required elimination rate specified by c.
 3. (GETA LTERNATIVE Q UERY) Otherwise, select the query Xj ∈ X (Xj 6= Xi )
    which has minimal score scent among all least cautious non-high-risk queries L.
    That means:
                                 Xj = arg min(scent (Xk )).
                                        Xk ∈L

    where L = {Xr ∈ X \ HR(X) | ∀Xt ∈ X \ HR(X) : caut(Xr ) ≤ caut(Xt )}.
    If there is no such query Xj ∈ X, then select Xi from step 1.
 4. Given the query outcome Xs = as , update the meta information as per steps 4a
    and 4b. If step 2 was successful, then Xs = Xi , otherwise Xs = Xj .
    (a) (UPDATE C AUTIOUSNESS) Update the cautiousness c depending on the elimi-
         nation rate e(Xs , as ) as follows:

                                         c ← c + cadj                                  (5)

        where cadj denotes the cautiousness adjustment factor which is defined as fol-
        lows:
                                                        j         k
                                                          |D|
                                                           2  − 
              cadj := adj 2 (c − c) where adj :=                    − e(Xs , as ) (6)
                                                            |D|

        and  ∈ (0, 21 ) an arbitrary real number. The factor 2 (c − c) in formula (6) sim-
        ply regulates the extent of the cautiousness adjustment depending on the inter-
        val length c − c. The more crucial factor in the formula is adj which indicates
        whether cadj is positive or negative, i.e. whether the cautiousness is increased
        or decreased. Note that adj < 0 holds for any favorable query result, indepen-
        dently of the cautiousness caut(Xs ) of the asked query Xs . This has the fol-
        lowing reasons: For even |D|, the elimination rate resulting from the favorable
10       Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich

         query outcome is ≥ b |D| 2 c/|D|, which is the maximal value the minimal elim-
         ination rate of a query can take (see Definition 1). But, also b |D|
                                                                            2 − c/|D| <
         b |D|
            2  c/|D| holds  since |D| is even. Given that |D|  is odd, the elimination  rate
                                               |D|            |D|
         for any favorable query result is > b 2 c/|D| = b 2 − c/|D|. If the value c
         computed in (5) is outside the user-defined cautiousness interval [c, c], it is set
         to c if c < c and to c if c > c. Positive cadj is a penalty telling the algorithm
         to get more cautious, whereas negative cadj is a bonus resulting in a braver
         behavior of the algorithm.
     (b) (GET P ROBABILITIES) Update the diagnosis probabilities p(Dr ) for Dr ∈ D
         according to formula (4) on page 6. Section 4 gives more details on this method.
     To illustrate the behavior of our algorithm, we revisit the example ontology Oex
with six diagnoses discussed in Section 2. Again, we first assume that D∗ = D2 . Fur-
ther on, let the user be quite unsure whether to trust or not trust in the fault probabil-
ities (see page 6) and thus define cautiousness c := 0.3 and the cautiousness interval
[c, c] := [0, b 62 c/6] = [0, 0.5]. Then, our algorithm will first test if the query X1 which
has minimal score scent is admissible w.r.t. c. Therefore, caut(X1 ) = 1/6 = 0.17
is calculated and compared with c = 0.3. Since caut(X1 ) < c, X1 is denied. As a
consequence, the algorithm seeks the query with minimal scent among all least cau-
tious non-high-risk queries, i.e. those which guarantee elimination of a minimal num-
ber ≥ d0.3 ∗ 6e = 2 diagnoses. The only least cautious non-high-risk query is X2 ,
so it is chosen as first query and answered by false. Therefore, the set of possible di-
agnoses is reduced to D = {D1 , D2 }. Then, the only query allowing to discriminate
between D1 and D2 , namely X1 , is selected and D∗ = D2 is identified. For D∗ = D5 ,
the algorithm acts as follows: The first query is again X2 which is answered by true
in this case. Hence, two diagnoses D1 and D2 are dismissed yielding an elimination
rate e(X2 , true) = 2/6. Applying formula (6) then yields cadj = 0, i.e. c remains un-
changed. Thus the next query should eliminate at least d0.3 ∗ (6 − 2)e = 2 diagnoses.
The only candidate X4 (=true) is selected, leaving open only one possible last query
X5 . So, in both situations D∗ = D5 (three queries) and D∗ = D2 (two queries) our
method performed as well as the better of scent and scsplit , respectively.


4    Implementation Details
Our Risk Optimization Algorithm (Algorithm 3) takes the following inputs: (1) A di-
agnosis problem instance hO, B, TC + , TC − i, (2) meta information Info=(P, C), com-
prising on the one hand fault probabilities P = {pt |t ∈ CT } where CT is the set of
syntactical construct types available in the used ontology expression language, and on
the other hand and user-defined cautiousness parameters C = (c, c, c), (3) the number
n of considered leading diagnoses per iteration, and (4) a stopping criterion stop_cond.
Input 3 is needed because, in general, it is impossible to consider all minimal diagnoses
for a given diagnosis problem instance at once. A simple way of alleviating this prob-
lem is to consider only minimal diagnoses since the set of all diagnoses is sufficiently
described by the set of all minimal diagnoses. This holds because any superset of a di-
agnosis is also a diagnosis due to the fact that eliminating an axiom from a consistent
                   Balancing Brave and Cautious Query Strategies in Ontology Debugging                            11


  Algorithm 3: Risk Optimization Algorithm
      Input: diagnosis problem instance hO, B, TC + , TC − i, meta information Info=(P, C) where
              P = {pt |t ∈ CT } and C = (c, c, c), size of leading diagnoses window n, stop condition stop_cond
      Output: a diagnosis D
  1   TC ← ∅; TC − ← ∅; D ← ∅;
          +

  2   repeat
  3         D ← getDiagnoses(O, B, TC + , TC − , n, D);
  4         P ← getProbabilities(P, D, TC + , TC − );
  5         X ← generateQueries(O, B, TC + , D);
  6         Xi ← getMinScoreQuery(P, X, scent );
  7         if getQueryCautiousness(Xi , D) < c then Xi ← getAlternativeQuery(c, X, P, D);
 8          if getAnswer(Xi ) == true then TC + ← TC + ∪ {Xi };
 9          else TC − ← TC − ∪ {Xi };
 10         c ← updateCautiousness(D, TC + , TC − , Xi , c, c, c);
 11   until (stop_cond ∨ getScore(Xi , scent ) == 1);
 12   return mostProbableDiag(D, P );




ontology cannot make the ontology inconsistent. However, this does not solve the prob-
lem as there might also be a huge number of minimal diagnoses. The problem with a
large number of diagnoses is caused by the time complexity needed to find diagnoses.
In order to explain this problem in more detail, we first provide a short review of the di-
agnosis computation method described in [1] which we employ to calculate diagnoses.
This method exploits the notion of a conflict set which is defined as follows:
Definition 4 (Conflict Set). Given a diagnosis problem hO, B, TC + , TC − i, a conflict
set CS ⊆ O is a set of axioms s.t. there is a t − ∈ TC − for which CS ∪ B ∪ TC + ∪ t −
is inconsistent. A conflict set CS is called minimal iff there is no CS 0 ⊂ CS such that
CS 0 is a conflict set.
Simply put, a conflict set CS is an inconsistent part of the ontology, i.e. CS does not
meet requirements 1 or 3 in Problem Definition 1. If a diagnosis problem instance does
not include any conflict set, then this problem instance is either solved, i.e. meets re-
quirements 1, 2 and 3 in Problem Definition 1, or can be easily solved by adding the
desired positive test cases TC + if requirement 2 is not met. So, the idea is to resolve
every conflict set in order to solve a given diagnosis problem instance. This is achieved
by deleting or changing at least one axiom of a minimal conflict set. The following
proposition [7] summarizes these thoughts:
Proposition 1. D ⊆ O is a (minimal) diagnosis iff D is a (minimal) hitting set of all
minimal conflict sets.
In [1], the QUICKXPLAIN algorithm (in short QX ) introduced by [2] is used to com-
pute minimal conflict sets. QX (B, A) takes as inputs a set of trusted background knowl-
edge axioms B and a set of axioms A and returns a minimal conflict set CS ⊆ A w.r.t.
B. If A ∪ B is consistent, QX outputs consistent. If B is inconsistent, the output is ∅.
In order to compute minimal diagnoses, a tree called HS-Tree is built with minimal
conflict sets as nodes. The path from the root node nd 0 to a node nd ki on level k is
denoted by HS (nd ki ). The root of this tree is a minimal conflict set obtained by calling
QX (B ∪ TC + ∪ t − , O) for t − ∈ TC − until it first outputs CS(nd 0 ) 6= consistent.
For each axiom axi ∈ CS(nd k−1 j   ) there is a path HS (nd ki ) = HS (nd k−1
                                                                           j   ) ∪ {axi } to
12       Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich

a node nd ki . In order to compute a minimal conflict set for node nd ki , QX (B ∪ TC + ∪
t − , O \ HS (nd ki )) is run for t − ∈ TC − until an output CS(nd ki ) 6= consistent is
returned. If all outputs are consistent, i.e. O \ HS (nd ki ) ∪ B ∪ TC + ∪ t − is consistent
for all t − ∈ TC − , then a minimal hitting set D = HS (nd ki ) of all minimal conflict
sets, i.e. a minimal diagnosis, is found and node nd ki is not further expanded.
      The calculation of all minimal diagnoses at once would thus involve the complete
construction of the HS-Tree. However, in the worst case this requires an exponential (in
nCS ) number of calls to QX (B, A) which itself requires O(log2 |A|) calls to an under-
lying reasoner where nCS denotes the total number of conflict sets. Also, generating all
queries (w.r.t. certain entailment types) for a number k of diagnoses quickly becomes
impracticable as k grows, since all non-trivial 2k − 2 partitions are explored (see Algo-
rithm 1). Therefore it is generally not feasible to examine all minimal diagnoses at the
same time.
      In our approach, we tackle this issue by introducing a leading diagnoses window
of fixed size n, i.e. |D| = n. That is, when the debugging session starts, the first n
minimal diagnoses are computed as explained before. However, instead of calculating
the n minimal diagnoses by ascending cardinality as in [9], our method implements a
uniform-cost search on the HS-Tree to find the n most probable minimal diagnoses.
This uniform-cost search is guided by the meta information, i.e. the fault probabilities
P = {pt |t ∈ CT }, provided as an input to our algorithm. From these fault probabil-
ities, the probabilities p(axj ) of axioms axj ∈ O being erroneous can be determined
according to formula (1). These probabilities are then used to decide which node to ex-
pand next in the search. Namely, always the node Q nd i ∈ V with path HS (nd i ) from the
root node is expanded where p(HS (nd i )) = axj ∈HS (nd i ) p(axj ) is maximal for nd i
among all nodes nd k ∈ V where V is the set of all generated nodes in the HS-Tree. In
this manner, most probable diagnoses are generated first by the GET D IAGNOSES func-
tion. If g ≤ n leading diagnoses are eliminated by a query, then, in the next iteration,
the function supplies the next most probable n − g diagnoses, and so on. Visually spo-
ken, one could say that the leading diagnoses window moves on for e(Xi , ai ) |D| steps
after each query Xi to include the next most probable e(Xi , ai ) |D| diagnoses.
      The GET P ROBABILITIES function in each iteration calculates the diagnosis proba-
bilities for the current set of leading diagnoses D taking into account all information
gathered by querying the oracle so far. To that end it calculates the adjusted diagnosis
probabilities padj (Di ) for Di ∈ D as padj (Di ) = (1/2)z p(Di ) where z is the number
of precedent queries Xk where Di ∈ D∅k and p(Di ) is the probability of Di calculated
directly from the preliminarily given fault probabilities P by formulas (1) and (2). Af-
terwards the probabilities padj (Di ) are normalized. Note that z can be computed from
TC + and TC − which comprise all query answers. This way of updating probabilities
is exactly in compliance with the Bayesian theorem given by formula (4). The rest of
the methods used in Algorithm 3 is explained in Section 3.

5    Evaluation
We tested our Risk Optimization Algorithm (R) against the entropy-based approach (E )
and the split-in-half approach (S ) which were explained in Section 2. As adaptiveness
is one of the central achievements of our approach, the main goal of this evaluation is
                Balancing Brave and Cautious Query Strategies in Ontology Debugging      13

to show its robustness in a variety of possible application scenarios. A major obstacle,
however, is that only a small number of inconsistent/incoherent ontologies are avail-
able and they capture only a few situations, like ontology learning or tutorial cases. So,
we decided to simulate diagnosis sessions covering a broader variety of situations us-
ing a model M which takes as inputs a set of total diagnoses, fault probabilities, and
the parameters explained throughout the rest of this section. M is realistic because the
values of parameters used in M (e.g. for query generation and diagnosis elimination)
were determined by taking averages (*) after analyzing the structure and dependencies
between diagnoses as well as between diagnoses and queries using real-world inconsis-
tent/incoherent ontologies as in [9]. M is fair since each approach A ∈ {E , S , R} was
tested under exactly the same conditions (i.e. diagnoses, queries, probabilities, etc.) by
means of random seeds. And M enables exhaustive automated tests.
     We randomly instantiated M in each test iteration featuring different types of fault
probability distributions PT ∈ {extreme, moderate, uniform} as well as different
cases in terms of quality of probabilities QU ∈ {good , average, bad }. Each approach
A was tested for the nine different settings in P T × QU . To construct different types of
fault probability distributions P T , it is sufficient to change the magnitude of the random
ratio rij = p(Di )/p(Dj ) between diagnosis probabilities p(Di ) > p(Dj ) under the as-
sumption that Di is a neighbor to Dj when diagnoses Di are ordered descending by p(.).
This is because (i) our approach uses the fault probabilities pt only at the very beginning
to calculate the diagnoses probabilities p(Di ) and subsequently only works with proba-
bilities p(Di ), and because (ii) our implementation of the diagnosis generation supplies
diagnoses in order w.r.t. their probability (see Section 5). So, extreme, moderate and
uniform refer to a ‘high’, ‘medium’ and equal-to-one rij , respectively. Concretely, an
extreme distribution was instantiated by generating uniformly distributed random val-
ues ri ∈ [0,1), one for each diagnosis Di , and weighting each ri by wi = (1/1.5)i . That
is, p(Di ) = ri wi . As a second step, the probabilities p(Di ) were normalized and diag-
noses ordered descending by p(.). A moderate distribution was generated analogously,
but with different weights wi = (1/1.1)i . To obtain a uniform distribution we simply
assigned equal probabilities to all diagnoses. The quality QU of fault probabilities is
measured in terms of the position of the target diagnosis D∗ in the list of all diagnoses
ordered by descending probability. By good, average and bad quality we mean that D∗
is located randomly within the first, fourth and ninth tenth of this ordered list.
     Our test setting, implemented in Java2 , was the following. For each A and each pre-
condition in P T × QU we performed 35 iterations in each of which we instantiated
M with 200 diagnoses in total. As explained before, we then assigned probabilities to
these diagnoses according to P T and we marked a diagnosis as the target D∗ depending
on QU . Further on, we specified the number of leading diagnoses to 9 which consti-
tutes a trade-off between computational complexity (e.g. query generation) and amount
of information taken into account for query selection. The stop condition stop_cond
was reasonably defined as follows: If the currently most probable diagnosis D1 has
probability at least three times as high as the second most probable diagnosis D2 , i.e.
p(D1 ) ≥ 3 p(D2 ), the user is asked to check if D1 corresponds to the wanted target
diagnosis D∗ . If so, D∗ is found and the debugging session ends. Otherwise, the user
 2
     See http://rmbd.googlecode.com
14        Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich

continues the debugging session. In our implementation this ‘user check’ was simply
done by testing if D1 == D∗ . Queries were then generated in conformance with Al-
gorithm 1, where D is the current set of leading diagnoses. In order to simulate the
non-existence of certain queries according to (*), we ran through all subsets DX         i ∈ D
and selected a DX    i  with probability 0.4  and for a  selected  D X
                                                                     i we ran  through  all parti-
                   ∅                                        ∅
tions hD¬X i   , D i i of D \ D X
                                i  and  selected D  ¬X
                                                    i   , D i with  probability 0.1 to constitute
                      ¬X    ∅
a query hDX    i , Di , Di i which was then added to X. The answer to a query Xi is
well-defined, if D∗ ∈ DX                     ∗
                              i (true) or D ∈ Di
                                                     ¬X
                                                          (false). Otherwise, we simulated the
feedback of the oracle in that we rationally generated another diagnosis probability
distribution preal which can be seen as the (unknown) ‘real’ fault distribution which
applies for a user. For those queries Xi where D∗ 6∈ (DX                   ¬X
                                                                    i ∪ Di ), the oracle was
implemented to give answers according to preal . The type of the distribution preal was
determined by QU . Hence, for the good case, preal was generated in a way that it cor-
relates positively with p, i.e. preal (Di ) = ri p(Di ) where ri ∈ [0, 1) is a uniformly
distributed random number. For the average case, preal was generated independently of
p, i.e. preal (Di ) = ri . For the bad case, preal was generated s.t. it correlates negatively
with p, i.e. preal (Di ) = ri /p(Di ). Then these probabilities were normalized.
    The relation between axioms in terms of common entailments was realized as fol-
lows: For each query X ∈ X, we specified according to (*) that 95% of diagnoses
currently not in the set of leading diagnoses D make no prediction about the query out-
come, i.e. are in no case eliminated by this query. All other Di 6∈ D were allocated
randomly to predict either true or false.
    After each iteration we measured the number of queries Q# needed to find the tar-
get diagnosis D∗ , the average elimination rate ER(%) observed for these Q# queries,
and the number of queries LD# still needed to find D∗ after it has become a lead-
ing diagnosis, i.e. D∗ ∈ D. Figure 1 illustrates the overall test results. In the bar
charts, the figure shows ER(%), Q# and LD# averaged over all 35 iterations for
each approach A ∈ {E , S , R} and each precondition in P T × QU . The box plot, on
the other hand, indicates the variance of the observed number of queries Q# during
all 35 iterations by providing the minimal and maximal observed values, the first and
third quartile and the median. The area between 0.25 and 0.75 quantiles is marked as
a box which includes a horizontal line giving the position of the median. We tested
our method R with many different settings for cautiousness parameters. Performance
was similar for all these settings thanks to the learning algorithm. Evaluation results are
given for (c, c, c) = (0.4, 0.3, 0.5) which yielded the most stable results. In Figure 1
the superior average performance of R in comparison to both other approaches S and
E is clearly manifested. It can be observed that R needs fewest queries in each situa-
tion. Responsible for this is the high ER(%) and the low LD# achieved. Indeed, our
method maximizes ER(%) while minimizing LD# in all cases compared to S and E .
As a consequence of the high ER(%), the leading diagnoses window in our method
moves on quickly to consider less probable diagnoses for discrimination. In this way,
the target diagnosis gets ‘covered’ earlier by the leading diagnoses window and is thus
identified with fewer queries. This high ER(%) can be explained by the property of our
method to constantly monitor the elimination rate of selected queries and the ability to
react quickly to poor performance by switching to a more cautious strategy. Account-
                Balancing Brave and Cautious Query Strategies in Ontology Debugging                   15

able for the minimization of LD# is the learning of probabilities adopted from the
entropy-based approach coupled with the high ER(%) achieved by our method. The
improvement of R w.r.t. Q# is most drastically visible in the box plots which show that
the interval of the most frequent 50% of observed Q# values (denoted by the box) is
always located lower or as low as for the better approach of S and E . In case the lower
bound of the interval is located as low as for another method, the length of the interval,
i.e. the variance, is lower for our method R.

6    Conclusion
In this paper we presented a risk optimization approach to sequential debugging of on-
tologies. The proposed method exploits meta information in terms of fault probabilities
and user-defined willingness to take risk. A substantial cost reduction compared to ex-
isting methods was shown for any quality of the given meta information. The proposed
method is of particular significance for domains where only vague meta information is
available, such as ontology development.

References
1. Friedrich, G., Shchekotykhin, K.: A General Diagnosis Method for Ontologies. In: Gil, Y.,
   Motta, E., Benjamins, R., Musen, M. (eds.) Proceedings of the 4 th International Semantic
   Web Conference (ISWC-05). pp. 232–246. Springer (2005)
2. Junker, U.: QUICKXPLAIN: Preferred Explanations and Relaxations for Over-Constrained
   Problems. In: Association for the Advancement of Artificial Intelligence. pp. 167–172. San
   Jose, CA, USA (2004)
3. Kalyanpur, A., Parsia, B., Horridge, M., Sirin, E.: Finding all Justifications of
   OWL DL Entailments. In: Proc. of ISWC/ASWC2007, Busan, South Korea.
   LNCS, vol. 4825, pp. 267–280. Springer Verlag, Berlin, Heidelberg (Nov 2007),
   http://iswc2007.semanticweb.org/papers/267.pdf
4. Kalyanpur, A., Parsia, B., Sirin, E., Cuenca-Grau, B.: Repairing Unsatisfiable Concepts
   in OWL Ontologies. In: 3rd European Semantic Web Conference, ESWC 2006. Lecture
   Notes in Computer Science, vol. 4011, pp. 170–184. Springer, Berlin, Heidelberg (2006),
   http://www.springerlink.com/index/10.1007/11762256
5. de Kleer, J., Williams, B.C.: Diagnosing multiple faults. Artificial Intelligence 32(1), 97–130
   (Apr 1987), http://linkinghub.elsevier.com/retrieve/pii/0004370287900634
6. Rector, A., Drummond, N., Horridge, M., Rogers, J., Knublauch, H., Stevens, R.,
   Wang, H., Wroe, C.: OWL Pizzas: Practical Experience of Teaching OWL-DL: Com-
   mon Errors & Common Patterns. In: Motta, E., Shadbolt, N.R., Stutt, A., Gib-
   bins, N. (eds.) Engineering Knowledge in the Age of the SemanticWeb 14th Interna-
   tional Conference, EKAW 2004. pp. 63–81. Springer, Whittenbury Hall, UK (2004),
   http://www.springerlink.com/index/PNRG2T506C2JV3MD.pdf
7. Reiter, R.: A Theory of Diagnosis from First Principles. Artificial Intelligence 23, 57–95
   (1987)
8. Schlobach, S., Huang, Z., Cornet, R., Harmelen, F.: Debugging Incoher-
   ent Terminologies. Journal of Automated Reasoning 39(3), 317–349 (2007),
   http://www.springerlink.com/index/10.1007/s10817-007-9076-z
9. Shchekotykhin, K., Friedrich, G.: Query strategy for sequential ontology debugging. In: Patel-
   Schneider, P.F., Yue, P., Hitzler, P., Mika, P., Lei, Z., Pan, J., Horrocks, I., Glimm, B. (eds.) Pro-
   ceedings of International Semantic Web Conference (ISWC 2010). Shanghai, China (2010)
16            Patrick Rodler, Kostyantyn Shchekotykhin, Philipp Fleiss and Gerhard Friedrich

              LDð      ERH%L      Qð             LDð       ERH%L       Qð             LDð       ERH%L     Qð

         40                                 40                                   40

         30                                 30                                   30

         20                                 20                                   20

         10                                 10                                   10



              E         S         R              E           S         R              E           S       R
                  Extreme Good                       Extreme Average                        Extreme Bad


                                                                            60
  12
                                       35                                   55
  10
                                                                            50
                                       30
     8                                                                      45
                                                                            40
     6                                 25
                                                                            35
     4
                                       20                                   30
     2                                                                      25

              E         S         R              E          S          R              E          S        R



              LDð      ERH%L      Qð             LDð       ERH%L       Qð             LDð       ERH%L     Qð

         40                                 40                                   40

         30                                 30                                   30

         20                                 20                                   20

         10                                 10                                   10



              E          S        R              E           S        R               E         S         R
                  Moderate Good                      Moderate Average                     Moderate Bad


                                                                            50
  16

  14                                   30                                   45
  12
                                                                            40
  10                                   25
     8                                                                      35

     6                                 20
                                                                            30
     4
                                       15                                   25
              E         S         R              E          S          R              E          S        R



              LDð      ERH%L      Qð             LDð       ERH%L       Qð             LDð       ERH%L     Qð
         50
                                            40                                   40
         40
                                            30                                   30
         30
                                            20                                   20
         20

         10                                 10                                   10




              E          S        R              E           S         R              E           S       R
                  Uniform Good                       Uniform Average                        Uniform Bad



  14                                   30
                                                                            40
  12

  10                                   25
                                                                            35
     8

     6
                                       20
                                                                            30
     4

     2
                                       15
              E         S         R              E          S          R              E          S        R




Fig. 1. For each combination of probability distribution type in {Extreme, Moderate, Uniform}
and quality in {Good, Average, Bad} one bar chart and one box plot is shown. Every plot depicts
values for approaches S (split-in-half), E (entropy) and R (new method). Bar charts show aver-
age required queries (Q#), elimination rate (ER(%)) and queries needed after D∗ ∈ D (LD#).
Box plots show the variance of Q#.