Exact Learning of EL Ontologies

                   Ricardo Duarte, Boris Konev, Ana Ozaki

     Center for Advancing Electronics Dresden (cfaed), TU Dresden, Germany
                    University of Liverpool, United Kingdom
                     Free University of Bozen-Bolzano, Italy


      Abstract. We present the learning algorithm underpinning ExactLearner,
      a tool for exactly learning and teaching EL terminologies. The algorithm
      is superpolynomial in the size of the concept expressions and vocabulary
      of the terminology, but not in the size of the whole terminology (which
      is optimal). We evaluate ExactLearner’s performance on EL ontologies
      from the Oxford ontology repository and demonstrate that despite the
      algorithm being exponential, it successfully terminates for small and
      medium size ontologies. We investigate the impact of various learner and
      teacher features and identify those most useful for learning.


1   Introduction
Authoring ontologies is a laborious task that requires a combined expertise of
domain experts, who know the vocabulary of terms used in a particular subject
area and have an understanding of the conceptual relationships between them,
and of knowledge engineers, who can formalise these relations in an appropriate
ontology definition language. In [6] the dialogue between an expert and a knowl-
edge engineer is formalised as an instance of Angluin’s exact learning framework
in which a learner tries to exactly identify an ontology by communicating with a
teacher, seen as an oracle. It is assumed that the vocabulary of terms is known
and is communicated directly to the learner; in contrast, the exact ontology
composition has to be found through a ‘trial and error’ learning process.
     The learner poses queries of two kinds: membership queries, which ask the
oracle to determine whether a given inclusion is entailed by the ontology, and
equivalence queries, whether the ontology constructed is complete. If the answer
to an equivalence query is negative, the oracle returns a statement which follows
from the expert’s knowledge but not from the ontology constructed, or the other
way around. We are interested in learning algorithms that can identify any target
ontology independently of the behaviour of the teacher. For complexity bounds,
we consider the worst possible, adversarial teacher, which chooses to reveal as
little information as possible.
     In this paper we build on results of [6] and give an algorithm for learning
EL terminologies, which is exponential in the size of concept expressions and
its vocabulary but not in the size of the whole terminology, thus providing a
formal underpinning to the ExactLearner system [5]. This result complements
previous results showing that there is no polynomial time algorithm which can
exactly learn (even acyclic) EL terminologies [6]. We then introduce ExactLearner,
a tool for exactly learning and teaching EL terminologies, which contains an
implementation of our learning algorithm as well as a teacher. We evaluate Ex-
actLearner’s performance on EL ontologies from the Oxford ontology repository [8]
and demonstrate that despite the algorithm being exponential, it successfully
terminates for small and medium size ontologies. We investigate the impact of
various learner and teacher features and identify those most useful for learning.
Related work. Most relevant to our work are: the DL-Learner [7], which learns
concept expressions (but not ontologies) in various fragments of description logic,
using refinement operators; and systems based on the exact learning model,
such as: Logan-H [2] for learning function-free first order Horn sentences from
interpretations; and EIRENE [1], for learning schema mappings. For a more
detailed discussion of related work, see [6].


2     Preliminaries

Description logic. Let NC and NR be countably infinite sets of concept and role
names. An EL concept expression C is formed according to the rule: C, D := A |
> | C u D | ∃r.C, where A ranges over NC and r ranges over NR . A (general) EL
concept inclusion has the form C v D, where C and D are EL concept expressions.
An EL ontology is a finite set of EL concept inclusions [4]. We call an EL ontology
O a terminology if for all C v D ∈ O either C or D is a concept name and O
has at most one1 inclusion of the form A v C for every A ∈ NC . ELlhs is the
class of EL terminologies consisting only of inclusions of the form C v A, while
ELrhs only of inclusions of the form A v C.
    The size |C| of a concept expression C is the length of the string that represents
it, where concept names and role names are considered to be of length one. An
ontology vocabulary is the set of concept and role names occurring in the ontology.
The size of a concept inclusion C v D,  P denoted |C v D|, is |C| + |D| and the
size of an ontology O, denoted |O|, is CvD∈O |C v D|. The semantics of EL is
defined as usual [3]. We write I |= α to say that a concept inclusion α is true
in I. An interpretation I is a model of an ontology O if I |= α for all α ∈ O.
O |= α means that I |= α for all models I of O; and O ≡ O0 means that O |= α
if and only if O0 |= α for all concept inclusions α.
    We identify each EL concept expression C with a finite tree TC where nodes
are labelled with sets of concept names and edges are labelled with roles: if A is
a concept name, then TA has a single node d with label l(d) = {A}; if C = ∃r.D,
then TC is obtained from TD by adding a new root d0 and an edge from d0
to the root d of TD with label l(d0 , d) = r (we then call d an r-successor of
d0 ); if C = D1 u D2 , then TC is obtained by identifying the roots of TD1 and
1
    In the literature, the term terminology commonly refers to sets of concept inclusions
    A v C and concept definitions A ≡ C, with no concept name occurring more than
    once on the left. As A ≡ C can be equivalently rewritten as A v C and C v A, our
    definition is a natural extension of this one.
TD2 . A mapping h from a tree TC to an interpretation I is a homomorphism if
A ∈ l(d) implies h(d) ∈ AI for every concept name A and r = l(d, d0 ) implies
(h(d), h(d0 )) ∈ rI for all role names r. Then d ∈ C I if, and only if, there is a
homomorphism from TC to I mapping the root of TC to d.
    In our proofs, we use notion of a canonical model, defined for every EL ontology
O and EL concept C. IC,∅ , also denoted IC , is defined as TC viewed as a tree-
shaped interpretation. For O 6= ∅, IC,O is the limit of the sequence I0 , I1 , . . . of
interpretations, where I0 = IC and In+1 is obtained from In by applying, in
a fair way, the following rule: if C v D ∈ O and d ∈ C In but d 6∈ DIn , then
take the interpretation ID and add it to In by identifying its root ρD with d.
IC,O |= O and, for any EL concept D, we have ρC ∈ DIC,O iff O |= C v D,
where ρC is the root of IC,O .
Subsumption learning framework. Given a class of ontologies L (for example
all ontologies in a particular DL, EL terminologies etc), we are interested in the
exact identification of a target ontology O ∈ L by posing queries to an oracle. We
assume that the vocabulary of the target terminology ΣO is known to the learner.
A membership query is a call to the oracle to test for an inclusion C v D, where
C, D are ΣO -concept expressions of the DL under consideration, if O |= C v D.
An inclusion C v D is a positive example w.r.t. a target O if O |= C v D and
a negative example else. An equivalence query is a call to the oracle to check
if a hypothesis ontology H is equivalent to the target O. If it is the case, the
oracle responds ‘yes’, otherwise the oracle returns a positive example C v D with
H 6|= C v D or a negative example E v F with H |= E v F . Such a positive
example C v D (negative example E v F ) is called a positive counterexample
(a negative counterexample, resp.) to H being equivalent to O. For a formal
definition of the subsumption learning framework and a discussion of how this
definition relates to Angluin’s exact learning model see [6].
    We say that a class of ontologies L is exactly learnable if there is an algorithm,
which halts for any target O ∈ L and computes, using membership and equivalence
queries, H ∈ L with H ≡ O as long as the oracle replies to the queries truthfully.
An ontology class is exactly learnable in polynomial time if it is exactly learnable
by an algorithm A such that at every step2 of computation the time used by
A up to that step is bounded by a polynomial p(|O|, |C v D|), where O is the
target and C v D is the largest counterexample seen so far. ELlhs and ELrhs
are known to be exactly learnable in polynomial time, while the class of all EL
ontologies is not learnable in polynomial time [6].


3     Learning EL ontologies

In this section we present Algorithm 1, which can exactly learn EL terminologies
in time exponential in |CO |, the size of the largest concept expression in O, and
|ΣO |, the size of the ontology vocabulary, but not in the size of the whole ontology.

2
    We count each call to an oracle as one step.
Algorithm 1 The learning algorithm for EL
Input: An EL terminology O given to the oracle; ΣO given to the learner
Output: An EL terminology H computed by the learner such that O ≡ H
 1: Set H = {A v B | O |= A v B, A, B ∈ ΣO }
 2: while H 6≡ O do
 3:    Let C v D be the returned positive counterexample for O relative to H
 4:    Compute C 0 v D0 with C 0 or D0 in ΣO ∩ NC
 5:    if C 0 ∈ ΣO ∩ NC then
           Compute a right O-essential α from C 0 v D0 u             F0
                                                              d
 6:
                                                         C 0 vF 0 ∈H
 7:    else
 8:        Compute a left O-essential α from C 0 v D0
 9:    end if
10:     Add α to H
11: end while
12: return H


   In the main loop of the algorithm the learner poses an equivalence query to
the oracle. If the oracle answers “yes” then the algorithm returns H equivalent
to O. Otherwise, it receives a counterexample C v D. It is easy to see that at all
times O |= H so the counterexample is always positive. As O is a terminology,
complex C and D in the counterexample can only “connect” via a concept name,
which can be identified by asking membership queries.
Lemma 1. Given a counterexample C v D, one can construct, by posing mem-
bership queries, a counterexample C 0 v D0 such that |C 0 v D0 | ≤ |C v D| and
either C 0 or D0 is a concept name in time polynomial in |H|, |C| and |ΣO |.
Proof. (Sketch) This lemma relies on the properties of the rooted canonical model
IC,O of an EL concept C and an EL terminology O. It can be proved by induction
on the structure of the right-hand side of the counterexample C v D using the
following property proved using the construction of IC,O :
    For any d ∈ ∆IC and EL concept expression of the form ∃r.F : if there exists a
homomorphism h from the labeled tree corresponding to ∃r.F into IC,O such that
h(a∃r.F ) = d and all nodes in the labeled tree corresponding to F are mapped
to ∆IC,O \ ∆IC , then there exists a concept name E with d ∈ E IC such that
O |= E v ∃r.F .
   Having transformed the counterexample to the case of a concept name on
the left or on the right, the algorithm minimises the size of the counterexample
and makes it more informative by applying rules. If C 0 is a concept name then
Algorithm 1 merges D0 with the right-hand sides of all inclusions in H with C 0 on
the left (if they exist) and computes a so called right O-essential counterexample.
Otherwise D0 is a concept name, and the algorithm computes a left O-essential
counterexample. It then adds the resulting O-essential concept inclusion α to H.

Right O-essential concept inclusions We say that an inclusion A v C is
right O-essential if none of the following three rules applies to A v C.
 Concept saturation for O: If O |= A v C 0 and C 0 results from C by adding a
    concept name A0 to the label of some node, then replace A v C by A v C 0 :
 Sibling merging for O: If O |= A v C 0 and C 0 is the result of identifying in C
    two r-successors of the same node then replace A v C by A v C 0 .
 Decomposition on the right for O: If d0 is an r-successor of d in C, A0 is in the
    node label of d, and O |= A0 v ∃r.Cd0 plus A0 6≡O A if d is the root of C,
    then replace A v C by
    (a) A0 v ∃r.Cd0 if H 6|= A0 v ∃r.Cd0 ; or
   (b) A v C|− d0 ↓ , otherwise,
    where Cd is the concept corresponding to the subtree rooted in d and C|−  d↓ is
    the concept corresponding to the result of removing the subtree rooted in d
    from C.
It follows from [6], Lemma 29 and its proof, that given a counterexample A v
C, one can construct a right O-essential counterexample A0 v C 0 using only
polynomially many polynomial size membership queries in |H| + |C| + |ΣO |.
    Concept saturation, sibling merging and decomposition on the right are
all essential—hence the name—steps of the polynomial learning algorithm for
DL-Lite∃R , which extends ELrhs with inverse roles and role hierarchies [6].
Example 1. (a) For H = ∅ and O = {Human v ∃hasParent.Human} the oracle
    can return arbitrary many arbitrary long hasParent chains starting at Human
    as a counterexamples leading to non-termination. For instance, Human v
    ∃hasParent.∃hasParent.> is a chain of length two. With concept saturation,
    this counterexample can be strengthened to Human v ∃hasParent.(Human u
    ∃hasParent.Human), which is equivalent to O.
(b) For O = {Human v ∃hasParent.(Human u Male)} and H = {Human v
    ∃hasParent.Human}, upon receiving a counterexample Human v ∃hasParent.
    Male, the learner merges its right hand side with the right hand side of the
    inclusion in H to form Human v ∃hasParent.Male u ∃hasParent.Human and
    then strengthens it by sibling merging to form the inclusion in O.
(c) For H = ∅ and O = {Woman v Human, Human v ∃hasParent.Human}, even
    with concept saturation, there exist infinitely many chain counterexamples;
    Woman v Human u ∃hasParent.(Human u ∃hasParent.Human) is one of them.
    This inclusion can be decomposed at the root into (a) Human v Woman
    and (b) Human v ∃hasParent.(Human u ∃hasParent.Human). Picking either
    of them allows the learner to make progress.
   In fact, one can prove that Algorithm 1 polynomially learns ELrhs .
Theorem 1. The class of ELrhs ontologies is exactly learnable in polynomial
time by Algorithm 1.
   We show that even in the presence of inclusions of the form D v B, right
O-essential concept inclusions are bounded in size by the size of the largest
concept expression in O.
Lemma 2. Suppose that A v C is right O-essential. Then |C| ≤ |CO | · |ΣO |,
where CO is the largest concept expression in O.
Proof. (Sketch) We follow the lines of the proof of Lemma 32 in [6]. First we
define
                  AO = {A} ∪ {D | O |= A v B, B v D ∈ O}
                                                  d
and construct the canonical model ID0 ,O of D0 = D∈AO D and O. The inter-
pretation ID0 ,O has the following properties:

 1. for every EL concept expression F : ρD0 ∈ F ID0 ,O iff O |= A v F ;
 2. for any d ∈ ∆ID0 and EL concept expression of the form F = ∃r.F 0 : if there
    exists a homomorphism h from the labeled tree corresponding to F into
    ID0 ,O such that h(aF ) = d and all nodes in the labeled tree corresponding to
    F 0 are mapped to ∆ID0 ,O \ ∆ID0 , then there exists a concept name E with
    d ∈ E ID0 such that O |= E v F .

Then, one can show that there is an injective homomorphism h from the labeled
tree TC of C into the restriction J of ID0 ,O to ∆ID0 which maps aC to ρD0 . By
definition of AO , there is B v D ∈ O such that |∆IC | ≤ |∆ID |, which means
that |∆IC | ≤ |∆ICO |, where CO is the largest concept expression in O. The main
observation here is that the argument holds for EL terminologies, that is, in
the presence of concept inclusions of the form D v B. Such inclusions do not
interfere in the argument since concept saturation ensures that nodes have all
concept names implied by O in their labels.


Left O-essential concept inclusions We say that an inclusion C v A is left
O-essential if neither of the following rules applies to C v A.

 Concept saturation for H: If H |= C v C 0 and C 0 results from C by adding a
   concept name A0 to the label of some node, then replace C v A by C 0 v A.
 Decomposition on the left for O: If d is a non-root node such that O |= C|−
                                                                           d↓ v A
                                                                                  0
               −      0              0                                   −     0
   and H 6|= C|d↓ v A , for some A ∈ ΣO , then replace C v A by C|d↓ v A ; if
   O |= Cd v A0 and H 6|= Cd v A0 , then replace C v A by Cd v A0 .

The applicability of the second rule may depend on the application of the first. For
example, for H = {∃hasParent.> v Human} and O = H ∪ {∃hasChild.Human v
Human} a counterexample could be ∃hasChild.∃hasParent.> v Human, which can
only be decomposed on the left for O if we apply concept saturation for H first.
    Similarly to right O-essential counterexamples, given a counterexample C v
A, one can construct a left O-essential counterexample C 0 v A0 using only
polynomially many polynomial size membership queries in |H| + |C| + |ΣO |. We
show in Lemma 3 that the size of left O-essential counterexamples is polynomially
bounded by |ΣO | and |CO |. The intuition behind this lemma is that if neither
concept saturation for H nor decomposition on the left for O apply to an
inclusion C v A then for some D v E ∈ O, with E ∈ NC , not only there is a
homomorphism h from ID to IC,O mapping their roots but also all elements of
IC are in the image of this homomorphism. However, in contrast to Lemma 58 by
Konev et. al [6], due to inclusions of the form B v F with F a complex expression,
some nodes of D may be mapped by h outside of IC . For example, for H = ∅,
O = {∃r.∃s.B v E, E v A, B v ∃s.B} and a left O-essential counterexample
∃r.B v A, the homomorphism from I∃r.∃s.B to I∃r.B,O covers I∃r.B but there is
only a partial homomorphism (that is, a partial function satisfying properties of
a homomorphism) from I∃r.∃s.B to I∃r.B .
Lemma 3. Suppose that C v A is left O-essential. Then |C| ≤ |CO | · |ΣO |,
where CO is the largest concept expression in O.

Proof. To show this lemma, we use the canonical model IC,O of C and O. If
there is E ∈ ΣO such that O |= C v E (and C v E is not a tautology) then
there is D v E ∈ O such that ρC ∈ DIC,O . Let

        (C, O)IC = {D | ∃E ∈ ΣO : ρC ∈ DIC,O , ρC 6∈ E IC , D v E ∈ O}.

For all D ∈ (C, O)IC there is a homomorphism h : ID → IC,O mapping the root
of ID to the root of IC,O . Also, by construction of IC,O , there is D0 ∈ (C, O)IC
such that:

 1. there is a partial homomorphism h : ID0 → IC mapping the root of ID0 to
    the root of IC , for e ∈ ∆ID0 with h(e) ∈ ∆IC ;
 2. if D0 has a conjunct ∃r.F , with d ∈ F ID0 as the corresponding r-successor
    of ρD , and h(d) 6∈ ∆IC then there is A ∈ NC such that ρC ∈ AIC and
    O |= A v ∃r.F .

The fact that |∆IC | ≤ |∆ID0 | for such D0 v E ∈ O is a result of the Claim.
Claim. For all d ∈ ∆IC , there is e ∈ ∆ID0 such that d = h(e).
Suppose to the contrary that there is d0 ∈ ∆IC such that d0 is outside the image
of h : ID0 → IC,O . Let G = C|− d0 ↓ be the concept corresponding to the result
of removing the subtree rooted in d0 . Since C is concept saturated for H, if
ρG ∈ D0IG,O and ρG 6∈ E IG then this contradicts the fact that decomposition
on the left for O was exhaustively applied. To show the contradiction, we argue
that h is a homomorphism from ID0 into the restriction IG,O of IC,O . Our
construction has the following properties:

 3. for e ∈ ∆ID0 with h(e) ∈ ∆IC , we have that h(e) ∈ ∆IG ;
 4. for all d ∈ ∆IC \ {ρC }, d ∈ AIC iff d ∈ AIC,O , for all A ∈ NC .

We obtain Point (3) by the fact that since d0 is outside the image of h : ID0 → IC,O ,
all descendants of d0 are outside the image of h : ID0 → IC,O as well. Point (4)
follows from decomposition of the left for O and concept saturation for H.
    For e ∈ ∆ID0 with h(e) ∈ ∆IC , by Points (1) and (3), there is a partial
homomorphism from ID0 to IG . For e ∈ ∆ID0 with h(e) 6∈ ∆IC , there is e1 ∈ ∆ID0
such that: (i) h(e1 ) 6∈ ∆IC ; (ii) e is in the subtree rooted in e1 ; and (iii) e1 has
minimal distance from ρD0 , where ‘distance’ here is the length of a chain of
elements connected via roles from ρD0 to e1 . This means that the parent e2 of
e1 is such that h(e2 ) ∈ ∆IC . Let F be the concept corresponding to the subtree
rooted in e1 and let s be the role between e2 and e1 . That is, (e2 , e1 ) ∈ sID0 . Since
h(e1 ) 6∈ ∆IC we have that all descendants of e1 are not mapped to ∆IC . Then, by
construction of the canonical model IC,O of C and the EL terminology O, one can
show that there is a B ∈ NC such that h(e2 ) ∈ B IC,O and O |= B v ∃s.F . So it
remains to show that h(e2 ) ∈ B IG,O , for some B ∈ NC such that O |= B v ∃s.F .
We make a case distinction:

 – for h(e2 ) 6= ρC : by Point (4), h(e2 ) ∈ B IC,O implies h(e2 ) ∈ B IC . As
   h(e2 ) ∈ ∆IC , by Point (3), we have that h(e2 ) ∈ ∆IG . So h(e2 ) ∈ B IG , and
   thus, h(e2 ) ∈ B IG,O .
 – for h(e2 ) = ρC : by Point (2), if e1 ∈ F ID0 and h(e1 ) 6∈ ∆IC then there is
   a concept name B such that ρC ∈ B IC and O |= B v ∃s.F . So ρG ∈ B IG ,
   which means that ρG = h(e2 ) ∈ B IG,O .

Since we showed for arbitrary e ∈ ∆ID0 with h(e) ∈ ∆IC and h(e) 6∈ ∆IC
that there is d ∈ ∆IG,O such that d = h(e), we have that h : ID0 → IG,O is
a homomorphism. Thus, there is D0 ∈ (C, O)IG such that ρG ∈ D0IG,O and
ρG 6∈ E IG for some D0 v E ∈ O, which contradicts the fact that decomposition
on the left for O was exhaustively applied, so our claim follows.
    This bounds the size of the domain of IC to the size of the domain of ID0 .
Since |∆ID0 | ≤ |∆ICO |, we have |∆IC | ≤ |∆ICO |.

    By combining lemmas 2 and 3 we obtain the following result.
Lemma 4. Given a counterexample C v D for O relative to H, one can con-
struct a counterexample C 0 v D0 such that |C 0 v D0 | ≤ |CO | · |ΣO | + 1 in
polynomial time in |C v D|, |ΣO | and |H|.
   Since there are at most |ΣO ||CO |·|ΣO |+1 many inclusions over ΣO of size
|CO | · |ΣO | + 1, at most |ΣO ||CO |·|ΣO |+1 counterexamples get added to H over
the run of the algorithm. Thus we obtain the following theorem.
Theorem 2. The class of EL terminologies is exactly learnable by Algorithm 1
in O(|ΣO |2|CO |·|ΣO |+2 · |C v D|2 ) steps, where ΣO is the vocabulary of the target
terminology O, CO is the largest concept expression in O and C v D is the
largest counterexample seen by Algorithm 1.


4    Evaluation

The ExactLearner system [5] has two main components: a learner and a teacher.
The learner supports (1) “Concept Saturation”, (2) “Sibling Merging”, (3) “De-
composition”, applied on the right side of inclusions, and (4) “Concept Desatu-
ration”, (5) “Sibling Branching” and (6) “Decomposition”, applied on the left.
Operations (1), (2), (3) and (6) have already been described. In addition, we
have also implemented (4) and (5), which act as heuristics to construct smaller,
more informative counterexamples. Concept desaturation tries to remove concept
names from nodes in the left of counterexamples to make them logically stronger.
Sibling branching tries to strengthen a counterexample by splitting paths on
                        Test 2                          Test 3
    p     # timeouts    avg CE    avg max C       # timeouts avg CE        avg max C
 0.01               3      17.2         27.7               2         5.6        31.7
  0.5              25     107.8         26.6               3         6.1        31.6
  1.0              26     190.4         19.5               3         6.3        31.9

                 Table 1. Learner against the adversarial teacher.


the left. For example, for O = {∃hasDegree.BSc u ∃hasDegree.MSc v PG} and
H = ∅, the inclusion ∃hasDegree.(BSc u MSc u PhD) v PG is a counterexample,
from which desaturation removes the irrelevant PhD and then sibling branching
strengthens it to the one in O. Concept saturation for H was implemented as
part of Rule (6), “Decomposition” applied to the left, as the applicability of Rule
(6) may depend on the exhaustive application of this rule.
    We have evaluated ExactLearner’s performance on EL ontologies from the
Oxford ontology repository [8]. As a first experiment we ran the learner against a
naïve teacher, which presents the target ontology inclusions one by one without
modification. This experiment aims at estimating the overheads of the learning
process under the best possible conditions. In this first experiment, for 50 out
of 174 EL ontologies computations concluded within 1 hour. We selected these
ontologies for further experiments.
    The selected ontologies range in size from 9 to 11 177 inclusions with signature
sizes ranging from 23 to 9334 concept names and from 2 to 25 role names. The
average size of counterexamples produced by the teacher was 5.48 while the
average size of the largest concept in O was 2.7. The average size of the largest
concept in H was 31.3, an increase caused by concept saturation on the right
side of inclusions. The performance bottlenecks in our system are checking if the
presented inclusion is a counterexample w.r.t. the current hypothesis ontology at
the server side and entailment checks in the learner.
    We have also introduced an adversarial teacher, which forces the learner to
apply particular operations from (1)–(6) above by manipulating the counterexam-
ples. For instance, to force the learner to perform concept saturation on the right
of A v C, the teacher exhaustively tries to remove concept names from every node
in the tree representation of C, while ensuring that the modified inclusion is still
a counterexample. The adversarial teacher can apply: (7) “Concept Desaturation”
on the right, which we have just described; (8) “Sibling Branching” on the right,
which weakens counterexamples of the form A v ∃r.(C u D) into A v ∃r.C u ∃r.D
(provided the latter is still a counterexample); (9) “Concept Saturation” on the
left; and (10) “Sibling Merging” on the left, which are the opposite of learner’s
concept desaturation and sibling branching. We also substitute concept defini-
tions into counterexamples, for instance, if A v ∃r.B is a counterexample and
B v C ∈ O we test A v ∃r.C for being a counterexample as well. We call this
operation (11) “Composition on the right”. (12) “Composition on the left” is its
                   Fig. 1. Oracle and learner skills usage in Test 2.


counterpart. Operations (7)–(12) are applied at random with set probabilities so
that the level of difficulty could be controlled (whenever the oracle applies an
operation, the result is a ‘less informative’ counterexample, so learning from those
counterexamples may require more queries, and thus, increase the difficulty).
    Table 1 presents statistics of running the learner against the adversarial teacher.
In Test 2 the teacher was applying transformations (7)–(12) with probability p
of 0.01, 0.5 and 1.0. The learner can cope with a small distortion of examples
(p = 0.01) but a significant distortion leads to a big increase in the number of
time-outs. Figure 1 shows the percentage of the rules applied by the learner and
the oracle in Test 2. As the chart indicates, the most frequently applied rule
(42%) for the learner with probability p = 0.01 is desaturation in the left. This
number grows to 94% when p = 0.5 and to 96% when p = 1.0. For the oracle,
the most frequently applied rule with p = 0.01 is saturation on the left. Then,
we see an increase on the application of composition on the left, which grows to
90% when p = 0.5 and to 92% when p = 1.0.
    As the oracle applies saturation on the left, there are more concept names on
the left which can be used for composition on the left. The discrepancy between
the number of compositions on the left by the oracle and decompositions on the
left by the learner is due to the fact that the oracle applies the rule repeatedly
while the learner finds a minimal subtree in one rule application.
    In Test 3 we have disabled rule (9) at the oracle side as well as rules (8)
and (10) as they almost never applied, see Figure 1, yet took up a significant
time in our tests. This led to a significant drop in the failure rate even though
other adversarial teacher operations were applied with a high probability. This
suggests that the main cause of time-outs is the exponential explosion in the size
of the signature rather than the size of concepts in the target ontology. We also


                    Fig. 2. Membership and equivalence queries.


measured the increase on the number of queries in Test 2 when the probability for
the oracle to apply a certain rule increases. In Test 2, 22 ontology computations
concluded within 1 hour with the probability of the oracle to apply a certain
rule set to 1.0 (the oracle always apply all rules exhaustively). Figure 2 shows
the delta between averages of the number of queries when the probability for the
oracle to apply its rules are set to 0.0 and the other values: 0.01, 0.5 and 1.0. The
number of membership queries visibly increases, while the number of equivalence
queries remains nearly the same. This is expected, since computing O-essential
inclusions from less informative counterexamples need more membership queries.
Though, since the learner indeed computes O-essential counterexamples, there is
only a small impact on the number of equivalence queries.

5    Conclusion
We presented the learning algorithm underpinning ExactLearner. As future work,
we plan to extend our algorithm in an ontology-based data access setting and adopt
the Probably Approximately Correct learning model extended with membership
queries, so that our algorithm can also run without the teacher.

Acknowledgements. Konev was supported by the EPSRC project EP/H043594/1.
We would like to thank Frank Wolter for fruitful discussions and Liyi Zhao for
her contribution to an earlier version of ExactLearner.
References
1. Bogdan Alexe, Balder ten Cate, Phokion G. Kolaitis, and Wang Chiew Tan. EIRENE:
   interactive design and refinement of schema mappings via data examples. PVLDB,
   4(12):1414–1417, 2011.
2. Marta Arias, Roni Khardon, and Jérôme Maloberti. Learning horn expressions with
   LOGAN-H. Journal of Machine Learning Research, 8:549–587, 2007.
3. F. Baader, D. Calvanese, D. McGuiness, D. Nardi, and P. Patel-Schneider. The
   Description Logic Handbook: Theory, implementation and applications. Cambridge
   University Press, 2003.
4. Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL envelope. In
   IJCAI, pages 364–369. Professional Book Center, 2005.
5. Ricardo Duarte, Boris Konev, and Ana Ozaki. Exact learning of EL ontologies.
   In Principles of Knowledge Representation and Reasoning: Proceedings of the 16th
   International Conference, KR 2018, Tempe, USA, October 30-November 2, 2018,
   2018.
6. Boris Konev, Carsten Lutz, Ana Ozaki, and Frank Wolter. Exact learning of
   lightweight description logic ontologies. Journal of Machine Learning Research,
   18(201):1–63, 2018.
7. Jens Lehmann. DL-Learner: Learning concepts in description logics. Journal of
   Machine Learning Research, 10:2639–2642, 2009.
8. Oxford. Information systems group ontologies. Retrieved from https://www.cs.ox.
   ac.uk/isg/ontologies/. Accessed: 18 May 2018.