Actively Learning ELI Queries
                   under DL-Lite Ontologies

              Maurice Funk1 , Jean Christoph Jung2 , and Carsten Lutz1
          1
           Department of Computer Science, University of Bremen, Germany
      2
          Department of Computer Science, University of Hildesheim, Germany


      Abstract. We show that ELI queries (ELIQs) are learnable in polyno-
      mial time in the presence of a DL-Lite ontology O, in Angluin’s framework
      of active learning. When initially provided with a conjunctive query (CQ)
      that implies the target ELIQ under O (in the sense of query contain-
      ment), it suffices for the learner to only pose membership queries to the
      oracle, but no equivalence queries. The initial CQ can be obtained by a
      single equivalence query and is available ‘for free’ in case that O does not
      pose any disjointness constraints on concepts. Our main technical result
      is that every ELI concept has only polynomially many most specific
      subsumers w.r.t. a DL-Lite ontology, generalizing a recent result about
      homomorphism frontiers by ten Cate and Dalmau.


1   Introduction
Constructing description logic (DL) concepts, ontologies, and queries can be
challenging and costly, especially when logic expertise and domain knowledge
are not in the same hands. This has prompted many approaches to learning such
objects, including PAC learning [12,13,14], the construction of the least common
subsumer (LCS) and the most specific concept (MSC) [4,6,7,19,28], and learning
from data examples [16,18,22,23,27]. In recent years, there has been significant
interest in applying Angluin’s framework of exact learning in a DL context where
a learner interacts in a game-like fashion with an oracle [1,2]. In particular, the
learner may be a DL expert and the oracle a collaborating domain expert. The
main aim is then to find an algorithm that enables the learner to construct the
target object in polynomial time based on queries that it poses to the oracle,
even when the oracle is not able to answer the queries in the most informative
way.
    The interest in exact learning in DLs started with an investigation of ontology
learning in (the conference version of) [21], see also [20,26] and the survey [25].
This was recently complemented by studies of exactly learning DL concepts and
queries: active learning of ELI concept queries (ELIQs) without ontologies is
considered in [11] while [15] studies active learning of EL concept queries (ELQs),
ELIQs, and restricted forms of conjunctive queries (CQs) in the presence of EL
 Copyright © 2021 for this paper by its authors. Use permitted under Creative
  Commons License Attribution 4.0 International (CC BY 4.0).
and ELI ontologies. The purpose of the current paper is to initiate the study of
actively learning concepts and queries under ontologies formulated in DL-Lite,
a prominent family of DLs that is featured in the OWL 2 family of ontology
languages [3]. Our main result is that ELIQs, which can be viewed both as
queries and as concepts to be used in an ontology, can be learned under DL-Lite
ontologies in polynomial time even when the oracle can pose only a very basic
kind of query to the oracle. To make this precise, we introduce the exact learning
framework in more detail.
     Learner and oracle both know and agree on the ontology O and the concept
 and role names that are available for constructing the target ELIQ qT which must
 be satisfiable w.r.t. O; we assume that this includes all concept and role names
 in O. In a membership query, the learner provides an ABox A and a candidate
 answer ā and asks whether A, O |= qT (ā); the oracle faithfully answers “yes” or
“no”. In an equivalence query, the learner provides a hypothesis ELIQ qH and asks
whether qH is equivalent to qT under O; the oracle answers “yes” or provides a
 counterexample, that is, an ABox A and tuple ā such that A, O |= qT (ā) and
 A, O 6|= qH (ā) (positive counterexample) or vice versa (negative counterexample).
 One is then interested in polynomial time learnability, that is, whether there is a
 learning algorithm that constructs qT (x̄), up to equivalence w.r.t. O, such that
 at any given time, the running time of the algorithm is bounded by a polynomial
 in the sizes of qT , of O, and of the largest counterexample given by the oracle so
 far. A weaker requirement is polynomial query learnability where only the sum of
 the sizes of the queries posed to the oracle up to the current time point has to
 be bounded by such a polynomial.
    We can now state our main result more precisely. With DL-Lite, we generally
refer to the basic member of the DL-Lite family [10] that admits inclusions between
basic concepts, concept disjointness constraints, and role disjointness constraints;
DL-Lite− then means the fragment without concept disjointness. Our main result
is that ELIQs are polynomial time learnable using only membership queries under
DL-Lite − ontologies, and that the same is true for DL-Lite ontologies provided
                                      0               0
that we have available an initial CQ qH  such that qH   ⊆O qT , that is, the answers
     0
to qH   w.r.t. O are a subset of those to qT w.r.t. O on every ABox A. Such a
  0
qH  can be obtained by a single initial equivalence query. We also observe that
polynomial learnability using only membership queries fails in the presence of
concept disjointness.
    Let us mention two interesting perspectives on our results. First, they gen-
eralize the results in [11] about polynomial time learnability of ELIQs to the
case with DL-Lite ontologies, in fact borrowing and extending crucial techniques
from [11]. And second, the results in [15] demonstrate that inverse roles pose a
significant challenge to polynomial time learnability. More precisely, [15] brings
forward a polynomial time learning algorithm for symmetry-free ELIQs under EL
ontologies where symmetry-free means that there is no subconcept of the form
∃r.(C u ∃r− .D) with r a role name. It is not clear at all how to generalize that
algorithm to unrestricted ELIQs. Moreover, it is proved in [15] that ELQs are
not polynomial query learnable under ELI ontologies. Thus, inverse roles tend
to be challenging both in the query and in the ontology. In contrast, the result
in this paper need not impose any restriction on the use of inverse roles. It seems
relevant to recall here that DL-Lite is a fragment of ELI.
    A core technical result underlying our approach is that the frontier of an
ELIQ q w.r.t. a DL-Lite − ontology is only of polynomial size and can be computed
in polynomial time, generalizing a similar result from [11] that does not encompass
ontologies. More precisely, a frontier of an ELIQ q w.r.t. a DL-Lite ontology
O is a set of ELIQs F such that q ⊆O qF and qF 6⊆O q for all qF ∈ F and for
all ELIQs q 0 with q ⊆O q 0 and q 0 6⊆O q, there is a qF ∈ F such that qF ⊆O q 0 .
Note that if one thinks of q as an ELI concept, then F is the set of most specific
subsumers of q w.r.t. O.1 Apart from being essential for our learning algorithm,
there is another reason for why one may be interested in the frontier. In fact, it
is observed in [11] that if an ELIQ q has a frontier of polynomial size, then q can
be characterized up to equivalence by polynomially many data examples. Such
an example takes the form (A, a) and is a positive example if A |= q(a) and a
negative example otherwise. The same is true in the presence of ontologies.
    Proof details are given in the appendix of the long version of this paper,
available at http://www.informatik.uni-bremen.de/tdki/research/papers.
html.


2    Preliminaries

Ontologies and ABoxes. Let NC , NR , and NI be countably infinite sets of
concept, role and individual names. A role R is a role name r or the inverse r− of
a role name. A basic concept B is >, a concept name A, or of the form ∃R, R a
role. A DL-Lite ontology O is a finite set of (basic) concept inclusions B1 v B2 ,
concept disjointness constraints B1 u B2 v ⊥, and role disjointness constraints
R1 u R2 v ⊥. A DL-Lite− ontology is a DL-Lite ontology that contains no
concept disjointness constraints. A DL-Lite ontology is in normal form if all
concept inclusions in it are of the form A v B or B v A with A a concept name
or > and B a basic concept. An ABox A is a finite set of concept assertions A(a)
and role assertions r(a, b) with A a concept name or >, r a role name, and a, b
individual names. We use ind(A) to denote the set of individual names used in A.
     The semantics is defined as usual in terms of interpretations I, which we
define to be a (possibly infinite and) non-empty set of concept and role assertions.
We use ∆I to denote the set of individual names in I, define AI = {a | A(a) ∈ I}
for all A ∈ NC , and rI = {(a, b) | r(a, b) ∈ I} and (r− )I = {(b, a) | r(a, b) ∈ I}
for all r ∈ NR . We further set >I = ∆I and (∃R)I = {a ∈ ∆I | ∃(a, b) ∈ RI } for
all roles R. This definition of interpretation is slightly different from the usual one,
but equivalent;2 its virtue is uniformity as every ABox is a finite interpretation.
An interpretation I satisfies a concept inclusion B1 v B2 if B1I ⊆ B2I , a concept
1
  One could equivalently say that F is the set of LCSs of a single concept which strictly
  generalize that concept.
2
  This depends on admitting assertions >(a) in ABoxes.
disjointness constraint B1 u B2 v ⊥ if B1I ∩ B2I = ∅, and a role disjointness
constraint R1 u R2 v ⊥ if R1I ∩ R2I = ∅.
    An interpretation is a model of a DL-Lite ontology or an ABox if it satisfies
all concept inclusions, disjointness constraints and assertions in it. We write
O |= B1 v B2 if every model of O satisfies the basic concept inclusion B1 v B2
and A, O |= B(a) if every model of A and O satisfies the concept assertion B(a).
An ABox A is satisfiable w.r.t. a DL-Lite ontology O if A and O have a common
model.
    A signature is a set of concept and role names, uniformly referred to as
symbols. For any syntactic object O such as an ontology or an ABox, we use
sig(O) to denote the symbols used in O and ||O|| to denote the size of O, that is,
the length of a representation of O as a word in a suitable alphabet.

Conjunctive Queries, ELIQs, Homomorphisms. A conjunctive query (CQ)
takes the form q(x̄) ← φ(x̄, ȳ) where φ is a conjunction of concept atoms A(x),
with A ∈ NC , and role atoms r(x, y), with r ∈ NR over variables x, y ∈ x̄ ∪ ȳ. We
may write r− (x, y) in place of r(y, x). We refer to the variables in x̄ as answer
variables and to the variables in ȳ as quantified variables. We use var(q) to denote
the set of all variables in x̄ and ȳ. We may view a CQ q(x̄) as a set of atoms
when convenient and write r(x, y) ∈ q(x̄) to mean that r(x, y) occurs in the
conjunction φ.
    A conjunctive query q(x̄) is unary if q(x̄) only has a single answer variable. A
cycle in a CQ q is a sequence R1 (x1 , x2 ), . . . , Rn (xn , x1 ) of distinct role atoms in q
such that x1 , . . . xn are distinct. An ELIQ is a unary CQ q that does not contain
a cycle and such that the undirected graph Gq = (var(q), {{y, z} | r(y, z) ∈
q}) is connected. Note that every ELIQ can be seen as an ELI concept in a
straightforward way and vice versa; see [5] for a definition of ELI concepts. We
use Aq to denote the ABox obtained from q by viewing variables as individuals
and atoms as assertions. A CQ q is satisfiable w.r.t. a DL-Lite ontology O if Aq
is satisfiable w.r.t. O.
    A homomorphism h from interpretation I1 to interpretation I2 is a mapping
from ∆I1 to ∆I2 such that d ∈ AI1 implies h(d) ∈ AI2 and (d, e) ∈ rI1 implies
(h(d), h(e)) ∈ rI2 . We use img(h) to denote the set {e ∈ ∆I2 | ∃d ∈ ∆I1 : h(d) =
e}. For di ∈ ∆Ii , i ∈ {1, 2}, we write I1 , d1 → I2 , d2 if there is a homomorphism
h from I1 to I2 with h(d1 ) = d2 . With a homomorphism from a CQ q to an
interpretation I, we mean a homomorphism from Aq to I. For a unary CQ q(x),
we write q(x) → (I, d) if there is a homomorphism h from q to I with h(x) = d.
Let q(x) be a unary CQ and I an interpretation. An element d ∈ ∆I is an answer
to q in I, written I |= q(d), if q(x) → (I, d). Now let O be a DL-Lite ontology
and A an ABox. An individual a ∈ ind(A) is an answer to q on A w.r.t. O,
written A, O |= q(a), if a is an answer to q in every model of O and A.
    For q1 and q2 unary CQs and O a DL-Lite ontology, we say that q1 is contained
in q2 w.r.t. O, written q1 ⊆O q2 if for all ABoxes A and a ∈ ind(A), A, O |= q1 (a)
implies A, O |= q2 (a). We call q1 and q2 equivalent w.r.t. O, written q1 ≡O q2 , if
q1 ⊆O q2 and q2 ⊆O q1 .
O-saturatedness and O-minimality. The following two technical notions are
used throughout this paper. A CQ q is O-saturated, with O a DL-Lite ontology,
if Aq , O |= A(y) implies A(y) ∈ q for all y ∈ var(q) and A ∈ NC . It is O-minimal
if there is no S ( var(q) such that q ≡O q|S with q|S the restriction of q to the
atoms that only contain variables in S.
    The following is a consequence of the fact that acyclic CQs over DL-Lite
ontologies can be answered in polynomial time [8].
Lemma 1. Given an ELIQ q and a DL-Lite ontology O, we can find in polyno-
mial time an O-saturated and O-minimal ELIQ q 0 with q ≡O q 0 .

Universal Model. Let O be a DL-Lite ontology and A an ABox that is
satisfiable w.r.t. O. A trace for A and O is a sequence t = aR1 . . . Rn , n ≥ 0, such
that a ∈ ind(A), the basic concepts ∃R1 , . . . , ∃Rn occur in O, A, O |= ∃R1 (a),
and O |= ∃Ri− v ∃Ri+1 for 1 ≤ i < n. Let T denote the set of all traces for A
and O. Then the universal model of A and O is

      UA,O = A ∪ {A(a) | A, O |= A(a)} ∪
              {A(tR) | tR ∈ T and O |= ∃R− v A} ∪ {R(t, tR) | tR ∈ T}.

For brevity, we write Uq,O instead of UAq ,O for any conjunctive query q.


3   Computing Frontiers in Polynomial Time
We show that for every ELIQ q and DL-Lite ontology O such that q is satisfiable
w.r.t. O, there is a frontier of polynomial size that can be computed in polynomial
time. This generalizes a result from [11] for the case without ontologies. We also
observe that the same is not true when DL-Lite is extended with conjunction,
and that ELIQs can be characterized up to equivalence by polynomially many
data examples in the presence of DL-Lite ontologies.
Definition 1. A frontier of an ELIQ q w.r.t. a DL-Lite ontology O is a finite
set of ELIQs F such that
 1. q ⊆O qF for all qF ∈ F;
 2. qF 6⊆O q for all qF ∈ F;
 3. for all ELIQs q 0 with q ⊆O q 0 6⊆O q, there is a qF ∈ F with qF ⊆O q 0 .
It is not hard to see that frontiers that are minimal w.r.t. set inclusion are unique
up to equivalence of the ELIQs in them, that is, if F1 and F2 are minimal frontiers
of q w.r.t. O, then for every qF ∈ F1 , there is a qF0 ∈ F2 such that qF ≡O qF0 ,
and vice versa. The following is the main result of this section.
Theorem 1. Let O be a DL-Lite ontology and q an ELIQ that is satisfiable
w.r.t. O. Then a frontier of q w.r.t. O can be computed in polynomial time.
For proving Theorem 1, we first observe that we can concentrate on ontologies
that are in normal form.
Lemma 2. For every DL-Lite ontology O, we can construct in polynomial time
a DL-Lite ontology O0 in normal form such that for every ELIQ q, a frontier of
q w.r.t. O can be constructed in polynomial time given a frontier of q w.r.t. O0 .
     The normalization of O introduces fresh concept names X∃R that represent
the basic concept ∃R. In the proof of Lemma 2, we construct the frontier of q
w.r.t. O0 by replacing atoms X∃R (x) with atoms R(x, y), y a fresh variable.
     Now we start with the proof of Theorem 1. Let O and q(x) be as in the
formulation of the theorem, O in normal form. By Lemma 1 and the fact that
equivalent queries have the same frontiers, we can assume that q is O-saturated
and O-minimal. To construct a frontier of q w.r.t. O, we consider all ways to
weaken q in a minimal way where weakening means to construct from q an ELIQ
q 0 such that q ⊆O q 0 and q 0 6⊆O q.
     We start with some notation. We view the answer variable x of q as the root
of the undirected tree Gq , thus imposing a direction on this tree which allows
us to use notions for directed trees, such as successor, predecessor, and leaf, for
the variables in q. Note that the imposed direction is unrelated to the direction
of (inverse) roles in atoms in q. For every z ∈ var(q), we use qz to denote the
ELIQ obtained from q by taking the subtree of Gq rooted at z and making z the
answer variable. For each variable y ∈ var(q), we define a set Γy of atoms in q
that mention y and represent options that we have for weakening q. Formally, Γy
contains
 – all role atoms R(y, z) ∈ qy and
 – all concept atoms A(y) ∈ q such that
    (i) there is no B(y) ∈ q with O |= B v A and O 6|= A v B and
   (ii) there is no R(y, z) ∈ q with O |= ∃R v A.
Informally, we can weaken q by choosing a y ∈ var(q) and then removing a concept
atom A(y) ∈ Γy or a role atom R(y, z) ∈ Γy as well as the subtree of q rooted at
variable z. However, such removals alone are not enough to obtain a minimal
weakening of q and must be accompanied by certain additions, as detailed below.
Note that Conditions (i) and (ii) are needed to ensure that removing A(y) indeed
weakens q, that is, the resulting query is not equivalent to q w.r.t. O.
    We define a set of ELIQs Fq (y) for every variable y ∈ var(q), by induction
on the codepth of y in q. The set Fq (x) ultimately obtained is a frontier of q
w.r.t. O. For every y ∈ var(q), set

                             Fq (y) = {q α (y) | α ∈ Γy }

where q α (y) is constructed by starting with qy (y) and then doing the following:
 1. if α = A(y), remove all atoms B(y) with O |= A ≡ B (including α);
 2. if α = R(y, z), remove α and all atoms of qz . For each q β (z) ∈ Fq (z), add
    a disjoint copy q̃ β of q β and the role atom R(y, z̃) where z̃ is the copy of z
    in q̃ β ;
 3. for each S(y, z1 ) ∈ qy with S(y, z1 ) 6= α, add a disjoint copy q 0 of q and the
    role atom S(y 0 , z1 ) where y 0 is the copy of y in q 0 ;
                                                                            q


                ŷ                                     q                ŷ 0                   q

                     R3                                                         R3

                y                                    y0                     y                 y0
           R1             R2                                  R1                         R2
                     R                            R1                    R            R             R2

     z1         z                                                       ...
                                 yR2                 z1         z0                       z0   yR2


                                                                        ...
     qz1        qz                                  qz1            β1
                                                               q                     q βn


                               Fig. 1. Construction of q R(y,z) (y) from q


 4. for each S(y, yS) ∈ Uqy ,O with yS a trace such that O 6|= ∃S v A if α = A(y),
    add a disjoint copy q 0 of q and the role atoms S(y, z1 ), S(y 0 , z1 ), where y 0 is
    the copy of y in q 0 and z1 is a fresh variable name;
 5. if there is a S(ŷ, y) ∈ q with ŷ the predecessor of y, then add a disjoint copy
    q 0 of q and the role atom S(ŷ 0 , y) where ŷ 0 is the copy of ŷ in q 0 .

Note that Step 2 is the inductive step, where every subtree rooted at a successor
z of y is replaced with all ELIQs from Fq (z). The construction is illustrated
in Figure 1 which on the left side shows a variable y in q with predecessor ŷ,
two successors z1 and z in q, and one additional successor yR2 (a trace) in the
universal model Uq,O . On the right side, it displays the ELIQ q R(y,z) (y) ∈ Fq (y),
assuming that Fq (z) = {q β1 , . . . , q βn }. We remark that for y =
                                                                    6 x, the set Fq (y) is
not necessarily a frontier of the ELIQ qy (y) because the part of q that is outside
of subtree qy is taken into account in the construction of Fq (y), in Steps 3-5. Our
construction generalizes the construction of frontiers of ELIQs without ontologies
given in [11].3 It indeed yields a frontier.

Lemma 3. Fq (x) is a frontier of q(x) w.r.t. O.

   We next observe that the obtained frontier Fq (x) is of polynomial size. It is
then clear that it can be computed in polynomial time as described above since
subsumption between basic concepts in DL-Lite can be decided in polynomial
time [10].
3
    There is actually a small omission in [11] as the counterpart of our Step 5 is missing.
                 X
Lemma 4.                      |var(q α )| ≤ |sig(q)| · |var(q)|3 · (|var(q)| + 1) · (||O|| + 1).
             q α (x)∈Fq (x)

    We next observe that adding conjunction to DL-Lite destroys polynomial
frontiers and thus Theorem 1 does not extend to DL-Lite horn ontologies [3]. In
fact, this already holds for very simple queries and ontologies, implying that
also for other DLs that support conjunction such as EL, polynomial frontiers
are elusive. A conjunction of atomic queries (AQ∧ ) is a unary CQ of the form
q(x) ← A1 (x) ∧ · · · ∧ An (x) and a conjunctive ontology is a set of CIs of the form
A1 u · · · u An v A where A1 , . . . , An and A are concept names.
Theorem 2. There are families of AQ∧ s q1 , q2 , . . . and conjunctive ontologies
O1 , O2 , . . . such that for all n ≥ 1, any frontier of qn w.r.t. On has size at
least 2n .
    The proof is a variation of a proof given in [15] showing that AQ∧ s are
not polynomial time learnable under conjunctive ontologies. It is based on the
following AQ∧ s and ontologies:

           qn (x) ← A1 (x) ∧ A01 (x) ∧ · · · ∧ An (x) ∧ A0n (x)
              On = {Ai u A0i v A1 u A01 u · · · u An u A0n | 1 ≤ i ≤ n}.

In fact, the minimal frontier contains all AQ∧ s that contain exactly one of the
conjuncts Ai and A0i , for 1 ≤ i ≤ n. Observe that in the proof of Theorem 1,
there are only polynomially many choices for weakening an ELIQ, represented
by the sets Γy , y ∈ var(q). In contrast, weakening the AQ∧ qn w.r.t. ontology On
in a minimal way requires to choose for each i ∈ {1, . . . , n} whether Ai or A0i
should be removed, and there are exponentially many such choices.
    To close this section, we briefly consider the unique characterization of ELIQs
in terms of polynomially many data examples. A data example takes the form
(A, a) where A is an ABox and a ∈ ind(A). Let E + , E − be finite sets of data
examples. An ELIQ q fits (E + , E − ) w.r.t. a DL-Lite ontology O if (A, a) ∈ E +
implies A, O |= q(a) and (A, a) ∈ E − implies A, O 6|= q(a). We say that (E + , E − )
uniquely characterizes q w.r.t. O if q fits (E + , E − ) and every ELIQ q 0 that also
fits (E + , E − ) satisfies q ≡O q 0 . The following is a consequence of Theorem 1.
Theorem 3. For every DL-Lite ontology O and every ELIQ q that is satisfiable
w.r.t. O, we can compute in polynomial time data examples (E + , E − ) that
uniquely characterize q w.r.t. O.
    Note that unique characterizability is closely related to the reverse engineering
of CQs, also called query-by-example and studied in a DL context in [17,24].


4    Learning ELIQs
We use the results from the previous section to show that ELIQs are polynomial
time learnable using only membership queries under DL-Lite ontologies if the
                                         0             0
learner is provided with an initial CQ qH  such that qH  is satisfiable w.r.t. the
                     0
ontology O and qH        ⊆O qT where qT is the target ELIQ. Such a q can be
constructed in polynomial time if O is formulated in DL-Lite − . Otherwise, it
can be produced by a single initial equivalence query with an ELIQ that is not
satisfiable w.r.t. O, forcing the learner to provide a positive counterexample (A, a)
                                             0
from which we can extract the desired qH       . Before proving these positive results,
however, we first observe that polynomial time learning using only membership
queries (but no initial equivalence query) is not possible when O contains concept
disjointness constraints.
    A disjointness ontology is a DL-Lite ontology that only consists of concept
disjointness constraints.
Theorem 4. AQ∧ s are not polynomial query learnable under disjointness on-
tologies using only membership queries.
   The proof of Theorem 4 is a variation of that of Theorem 2. We next present
the main results of this section.
Theorem 5.
 1. ELIQs are polynomial time learnable under DL-Lite ontologies using only
    membership queries and a single initial equivalence query.
 2. ELIQs are polynomial time learnable under DL-Lite− ontologies using only
    membership queries.
Throughout this section, we may assume the ontology to be in normal form.
Lemma 5. If ELIQs are polynomial time learnable under DL-Lite ontologies
in normal form using membership queries and a single initial equivalence query,
then this is also true for unrestricted DL-Lite ontologies. The same holds for
DL-Lite− ontologies without the initial equivalence query.
    The idea to prove Lemma 5 is to convert the given ontology into normal form
and then run the learning algorithm for ontologies in normal form. Since that
algorithm may pose membership queries and equivalence queries that involve
fresh concept names introduced during normalization, we need to replace those
concept names as described after Lemma 2 before forwarding the query to the
oracle (which uses the original non-normalized ontology).
    We prove Points 1 and 2 of Theorem 5 simultaneously. Let O be a DL-Lite
ontology and Σ a finite signature that contains all symbols in O, both known to
the learner and the oracle. Further let qT (y) be the target ELIQ known to the
oracle, formulated in signature Σ and satisfiable w.r.t. O. The algorithm that
enables the learner to learn qT in polynomial time is displayed as Algorithm 1. It
                       0                                              0
takes as input a CQ qH    that is satisfiable w.r.t. O and satisfies qH ⊆O qT . Note
       0
that qH need not be an ELIQ. The algorithm then constructs and repeatedly
updates a hypothesis ELIQ qH while maintaining that qH ⊆O qT . The initial call
                                                  0
to subroutine treeify yields an ELIQ qH with qH      ⊆O qH ⊆O qT to be used as the
first hypothesis. The algorithm then iteratively generalizes qH by constructing the
frontier FqH of qH w.r.t. O in polynomial time and choosing from it a new ELIQ
qH with qH ⊆O qT . Additionally, the algorithm applies the minimize subroutine
Algorithm 1 Algorithm for learning ELIQs under DL-Lite ontologies
                                       0                                 0
  Input A DL-Lite ontology O and a CQ qH satisfiable w.r.t. O such that qH ⊆O qT
  Output An ELIQ qH such that qH ≡O qT
                 0
  qH := treeify(qH )
  while there is a qF ∈ FqH with qF ⊆O qT do
     qH := minimize(qF )
  end while
  return qH


to ensure that the new qH is O-minimal and to avoid an excessive blowup while
iterating in the while loop.
    Before we detail the subroutines treeify and minimize, we explain how to
                        0
obtain the argument qH    to the algorithm. Suppose first that O contains neither
concept disjointness constraints nor role disjointness constraints. Then
              0
             qH (x) = {A(x) | A ∈ Σ ∩ NC } ∪ {r(x, x) | r ∈ Σ ∩ NR }.                (1)

If O contains concept or role disjointness constraints, then we cannot use the
             0
above qH        because it is not satisfiable w.r.t. O. If, however, O contains only role
                                                                     0
disjointness constraints, then we can still find a suitable qH         . Let R be the set
of all r ∈ Σ ∩ NR such that ∃r is satisfiable w.r.t. O, introduce four variables
x0r , x1r , x0r− , x1r− for all r ∈ R, and fix a linear order  on R0 = R ∪ {r− | r ∈ R}.
Fix any variable x := xiR . Then, qH       0
                                              is given by
               0
              qH (x) = {A(xiR ) | A ∈ Σ ∩ NC , R ∈ R0 , i ∈ {0, 1}} ∪
                        {R(xiS , xiR ) | R, S ∈ R0 , S  R, i ∈ {0, 1}} ∪
                                                0
                        {R(xiS , x1−i
                                  R ) | R, S ∈ R , S 6 R, i ∈ {0, 1}}.

Observe that every variable has an R-successor for every (satisfiable) role R.
Therefore, there is a homomorphism from every satisfiable target ELIQ qT to
 0                              0
qH , which shows that indeed qH    ⊆O q T .
    In the remaining case when O contains a concept disjointness constraint
A u B v ⊥, we pose the ELIQ A(x) ∧ B(x) as an equivalence query to the
oracle. Since the target query is satisfiable w.r.t. O, the oracle returns a positive
                                             0
counterexample (A, a). The desired query qH    is (A, a) viewed as a CQ with answer
                                                               0
variable a. In the algorithm, we may w.l.o.g. assume that qH      is O-saturated due
to Lemma 1.

The minimize subroutine. The subroutine takes as input a unary CQ q(x)
that is O-saturated, satisfiable w.r.t. O, and satisfies q ⊆O qT . It computes an
O-minimal unary CQ q 0 with q ⊆O q 0 ⊆O qT using membership queries. This is
done by exhaustively applying the following operation:
Remove successor. Choose a role atom r(x1 , x2 ) ∈ q and let q − be the restriction
of q \ {r(x1 , x2 )} to the atoms that only contain variables which are reachable
from x in Gq\{r(x1 ,x2 )} . Pose the membership query Aq− , O |= qT (x). If the
response is positive, continue with q − in place of q.
   This operation preserves O-saturation and satisfiability w.r.t. O. Since the
operation is applied at most once to each atom in q, the running time and
number of membership queries is polynomial in |var(q)| + |Σ|. The following
lemma summarizes important properties of q 0 = minimize(q) that we need to
show correctness and polynomial running time of Algorithm 1.

Lemma 6. Let q be a unary CQ that is O-saturated and satisfiable w.r.t. O such
that q ⊆ qT for the target query qT (y), and let q 0 = minimize(q). Then

 1. q ⊆O q 0 and q 0 ⊆O qT ;
 2. var(q 0 ) ⊆ img(h) for every homomorphism h from qT to Uq0 ,O with h(y) = x
    (and consequently |var(q 0 )| ≤ |var(qT )|);
 3. q 0 is O-minimal.

   When q is an ELIQ, minimize(q) is also an ELIQ. We define minimize on
unrestricted CQs since it is applied to non-ELIQs as part of the treeify subroutine,
described next.

The treeify subroutine. The subroutine takes as input a unary CQ q(x) that
is O-saturated, satisfiable w.r.t. O, and satisfies q ⊆O qT . It computes an ELIQ
q 0 = treeify(q) with q ⊆O q 0 ⊆O qT by repeatedly increasing the length of
cycles in q and posing membership queries; similar constructions are used in
in [21,15]. Formally, treeify constructs a sequence of CQs p1 , p2 , . . . starting with
p1 = minimize(q) and then taking pi+1 = minimize(p0i ) where p0i is obtained from
pi by doubling the length of every cycle. More precisely, p0i is the result of the
following operation.
Double cycle. Choose a cycle R1 (x1 , x2 ), . . . , Rn (xn , x1 ) in pi and let p0i be the
CQ obtained as follows: start with pi , introduce copies x01 , . . . , x0n of x1 , . . . , xn ,
and

 – remove all atoms R(xn , x1 );
 – add A(x0j ) if A(xj ) ∈ pi with 1 ≤ j ≤ n;
 – add R(x0j , z) if R(xj , z) ∈ pi with 1 ≤ j ≤ n and z ∈   / {x1 , . . . , xn };
 – add R(x0j , x0k ) if R(xj , xk ) ∈ pi with 1 ≤ j, k ≤ n and {j, k} =   6 {1, n};
 – add R(xn , x01 ) and R(x0n , x1 ) if R(xn , x1 ) ∈ pi .

    Once pi contains no more cycles, treeify stops and returns pi . The next lemma
states the central properties of this construction.

Lemma 7. For all i ≥ 1,

 1. pi is O-saturated and satisfiable w.r.t. O;
 2. pi ⊆O qT ;
 3. pi ⊆O pi+1 and |var(pi+1 )| > |var(pi )|.
   Point 3 of Lemma 7 and the ‘consequently’ part of Point 2 of Lemma 6
imply that treeify terminates and thus eliminates all cycles in q while maintaining
q ⊆O qT . The next lemma makes this precise.

Lemma 8. Let q be a CQ that is O-saturated, satisfiable w.r.t. O, and satisfies
q ⊆O qT . Then q 0 = treeify(q) is an ELIQ that can be computed in time polynomial
in |var(qT )| + ||q|| using membership queries.

    Returning to Algorithm 1, let q1 , q2 , . . . be the sequence of ELIQs that are
assigned to qH during a run of the learning algorithm. Using the properties of
frontiers, minimize, and treeify we can now show that the hypotheses approximate
the target query in an increasingly closer way.

Lemma 9. For all i ≥ 1,
 1. qi ⊆O qT ;
 2. qi ⊆O qi+1 and qi+1 6⊆O qi ;
 3. var(qi ) ⊆ img(h) for every homomorphism h from qi+1 to Uqi ,O with h(x) = x.

    Point 3 of Lemma 9 implies that |var(qi+1 )| ≥ |var(qi )|, and this can be used
to show that the while loop in Algorithm 1 terminates after a polynomial number
of iterations, arriving at a hypothesis qH ⊆O qT such that there is no qF ∈ FqH
with qF ⊆O qT , that is, qH ≡O qT .
Lemma 10. qn ≡ qT for some n ≤ p(|var(qT )| + |Σ|), p a polynomial.
   It follows from Lemma 10, the ‘consequently’ part of Point 2 of Lemma 6,
and Lemma 8 that Algorithm 1 is a polynomial time learning algorithm, thus
completing the proof of Theorem 5.


5   Outlook
As future work, we are going to consider extensions of the setup studied in this
paper. We are optimistic that the approach presented here can be extended
to DL-Lite ontologies with role inclusions, yielding polynomial size frontiers
and polynomial query learnability also for that logic (but not polynomial time
learnability since subsumption becomes NP-complete). Natural next steps could
then be to replace DL-Lite with linear tuple-generating dependencies (TGDs) [9]
or with DL-Lite krom ontologies [3]. In contrast, it is clear from Theorem 2 and
the results in [15] that our results do not extend to DL-Lite horn . It is not ruled
out, however, that ELIQs can be learned in polynomial time w.r.t. DL-Lite horn
ontologies when both membership and equivalence queries can be used. Another
natural question is whether CQs can be learned in polynomial time in the
presence of DL-Lite ontologies. It is known that this is not possible with only
membership queries even without ontologies [11], but it might be possible with
both membership and equivalence queries.

Acknowledgement. Carsten Lutz was supported by DFG CRC 1320 EASE.
References
 1. Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput.
    75(2), 87–106 (1987)
 2. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1987)
 3. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family
    and relations. J. of Artifical Intelligence Research 36, 1–69 (2009)
 4. Baader, F.: Least common subsumers and most specific concepts in a description
    logic with existential restrictions and terminological cycles. In: Proc. of IJCAI. pp.
    319–324. Morgan Kaufmann (2003)
 5. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description
    Logics. Cambride University Press (2017)
 6. Baader, F., Küsters, R., Molitor, R.: Computing least common subsumers in
    description logics with existential restrictions. In: Proc. of IJCAI. pp. 96–103.
    Morgan Kaufmann (1999)
 7. Baader, F., Sertkaya, B., Turhan, A.: Computing the least common subsumer w.r.t.
    a background terminology. J. Appl. Log. 5(3), 392–420 (2007)
 8. Bienvenu, M., Ortiz, M., Simkus, M., Xiao, G.: Tractable queries for lightweight
    description logics. In: Proc. of IJCAI. pp. 768–774 (2013)
 9. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for
    tractable query answering over ontologies. J. Web Semant. 14, 57–83 (2012)
10. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
    reasoning and efficient query answering in description logics: The DL-Lite family. J.
    of Automated Reasoning 39(3), 385–429 (2007)
11. ten Cate, B., Dalmau, V.: Conjunctive queries: Unique characterizations and exact
    learnability. In: Proc. of ICDT. LIPIcs, vol. 186, pp. 9:1–9:24. Schloss Dagstuhl -
    Leibniz-Zentrum für Informatik (2021)
12. Cohen, W.W., Hirsh, H.: The learnability of description logics with equality con-
    straints. Mach. Learn. 17(2-3), 169–199 (1994)
13. Cohen, W.W., Hirsh, H.: Learning the classic description logic: Theoretical and
    experimental results. In: Proc. of KR. pp. 121–133. Morgan Kaufmann (1994)
14. Frazier, M., Pitt, L.: Classic learning. Mach. Learn. 25(2-3), 151–193 (1996)
15. Funk, M., Jung, J.C., Lutz, C.: Actively learning concept and conjunctive queries
    under ELr -ontologies. In: Proc. of IJCAI (2021)
16. Funk, M., Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Learning description logic
    concepts: When can positive and negative examples be separated? In: Proc. of
    IJCAI. pp. 1682–1688 (2019)
17. Gutiérrez-Basulto, V., Jung, J.C., Sabellek, L.: Reverse engineering queries in
    ontology-enriched systems: The case of expressive Horn description logic ontologies.
    In: Proc. of IJCAI-ECAI. ijcai.org (2018)
18. Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Logical separability of incomplete data
    under ontologies. In: Proc. of KR. pp. 517–528 (2020)
19. Jung, J.C., Lutz, C., Wolter, F.: Least general generalizations in description logic:
    Verification and existence. In: Proc. of AAAI (2020)
20. Konev, B., Lutz, C., Ozaki, A., Wolter, F.: Exact learning of lightweight description
    logic ontologies. J. Mach. Learn. Res. 18(201), 1–63 (2018)
21. Konev, B., Ozaki, A., Wolter, F.: A model for learning description logic ontologies
    based on exact learning. In: Proc. of AAAI. pp. 1008–1015. AAAI Press (2016)
22. Lehmann, J., Hitzler, P.: Concept learning in description logics using refinement
    operators. Mach. Learn. 78, 203–250 (2010)
23. Lehmann, J., Völker, J.: Perspectives on Ontology Learning, Studies on the Semantic
    Web, vol. 18. IOS Press (2014)
24. Ortiz, M.: Ontology-mediated queries from examples: a glimpse at the DL-Lite case.
    In: Proc. of GCAI. pp. 1–14 (2019)
25. Ozaki, A.: Learning description logic ontologies: Five approaches. where do they
    stand? KI - Künstliche Intelligenz (2020)
26. Ozaki, A., Persia, C., Mazzullo, A.: Learning query inseparable ELH ontologies. In:
    Proc. of AAAI. pp. 2959–2966 (2020)
27. Sarker, M.K., Hitzler, P.: Efficient concept induction for description logics. In: Proc.
    of AAAI. pp. 3036–3043 (2019)
28. Zarrieß, B., Turhan, A.: Most specific generalizations w.r.t. general EL-TBoxes. In:
    Proc. of IJCAI. pp. 1191–1197 (2013)