Actively Learning ELI Queries under DL-Lite Ontologies Maurice Funk1 , Jean Christoph Jung2 , and Carsten Lutz1 1 Department of Computer Science, University of Bremen, Germany 2 Department of Computer Science, University of Hildesheim, Germany Abstract. We show that ELI queries (ELIQs) are learnable in polyno- mial time in the presence of a DL-Lite ontology O, in Angluin’s framework of active learning. When initially provided with a conjunctive query (CQ) that implies the target ELIQ under O (in the sense of query contain- ment), it suffices for the learner to only pose membership queries to the oracle, but no equivalence queries. The initial CQ can be obtained by a single equivalence query and is available ‘for free’ in case that O does not pose any disjointness constraints on concepts. Our main technical result is that every ELI concept has only polynomially many most specific subsumers w.r.t. a DL-Lite ontology, generalizing a recent result about homomorphism frontiers by ten Cate and Dalmau. 1 Introduction Constructing description logic (DL) concepts, ontologies, and queries can be challenging and costly, especially when logic expertise and domain knowledge are not in the same hands. This has prompted many approaches to learning such objects, including PAC learning [12,13,14], the construction of the least common subsumer (LCS) and the most specific concept (MSC) [4,6,7,19,28], and learning from data examples [16,18,22,23,27]. In recent years, there has been significant interest in applying Angluin’s framework of exact learning in a DL context where a learner interacts in a game-like fashion with an oracle [1,2]. In particular, the learner may be a DL expert and the oracle a collaborating domain expert. The main aim is then to find an algorithm that enables the learner to construct the target object in polynomial time based on queries that it poses to the oracle, even when the oracle is not able to answer the queries in the most informative way. The interest in exact learning in DLs started with an investigation of ontology learning in (the conference version of) [21], see also [20,26] and the survey [25]. This was recently complemented by studies of exactly learning DL concepts and queries: active learning of ELI concept queries (ELIQs) without ontologies is considered in [11] while [15] studies active learning of EL concept queries (ELQs), ELIQs, and restricted forms of conjunctive queries (CQs) in the presence of EL Copyright © 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). and ELI ontologies. The purpose of the current paper is to initiate the study of actively learning concepts and queries under ontologies formulated in DL-Lite, a prominent family of DLs that is featured in the OWL 2 family of ontology languages [3]. Our main result is that ELIQs, which can be viewed both as queries and as concepts to be used in an ontology, can be learned under DL-Lite ontologies in polynomial time even when the oracle can pose only a very basic kind of query to the oracle. To make this precise, we introduce the exact learning framework in more detail. Learner and oracle both know and agree on the ontology O and the concept and role names that are available for constructing the target ELIQ qT which must be satisfiable w.r.t. O; we assume that this includes all concept and role names in O. In a membership query, the learner provides an ABox A and a candidate answer ā and asks whether A, O |= qT (ā); the oracle faithfully answers “yes” or “no”. In an equivalence query, the learner provides a hypothesis ELIQ qH and asks whether qH is equivalent to qT under O; the oracle answers “yes” or provides a counterexample, that is, an ABox A and tuple ā such that A, O |= qT (ā) and A, O 6|= qH (ā) (positive counterexample) or vice versa (negative counterexample). One is then interested in polynomial time learnability, that is, whether there is a learning algorithm that constructs qT (x̄), up to equivalence w.r.t. O, such that at any given time, the running time of the algorithm is bounded by a polynomial in the sizes of qT , of O, and of the largest counterexample given by the oracle so far. A weaker requirement is polynomial query learnability where only the sum of the sizes of the queries posed to the oracle up to the current time point has to be bounded by such a polynomial. We can now state our main result more precisely. With DL-Lite, we generally refer to the basic member of the DL-Lite family [10] that admits inclusions between basic concepts, concept disjointness constraints, and role disjointness constraints; DL-Lite− then means the fragment without concept disjointness. Our main result is that ELIQs are polynomial time learnable using only membership queries under DL-Lite − ontologies, and that the same is true for DL-Lite ontologies provided 0 0 that we have available an initial CQ qH such that qH ⊆O qT , that is, the answers 0 to qH w.r.t. O are a subset of those to qT w.r.t. O on every ABox A. Such a 0 qH can be obtained by a single initial equivalence query. We also observe that polynomial learnability using only membership queries fails in the presence of concept disjointness. Let us mention two interesting perspectives on our results. First, they gen- eralize the results in [11] about polynomial time learnability of ELIQs to the case with DL-Lite ontologies, in fact borrowing and extending crucial techniques from [11]. And second, the results in [15] demonstrate that inverse roles pose a significant challenge to polynomial time learnability. More precisely, [15] brings forward a polynomial time learning algorithm for symmetry-free ELIQs under EL ontologies where symmetry-free means that there is no subconcept of the form ∃r.(C u ∃r− .D) with r a role name. It is not clear at all how to generalize that algorithm to unrestricted ELIQs. Moreover, it is proved in [15] that ELQs are not polynomial query learnable under ELI ontologies. Thus, inverse roles tend to be challenging both in the query and in the ontology. In contrast, the result in this paper need not impose any restriction on the use of inverse roles. It seems relevant to recall here that DL-Lite is a fragment of ELI. A core technical result underlying our approach is that the frontier of an ELIQ q w.r.t. a DL-Lite − ontology is only of polynomial size and can be computed in polynomial time, generalizing a similar result from [11] that does not encompass ontologies. More precisely, a frontier of an ELIQ q w.r.t. a DL-Lite ontology O is a set of ELIQs F such that q ⊆O qF and qF 6⊆O q for all qF ∈ F and for all ELIQs q 0 with q ⊆O q 0 and q 0 6⊆O q, there is a qF ∈ F such that qF ⊆O q 0 . Note that if one thinks of q as an ELI concept, then F is the set of most specific subsumers of q w.r.t. O.1 Apart from being essential for our learning algorithm, there is another reason for why one may be interested in the frontier. In fact, it is observed in [11] that if an ELIQ q has a frontier of polynomial size, then q can be characterized up to equivalence by polynomially many data examples. Such an example takes the form (A, a) and is a positive example if A |= q(a) and a negative example otherwise. The same is true in the presence of ontologies. Proof details are given in the appendix of the long version of this paper, available at http://www.informatik.uni-bremen.de/tdki/research/papers. html. 2 Preliminaries Ontologies and ABoxes. Let NC , NR , and NI be countably infinite sets of concept, role and individual names. A role R is a role name r or the inverse r− of a role name. A basic concept B is >, a concept name A, or of the form ∃R, R a role. A DL-Lite ontology O is a finite set of (basic) concept inclusions B1 v B2 , concept disjointness constraints B1 u B2 v ⊥, and role disjointness constraints R1 u R2 v ⊥. A DL-Lite− ontology is a DL-Lite ontology that contains no concept disjointness constraints. A DL-Lite ontology is in normal form if all concept inclusions in it are of the form A v B or B v A with A a concept name or > and B a basic concept. An ABox A is a finite set of concept assertions A(a) and role assertions r(a, b) with A a concept name or >, r a role name, and a, b individual names. We use ind(A) to denote the set of individual names used in A. The semantics is defined as usual in terms of interpretations I, which we define to be a (possibly infinite and) non-empty set of concept and role assertions. We use ∆I to denote the set of individual names in I, define AI = {a | A(a) ∈ I} for all A ∈ NC , and rI = {(a, b) | r(a, b) ∈ I} and (r− )I = {(b, a) | r(a, b) ∈ I} for all r ∈ NR . We further set >I = ∆I and (∃R)I = {a ∈ ∆I | ∃(a, b) ∈ RI } for all roles R. This definition of interpretation is slightly different from the usual one, but equivalent;2 its virtue is uniformity as every ABox is a finite interpretation. An interpretation I satisfies a concept inclusion B1 v B2 if B1I ⊆ B2I , a concept 1 One could equivalently say that F is the set of LCSs of a single concept which strictly generalize that concept. 2 This depends on admitting assertions >(a) in ABoxes. disjointness constraint B1 u B2 v ⊥ if B1I ∩ B2I = ∅, and a role disjointness constraint R1 u R2 v ⊥ if R1I ∩ R2I = ∅. An interpretation is a model of a DL-Lite ontology or an ABox if it satisfies all concept inclusions, disjointness constraints and assertions in it. We write O |= B1 v B2 if every model of O satisfies the basic concept inclusion B1 v B2 and A, O |= B(a) if every model of A and O satisfies the concept assertion B(a). An ABox A is satisfiable w.r.t. a DL-Lite ontology O if A and O have a common model. A signature is a set of concept and role names, uniformly referred to as symbols. For any syntactic object O such as an ontology or an ABox, we use sig(O) to denote the symbols used in O and ||O|| to denote the size of O, that is, the length of a representation of O as a word in a suitable alphabet. Conjunctive Queries, ELIQs, Homomorphisms. A conjunctive query (CQ) takes the form q(x̄) ← φ(x̄, ȳ) where φ is a conjunction of concept atoms A(x), with A ∈ NC , and role atoms r(x, y), with r ∈ NR over variables x, y ∈ x̄ ∪ ȳ. We may write r− (x, y) in place of r(y, x). We refer to the variables in x̄ as answer variables and to the variables in ȳ as quantified variables. We use var(q) to denote the set of all variables in x̄ and ȳ. We may view a CQ q(x̄) as a set of atoms when convenient and write r(x, y) ∈ q(x̄) to mean that r(x, y) occurs in the conjunction φ. A conjunctive query q(x̄) is unary if q(x̄) only has a single answer variable. A cycle in a CQ q is a sequence R1 (x1 , x2 ), . . . , Rn (xn , x1 ) of distinct role atoms in q such that x1 , . . . xn are distinct. An ELIQ is a unary CQ q that does not contain a cycle and such that the undirected graph Gq = (var(q), {{y, z} | r(y, z) ∈ q}) is connected. Note that every ELIQ can be seen as an ELI concept in a straightforward way and vice versa; see [5] for a definition of ELI concepts. We use Aq to denote the ABox obtained from q by viewing variables as individuals and atoms as assertions. A CQ q is satisfiable w.r.t. a DL-Lite ontology O if Aq is satisfiable w.r.t. O. A homomorphism h from interpretation I1 to interpretation I2 is a mapping from ∆I1 to ∆I2 such that d ∈ AI1 implies h(d) ∈ AI2 and (d, e) ∈ rI1 implies (h(d), h(e)) ∈ rI2 . We use img(h) to denote the set {e ∈ ∆I2 | ∃d ∈ ∆I1 : h(d) = e}. For di ∈ ∆Ii , i ∈ {1, 2}, we write I1 , d1 → I2 , d2 if there is a homomorphism h from I1 to I2 with h(d1 ) = d2 . With a homomorphism from a CQ q to an interpretation I, we mean a homomorphism from Aq to I. For a unary CQ q(x), we write q(x) → (I, d) if there is a homomorphism h from q to I with h(x) = d. Let q(x) be a unary CQ and I an interpretation. An element d ∈ ∆I is an answer to q in I, written I |= q(d), if q(x) → (I, d). Now let O be a DL-Lite ontology and A an ABox. An individual a ∈ ind(A) is an answer to q on A w.r.t. O, written A, O |= q(a), if a is an answer to q in every model of O and A. For q1 and q2 unary CQs and O a DL-Lite ontology, we say that q1 is contained in q2 w.r.t. O, written q1 ⊆O q2 if for all ABoxes A and a ∈ ind(A), A, O |= q1 (a) implies A, O |= q2 (a). We call q1 and q2 equivalent w.r.t. O, written q1 ≡O q2 , if q1 ⊆O q2 and q2 ⊆O q1 . O-saturatedness and O-minimality. The following two technical notions are used throughout this paper. A CQ q is O-saturated, with O a DL-Lite ontology, if Aq , O |= A(y) implies A(y) ∈ q for all y ∈ var(q) and A ∈ NC . It is O-minimal if there is no S ( var(q) such that q ≡O q|S with q|S the restriction of q to the atoms that only contain variables in S. The following is a consequence of the fact that acyclic CQs over DL-Lite ontologies can be answered in polynomial time [8]. Lemma 1. Given an ELIQ q and a DL-Lite ontology O, we can find in polyno- mial time an O-saturated and O-minimal ELIQ q 0 with q ≡O q 0 . Universal Model. Let O be a DL-Lite ontology and A an ABox that is satisfiable w.r.t. O. A trace for A and O is a sequence t = aR1 . . . Rn , n ≥ 0, such that a ∈ ind(A), the basic concepts ∃R1 , . . . , ∃Rn occur in O, A, O |= ∃R1 (a), and O |= ∃Ri− v ∃Ri+1 for 1 ≤ i < n. Let T denote the set of all traces for A and O. Then the universal model of A and O is UA,O = A ∪ {A(a) | A, O |= A(a)} ∪ {A(tR) | tR ∈ T and O |= ∃R− v A} ∪ {R(t, tR) | tR ∈ T}. For brevity, we write Uq,O instead of UAq ,O for any conjunctive query q. 3 Computing Frontiers in Polynomial Time We show that for every ELIQ q and DL-Lite ontology O such that q is satisfiable w.r.t. O, there is a frontier of polynomial size that can be computed in polynomial time. This generalizes a result from [11] for the case without ontologies. We also observe that the same is not true when DL-Lite is extended with conjunction, and that ELIQs can be characterized up to equivalence by polynomially many data examples in the presence of DL-Lite ontologies. Definition 1. A frontier of an ELIQ q w.r.t. a DL-Lite ontology O is a finite set of ELIQs F such that 1. q ⊆O qF for all qF ∈ F; 2. qF 6⊆O q for all qF ∈ F; 3. for all ELIQs q 0 with q ⊆O q 0 6⊆O q, there is a qF ∈ F with qF ⊆O q 0 . It is not hard to see that frontiers that are minimal w.r.t. set inclusion are unique up to equivalence of the ELIQs in them, that is, if F1 and F2 are minimal frontiers of q w.r.t. O, then for every qF ∈ F1 , there is a qF0 ∈ F2 such that qF ≡O qF0 , and vice versa. The following is the main result of this section. Theorem 1. Let O be a DL-Lite ontology and q an ELIQ that is satisfiable w.r.t. O. Then a frontier of q w.r.t. O can be computed in polynomial time. For proving Theorem 1, we first observe that we can concentrate on ontologies that are in normal form. Lemma 2. For every DL-Lite ontology O, we can construct in polynomial time a DL-Lite ontology O0 in normal form such that for every ELIQ q, a frontier of q w.r.t. O can be constructed in polynomial time given a frontier of q w.r.t. O0 . The normalization of O introduces fresh concept names X∃R that represent the basic concept ∃R. In the proof of Lemma 2, we construct the frontier of q w.r.t. O0 by replacing atoms X∃R (x) with atoms R(x, y), y a fresh variable. Now we start with the proof of Theorem 1. Let O and q(x) be as in the formulation of the theorem, O in normal form. By Lemma 1 and the fact that equivalent queries have the same frontiers, we can assume that q is O-saturated and O-minimal. To construct a frontier of q w.r.t. O, we consider all ways to weaken q in a minimal way where weakening means to construct from q an ELIQ q 0 such that q ⊆O q 0 and q 0 6⊆O q. We start with some notation. We view the answer variable x of q as the root of the undirected tree Gq , thus imposing a direction on this tree which allows us to use notions for directed trees, such as successor, predecessor, and leaf, for the variables in q. Note that the imposed direction is unrelated to the direction of (inverse) roles in atoms in q. For every z ∈ var(q), we use qz to denote the ELIQ obtained from q by taking the subtree of Gq rooted at z and making z the answer variable. For each variable y ∈ var(q), we define a set Γy of atoms in q that mention y and represent options that we have for weakening q. Formally, Γy contains – all role atoms R(y, z) ∈ qy and – all concept atoms A(y) ∈ q such that (i) there is no B(y) ∈ q with O |= B v A and O 6|= A v B and (ii) there is no R(y, z) ∈ q with O |= ∃R v A. Informally, we can weaken q by choosing a y ∈ var(q) and then removing a concept atom A(y) ∈ Γy or a role atom R(y, z) ∈ Γy as well as the subtree of q rooted at variable z. However, such removals alone are not enough to obtain a minimal weakening of q and must be accompanied by certain additions, as detailed below. Note that Conditions (i) and (ii) are needed to ensure that removing A(y) indeed weakens q, that is, the resulting query is not equivalent to q w.r.t. O. We define a set of ELIQs Fq (y) for every variable y ∈ var(q), by induction on the codepth of y in q. The set Fq (x) ultimately obtained is a frontier of q w.r.t. O. For every y ∈ var(q), set Fq (y) = {q α (y) | α ∈ Γy } where q α (y) is constructed by starting with qy (y) and then doing the following: 1. if α = A(y), remove all atoms B(y) with O |= A ≡ B (including α); 2. if α = R(y, z), remove α and all atoms of qz . For each q β (z) ∈ Fq (z), add a disjoint copy q̃ β of q β and the role atom R(y, z̃) where z̃ is the copy of z in q̃ β ; 3. for each S(y, z1 ) ∈ qy with S(y, z1 ) 6= α, add a disjoint copy q 0 of q and the role atom S(y 0 , z1 ) where y 0 is the copy of y in q 0 ; q ŷ q ŷ 0 q R3 R3 y y0 y y0 R1 R2 R1 R2 R R1 R R R2 z1 z ... yR2 z1 z0 z0 yR2 ... qz1 qz qz1 β1 q q βn Fig. 1. Construction of q R(y,z) (y) from q 4. for each S(y, yS) ∈ Uqy ,O with yS a trace such that O 6|= ∃S v A if α = A(y), add a disjoint copy q 0 of q and the role atoms S(y, z1 ), S(y 0 , z1 ), where y 0 is the copy of y in q 0 and z1 is a fresh variable name; 5. if there is a S(ŷ, y) ∈ q with ŷ the predecessor of y, then add a disjoint copy q 0 of q and the role atom S(ŷ 0 , y) where ŷ 0 is the copy of ŷ in q 0 . Note that Step 2 is the inductive step, where every subtree rooted at a successor z of y is replaced with all ELIQs from Fq (z). The construction is illustrated in Figure 1 which on the left side shows a variable y in q with predecessor ŷ, two successors z1 and z in q, and one additional successor yR2 (a trace) in the universal model Uq,O . On the right side, it displays the ELIQ q R(y,z) (y) ∈ Fq (y), assuming that Fq (z) = {q β1 , . . . , q βn }. We remark that for y = 6 x, the set Fq (y) is not necessarily a frontier of the ELIQ qy (y) because the part of q that is outside of subtree qy is taken into account in the construction of Fq (y), in Steps 3-5. Our construction generalizes the construction of frontiers of ELIQs without ontologies given in [11].3 It indeed yields a frontier. Lemma 3. Fq (x) is a frontier of q(x) w.r.t. O. We next observe that the obtained frontier Fq (x) is of polynomial size. It is then clear that it can be computed in polynomial time as described above since subsumption between basic concepts in DL-Lite can be decided in polynomial time [10]. 3 There is actually a small omission in [11] as the counterpart of our Step 5 is missing. X Lemma 4. |var(q α )| ≤ |sig(q)| · |var(q)|3 · (|var(q)| + 1) · (||O|| + 1). q α (x)∈Fq (x) We next observe that adding conjunction to DL-Lite destroys polynomial frontiers and thus Theorem 1 does not extend to DL-Lite horn ontologies [3]. In fact, this already holds for very simple queries and ontologies, implying that also for other DLs that support conjunction such as EL, polynomial frontiers are elusive. A conjunction of atomic queries (AQ∧ ) is a unary CQ of the form q(x) ← A1 (x) ∧ · · · ∧ An (x) and a conjunctive ontology is a set of CIs of the form A1 u · · · u An v A where A1 , . . . , An and A are concept names. Theorem 2. There are families of AQ∧ s q1 , q2 , . . . and conjunctive ontologies O1 , O2 , . . . such that for all n ≥ 1, any frontier of qn w.r.t. On has size at least 2n . The proof is a variation of a proof given in [15] showing that AQ∧ s are not polynomial time learnable under conjunctive ontologies. It is based on the following AQ∧ s and ontologies: qn (x) ← A1 (x) ∧ A01 (x) ∧ · · · ∧ An (x) ∧ A0n (x) On = {Ai u A0i v A1 u A01 u · · · u An u A0n | 1 ≤ i ≤ n}. In fact, the minimal frontier contains all AQ∧ s that contain exactly one of the conjuncts Ai and A0i , for 1 ≤ i ≤ n. Observe that in the proof of Theorem 1, there are only polynomially many choices for weakening an ELIQ, represented by the sets Γy , y ∈ var(q). In contrast, weakening the AQ∧ qn w.r.t. ontology On in a minimal way requires to choose for each i ∈ {1, . . . , n} whether Ai or A0i should be removed, and there are exponentially many such choices. To close this section, we briefly consider the unique characterization of ELIQs in terms of polynomially many data examples. A data example takes the form (A, a) where A is an ABox and a ∈ ind(A). Let E + , E − be finite sets of data examples. An ELIQ q fits (E + , E − ) w.r.t. a DL-Lite ontology O if (A, a) ∈ E + implies A, O |= q(a) and (A, a) ∈ E − implies A, O 6|= q(a). We say that (E + , E − ) uniquely characterizes q w.r.t. O if q fits (E + , E − ) and every ELIQ q 0 that also fits (E + , E − ) satisfies q ≡O q 0 . The following is a consequence of Theorem 1. Theorem 3. For every DL-Lite ontology O and every ELIQ q that is satisfiable w.r.t. O, we can compute in polynomial time data examples (E + , E − ) that uniquely characterize q w.r.t. O. Note that unique characterizability is closely related to the reverse engineering of CQs, also called query-by-example and studied in a DL context in [17,24]. 4 Learning ELIQs We use the results from the previous section to show that ELIQs are polynomial time learnable using only membership queries under DL-Lite ontologies if the 0 0 learner is provided with an initial CQ qH such that qH is satisfiable w.r.t. the 0 ontology O and qH ⊆O qT where qT is the target ELIQ. Such a q can be constructed in polynomial time if O is formulated in DL-Lite − . Otherwise, it can be produced by a single initial equivalence query with an ELIQ that is not satisfiable w.r.t. O, forcing the learner to provide a positive counterexample (A, a) 0 from which we can extract the desired qH . Before proving these positive results, however, we first observe that polynomial time learning using only membership queries (but no initial equivalence query) is not possible when O contains concept disjointness constraints. A disjointness ontology is a DL-Lite ontology that only consists of concept disjointness constraints. Theorem 4. AQ∧ s are not polynomial query learnable under disjointness on- tologies using only membership queries. The proof of Theorem 4 is a variation of that of Theorem 2. We next present the main results of this section. Theorem 5. 1. ELIQs are polynomial time learnable under DL-Lite ontologies using only membership queries and a single initial equivalence query. 2. ELIQs are polynomial time learnable under DL-Lite− ontologies using only membership queries. Throughout this section, we may assume the ontology to be in normal form. Lemma 5. If ELIQs are polynomial time learnable under DL-Lite ontologies in normal form using membership queries and a single initial equivalence query, then this is also true for unrestricted DL-Lite ontologies. The same holds for DL-Lite− ontologies without the initial equivalence query. The idea to prove Lemma 5 is to convert the given ontology into normal form and then run the learning algorithm for ontologies in normal form. Since that algorithm may pose membership queries and equivalence queries that involve fresh concept names introduced during normalization, we need to replace those concept names as described after Lemma 2 before forwarding the query to the oracle (which uses the original non-normalized ontology). We prove Points 1 and 2 of Theorem 5 simultaneously. Let O be a DL-Lite ontology and Σ a finite signature that contains all symbols in O, both known to the learner and the oracle. Further let qT (y) be the target ELIQ known to the oracle, formulated in signature Σ and satisfiable w.r.t. O. The algorithm that enables the learner to learn qT in polynomial time is displayed as Algorithm 1. It 0 0 takes as input a CQ qH that is satisfiable w.r.t. O and satisfies qH ⊆O qT . Note 0 that qH need not be an ELIQ. The algorithm then constructs and repeatedly updates a hypothesis ELIQ qH while maintaining that qH ⊆O qT . The initial call 0 to subroutine treeify yields an ELIQ qH with qH ⊆O qH ⊆O qT to be used as the first hypothesis. The algorithm then iteratively generalizes qH by constructing the frontier FqH of qH w.r.t. O in polynomial time and choosing from it a new ELIQ qH with qH ⊆O qT . Additionally, the algorithm applies the minimize subroutine Algorithm 1 Algorithm for learning ELIQs under DL-Lite ontologies 0 0 Input A DL-Lite ontology O and a CQ qH satisfiable w.r.t. O such that qH ⊆O qT Output An ELIQ qH such that qH ≡O qT 0 qH := treeify(qH ) while there is a qF ∈ FqH with qF ⊆O qT do qH := minimize(qF ) end while return qH to ensure that the new qH is O-minimal and to avoid an excessive blowup while iterating in the while loop. Before we detail the subroutines treeify and minimize, we explain how to 0 obtain the argument qH to the algorithm. Suppose first that O contains neither concept disjointness constraints nor role disjointness constraints. Then 0 qH (x) = {A(x) | A ∈ Σ ∩ NC } ∪ {r(x, x) | r ∈ Σ ∩ NR }. (1) If O contains concept or role disjointness constraints, then we cannot use the 0 above qH because it is not satisfiable w.r.t. O. If, however, O contains only role 0 disjointness constraints, then we can still find a suitable qH . Let R be the set of all r ∈ Σ ∩ NR such that ∃r is satisfiable w.r.t. O, introduce four variables x0r , x1r , x0r− , x1r− for all r ∈ R, and fix a linear order  on R0 = R ∪ {r− | r ∈ R}. Fix any variable x := xiR . Then, qH 0 is given by 0 qH (x) = {A(xiR ) | A ∈ Σ ∩ NC , R ∈ R0 , i ∈ {0, 1}} ∪ {R(xiS , xiR ) | R, S ∈ R0 , S  R, i ∈ {0, 1}} ∪ 0 {R(xiS , x1−i R ) | R, S ∈ R , S 6 R, i ∈ {0, 1}}. Observe that every variable has an R-successor for every (satisfiable) role R. Therefore, there is a homomorphism from every satisfiable target ELIQ qT to 0 0 qH , which shows that indeed qH ⊆O q T . In the remaining case when O contains a concept disjointness constraint A u B v ⊥, we pose the ELIQ A(x) ∧ B(x) as an equivalence query to the oracle. Since the target query is satisfiable w.r.t. O, the oracle returns a positive 0 counterexample (A, a). The desired query qH is (A, a) viewed as a CQ with answer 0 variable a. In the algorithm, we may w.l.o.g. assume that qH is O-saturated due to Lemma 1. The minimize subroutine. The subroutine takes as input a unary CQ q(x) that is O-saturated, satisfiable w.r.t. O, and satisfies q ⊆O qT . It computes an O-minimal unary CQ q 0 with q ⊆O q 0 ⊆O qT using membership queries. This is done by exhaustively applying the following operation: Remove successor. Choose a role atom r(x1 , x2 ) ∈ q and let q − be the restriction of q \ {r(x1 , x2 )} to the atoms that only contain variables which are reachable from x in Gq\{r(x1 ,x2 )} . Pose the membership query Aq− , O |= qT (x). If the response is positive, continue with q − in place of q. This operation preserves O-saturation and satisfiability w.r.t. O. Since the operation is applied at most once to each atom in q, the running time and number of membership queries is polynomial in |var(q)| + |Σ|. The following lemma summarizes important properties of q 0 = minimize(q) that we need to show correctness and polynomial running time of Algorithm 1. Lemma 6. Let q be a unary CQ that is O-saturated and satisfiable w.r.t. O such that q ⊆ qT for the target query qT (y), and let q 0 = minimize(q). Then 1. q ⊆O q 0 and q 0 ⊆O qT ; 2. var(q 0 ) ⊆ img(h) for every homomorphism h from qT to Uq0 ,O with h(y) = x (and consequently |var(q 0 )| ≤ |var(qT )|); 3. q 0 is O-minimal. When q is an ELIQ, minimize(q) is also an ELIQ. We define minimize on unrestricted CQs since it is applied to non-ELIQs as part of the treeify subroutine, described next. The treeify subroutine. The subroutine takes as input a unary CQ q(x) that is O-saturated, satisfiable w.r.t. O, and satisfies q ⊆O qT . It computes an ELIQ q 0 = treeify(q) with q ⊆O q 0 ⊆O qT by repeatedly increasing the length of cycles in q and posing membership queries; similar constructions are used in in [21,15]. Formally, treeify constructs a sequence of CQs p1 , p2 , . . . starting with p1 = minimize(q) and then taking pi+1 = minimize(p0i ) where p0i is obtained from pi by doubling the length of every cycle. More precisely, p0i is the result of the following operation. Double cycle. Choose a cycle R1 (x1 , x2 ), . . . , Rn (xn , x1 ) in pi and let p0i be the CQ obtained as follows: start with pi , introduce copies x01 , . . . , x0n of x1 , . . . , xn , and – remove all atoms R(xn , x1 ); – add A(x0j ) if A(xj ) ∈ pi with 1 ≤ j ≤ n; – add R(x0j , z) if R(xj , z) ∈ pi with 1 ≤ j ≤ n and z ∈ / {x1 , . . . , xn }; – add R(x0j , x0k ) if R(xj , xk ) ∈ pi with 1 ≤ j, k ≤ n and {j, k} = 6 {1, n}; – add R(xn , x01 ) and R(x0n , x1 ) if R(xn , x1 ) ∈ pi . Once pi contains no more cycles, treeify stops and returns pi . The next lemma states the central properties of this construction. Lemma 7. For all i ≥ 1, 1. pi is O-saturated and satisfiable w.r.t. O; 2. pi ⊆O qT ; 3. pi ⊆O pi+1 and |var(pi+1 )| > |var(pi )|. Point 3 of Lemma 7 and the ‘consequently’ part of Point 2 of Lemma 6 imply that treeify terminates and thus eliminates all cycles in q while maintaining q ⊆O qT . The next lemma makes this precise. Lemma 8. Let q be a CQ that is O-saturated, satisfiable w.r.t. O, and satisfies q ⊆O qT . Then q 0 = treeify(q) is an ELIQ that can be computed in time polynomial in |var(qT )| + ||q|| using membership queries. Returning to Algorithm 1, let q1 , q2 , . . . be the sequence of ELIQs that are assigned to qH during a run of the learning algorithm. Using the properties of frontiers, minimize, and treeify we can now show that the hypotheses approximate the target query in an increasingly closer way. Lemma 9. For all i ≥ 1, 1. qi ⊆O qT ; 2. qi ⊆O qi+1 and qi+1 6⊆O qi ; 3. var(qi ) ⊆ img(h) for every homomorphism h from qi+1 to Uqi ,O with h(x) = x. Point 3 of Lemma 9 implies that |var(qi+1 )| ≥ |var(qi )|, and this can be used to show that the while loop in Algorithm 1 terminates after a polynomial number of iterations, arriving at a hypothesis qH ⊆O qT such that there is no qF ∈ FqH with qF ⊆O qT , that is, qH ≡O qT . Lemma 10. qn ≡ qT for some n ≤ p(|var(qT )| + |Σ|), p a polynomial. It follows from Lemma 10, the ‘consequently’ part of Point 2 of Lemma 6, and Lemma 8 that Algorithm 1 is a polynomial time learning algorithm, thus completing the proof of Theorem 5. 5 Outlook As future work, we are going to consider extensions of the setup studied in this paper. We are optimistic that the approach presented here can be extended to DL-Lite ontologies with role inclusions, yielding polynomial size frontiers and polynomial query learnability also for that logic (but not polynomial time learnability since subsumption becomes NP-complete). Natural next steps could then be to replace DL-Lite with linear tuple-generating dependencies (TGDs) [9] or with DL-Lite krom ontologies [3]. In contrast, it is clear from Theorem 2 and the results in [15] that our results do not extend to DL-Lite horn . It is not ruled out, however, that ELIQs can be learned in polynomial time w.r.t. DL-Lite horn ontologies when both membership and equivalence queries can be used. Another natural question is whether CQs can be learned in polynomial time in the presence of DL-Lite ontologies. It is known that this is not possible with only membership queries even without ontologies [11], but it might be possible with both membership and equivalence queries. Acknowledgement. Carsten Lutz was supported by DFG CRC 1320 EASE. References 1. Angluin, D.: Learning regular sets from queries and counterexamples. Inf. Comput. 75(2), 87–106 (1987) 2. Angluin, D.: Queries and concept learning. Mach. Learn. 2(4), 319–342 (1987) 3. Artale, A., Calvanese, D., Kontchakov, R., Zakharyaschev, M.: The DL-Lite family and relations. J. of Artifical Intelligence Research 36, 1–69 (2009) 4. Baader, F.: Least common subsumers and most specific concepts in a description logic with existential restrictions and terminological cycles. In: Proc. of IJCAI. pp. 319–324. Morgan Kaufmann (2003) 5. Baader, F., Horrocks, I., Lutz, C., Sattler, U.: An Introduction to Description Logics. Cambride University Press (2017) 6. Baader, F., Küsters, R., Molitor, R.: Computing least common subsumers in description logics with existential restrictions. In: Proc. of IJCAI. pp. 96–103. Morgan Kaufmann (1999) 7. Baader, F., Sertkaya, B., Turhan, A.: Computing the least common subsumer w.r.t. a background terminology. J. Appl. Log. 5(3), 392–420 (2007) 8. Bienvenu, M., Ortiz, M., Simkus, M., Xiao, G.: Tractable queries for lightweight description logics. In: Proc. of IJCAI. pp. 768–774 (2013) 9. Calı̀, A., Gottlob, G., Lukasiewicz, T.: A general datalog-based framework for tractable query answering over ontologies. J. Web Semant. 14, 57–83 (2012) 10. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning 39(3), 385–429 (2007) 11. ten Cate, B., Dalmau, V.: Conjunctive queries: Unique characterizations and exact learnability. In: Proc. of ICDT. LIPIcs, vol. 186, pp. 9:1–9:24. Schloss Dagstuhl - Leibniz-Zentrum für Informatik (2021) 12. Cohen, W.W., Hirsh, H.: The learnability of description logics with equality con- straints. Mach. Learn. 17(2-3), 169–199 (1994) 13. Cohen, W.W., Hirsh, H.: Learning the classic description logic: Theoretical and experimental results. In: Proc. of KR. pp. 121–133. Morgan Kaufmann (1994) 14. Frazier, M., Pitt, L.: Classic learning. Mach. Learn. 25(2-3), 151–193 (1996) 15. Funk, M., Jung, J.C., Lutz, C.: Actively learning concept and conjunctive queries under ELr -ontologies. In: Proc. of IJCAI (2021) 16. Funk, M., Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Learning description logic concepts: When can positive and negative examples be separated? In: Proc. of IJCAI. pp. 1682–1688 (2019) 17. Gutiérrez-Basulto, V., Jung, J.C., Sabellek, L.: Reverse engineering queries in ontology-enriched systems: The case of expressive Horn description logic ontologies. In: Proc. of IJCAI-ECAI. ijcai.org (2018) 18. Jung, J.C., Lutz, C., Pulcini, H., Wolter, F.: Logical separability of incomplete data under ontologies. In: Proc. of KR. pp. 517–528 (2020) 19. Jung, J.C., Lutz, C., Wolter, F.: Least general generalizations in description logic: Verification and existence. In: Proc. of AAAI (2020) 20. Konev, B., Lutz, C., Ozaki, A., Wolter, F.: Exact learning of lightweight description logic ontologies. J. Mach. Learn. Res. 18(201), 1–63 (2018) 21. Konev, B., Ozaki, A., Wolter, F.: A model for learning description logic ontologies based on exact learning. In: Proc. of AAAI. pp. 1008–1015. AAAI Press (2016) 22. Lehmann, J., Hitzler, P.: Concept learning in description logics using refinement operators. Mach. Learn. 78, 203–250 (2010) 23. Lehmann, J., Völker, J.: Perspectives on Ontology Learning, Studies on the Semantic Web, vol. 18. IOS Press (2014) 24. Ortiz, M.: Ontology-mediated queries from examples: a glimpse at the DL-Lite case. In: Proc. of GCAI. pp. 1–14 (2019) 25. Ozaki, A.: Learning description logic ontologies: Five approaches. where do they stand? KI - Künstliche Intelligenz (2020) 26. Ozaki, A., Persia, C., Mazzullo, A.: Learning query inseparable ELH ontologies. In: Proc. of AAAI. pp. 2959–2966 (2020) 27. Sarker, M.K., Hitzler, P.: Efficient concept induction for description logics. In: Proc. of AAAI. pp. 3036–3043 (2019) 28. Zarrieß, B., Turhan, A.: Most specific generalizations w.r.t. general EL-TBoxes. In: Proc. of IJCAI. pp. 1191–1197 (2013)