Learning Query Inseparable ELH Ontologies (Extended Abstract) Ana Ozaki1 , Cosimo Persia1 , and Andrea Mazzullo2 1 University of Bergen, Norway {ana.ozaki,cosimo.persia}@uib.no 2 Free University of Bozen-Bolzano, Italy mazzullo@inf.unibz.it In this extended abstract we give an overview of our work [6], published at AAAI 2020, on learning description logic (DL) ontologies. We focus on learning ontologies formulated as ELH terminologies [2]. The problem of building ontologies is treated as a learning problem [4, 5, 3] in which a learner (the ontology engineer) attempts to acquire the necessary information to build an ontology by posing queries to a teacher (the domain expert), who knows about the domain of the ontology. This scenario can be modelled within Angluin’s exact learning model [1], where a learner interacts with a teacher in order to learn a hypothesis that reflects (by some notion of similarity, approximation or equivalence) an abstract target which, in this case, is a DL ontology. The learner can formulate questions of two kinds. In the first one, the learner chooses a set of assertions A and a query q expressible in a query language Q, and asks the teacher if the target ontology T , together with A, entails q. Example 1. Assume T = {PalmTree v TropicalTree}. Then, the learner can ask whether (T , {PalmTree(a)}) |= TropicalTree(a), receiving the answer ‘yes’. This is an instance of what is called a membership query in Angluin’s model (note that the term ‘query’ here refers to the kind of question posed by the learner, not to a query in a query language). In the second, called inseparability query, the learner checks if the hypothesis and the target entail the same queries in a query language Q on a fixed set of assertions A∗ . If so then the hypothesis and the target are query inseparable w.r.t. A∗ and Q. In case they are not (query) inseparable, the learner receives from the teacher a query in Q showing that the hypothesis and the target can still be distinguished w.r.t. A∗ and Q. We consider the cases in which Q is the language of atomic queries AQs, instance queries IQs, conjunctive queries CQs and rooted conjunctive queries CQr s (i.e., CQs where each variable is reachable from an individual name in its graph representation). Example 2. Consider T = {PalmTree v TropicalTree, TropicalTree v Tree}, and H = {PalmTree v TropicalTree, PalmTree v Tree}. Even though H is not equiva- lent to T , it is query inseparable w.r.t. A∗ = {PalmTree(a)} and AQs. However, if A∗ = {TropicalTree(a)} then T and H are not inseparable w.r.t A∗ and AQs. As illustrated in Example 2, query inseparability is a less strict notion than logical equivalence, investigated by Konev et al. [4, 5, 3]. We assume that the Copyright c 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 2 A. Ozaki et al. Equivalent Inseparable PAC Framework AQ IQ CQr CQ AQ IQ CQr CQ AQ IQ CQr CQ EL(H)lhs 3 3 3 3 3 3 3 3 3 3 3 3 EL(H)rhs – 3 3 7 3 3 3 7 3 3 3 3 EL(H) – 7 ? 7 3 3 3 7 3 3 3 3 Table 1. Polynomial query learnability of learning frameworks. learner knows the vocabulary of the ontology about the domain of interest (but not how terms relate to each other and how they can be combined to form the ontology), and that the teacher answers accordingly. We study, in the model just described, the complexity of learning query inseparable ELH ontologies. The complexity of learning in this case depends on the interactions between the learner and the teacher in order to learn a query inseparable ontology, where posing a (membership or inseparability) query counts as one step of computation. We consider two notions of complexity. In the case of query inseparability, they depend also on the size of the fixed ABox, which is considered to be part of input. One is based on the number and size of queries posed by the learner. This is the query complexity. The other is based on the overall time spent by the learner—time complexity (for learning). Polynomial time learnability implies polynomial query learnability but not the converse. However, answering inseparability queries is more challenging for an expert than answering membership queries. For this reason, we also consider an approach for learning ontologies, based on Valiant’s probably approximately correct (PAC) model [7] that avoids inseparability queries. Table 1 summarises, in the first two columns, polynomial query learnability results for finding a logically equivalent hypothesis, and our query inseparability results. We also consider the cases where ontologies contain CIs with complex concepts only on the left hand side (EL(H)lhs ), or only on the right hand side (EL(H)rhs ). The symbol 3 means ‘polynomial query learnable’, while 7 means the negation of 3. The last column of the table shows that query inseparable ontologies are polynomial query learnable in the PAC model. Our main contributions are shaded in grey. In addition, we show that there is a case in which one can achieve PAC learnability in polynomial time but not polynomial query learnability with membership and inseparability queries (and thus also not polynomial time learnability). Query inseparability may not be preserved after that the predefined set of assertions is updated (Example 3). Example 3. If the fixed set of assertions in Example 2 is updated to A∗ = {TropicalTree(a)}, then H and T of Example 2 are not anymore query inseparable w.r.t. A∗ and AQ, because (T , A∗ ) |= Tree(a), while H does not. For this reason, we also study conditions under which query inseparability is guaranteed to be preserved. We first observe that, if every individual in the updated data instance A is bisimilar to an individual in the fixed data instance A∗ , then IQ-inseparability is preserved. This result does not hold if we just Learning Query Inseparable ELH Ontologies (Extended Abstract) 3 require the existence of a homomorphism from the old ABox to the updated one and vice-versa. We then show that there is an algorithm that can compute a hypothesis that would preserve IQ inseparability under an extended class of possible updates while still preserving IQ inseparability. The update in Example 3 is covered by this extended class of updates even though there is no bisimulation between the old and the new set of assertions. In case query inseparability is not preserved, the learning algorithm can pose further queries starting from the previously built hypothesis. References 1. Dana Angluin. Queries and concept learning. Machine Learning, 2(4):319–342, 1988. 2. Franz Baader, Diego Calvanese, Deborah McGuinness, Daniele Nardi, and Peter Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, second edition, 2007. 3. Mario Ricardo Cruz Duarte, Boris Konev, and Ana Ozaki. ExactLearner: A tool for exact learning of EL ontologies. In KR, pages 409–414, 2018. 4. Boris Konev, Carsten Lutz, Ana Ozaki, and Frank Wolter. Exact Learning of Lightweight Description Logic Ontologies. Journal of Machine Learning Research, 18(201):1–63, 2018. 5. Boris Konev, Ana Ozaki, and Frank Wolter. A model for learning description logic ontologies based on exact learning. In AAAI, pages 1008–1015, 2016. 6. Ana Ozaki, Cosimo Persia, and Andrea Mazzullo. Learning query inseparable ELH ontologies. In AAAI, pages 2959–2966. AAAI Press, 2020. 7. L. G. Valiant. A theory of the learnable. Commun. ACM, 27(11):1134–1142, 1984.