Learning Description Logic Concepts: When can Positive and Negative Examples be Separated?

An important challenge for adopting ontologies in practical applications is the knowledge acquisition bottleneck, that is, the significant time and effort it takes to build the required ontologies. As a promising approach to help overcoming this difficulty, the varied field of ontology learning has received a lot of attention in the last two decades, see [15] for a recent overview. A prominent line of research within ontology learning is concerned with learning description logic (DL) concepts from positive and negative examples, given an already available DL ontology that contains background knowledge [12,14,16,20,7]. Applications include the support of ontology development and the construction of concept based classifiers [4,18]. The precise formulation is as follows. Given a knowledge base (KB) K = (T , A) and designated sets of individuals P and N from A that serve as positive and negative examples, find a concept C formulated in a DL L S that separates the positive from the negative examples, that is, K |= C(a) for all a ∈ P and K |= C(a) for all a ∈ N . In addition to separation, one also wants to achieve that the learned concept C generalizes the positive examples in a meaningful way, classifying new examples accordingly. As a prominent system for DL concept learning, we mention DL Learner. It encompasses several learning algorithms that support a range of DLs, including expressive ones such as ALC and ALCQ, Horn DLs such as EL, and even full OWL 2 [5,4]. Like competing systems such as DL-Foil, YinYang, and pFOIL-DL [7,10,19], DL Learner uses carefully crafted refinement operators [1,13,14] along with various heuristics to learn concepts that provide an as good as possible generalization of the examples, avoiding overfitting. If possible, refinement operators are designed so that the resulting algorithm terminates on any input and is complete in the sense that whenever there is a concept that separates the positive and negative examples in the input, then such a concept is indeed learned.

In the paper reported about in this abstract [8], we investigate the fundamental question of when a separating concept exists for a learning instance (K, P, N ), considering the most popular choices of DLs for the TBox language L T and the separation language L S . Our main contributions are model-theoretic characterizations that give important insight into when this is the case and a precise analysis of the computational complexity of separability viewed as a decision problem, which we refer to as (L T , L S ) concept separability and as L concept separability when L T = L S = L. We also consider concept definability, the spe-cial case of concept separability in which P ∪N comprises all individuals from A. All our complexity results actually hold for both separability and definability.

We believe that these problems are relevant both from a practical and from a theoretical perspective. In fact, complexity lower bounds for concept separability point to an inherent complexity that no practical system that aims for completeness can avoid. Undecidability results even mean that there can be no practical learning system that is both terminating and complete. From the viewpoint of machine learning theory, concept separability corresponds to the existence of a consistent hypothesis, the most fundamental problem for exploring the version space [9]. The associated decision problem is often called the consistency problem, and it is known to be closely related to PAC learnability [17,11].

We cover the expressive DLs ALC, ALCI, ALCQ, and ALCQI as well as the Horn DLs EL and ELI. For the former, overfitting is a risk because the disjunction operator available in such DLs enables the construction of separating concepts that do not provide the desired generalization of the positive examples. Nevertheless, most practical systems such as DL Learner work with expressive DLs and avoid overfitting by using appropriate refinement operators. Horn DLs do not admit disjunction and therefore are not prone to overfitting. On the other hand, they provide less separating power and, as we show, tend to incur higher computational (worst-case) cost for learning.

For expressive DLs, we start with initial characterizations in terms of (a form of) bisimulations and then proceed to more refined characterizations based on homomorphisms. Interestingly and unexpectedly to us, these establish a tight link between concept separability and the evaluation of ontology-mediated queries (OMQs) based on unions of 'rooted' conjunctive queries [6,2]. We use this link to obtain complexity upper and lower bounds. In fact, L concept separability is NExpTime-complete for L ∈ {ALC, ALCI, ALCQ} while ALCQI concept separability is only ExpTime-complete. This refers to combined complexity where all components of the learning instance are part of the input. We also study data complexity where the ABox is the only input while the TBox is fixed. In all expressive DLs above, concept separability is Σ p 2 -complete in data complexity. For Horn DLs, we establish characterizations based on products of universal models and simulations. Based on these, we show that (L T , EL) concept separability is ExpTime-complete for L T ∈ {EL, ELI}, both in combined complexity and in data complexity. We find the high data complexity of this problem rather remarkable. We also prove that ELI concept separability is undecidable, a result that came as a surprise to us.

We finally consider a mix of expressive DLs and Horn DLs, that is, (L T , L S ) concept separability where L T is any of the expressive DLs mentioned above and L S is EL or ELI. These problems again turn out to be undecidable, thus ruling out terminating and complete learning systems. The proof exploits a connection to a certain version of query based conservative extensions between ALC KBs [3].

We also consider a stronger version of concept separability that is also considered in the literature requires that K |= ¬C(a) for all a ∈ N , rather than only K |= C(a).