On the Eve of True Explainability for OWL Ontologies: Description Logic Proofs with Evee and Evonne Christian Alrabbaa1 , Stefan Borgwardt1 , Tom Friese1 , Patrick Koopmann1 , Julián Méndez2 and Alexej Popovič1 1 Institute of Theoretical Computer Science, Technische Universität Dresden, 01062 Dresden, Germany 2 Interactive Media Lab Dresden, Technische Universität Dresden, 01062 Dresden, Germany Abstract When working with description logic ontologies, understanding entailments derived by a description logic reasoner is not always straightforward. So far, the standard ontology editor Protégé offers two services to help: (black-box) justifications for OWL 2 DL ontologies, and (glass-box) proofs for lightweight OWL EL ontologies, where the latter exploits the proof facilities of reasoner Elk. Since justifications are often insufficient in explaining inferences, there is thus only little tool support for explaining inferences in more expressive DLs. In this paper, we introduce Evee-libs, a Java library for computing proofs for DLs up to 𝒜ℒ𝒞ℋ, and Evee-protege, a collection of Protégé plugins for displaying those proofs in Protégé. We also give a short glimpse of the latest version of Evonne, a more advanced standalone application for displaying and interacting with proofs computed with Evee-libs. Keywords Proofs, Explanation, Protégé Plug-in, Forgetting 1. Introduction Description logics (DLs) [1] have gained popularity through the standardization of the Web Ontology Language OWL1 and the development of an OWL Java API [2], editing tools, and reasoners. This popularity comes with the need of explaining description logic reasoning to domain experts who are not familiar with DLs. We consider here the problem of explaining the entailment of logical consequences from DL ontologies (as opposed to explaining non- consequences, which requires other techniques [3, 4]). Research on explaining description logic inferences in the beginning considered proofs that provide detailed inference steps through which the consequence can be obtained [5, 6]. In many cases, especially in light-weight DLs, it can be sufficient to compute a justification, a minimal set of ontology axioms that entail the consequence [7, 8, 9]. However, if justifications become very large or the ontology is formulated in a more expressive DL, providing intermediate inference steps between a justification and its DL 2022: 35th International Workshop on Description Logics, August 7–10, 2022, Haifa, Israel $ christian.alrabbaa@tu-dresden.de (C. Alrabbaa); stefan.borgwardt@tu-dresden.de (S. Borgwardt); tom.friese@tu-dresden.de (T. Friese); patrick.koopmann@tu-dresden.de (P. Koopmann); julian.mendez2@tu-dresden.de (J. Méndez); alexej.popovic@tu-dresden.de (A. Popovič)  0000-0002-2925-1765 (C. Alrabbaa); 0000-0003-0924-8478 (S. Borgwardt); 0000-0001-5999-2583 (P. Koopmann); 0000-0003-1029-7656 (J. Méndez) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings CEUR Workshop Proceedings (CEUR-WS.org) http://ceur-ws.org ISSN 1613-0073 1 https://www.w3.org/TR/owl2-overview/ consequence may be required for understanding. Generating proofs is achievable via heuristic search for possible intermediate inferences [10], concept interpolation [11], forgetting [12], or directly using the inference rules underlying a consequence-based reasoner like Elk [13], which however only supports ℰℒ+ . Recently, a plug-in for the ontology editor Protégé [14] was developed that shows proofs generated by Elk in a human-readable form [15]. Related work investigated the complexity of finding proofs that are optimal w.r.t. various measures, e.g. proof size or depth [12, 16, 17]. For example, from a set of instantiated inference rules, e.g. computed by Elk, one can extract a proof of minimal depth in polynomial time [17]. In this paper, we present a collection of tools for proof generation, also for expressive DLs other than ℰℒ+ . Our Java library Evee-libs (EVincing Expressive Entailments) implements various proof generation methods that form the basis for Evee-protege, a collection of Protégé plug-ins for interactively inspecting proofs, and Evonne, a more advanced proof visualization tool. This library extends our previously reported implementations of proof generation algorithms [12, 18] in various ways. One can extract proofs that are minimized w.r.t. size or depth from the output of Elk, generate elimination proofs [12, 18] that can additionally be optimized w.r.t. several parameters, or generate more detailed proofs directly via the resolution-based calculus used in the uniform interpolation (UI) tool Lethe [19]. Elimination proofs can support different DLs, depending on the UI tool that is used internally in a black-box fashion — our library currently uses a version of Fame [20] that supports 𝒜ℒ𝒞, and Lethe, which supports 𝒜ℒ𝒞ℋ. While we discussed and evaluated the different methods for elimination proofs before [18], the method for detailed proofs is new. We provide a quantitative comparison of this method with the elimination proofs based on ontologies from BioPortal. To use the proofs in practice to support ontology engineers with explanation services, we offer two tools to show proofs generated by Evee-libs to users. Evee-protege utilizes PULi (the Proof Utility Library) and the Protégé proofs plug-in [15] to show the proofs directly in the ontology editor Protégé. This might be the easiest way for users to use our services, and allowed us to perform a qualitative user study for our different proof methods presented in this paper. Evonne is a stand-alone web application, early prototypes of which were presented in [21, 22]. Since then, it has seen many visual improvements and new features, and offers now many different ways of displaying and interacting with proofs, together with a visualization of the modular structure of ontologies, and guidance for repairing undesired entailments. A larger publication discussing and evaluating the design decisions made in Evonne is currently under preparation. We here just give a brief overview of its proof visualization facilities. The source code and installation instructions for Evee-libs and Evee-protege can be found at https://github.com/de-tu-dresden-inf-lat/evee (version 0.1), where also the plugins can be downloaded. Evonne can be tested online and downloaded under https://imld.de/evonne/. 2. Preliminaries We consider a consequence 𝒪 |= 𝛼 that is to be explained, where 𝒪 is a DL ontology [1] and 𝛼 an axiom. A justification for 𝛼 in 𝒪 is a subset-minimal 𝒥 ⊆ 𝒪 such that 𝒥 |= 𝛼 [7]. In the worst case, there can be exponentially many justifications for an entailment. Following [12], we define proofs of 𝒪 |= 𝛼 as finite, acyclic, directed hypergraphs, where vertices 𝑣 are labeled with axioms ℓ(𝑣) and hyperedges are of the form (𝑆, 𝑑), with 𝑆 a tuple of vertices and 𝑑 a vertex such that {ℓ(𝑣) | 𝑣 ∈ 𝑆} |= ℓ(𝑑); the leafs of a proof must be labeled by elements of 𝒪 and the root by 𝛼. We depict hyperedges either using horizontal bars (e.g. 𝑝 𝑞𝑝→𝑞 ) or using graphs with two kinds of nodes (see e.g. Figure 1). In this paper, all proofs are trees, i.e., no vertex can appear in the first component of multiple hyperedges. The size of such a proof is the number of its vertices (this is called tree size in [12, 17]), and its depth is the length of the longest path from a leaf to the root. Given a finite collection of valid inference steps (hyperedges), one can extract a (tree) proof that is composed from a subset of these steps and has minimal size or minimal depth in polynomial time [12, 17]. 3. Proof Generation with Evee-libs Evee-libs implements several (families of) proof generation methods with different advantages. They all use the Java interface IProofGenerator with a method getProof that takes as input an OWLAxiom and returns an IProof that can either be serialized into JSON format using a JsonProofWriter, explored using the method getInferences, or measured using an IProofEvaluator. For entailments that only depend on axioms in ℰℒ+ , we implemented a method that optimizes proofs generated by Elk using various proof measures (in the evee-elk-proof-extractor library). For 𝒜ℒ𝒞 and 𝒜ℒ𝒞ℋ, we implemented two other proof generation methods, elimination proofs (evee-elimination-proofs) and Lethe-based proofs (evee-lethe-proof-extractor). 3.1. Optimized Elk Proofs We implemented the Dijkstra-like proof search algorithm from [12, 16, 17] that takes as input a set of instantiated inference rules that prove a target consequence and outputs an optimized proof. This works for any recursive measure of proof quality—intuitively, measures that can be computed recursively over the tree structure of the proof. Examples include the (tree) size weighted size, and depth of a proof. We apply this algorithm to the output of Elk, providing a different view on Elk proofs than the existing plug-in [15], which may show multiple ways to prove a DL axiom at the same time. Due to the fast reasoning with Elk, and because our algorithm runs only on the relatively small input extracted from Elk, this is the fasted proof generation method in Evee-libs. An experimental comparison of size-minimal Elk proofs and elimination proofs can be found in [12]. 3.2. Elimination Proofs For entailments that require more expressive DLs, such as 𝒜ℒ𝒞, we implemented the forgetting- based approach from [12]. We now call those proofs elimination proofs, because they perform inferences by eliminating names. Inferences in an elimination proof look as follows: 𝛼1 ... 𝛼𝑛 eliminate 𝑋 𝛽 where 𝑋 is a concept or a role name, {𝛼1 , . . ., 𝛼𝑛 } |= 𝛽, no subset of {𝛼1 , . . ., 𝛼𝑛 } entails 𝛽, and 𝑋 does not occur in 𝛽. The general idea is that 𝛽 is derived from the premises 𝛼1 , . . ., 𝛼𝑛 by performing inferences on 𝑋. Since usually the axiom we want to prove contains less names than the axioms it is derived from, this allows for a step-wise sequence of inferences from the ontology to the entailment. The result is a more high-level explanation than the other proof types. Elimination proofs do not depend on a specific logic – provided we have a method to compute elimination steps. We compute elimination proofs by making use of existing tools for forgetting: given an ontology 𝒪 and a name 𝑋, the result of forgetting 𝑋 from 𝒪 is an ontology 𝒪−𝑋 entailed by 𝒪 that does not contain 𝑋, and preserves all entailments of 𝒪 that can be expressed without using 𝑋. Forgetting has bad worst-case complexities [23, 24], and may not always be successful, for instance if 𝒪 contains cycles. Nonetheless, forgetting-tools like Fame and Lethe often perform well in practice when applied to realistic ontologies [19, 20]. Since forgetting is not always possible, these tools may introduce fresh names to simulate cyclic dependencies in the result. Fame may even not return a result at all because it is not complete. We implemented three methods for computing elimination proofs using forgetting. All three start from a justification 𝒥 2 for the entailment 𝒪 |= 𝛼 to be explained, and then generate a sequence of ontologies 𝒥 = 𝒪1 , . . . 𝒪𝑛 by eliminating the names that do not occur in 𝛼 one after the other. For robustness, we internally use a wrapper for the chosen forgetting method — if the forgetting method “fails” to eliminate 𝑋 (exceeds a fixed timeout set to 10 seconds by default, contains a definer, or fails for any other reason), we keep 𝑋 and continue forgetting the next name. The wrapper also makes sure that forgetting results are cached. From this sequence of ontologies, the elimination proof is then constructed using justifications: specifically, for 𝑖 > 1, we construct an inference step for 𝛼 ∈ 𝒪𝑖 ∖ 𝒪𝑖−1 by taking an arbitrary3 justification for 𝛼 from 𝒪𝑖−1 . To further optimize the presentation, we added a technique to merge several inference steps into one: in an inference 𝛼1 ... 𝛽 𝛼𝑛 , we may replace premises 𝛼𝑖 with the premises used in the inference producing 𝛼𝑖 , provided that the number of premises in the resulting inference step does not increase. This often leads to easier proofs in which several names are eliminated at the same time, e.g. using the following inference: 𝐴 ⊑ ∀𝑟.(𝐵1 ⊓ 𝐵2 ⊓ 𝐵3 ) ∃𝑟.(𝐵1 ⊓ 𝐵2 ⊓ 𝐵3 ) ⊑ 𝐵 eliminate 𝐵1 , 𝐵2 , 𝐵3 𝐴 ⊓ ∃𝑟.⊤ ⊑ 𝐵 For generating the sequence of ontologies, we employed three strategies. 1) The heuristic method selects names using the heuristic described in [12], 2) the name-optimized method uses best-first search to optimize the order in which names are eliminated with the aim of finding a shortest sequence of ontologies (thus, essentially minimizing the number of names being eliminated), 3) the size-optimized method also uses best-first search, but optimizes the final proof rather based on other measures than the length of the sequence: size and weighted size. Forgetting role names early often leads to very short sequences and small proofs, but also hides inferences. For this reason, we added the additional constraint that role names can only be eliminated if they only occur in role restrictions without other names (e.g. ∃𝑟.⊤, ∀𝑟.⊥). An example of an elimination proof generated using the heuristic approach is shown in Figure 1. 2 This is computed using the black-box justification algorithm [25] with HermiT [26]. 3 We also implemented the possibility of looking for the best justification in each case. However, this adversely affected the running time and did not lead to significantly smaller proofs. 𝐶1 ⊑ 𝐶3 ⊔ C2 C2 ⊑ 𝐶3 elim. C2 𝐴 ⊑ ∀𝑟.C1 C1 ⊑ 𝐶3 elim. C1 𝐴 ⊑ ∀r.C3 ∀r.C3 ⊑ 𝐵 elim. r, C3 𝐴⊑𝐵 Figure 1: Elimination proof. Premises and conclusion highlighted, eliminated symbols in bold face. Lethe and Fame may reformulate axioms that do not involve the forgotten name due nor- malization, which results in elimination inferences that do not really eliminate a symbol. For example, such a inference could produce 𝐶 ⊑ 𝐷 from 𝐶 ≡ 𝐷. Note that the inference pattern above allows this. Because they improve readability of the proof, we keep those inferences, but call them “normalization” rather than elimination inferences. We note that the version of Fame we use only supports 𝒜ℒ𝒞-TBoxes, while the version of Lethe supports 𝒜ℒ𝒞ℋ-TBoxes. The generated proofs often look quite different because Lethe puts more emphasis on “beautifying” the resulting axioms, that is, reformulating them to look more human-readable. Thus, elimination proofs generated using Lethe may contain axioms that are easier to read than those generated using Fame. On the other hand, it can happen as a result of reformulating axioms that the relation between the premises and the conclusion of an inference step becomes less obvious. Details and an experimental evaluation of the optimized algorithms can be found in [18]. 3.3. Detailed Proofs Extracted from Lethe Elimination proofs provide high-level explanations, since the detailed inferences for eliminating a symbol are hidden. For situations where this representation is too coarse, we provide a method for generating more detailed proofs, which uses the forgetting tool Lethe in a different way. When forgetting names in 𝒜ℒ𝒞ℋ TBoxes, Lethe internally uses a consequence-based reasoning procedure based on a normalized representation of axioms and the calculus shown in Figure 2 [27]. To explain 𝒪 |= 𝐴 ⊑ 𝐵, we forget all names other than 𝐴 and 𝐵 from 𝒪, and track the inferences performed by Lethe. The final set of clauses will contain a clause from which 𝐴 ⊑ 𝐵 can be straight-forwardly deduced. We modified the code of Lethe so that it logs all performed inferences, including information on the side conditions. In addition to applying the inferences from Figure 2, normally Lethe repeatedly normalizes and denormalizes relevant parts of the ontology, and applies further simplifications to keep the current representation of the forgetting result small. For the logging to operate properly, we added a logging mode where all optimizations are turned off, and the entire ontology is normalized once at the beginning. This leads to slower reasoning performance, which is okay since we apply the method only on justifications rather than large ontologies. To avoid complicated inferences using the last rule in Figure 2, we make sure that inferences on role names are performed at the very end. At this stage, all relevant information for the ⊤ ⊑ 𝐶1 ⊔ 𝐴 ⊤ ⊑ 𝐶2 ⊔ ¬𝐴 ⊤ ⊑ 𝐶 ⊔ ∃𝑟.𝐷 𝑟⊑𝑠 ⊤ ⊑ 𝐶 ⊔ ∀𝑟.𝐷 𝑠⊑𝑟 ⊤ ⊑ 𝐶1 ⊔ 𝐶2 ⊤ ⊑ 𝐶 ⊔ ∃𝑠.𝐷 ⊤ ⊑ 𝐶 ⊔ ∀𝑠.𝐷 ⊤ ⊑ 𝐶1 ⊔ ∃𝑟.𝐷1 ⊤ ⊑ 𝐶2 ⊔ ∀𝑠.𝐷2 where 𝒪 |= 𝑟 ⊑ 𝑠 ⊤ ⊑ 𝐶1 ⊔ 𝐶2 ⊔ ∃𝑟.(𝐷1 ⊓ 𝐷2 ) ⊤ ⊑ 𝐶1 ⊔ ∀𝑟1 .𝐷1 ⊤ ⊑ 𝐶2 ⊔ ∀𝑟2 .𝐷2 where 𝒪 |= 𝑠 ⊑ 𝑟1 , 𝑠 ⊑ 𝑟2 ⊤ ⊑ 𝐶1 ⊔ 𝐶2 ⊔ ∀𝑠.(𝐷1 ⊓ 𝐷2 ) d𝑛 ⊤ ⊑ 𝐶1 ⊔ ∃𝑟1 .𝐷1 ⊤ ⊑ 𝐶2 ⊔ ∀𝑟2 .𝐷2 . . . ⊤ ⊑ 𝐶𝑛 ⊔ ∀𝑟𝑛 .𝐷𝑛 where 𝒪 |= 𝑖=1 𝐷𝑖 ⊑ ⊥, ⊤ ⊑ 𝐶1 ⊔ . . . ⊔ 𝐶𝑛 𝒪 |= 𝑟1 ⊑ 𝑟2 , . . . , 𝑟1 ⊑ 𝑟𝑛 Figure 2: Inference rules of Lethe for 𝒜ℒ𝒞ℋ TBoxes. The last rule uses HermiT for satisfiability testing. ∀𝑟.𝐶3 ⊑ 𝐵 𝐶1 ⊑ 𝐶3 ⊔ 𝐶2 𝐶2 ⊑ 𝐶3 𝐴 ⊑ ∀𝑟.𝐶1 ⊤ ⊑ 𝐵 ⊔ ∃𝑟.¬𝐶3 𝐶1 ⊑ 𝐶3 𝐶3 ⊓ ¬𝐶3 ⊑ ⊥ 𝐴 ⊑ 𝐵 ⊔ ∃𝑟.(𝐶1 ⊓ ¬𝐶3 ) 𝐶1 ⊓ ¬𝐶3 ⊑ ⊥ 𝐴⊑𝐵 Figure 3: Detailed proof generated based on inference logging in Lethe (rules omitted). unsatisfiability is usually already propagated into a single existential role restriction, which means that the inference has only one premise and the side conditions are not needed anymore. Indeed, we found that the resulting inference steps were usually very straightforward. To produce a proof from the logging information, some additional steps are necessary: 1) the normalization as well as the reasoning procedure introduce fresh names, which have to be replaced again by the concepts they represent, 2) in the normal form, axioms are always repre- sented as concept disjunctions, which is not very user-friendly, and 3) the normalized axioms still have to be linked to (non-normalized) input axioms, as well as to the conclusion. For 3), we again use justifications computed using HermiT. For 2), we use the “beautification” functionality of Lethe to produce more human-friendly forgetting results. This involves applying some obvi- ous simplifications of concepts (𝐴 ⊔ ¬𝐴 ≡ 𝐴 ⊔ ⊤ ≡ ∀𝑟.⊤ ≡ ⊤, 𝐴 ⊓ ¬𝐴 ≡ 𝐴 ⊓ ⊥ ≡ ∃𝑟.⊥ ≡ ⊥), pushing negations inside complex concepts, and moving disjuncts to the left-hand side of a GCI if this allows to eliminate further negation symbols (eg. 𝐴 ⊑ 𝐵 ⊔ ∃𝑟.¬𝐶 ⇒ 𝐴 ⊓ ∀𝑟.𝐶 ⊑ 𝐵). Applying those simplifications too rigorously, however, may again hide inferences performed by Lethe. We thus “beautify” the axioms before we replace the introduced names for 1), so that inferences performed on introduced names are still easily visible. The effect of this can be seen in the proof in Figure 3, where negations that would otherwise be “beautified away” are still shown. After transforming the logged inferences in this way, we use our Dijkstra-style proof search algorithm to extract a single proof. Proof size Run time (ms) 70 106 60 105 50 40 EXTR EXTR 104 30 20 103 10 0 102 2 0 10 20 30 40 50 60 70 10 103 104 105 106 HEUR HEUR Figure 4: Run time and proof size for the heuristic elimination approach (using Lethe) vs. detailed proof extraction from Lethe. Marker size indicates how often each pattern occurred in the BioPortal snapshot. 3.3.1. Comparing Elimination Proofs to Lethe-based Proofs We compared the elimination proofs constructed using Lethe and the heuristic method (HEUR) with the detailed proofs extracted directly from Lethe (EXTR) on a collection of proof tasks from the 2017 snapshot of BioPortal [28] (see also [18]). Up to isomorphism, we extracted 138 distinct 𝒜ℒ𝒞ℋ proof tasks (consisting of an entailed axiom and the union of all its justifications) that are not fully expressible in ℰℒℋ, representing 327 different entailments within the BioPortal ontologies. We then computed proofs using the two methods, imposing a timeout of 5 minutes. This was successful for 325/327 entailments using HEUR. Due to the deactivated optimizations, EXTR timed out in 24/327 cases, which mostly happened when the signature of the task was large (more than ~18 symbols). Users should thus use EXTR only if the signature of the justification is sufficiently small. In Figure 4, we compare the proof size (left) and the run time (right) of both methods. On average, the proof size of EXTR was 90% higher than for HEUR, but the average inference step complexity4 was 10% lower, supporting our expectation that EXTR computes more detailed proofs. Interestingly, the maximum step complexity of EXTR was 29% higher, indicating that the elimination proofs can hide complex inferences. 4. Evee Protégé Plugins The easiest way to use our proof generators is via plug-ins for Protégé, a popular editor for OWL ontologies [14]. In Protégé, users can navigate the concept and role hierarchies of the ontology. When selecting a concept or role name, further information is shown, such as annotations and asserted or inferred equivalent concepts and superconcepts. To explain reasoning results in Protégé, one can click on the “?”-button next to an entailment. The standard explanation 4 This is measured using the justification complexity from [29]. Figure 5: Loading screen during proof generation (left) and a finished proof (right) consists of a list of justifications for the axiom, but this has been extended to support proofs via the protege-proof-explanation5 plug-in, which relies on the proof utility library 6 (PULi) [15]. Our proof generators are available in several Protégé plug-ins under the umbrella name Evee- protege, utilizing this existing functionality. The plug-ins can be installed by copying their JAR-files into Protégé’s plugin folder, where the elimination proof methods are available in two versions for Lethe and Fame. Each proof generation method is registered as a ProofService for the protege-proof-explanation plug-in, which can be accessed from a drop-down list in the explanation dialog of Protégé (see Figure 5). Once selected, a ProofService starts generating the actual proof and a progress bar shows updates to the user. After proof generation is completed, the result can be explored interactively in a tree view. By clicking on the grey triangle next to an axiom, the proof steps that lead to this conclusion can be recursively unraveled. Since proof generation can take some time to complete, users can abort the process by clicking the “Cancel”-button or by closing the progress window. The proof generators that are based on search will then assemble the best proof found so far, provided they already found one. If this is the case, a warning message appears to inform the user that the proof may be sub-optimal. To facilitate modularity, the common functionalities of the proof services are implemented in evee-protege-core, which is not a plug-in itself (in the sense that it does not provide any extension to Protégé). Rather, evee-protege-core exports a collection of Java class files to Protégé that are required by the actual plug-ins. Each of our plug-ins that provides a ProofService uses these classes by inheriting from AbstractEveeProofService and instantiating one of our proof generators. In order to add a ProofService for Protégé, the method getProof needs to be implemented. This method returns a DynamicProof which in turn needs to implement a method getInferences to provide the actual inferences to Protégé. Our plug-ins define this method in the class EveeDynamicProofAdapter of evee-protege-core. The dynamic nature of the proof comes in very handy as it allows for the proof generation to be executed in a background thread, and to later 5 https://github.com/liveontologies/protege-proof-explanation 6 https://github.com/liveontologies/puli update the proof explanation window with the constructed proof. To our knowledge [15], a dynamic proof was originally intended to be automatically updated after an asserted axiom is deleted. For this, a DynamicProof utilizes ChangeListeners which can be informed about any changes made to a proof that is currently shown to the user. Our plug-ins use this feature by first showing a proof consisting only of a single “placeholder” inference that is displayed when the proof generation is started. This inference has as its conclusion the axiom for which the proof is created and as its name a short message indicating that the proof is currently being computed (see Figure 5). Once the proof generation is complete, the method inferencesChanged is called on all registered ChangeListeners, which causes Protégé to update the UI with the actual proof. If no proof can be shown due to a cancellation or if an error occurred during proof generation, the “placeholder” inference will be updated accordingly. 4.1. User Study We conducted a small qualitative user study to evaluate the usability of the plug-ins, and to validate the variety of proof generation options that Evee-protege provides. We held individual interviews online, where participants installed the plug-ins, performed five tasks,7 and answered questions related to these tasks. Each task specified an entailment, for which the participants were asked to generate two proofs using different methods, and compare them in terms of ease of understanding (See the extended version [30] for all proofs used in the study). Lastly, participants could provide feedback about what additional features they would like to have. In total, we interviewed 10 participants, all of which often work with Description Logics. In Tasks 1 and 2, participants were asked to compare proofs generated for an atomic concept inclusion in the Skin Phsyiology Ontology (SPO).8 For Task 1, they compared the detailed Lethe-based proof with the elimination proof generated using Lethe with the heuristic method. Asked which proof is the easier to understand, 60% chose the elimination proof, 10% the detailed proof and 30% found no difference between the two. Most of the participants that found no difference stated that it took them the same amount of effort to understand both proofs. For the other participants, the structure of proofs and axioms and the type of expressions used were the reasons to prefer one proof over the other. For Task 2, participants compared elimination proofs generated by Lethe and Fame using the heuristic method. Here, 50% prefered the Fame proof and 30% the Lethe proof in terms of understandability, while 20% of the participants found no differences between them. Participants that prefered the Fame proof pointed out a contradiction caused by the interplay between a universal and an existential restriction. For these participants, inferences like that are easier to follow. At the same time, the lack of such interplay in the Lethe proof, which used only existential restrictions, is the reason why most of the other participants prefered this proof. In Task 3, participants were asked to prove a concept unsatisfiability in a modified version of the pizza ontology9 using the proofs provided by the original Elk proof plugin [15] and the size-optimized elimination proof using Fame. Asked which inference steps were easier to understand, 70% chose the Elk proof, 20% preferred the elimination proof, and 10% found 7 Tasks and ontologies used in the study: https://cloud.perspicuous-computing.science/s/cy5Y3kj94AGYSDB 8 https://bioportal.bioontology.org/ontologies/SPO 9 https://protege.stanford.edu/ontologies/pizza/pizza.owl no difference. However, when asked if they agree that a large proof with simple inferences is easier to understand than a smaller proof with intricate inferences, the answers varied. Some agreed with the statement, some agreed only as long as a certain ratio between the size of a proof and the difficulty of its inferences is not exceeded. Some pointed out that despite their agreement they think that people will eventually get used to the more complex inferences and how they work. On the other hand, there were some participants who totally disagreed with the statement. The main reason is that a short proof provides a better overview. For the last two tasks, participants looked into proofs for an atomic concept inclusion in the BioTop ontology.10 In Task 4, the comparison was between the original Elk plugin and the Elk proofs optimized for size. In Task 5, the comparison was between the optimized Elk Proof and detailed proof generated by Lethe. For both tasks, the answers were unanimous. Participants found the easier proofs to be the optimized Elk proof and the detailed proof, respectively. The reason is that the proofs get shorter with simpler axioms and inferences. The diversity of the answers confirms our assumption that the best method for generating proofs is often subjective, which justifies having all the different proof generation methods in Evee-protege. Some participants commented on the presentation and interaction with the proofs in Protégé (e.g. the possibilities to expand the entire proof directly and not one inference at a time, to abbreviate names, and to choose between alternative forms of the same expression, etc.). Some of these comments have already been addressed in our tool Evonne. 5. Evonne The proof visualization and interaction techniques in Evee-protege are restricted by the capabilities of Protégé’s explanation service. More advanced proof exploration is offered by Evee’s big sister Evonne [18, 21, 22], a web application that can be tried online and installed locally using Docker.11 Evonne visualizes proofs with various interaction components and layouts to support the exploration of large proofs. The proof visualization is linked to a second view showing the context of the proof in the ontology by visualizing a subset (module) of the ontology that is relevant to the proof. We only give an overview of the proof visualisation here. When starting Evonne, users need to create a new project and associate an OWL ontology file. They can then select an entailed atomic concept inclusion to be explained, and choose between the different proof generation methods described in Section 3. In addition, users can choose to provide a set of terms that they are familiar with (a signature) – axioms using only those terms are then assumed to be known to the user, and consequently are not explained in the proof view (see [18] for more details on this). Proofs are shown as graphs with two kinds of vertices: colored vertices for axioms, gray ones for inference steps. By default, proofs are illustrated using a tree layout. However, it is also possible to present proofs in a vertical layout, placing axioms linearly below each other, with inferences represented through edges on the side (without the inference vertices). It is possible to automatically re-order vertices to minimize the distance between conclusion and premises in each step. Another layout option is a bidirectional layout, which is a tree layout 10 https://bioportal.bioontology.org/ontologies/BT 11 https://www.docker.com/ Figure 6: Evonne: Overview of the proof visualization component where, initially, the entire proof is collapsed into a magic vertex that links the conclusion directly to its justification, and from which individual inference steps can be pulled out and pushed back from both directions. In all views, each vertex is equipped with multiple functionalities for exploring a proof. For proofs generated using Elk, clicking on an inference vertex shows the inference rule used, and the particular inference with relevant sub-elements highlighted in different colors (see Figure 6). From each axiom vertex, users can alter the visual structure of the proof by hiding branches, revealing the previous inference step from the current node or showing the entire proof. Each of the different proof layouts has specific controls in this regard - for example, the vertical layout does not show inference rules as nodes and instead allows inspection on demand. Additionally, trays of buttons above and below nodes can be used to manipulate the representation of axioms (by shortening names or using natural language), and to highlight justifications and diagnoses in the ontology view [21]. 6. Conclusion We have presented a collection of methods for generating DL proofs that can be explored in various ways. Future work includes adding support for specifying a “known” signature to Evee-protege, the development of proof generation methods for even more expressive logics (e.g. using consequence-based reasoners such as Sequoia [31]), and visualizing approaches for explaining missing entailments such as counter-interpretations [3] and abduction [4]. Acknowledgments This work was supported by DFG grant 389792660 as part of TRR 248 – CPEC, and the DFG Research Training Group QuantLA, GRK 1763. References [1] F. Baader, I. Horrocks, C. Lutz, U. Sattler, An Introduction to Description Logic, Cambridge University Press, 2017. URL: http://dltextbook.org/. doi:10.1017/9781139025355. [2] M. Horridge, S. Bechhofer, The OWL API: A java API for OWL ontologies, Semantic Web 2 (2011) 11–21. doi:10.3233/SW-2011-0025. [3] C. Alrabbaa, W. Hieke, A. Turhan, Counter model transformation for explaining non- subsumption in EL, in: C. Beierle, M. Ragni, F. Stolzenburg, M. Thimm (Eds.), Proceedings of the 7th Workshop on Formal and Cognitive Reasoning co-located with the 44th German Conference on Artificial Intelligence (KI 2021), September 28, 2021, volume 2961 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 9–22. URL: http://ceur-ws.org/Vol-2961/ paper_2.pdf. [4] F. Haifani, P. Koopmann, S. Tourret, Abduction in ℰℒ via translation to FOL, in: R. A. Schmidt, C. Wernhard, Y. Zhao (Eds.), Proceedings of the Second Workshop on Second- Order Quantifier Elimination and Related Topics (SOQE 2021) associated with the 18th International Conference on Principles of Knowledge Representation and Reasoning (KR 2021), Online Event, November 4, 2021, volume 3009 of CEUR Workshop Proceedings, CEUR-WS.org, 2021, pp. 46–58. URL: http://ceur-ws.org/Vol-3009/paper2.pdf. [5] D. L. McGuinness, Explaining Reasoning in Description Logics, Ph.D. thesis, Rutgers University, NJ, USA, 1996. doi:10.7282/t3-q0c6-5305. [6] A. Borgida, E. Franconi, I. Horrocks, Explaining ALC subsumption, in: W. Horn (Ed.), ECAI 2000, Proceedings of the 14th European Conference on Artificial Intelligence, IOS Press, 2000, pp. 209–213. URL: http://www.frontiersinai.com/ecai/ecai2000/pdf/p0209.pdf. [7] S. Schlobach, R. Cornet, Non-standard reasoning services for the debugging of description logic terminologies, in: G. Gottlob, T. Walsh (Eds.), IJCAI-03, Proceedings of the Eighteenth International Joint Conference on Artificial Intelligence, Morgan Kaufmann, 2003, pp. 355– 362. URL: http://ijcai.org/Proceedings/03/Papers/053.pdf. [8] F. Baader, R. Peñaloza, B. Suntisrivaraporn, Pinpointing in the description logic EL+ , in: J. Hertzberg, M. Beetz, R. Englert (Eds.), KI 2007: Advances in Artificial Intelligence, 30th Annual German Conference on AI, volume 4667 of Lecture Notes in Computer Science, Springer, 2007, pp. 52–67. doi:10.1007/978-3-540-74565-5_7. [9] M. Horridge, Justification Based Explanation in Ontologies, Ph.D. thesis, University of Manchester, UK, 2011. URL: https://www.research.manchester.ac.uk/portal/files/54511395/ FULL_TEXT.PDF. [10] M. Horridge, B. Parsia, U. Sattler, Justification oriented proofs in OWL, in: P. F. Patel- Schneider, Y. Pan, P. Hitzler, P. Mika, L. Zhang, J. Z. Pan, I. Horrocks, B. Glimm (Eds.), The Semantic Web - ISWC 2010 - 9th International Semantic Web Conference, ISWC 2010, volume 6496 of Lecture Notes in Computer Science, Springer, 2010, pp. 354–369. doi:10.1007/978-3-642-17746-0_23. [11] S. Schlobach, Explaining subsumption by optimal interpolation, in: J. J. Alferes, J. A. Leite (Eds.), Logics in Artificial Intelligence, 9th European Conference, JELIA 2004, volume 3229 of Lecture Notes in Computer Science, Springer, 2004, pp. 413–425. doi:10.1007/ 978-3-540-30227-8_35. [12] C. Alrabbaa, F. Baader, S. Borgwardt, P. Koopmann, A. Kovtunova, Finding small proofs for description logic entailments: Theory and practice, in: E. Albert, L. Kovács (Eds.), LPAR 2020: 23rd International Conference on Logic for Programming, Artificial Intelligence and Reasoning, volume 73 of EPiC Series in Computing, EasyChair, 2020, pp. 32–67. doi:10. 29007/nhpp. [13] Y. Kazakov, M. Krötzsch, F. Simancik, The incredible ELK - from polynomial procedures to efficient reasoning with ℰℒ ontologies, J. Autom. Reason. 53 (2014) 1–61. doi:10.1007/ s10817-013-9296-3. [14] M. A. Musen, The Protégé project: A look back and a look forward, AI Matters 1 (2015) 4–12. doi:10.1145/2757001.2757003. [15] Y. Kazakov, P. Klinov, A. Stupnikov, Towards reusable explanation services in protege, in: A. Artale, B. Glimm, R. Kontchakov (Eds.), Proceedings of the 30th International Workshop on Description Logics, volume 1879 of CEUR Workshop Proceedings, CEUR-WS.org, 2017. URL: http://ceur-ws.org/Vol-1879/paper31.pdf. [16] C. Alrabbaa, F. Baader, S. Borgwardt, P. Koopmann, A. Kovtunova, On the complexity of finding good proofs for description logic entailments, in: S. Borgwardt, T. Meyer (Eds.), Proceedings of the 33rd International Workshop on Description Logics (DL 2020), volume 2663 of CEUR Workshop Proceedings, CEUR-WS.org, 2020. URL: http://ceur-ws.org/ Vol-2663/paper-1.pdf. [17] C. Alrabbaa, F. Baader, S. Borgwardt, P. Koopmann, A. Kovtunova, Finding good proofs for description logic entailments using recursive quality measures, in: A. Platzer, G. Sutcliffe (Eds.), Automated Deduction - CADE 28 - 28th International Conference on Automated Deduction, volume 12699 of Lecture Notes in Computer Science, Springer, 2021, pp. 291–308. doi:10.1007/978-3-030-79876-5_17. [18] C. Alrabbaa, F. Baader, S. Borgwardt, R. Dachselt, P. Koopmann, J. Méndez, Evonne: Interactive proof visualization for description logics (system description), in: Automated Reasoning - 11th International Joint Conference, IJCAR 2022, Lecture Notes in Com- puter Science, Springer, 2022. URL: https://lat.inf.tu-dresden.de/research/papers/2022/ ALBABODAKOME-IJCAR22.pdf, to appear. [19] P. Koopmann, LETHE: Forgetting and uniform interpolation for expressive description logics, Künstliche Intell. 34 (2020) 381–387. doi:10.1007/s13218-020-00655-w. [20] Y. Zhao, R. A. Schmidt, FAME: An automated tool for semantic forgetting in expressive description logics, in: D. Galmiche, S. Schulz, R. Sebastiani (Eds.), Automated Reasoning - 9th International Joint Conference, IJCAR 2018, volume 10900 of Lecture Notes in Computer Science, Springer, 2018, pp. 19–27. doi:10.1007/978-3-319-94205-6_2. [21] C. Alrabbaa, F. Baader, R. Dachselt, T. Flemisch, P. Koopmann, Visualising proofs and the modular structure of ontologies to support ontology repair, in: S. Borgwardt, T. Meyer (Eds.), Proceedings of the 33rd International Workshop on Description Logics (DL 2020), volume 2663 of CEUR Workshop Proceedings, CEUR-WS.org, 2020. URL: http://ceur-ws.org/ Vol-2663/paper-2.pdf. [22] T. Flemisch, R. Langner, C. Alrabbaa, R. Dachselt, Towards designing a tool for under- standing proofs in ontologies through combined node-link diagrams, in: V. Ivanova, P. Lambrix, C. Pesquita, V. Wiens (Eds.), Proceedings of the Fifth International Work- shop on Visualization and Interaction for Ontologies and Linked Data (VOILA 2020), volume 2778 of CEUR Workshop Proceedings, CEUR-WS.org, 2020, pp. 28–40. URL: http: //ceur-ws.org/Vol-2778/paper3.pdf. [23] C. Lutz, F. Wolter, Foundations for uniform interpolation and forgetting in expressive description logics, in: T. Walsh (Ed.), IJCAI 2011, Proceedings of the 22nd International Joint Conference on Artificial Intelligence, IJCAI/AAAI, 2011, pp. 989–995. doi:10.5591/ 978-1-57735-516-8/IJCAI11-170. [24] N. Nikitina, S. Rudolph, (non-)succinctness of uniform interpolants of general terminologies in the description logic ℰℒ, Artif. Intell. 215 (2014) 120–140. doi:10.1016/j.artint. 2014.06.005. [25] M. Horridge, B. Parsia, U. Sattler, Explaining inconsistencies in OWL ontologies, in: L. Godo, A. Pugliese (Eds.), Scalable Uncertainty Management, Third International Con- ference, SUM 2009, volume 5785 of Lecture Notes in Computer Science, Springer, 2009, pp. 124–137. doi:10.1007/978-3-642-04388-8_11. [26] B. Glimm, I. Horrocks, B. Motik, G. Stoilos, Z. Wang, HermiT: An OWL 2 reasoner, J. Autom. Reason. 53 (2014) 245–269. URL: https://doi.org/10.1007/s10817-014-9305-1. [27] P. Koopmann, R. A. Schmidt, Forgetting concept and role symbols in 𝒜ℒ𝒞ℋ-ontologies, in: K. L. McMillan, A. Middeldorp, A. Voronkov (Eds.), Logic for Programming, Arti- ficial Intelligence, and Reasoning - 19th International Conference, LPAR-19, volume 8312 of Lecture Notes in Computer Science, Springer, 2013, pp. 552–567. doi:10.1007/ 978-3-642-45221-5_37. [28] N. Matentzoglu, B. Parsia, Bioportal snapshot 30.03.2017, 2017. doi:10.5281/zenodo. 439510. [29] M. Horridge, S. Bail, B. Parsia, U. Sattler, Toward cognitive support for OWL justifications, Knowl. Based Syst. 53 (2013) 66–79. doi:10.1016/j.knosys.2013.08.021. [30] C. Alrabbaa, S. Borgwardt, T. Friese, P. Koopmann, J. Méndez, A. Popovič, On the Eve of True Explainability for OWL Ontologies: Description Logic Proofs with Evee and Evonne (Extended Version), 2022. doi:10.48550/ARXIV.2206.07711. [31] A. Bate, B. Motik, B. C. Grau, F. Simancik, I. Horrocks, Extending consequence-based reasoning to 𝒮ℛℐ𝒬, in: C. Baral, J. P. Delgrande, F. Wolter (Eds.), Principles of Knowledge Representation and Reasoning: Proceedings of the Fifteenth International Conference, KR 2016, AAAI Press, 2016, pp. 187–196. URL: http://www.aaai.org/ocs/index.php/KR/KR16/ paper/view/12882.