Mary, What’s Like All Cats? Andreas Ecke1? , Rafael Peñaloza1,2?? , and Anni-Yasmin Turhan1? ? ? 1 Institute for Theoretical Computer Science, Technische Universität Dresden 2 Center for Advancing Electronics Dresden {ecke,penaloza,turhan}@tcs.inf.tu-dresden.de In this extended abstract we report on results recently achieved for answer- ing instance queries relaxed by concept similarity measures [4]. Traditionally, Description Logic (DL) reasoning systems only support crisp inference services, like subsumption and instances queries. The latter can be effectively used to perform different types of search tasks: Given an ABox describing a set of in- dividuals, an instance query returns all those that are instance of the query concept Q, rejecting all others. However, often it is also interesting to consider those individuals that are not instances: Are they completely different to Q or how similar are they to Q? In cases where the original query does not retrieve any resulting individuals, those individuals that are ‘very close’ to being an instance can still be a good alternative. The instance queries that do not only return the instances but also those that nearly match the query concept are called relaxed instance queries [3]. A natural way to relax instance queries is by using concept similarity measures (CSMs). Such a measure ∼ is a function that assigns to each pair of concepts a similarity value between 0 and 1. Together with a fixed threshold t, the instance query can be relaxed by returning all individuals that are instance of a concept with a similarity value of at least t to the query concept w.r.t. ∼. One advantage of using CSMs as a parameter for this inference is that they can implement different notions of similarity, and regard certain features more important than others. This allows to relax queries with respect to certain features, but leave others fixed (compare Figure 1). Example 1. Mary likes cats [7] and has a few of them as pets. As such, her view on the similarity between other animals and cats is highly influenced by how these animals behave as pets. A dog which lives inside the house, which likes getting stroked and begs for food is more similar to a cat than a wild lion in Africa. Or more formally put: Mary’s view on similarity can be expressed by a CSM ∼Mary , which weights features related to keeping animals as a pet higher than other features and therefore yields: (Cat ∼Mary Dog) > (Cat ∼Mary Lion). Jane, Mary’s best friend, is a biologist and her view on the similarity of animals is characterized by the anatomy and evolution of animals and thus resembles the biological taxonomy of animals. As such, Jane finds lions are more similar to cats than dogs, since both cats and lions belong to the felidae family, ? Supported by DFG Graduiertenkolleg 1763 (QuantLA). ?? Partially supported by DFG within the Cluster of Excellence ‘cfAED’ ??? Partially supported by the German Research Foundation (DFG) in the Collaborative Research Center 912 “Highly Adaptive Energy-Efficient Computing”. M(Q, b)I ∆I b msc(b) QI a msc(a) QI, M(Q, a)I Fig. 1: Relaxed instances w.r.t. two dif- Fig. 2: Two individuals, their MSCs (dot- ferent CSMs (solid and dashed). Darker ted), and the mimics of a concept Q w.r.t. colors represent larger thresholds t. the individuals (dashed). whereas dogs belong to the canidae family. Again, if we express her view on similarity as a CSM ∼Jane , it incorporates mainly features related to the anatomy and evolution of animals and thus yields (Cat ∼Jane Lion) > (Cat ∼Jane Dog). The CSMs ∼Mary and ∼Jane can be used to query a knowledge base describing different animals. If Mary is looking for a new pet and specifies its properties as an instance query, then using ∼Mary yields more useful results than ∼Jane , as Mary would rather keep a dog than a lion. Whereas, if Jane finds an animal unknown to her and wants to identify its species, a query using Jane’s CSM ∼Jane yields better results. We consider the Description Logic EL and distinguish between two kinds of TBoxes: unfoldable TBoxes, which are acyclic and only contain concept defini- tions A ≡ C and general TBoxes which contain (possibly cyclic) GCIs C v D. ABoxes, knowledge bases, the semantics via interpretations I, and common in- ferences, are defined as usual [1]. Let C(EL) be the set of all EL-concept descriptions. A concept similarity measure ∼T w.r.t. a TBox T is a function ∼T : C(EL) × C(EL) → [0, 1], where C ∼T C = 1 for all concepts C ∈ C(EL). Several properties of CSMs have been formalized in [6], the most important ones here are symmetry and equivalence invariance; the latter expresses that the similarity value between two concepts remains the same when replacing one concept for an equivalent one w.r.t. T . Based on this notion we can formalize the central inference as follows: Definition 1 (relaxed instance). The individual a is a relaxed instance of the query concept Q w.r.t. the KB K, the CSM ∼T and the threshold t ∈ [0, 1) iff there exists a concept description X such that Q ∼T X > t and K |= X(a). To compute the relaxed instances of an EL-concept (w.r.t. an EL-KB) it is not feasible to compute all sufficiently similar concepts and then perform instance checking for those, since (1) the number of those concepts can be infinite leading to an infinite number of queries and (2) a similarity measure does not necessarily provide a method how to obtain a ‘sufficiently similar’ concept. The case of unfoldable TBoxes. To perform relaxed instance querying compute for each individual a in the ABox a concept that has a as an instance and resembles C most w.r.t. ∼T . We call this the mimic of C w.r.t. a and ∼T , and denote it by M(C, a); see Figure 2. If M(Q, a) ∼T Q ≥ t holds, then a is a relaxed instance of Q; otherwise, it cannot be a relaxed instance, as no concept can have a greater similarity value with Q while still containing a. In [3] we give a computation algorithm for mimics in EL. The idea is to compute the role-depth bounded most specific concept k-MSC of a [8], with the role-depth of Q as the role-depth bound k, and then remove sub-concepts from the resulting concept to make it more similar to Q. This approach requires that the CSM ∼T is symmetric, equivalence invariant, structural, i.e., it computes the similarity by induction on the structure of concepts, and is monotone in the sense that: l  l  X ∼ A ≥ X u ∃r.B ∼ A . A∈NC A∈NC The case of general TBoxes. For general EL-TBoxes, this role depth-based ap- proach does not work, as the query concept may have a cyclic definition. To solve this, we introduce a CSM ∼c that uses the canonical models of a concept C w.r.t. a TBox T , denoted with IC,T , and a similarity measure ∼i between interpretations as follows: C ∼c D = (IC,T , dC ) ∼i (ID,T , dD ). The family of CSMs ∼c for EL inherits several formal properties from ∼i , in particular symmetry and equivalence invariance. The interpretation similarity measure (ISM) ∼i can be parametrized by a weighting function that assigns different weights to each concept and role name, by a primitive measure between concept and role names and by a discounting factor. The ISM ∼i is defined as a fixed point, and can be computed using an it- erative algorithm, which converges towards the similarity value. By modifying this algorithm to generalize the pointed interpretation corresponding to the in- dividual a to take those subsets of the concept names and role-successors that yield the highest similarity value, it actually computes the similarity between the query concept Q and the mimic of Q w.r.t. a. This is sufficient to check if a is a relaxed instance of Q w.r.t. the threshold t. This way, we get an iterative algorithm that computes all relaxed instances of Q w.r.t. ∼c and t, and that is sound and complete, i.e., it only returns individuals that are definitely relaxed instances, and it will find all relaxed instances in finitely many iterations. To conclude, we have proposed a new reasoning service that allows relaxed instance query answering for application-specific notions of similarity by the appropriate choice of a CSM ∼T and threshold t. We investigated necessary requirements for the CSMs to be employed. We devised computation algorithms for relaxed instances in the setting with unfoldable and with general EL-TBoxes. For the latter setting we needed to introduce a new family of CSMs that take the whole information from general TBoxes into account. The ∼c CSMs are, to the best of our knowledge, the first CSMs of this kind for general TBoxes. Based on these we gave a computation algorithm for relaxed instances w.r.t. general TBoxes. For more details see [4, 5] and for its extension to EL++ see [2]. References 1. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. Patel-Schneider, edi- tors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003. 2. A. Ecke. Similarity-based relaxed instance queries in EL++ . In T. Lukasiewicz, R. Peñaloza, and A.-Y. Turhan, editors, Proceedings of the First Workshop on Logics for Reasoning about Preferences, Uncertainty, and Vagueness, CEUR-WS. CEUR, 2014. To appear. 3. A. Ecke, R. Peñaloza, and A.-Y. Turhan. Towards instance query answering for concepts relaxed by similarity measures. In L. Godo, H. Prade, and G. Qi, edi- tors, Workshop on Weighted Logics for AI (in conjunction with IJCAI’13), Beijing, China, 2013. 4. A. Ecke, R. Peñaloza, and A.-Y. Turhan. Answering instance queries relaxed by concept similarity. In Proceedings of the Fourteenth International Conference on Principles of Knowledge Representation and Reasoning (KR’14), Vienna, Austria, 2014. AAAI Press. To appear. 5. A. Ecke and A.-Y. Turhan. Similarity measures for computing relaxed instances w.r.t. general EL-TBoxes. LTCS-Report 13-12, Chair of Automata Theory, Insti- tute of Theoretical Computer Science, Technische Universität Dresden, Dresden, Germany, 2013. See http://lat.inf.tu-dresden.de/research/reports.html. 6. K. Lehmann and A.-Y. Turhan. A framework for semantic-based similarity measures for ELH-concepts. In L. F. del Cerro, A. Herzig, and J. Mengin, editors, Proc. of the 13th European Conf. on Logics in A.I. (JELIA 2012), Lecture Notes In Artificial Intelligence, pages 307–319. Springer, 2012. 7. C. Lutz and U. Sattler. Mary likes all cats. In F. Baader and U. Sattler, editors, Proceedings of the 2000 International Workshop in Description Logics (DL2000), number 33 in CEUR-WS, pages 213–226, Aachen, Germany, August 2000. RWTH Aachen. 8. R. Peñaloza and A.-Y. Turhan. A practical approach for computing generaliza- tion inferences in EL. In M. Grobelnik and E. Simperl, editors, Proceedings of the 8th European Semantic Web Conference (ESWC’11), Lecture Notes in Computer Science, pages 410–423. Springer, 2011.