Querying Inconsistent Description Logic Knowledge Bases under Preferred Repair Semantics (Extended Abstract) Meghyn Bienvenu1 , Camille Bourgaux1 , and François Goasdoué2 1 LRI, CNRS & Université Paris-Sud, Orsay, France 2 IRISA, Université de Rennes 1, Lannion, France Abstract. Many inconsistency-tolerant semantics for DLs rely on the notion of a repair, i.e., a ⊆-maximal subset of the ABox that is consistent with the TBox. This extended abstract presents our investigation [4] of variants of two popular inconsistency-tolerant semantics obtained by replacing classical repairs by various types of preferred repair. 1 Introduction Recent years have seen an increasing interest in performing database-style query answering over description logic (DL) knowledge bases (KBs). Since scalability is crucial in data-rich applications, there has been a trend to using lightweight DLs for which query answering is tractable w.r.t. the size of the ABox. Partic- ular attention has been paid to DLs of the DL-Lite family [6] for which query answering can be reduced to evaluation of standard database queries. An important issue that arises in the context of DL query answering is how to handle the case in which the ABox is inconsistent with the TBox. Indeed, while it may be reasonable to assume that the TBox has been properly debugged, the ABox will typically be very large and subject to frequent modifications, both of which make errors likely. Since it may be too difficult or costly to identify and fix these errors, it is essential to be able to provide meaningful answers to queries in the presence of such data inconsistencies. To address this issue, sev- eral different inconsistency-tolerant semantics have been proposed for querying inconsistent DL knowledge bases. Among them, the AR and IAR semantics [10] are the most well-known and well-studied. Both semantics are based upon the notion of a repair, defined as an inclusion-maximal subset of the ABox which is consistent with the TBox. The AR semantics, which was inspired by consistent query answering in relational databases (cf. [2] for a survey), amounts to com- puting those answers that hold no matter which repair is chosen, whereas the more cautious IAR semantics queries the intersection of the repairs. When additional information on the reliability of ABox assertions is available, it is natural to use this information to identify preferred repairs, and to use the latter as the basis of inconsistency-tolerant query answering. This motivated us to study variants of the AR and IAR semantics obtained by replacing the classical notion of repair by one of four different types of preferred repairs. Cardinality- maximal repairs, denoted ≤-repairs, are intended for settings in which all ABox assertions are believed to have the same likelihood of being correct. The other three types of preferred repairs target the scenario in which some assertions are considered to be more reliable than others, which can be captured qualitatively by partitioning the ABox into priority levels (and then applying either the set inclusion (⊆P -repairs) or cardinality (≤P -repairs) criterion to each level), or quantitatively by assigning weights to the ABox assertions and selecting those repairs having the greatest weight (≤w -repairs). 2 Complexity Results There have been several studies of the complexity of querying DL knowledge bases under the AR and IAR semantics for the standard notion of repairs [10, 14, 3, 5]. The complexity of AR query answering with ≤- or ≤w -repairs was first studied in the database setting for denial constraints [11], and more recently in the DL setting for the highly expressive SHIQ [9]. However, there had been no systematic study of the computational properties of preferred repairs for important lightweight DLs. We addressed this gap in the literature by studying the complexity of answering conjunctive and atomic queries under the eight preferred repair semantics (cf. Figure 1). We focused on the lightweight logic DL-LiteR (which underlies the OWL 2 QL profile [13]), though many of our results hold also for other data-tractable ontology languages. Unsurprisingly, the results are mainly negative, showing that adding prefer- ences increases complexity. For the IAR semantics, we move from polynomial data complexity in the case of (plain) IAR semantics to coNP-hard data com- plexity (or worse) for IAR semantics based on preferred repairs. For the AR semantics, query answering is known to be coNP-complete in data complexity already for the standard notion of repairs, but adding preferences often results in even higher complexities. The sole exception is ⊆P -repairs (which combine pri- ority levels and the set inclusion criterion), for which both AR and IAR query answering are “only” coNP-complete in data complexity. The lower combined complexity of the IAR semantics shows that they still retain some computational advantage over the AR semantics. Intuitively, this is because one can precompute the intersection of repairs in an offline phase and then utilize standard querying algorithms at query time, whereas no such pre-computation is possible for the AR semantics. 3 System and Experiments We are aware of two systems for inconsistency-tolerant query answering over DL KBs: the system of [9] for querying SHIQ KBs under ≤w -AR semantics, and the QuID system [15] that handles IAR semantics (without preferences) and DL- LiteA KBs. The latter work left open whether, for the standard notion of repairs, AR IAR AR IAR ⊆ coNP in P ⊆ Π2p NP ≤ ∆p p 2 [O(log n)] ∆2 [O(log n)] ≤ Πp 2 ∆p 2 [O(log n)] ⊆P coNP coNP ⊆P Πp 2 ∆p 2 [O(log n)] † † † ≤P & ≤w ∆p 2 ∆p2 p ≤P & ≤w Π2 ∆p 2 Data complexity Combined complexity Fig. 1: Complexity of conjunctive query (CQ) entailment over DL-Lite KBs under AR and IAR semantics for different types of preferred repairs. For atomic queries, the data and combined complexity coincide with the data complexity for CQs. All results are completeness results unless otherwise noted. New results in bold. † ∆p2 [O(log n)]- complete if there is a bound on the number of priority classes (resp. maximal weight). the IAR semantics constitutes a good approximation of the AR semantics, and whether the AR semantics could be feasibly computed in practice. Encouraged by the performance of modern-day SAT solvers, we proposed a practical SAT-based approach for query answering in DL-LiteR under the (plain) AR, ⊆P -IAR, and ⊆P -AR semantics. We showed how to encode query answering under these three semantics in terms of propositional unsatisfiability, using a reachability analysis on the conflict graph (inspired by [7]) to reduce the size of the encodings. We implemented the approach in our CQAPri system. The ABox is stored in a PostgreSQL server, while the TBox is kept in-memory together with the pre-computed set of conflicts for the KB. When a query arrives, CQAPri evaluates it over the ABox using its SQLized rewriting (obtained with the Rapid query rewriting engine [8]), to get its possible answers and their causes. Among the possible answers, CQAPri identifies the IAR ones, or an approximation of the ⊆P -IAR ones. For those possible answers that are not found to be IAR answers (resp. in the approximation of the ⊆P -IAR answers), deciding whether they are entailed under the AR semantics (resp. the ⊆P -IAR and ⊆P -AR semantics), is done using our encodings and the SAT4J SAT solver [1]. An experimental evaluation demonstrates the scalability of the approach in settings we presume realistic. We considered the DL-LiteR LUBM∃20 TBox from [12] and the associated Extended University Data Generator (EUDG). We extended LUBM∃20 with negative inclusions and introduced inconsistencies in the ABox. Prioritizations of ABox were made to capture a variety of scenarios. The exper- iments show that our system scales up to large ABoxes (up to about 2 million assertions) for the IAR/AR and ⊆P -IAR/⊆P -AR semantics, when the number of conflicting assertions varies from a few percent (for all of these semantics) to a few tens of percent (only for IAR/AR). This positive empirical result is due in large part to the efficacy of the incomplete methods, which leave only few cases to be handled by the SAT solver. In a similar vein, our simple approximation of the ⊆P -IAR semantics often produced a large share of the ⊆P -IAR answers, which themselves constituted a large portion of the ⊆P -AR answers. Acknowledgements This work was supported by contract ANR-12-JS02-007-01. We acknowledge Despoina Trivela for her work on an early version of the system. References 1. Berre, D.L., Parrain, A.: The sat4j library, release 2.2. JSAT 7(2-3), 59–64 (2010) 2. Bertossi, L.E.: Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management, Morgan & Claypool Publishers (2011) 3. Bienvenu, M.: On the complexity of consistent query answering in the presence of simple ontologies. In: Proc. of AAAI (2012) 4. Bienvenu, M., Bourgaux, C., Goasdoué, F.: Querying inconsistent description logic knowledge bases under preferred repair semantics. In: Proc. of AAAI (2014) 5. Bienvenu, M., Rosati, R.: Tractable approximations of consistent query answering for robust ontology-based data access. In: Proc. of IJCAI (2013) 6. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable reasoning and efficient query answering in description logics: The DL-Lite family. Journal of Automated Reasoning 39(3), 385–429 (2007) 7. Chomicki, J., Marcinkowski, J., Staworko, S.: Computing consistent query answers using conflict hypergraphs. In: Proc. of CIKM. pp. 417–426 (2004) 8. Chortaras, A., Trivela, D., Stamou, G.: Optimized query rewriting for OWL 2 QL. In: Proc. of CADE (2011) 9. Du, J., Qi, G., Shen, Y.D.: Weight-based consistent query answering over inconsis- tent SHIQ knowledge bases. Knowledge and Information Systems 34(2), 335–371 (2013) 10. Lembo, D., Lenzerini, M., Rosati, R., Ruzzi, M., Savo, D.F.: Inconsistency-tolerant semantics for description logics. In: Proc. of RR. pp. 103–117 (2010) 11. Lopatenko, A., Bertossi, L.E.: Complexity of consistent query answering in databases under cardinality-based and incremental repair semantics. In: Proc. of ICDT. pp. 179–193 (2007) 12. Lutz, C., Seylan, I., Toman, D., Wolter, F.: The combined approach to OBDA: Taming role hierarchies using filters. In: Proc. of ISWC. pp. 314–330 (2013) 13. Motik, B., Cuenca Grau, B., Horrocks, I., Wu, Z., Fokoue, A., Lutz, C.: OWL 2 Web Ontology Language profiles. W3C Recommendation (11 December 2012), http://www.w3.org/TR/owl2-profiles/, available at http://www.w3.org/ TR/owl2-profiles/ 14. Rosati, R.: On the complexity of dealing with inconsistency in description logic ontologies. In: Proc. of IJCAI. pp. 1057–1062 (2011) 15. Rosati, R., Ruzzi, M., Graziosi, M., Masotti, G.: Evaluation of techniques for in- consistency handling in OWL 2 QL ontologies. In: Proc. of ISWC. pp. 337–349 (2012)