A Novel Approach to Controlled Query Evaluation in DL-Lite (DISCUSSION PAPER) Domenico Lembo1 , Riccardo Rosati1 , Domenico Fabio Savo2 1 2 Sapienza Università di Roma Università degli Studi di Bergamo lastname@dis.uniroma1.it domenicofabio.savo@unibg.it Abstract. In Controlled Query Evaluation (CQE) confidential data are protected through a declarative policy and a (optimal) censor, which guarantees that answers to queries are maximized without disclosing secrets. In this paper we consider CQE over Description Logic ontologies and study query answering over all optimal censors. We establish data complexity of the problem for ontologies specified in DL-LiteR and for variants of the censor language, which is the language used by the censor to enforce the policy. In our investigation we also analyze the relationship between CQE and the problem of Consistent Query Answering. 1 Introduction In Controlled Query Evaluation (CQE), a policy, i.e., a set of logical assertions, regulates the access to a database or knowledge base by specifying the information that must be kept secret, and a censor alters answers to queries so that confidential data cannot be inferred by the users on the basis of the queries they ask. The notion of censor traces back to [16], and since then it has been investigated for propositional closed databases [6], incomplete databases [7], and, more recently, Description Logic (DL) ontologies [10,11]. In this latter context, optimal censors are defined as those censors that modify query answers in a “minimal” way. Intuitively, such censors hide data to preserve confidentiality according to the policy, without restricting unnecessarily the ability of the system to return answers to users’ queries. Previous work on CQE in DLs has mainly focused on the tasks of establishing the existence of an optimal censor and characterizing the complexity of computing it. However, considering only one such censor means making an arbitrary selection among several optimal censors. To avoid such a discretionary choice, in this paper we adopt a different approach and study query answering over all optimal censors. Intuitively, given a query q, this amounts to compute the answers to q that are returned by all optimal censors. This form of reasoning can also be considered as the application of a single Copyright c 2019 for the individual papers by the papers’ authors. Copying permitted for private and academic purposes. This volume is published and copyrighted by its editors. SEBD 2019, June 16-19, 2019, Castiglione della Pescaia, Italy. censor corresponding to an “intersection” of all the optimal censors, which is thus a semantically well-founded (i.e., sound) approximation of any optimal censor. This idea has been also previously discussed in [10]. Our approach has similarities with the work on Consistent Query Answering (CQA), a framework for inconsistency management based on the notion of repair [3,5]. Roughly speaking, in DL, a repair of a possibly inconsistent ontology O is any ABox for O (i.e., the extensional component of Q) that is consistent with the TBox (i.e., the intensional component of O), and that differs “minimally” from the original ABox. Then, comput- ing query answers in CQA amounts to reasoning over all repairs and the TBox. The connection between CQE and CQA in DL is based on the intuition that the assertions in the policy in CQE seems to act as the class of assertions in T that may be violated by the data of the ABox. Some connections between CQA and a declarative approach for privacy preservation had already been discussed in [4]. The framework in that paper is similar to ours, with so-called secrecy views playing essentially the role of the policy. However, the setting considered there is relational and without intensional knowledge (TBox), and secrecy views are enforced through suitable virtual modifications of database values with SQL NULLs, so that this approach is incomparable with ours. Nonetheless, in this paper we elaborate on the intuition of [4] and investigate in depth the relationship between our CQE framework and CQA in DLs. We provide some general conditions ensuring that the two problems are mutually reducible, and we show cases of practical interest for which such conditions are satisfied and cases for which they are not. This also allows us to highlight the differences between the two frameworks. The ultimate goal of this paper is to investigate data complexity of answering conjunctive queries (CQs) in CQE. In our analysis we consider ontologies specified in DL-LiteR [9]. We also consider some variants of the censor language LC . Intuitively, LC is the language in which the censor expresses the sentences implied by the ontology that can be disclosed to the users without violating the policy. We provide data complexity results for the cases when: (i) LC is the ABox of the ontology; (ii) LC coincides with the set of ground atoms expressed over the signature of the ontology; and (iii) LC is the language of CQs expressed over the signature of the ontology. Some of the complexity results follow from the correspondence between CQA and CQE; we provide novel techniques for the cases in which the CQE problem does not have a CQA counterpart. The complexity results proved in this paper are shown in Figure 1. This paper is an extended abstract of [15], where also complexity results for ontolo- gies specified in EL⊥ [1] are provided. 2 Preliminaries We consider a signature Σ of predicates and constants, and a countably infinite alphabet of variables V. To simplify the presentation, we consider only languages containing FO sentences, i.e., formulas without free variables (our results applies to open formulas as well, modulo standard encoding of open formulas into closed ones). We use FO to indicate the language of all function-free FO sentences over Σ and V. Every language considered in this paper is a subset of FO. Given a set K ⊆ FO, Mod (K) indicates the set of models of K, i.e., the FO interpretations I such that φI (i.e., the interpretation of φ in I) evaluates to true, for each sentence φ ∈ K. If I is a model of K, we say that I satisfies K and write I |= K. K is consistent if it has at least one model, i.e., if Mod (K) 6= ∅, inconsistent otherwise. K entails a FO sentence φ, denoted K |= φ, if φI is true in every I ∈ Mod (K). A Boolean conjunctive query (BCQ) q is a FO sentence of the form ∃x.conj(x), where conj(x) = α1 (x) ∧ . . . ∧ αn (x), x is a sequence of variables, and each αi (x) is an atom (possibly with constants) with predicate αi and variables in x. The length of a BCQ q is the number of its atoms, denoted by length(q). In the following, CQ denotes the language of BCQs over Σ and V, CQk the language of BCQs from CQ whose maximum length is k, and GA the language of ground atoms. Obviously, for every integer k, GA ⊂ CQk ⊂ CQ. Verifying whether K |= α for K ⊆ FO and α ∈ GA is also called instance checking. A DL ontology O is specified as T ∪ A, where T is the TBox and A is the ABox [2]. Throughout the paper an ABox is always a set of ground atoms. We are interested in DL-LiteR DL ontologies [9]. A DL-LiteR TBox is a finite set of assertions of the form: B1 v B2 , B1 v ¬B2 , R1 v R2 , R1 v ¬R2 , where: each Ri , with i ∈ {1, 2}, is an atomic role Q ∈ Σ, or its inverse Q− ; each Bi , with i ∈ {1, 2}, is an atomic concept A ∈ Σ, or a concept of the form ∃Q (resp. ∃Q− ), i.e., unqualified existential restriction, which denotes the set of objects occurring as first (resp. second) argument of Q. We also consider denial assertions (or simply denials) over concepts and roles, i.e., sentences of the form: ∀x.conj(x) → ⊥ where conj(x) is such that ∃x.conj(x) is a BCQ whose atoms use only unary and binary predicates. The length of the denial is the length of such query. A denial is satisfied by an ontology O if O 6|= ∃x.conj(x). In the following, given an ontology O and a language L ⊆ FO, we denote with L(O) the subset of formulas of L over the predicates and constants occurring in O. All the complexity results we provide refer to data complexity, that in our framework is the complexity computed only with respect to the size of the ontology ABox. 3 CQE Framework Our CQE framework is adapted from [11]. An L CQE instance E is a quadruple hT , A, P, LC i, where T is a TBox in the DL L, A is an ABox such that T ∪ A is consistent, P is the policy, i.e., a set of denial assertions over the signature of T ∪ A, such that T ∪ P is consistent, and LC ⊆ FO(T ∪ A) is the censor language. Intuitively: T is the schema a user interacts with to pose her queries; A is the dataset underlying the schema; P specifies the knowledge that cannot be disclosed for confidentiality reasons, in the sense that the user will never get, through query answers, sufficient knowledge to violate the denials in P; and LC is the language with respect to which the censor is specified, that is, the censor establishes which are the sentences in LC implied by T ∪ A that can be divulged to the user while preserving the policy (cf. Definition 1). To simplify the notation, we will sometimes omit to specify that a certain censor language is limited to the signature of T ∪ A (e.g., we will use GA instead of GA(T ∪ A)). We then define a censor in terms of its underlying theory. Definition 1. The theory Thcens of a censor cens for a CQE instance hT , A, P, LC i is a subset of LC such that: (i) T ∪ A |= φ, for each φ ∈ Thcens , and (ii) T ∪ P ∪ Thcens is consistent. A censor c is optimal if there is no censor c0 such that Thc ⊂ Thc0 ⊆ LC . The set of theories of all the optimal censors of a CQE instance E is denoted by Thoc -Set(E). Example 1. A CQE instance E = (T , A, P, CQ) is used by an international humanitar- ian organization for regulating the access to the information about its volunteer staff. In E, T is an empty TBox, A = {workedIn(v1 , cA ), workedIn(v1 , cB ), workedIn(v2 , cA )}, and P = {∀x.workedIn(x, cA ) ∧ workedIn(x, cB ) → ⊥}. In words, the policy specifies as confidential the fact that a volunteer worked in both country A and country B, given that these countries are currently at war with each other. Below we provide the definition of entailment in CQE1 . Definition 2. (CQE-entailment) Given a CQE instance E and a FO sentence φ, decide whether Th |= φ for every Th ∈ Thoc -Set(E). If this is the case, we write E |=CQE φ. As usual, when the language of φ is restricted to ground atoms (i.e., ABox assertions), CQE-entailment is called (CQE-)instance checking. Example 2. For the CQE instance E of Example 1, we have, for instance, that E |=CQE ∃x.workedIn(v1 , x) and E |=CQE ∃x.workedIn(x, cB ), but E 6|=CQE workedIn(v1 , cB ). 4 Relationship between CQE and CQA In this section we discuss the relationship between the CQE framework we have just defined and CQA. To this aim, we first provide a general definition for CQA. A L CQA instance J is a triple hT , A, LR i where T is a consistent TBox in the DL L, A is an ABox, and LR ⊆ FO(T ∪ A) is the repair language. The consistent entailment set in a language L of a (possibly inconsistent) theory T ∪ A, denoted by CES (T , A, L), is the set {φ | φ ∈ L and there exists a A0 ⊆ A such that T ∪ A0 is consistent and T ∪ A0 |= φ}. A repair for a CQA instance is defined as follows. Definition 3. A repair R for a CQA instance J = hT , A, LR i is a subset of LR such that: (i) R ⊆ CES (T , A, LR ) and; (ii) T ∪ R is consistent and; (iii) there does not exist any R0 such that R ⊂ R0 ⊆ CES (T , A, LR ) and T ∪ R0 is consistent. We denote by RepSet(J ) the set of repairs of J . Definition 3 captures some notions of repair proposed in the literature, such as the repair at the basis of the prototypical AR-semantics, or the repair adopted by the CAR- semantics [13,14]. Indeed, given an ontology O = T ∪ A, repairs in the AR-semantics aim to preserve as many facts as possible of those in A. This means that, in a CQA instance adopting the AR-semantics, the language LR has to be set to A. Differently, the CAR-semantics aims to preserve as many facts as possible of those implied by T and 1 Due to space limits, we give a simplified definition and refer to [15] for other formal details. each subset of A consistent with T . Thus, to encode such semantics LR has to coincide with the set GA(O) of ground atoms over the predicates and constants in O. We now provide some conditions on CQE and CQA instances that allow to establish correspondences between (theories of) censors and repairs. Definition 4. A CQE instance hT , A, P, LC i is CQA-reducible if: (i) for every φ ∈ LC such that T ∪ A |= φ and {φ} ∪ T ∪ P is consistent, there exists A0 ⊆ A such that T ∪ A0 ∪ P is consistent and T ∪ A0 |= φ; (ii) for every φ ∈ LC and every A0 ⊆ A such that T ∪ A0 ∪ P is consistent, if T ∪ A0 ∪ P |= φ then T ∪ A0 |= φ. In words, condition (i) imposes that every logical consequence of T ∪ A that is consistent with the policy and the TBox belongs to CES (T ∪ P, A, LC ). Condition (ii) instead says that in a CQA-reducible instance the sentences in the policy act as constraints on top of T ∪ A, since they never contribute to infer new formulas from LC if added to T ∪ A. Example 3. The instance E = hT , A, P, LC i with T = {A v B}, A = {A(d)}, P = {∀x.A(x) → ⊥}, and LC = GA is not CQA-reducible, since it does not respect condition (i), even though it satisfies condition (ii) (in a trivial way). Instead, E = hT , A0 , P 0 , LC i with A0 = {A(d), B(d)}, P = {∀x.A(x) ∧ B(x) → ⊥}, and T and LC as before is CQA-reducible. Theorem 1. Let E = hT , A, P, LC i be a CQE instance, such that E is a CQA-reducible. Then Thoc -Set(hT , A, P, LC i) = RepSet(hT ∪ P, A, LC i). Below we consider reducibility of CQA instances into CQE ones. Definition 5. A CQA instance hT , A, LR i is CQE-reducible if there exists a partition TP ∪ TN of T such that TP ∪ A is consistent, TN is equivalent to a set of denials, and: (i) for every φ ∈ LR , such that TP ∪ A |= φ and {φ} ∪ T is consistent, there exists A0 ⊆ A such that T ∪ A0 is consistent and T ∪ A0 |= φ; (ii) for every φ ∈ LR and every A0 ⊆ A such that T ∪ A0 is consistent, if T ∪ A0 |= φ then TP ∪ A0 |= φ. Intuitively, the above definition says that in a CQE-reducible instance we can identify a portion TN of T such that its assertions act as constraints on the ontology TP ∪ A (cond. (ii)), thus TN behaves as a policy in a CQE instance. At the same time, each logical consequence of TP ∪ A consistent with T must belong to CES (T , A, LR ) (cond. (i)). CQE-reducible instances have the following property. Theorem 2. Let J = hT , A, LR i be a CQA instance, such that J is CQE-reducible with T = TP ∪ TN . Then RepSet(hTP ∪ TN , A, LR i) = Thoc -Set(hTP , A, TN , LR i). We now rephrase entailment in CQA [14]: given a CQA instance J = hT , A, LR i and a FO sentence φ, decide whether T ∪ R |= φ for every R ∈ RepSet(J ). This is denoted by J |=CQA φ. The following results follow from Theorem 1 and Theorem 2. Corollary 1. Let E = hT , A, P, LC i be a CQA-reducible CQE instance and φ a FO sentence. Then, E |=CQE φ iff J |=CQA φ, where J = hT ∪ P, A, LC i. Furthermore, Let J = hT , A, LR i be a CQE-reducible CQA instance with T = TP ∪ TN , and φ a FO sentence. Then, J |=CQA φ iff E |=CQE φ, where E = hTP , A, TN , LR i . 5 CQE under Restricted Censor Languages In this section we establish data complexity of instance checking and CQ entailment for DL-LiteR CQE instances, when the censor language LC coincides with the ABox A, and with the set of ground atoms GA. For the former case, we establish our complexity results by exploiting a mutual reduction between entailment in CQE and CQA. For the latter case, the two frameworks behave in a slightly different way, and thus we also need to use techniques tailored to the CQE setting. We start with LC = A, and show that CQE instances are CQA-reducible, but also that CQA-instances are CQE-reducible when the repair language coincides with A. Theorem 3. Each DL-LiteR CQE instance hT , A, P, Ai is CQA-reducible, and each DL-LiteR CQA instance hT , A, Ai is CQE-reducible. The following result follows from Theorem 3 and the fact that CQ entailment in DL-LiteR,den CQA instances under AR-semantics is coNP-complete, already for instance checking [14]. Theorem 4. Both instance checking and CQ entailment are coNP-complete in data complexity for DL-LiteR CQE instances hT , A, P, Ai. We now consider LC = GA. In this case, DL-LiteR CQE instances are not always CQA-reducible, as shown in Example 3. Reducibility in the other way round is also not always possible. However, we can show some weaker, but useful, properties. Proposition 1. Each DL-LiteR CQE instance hT , A, P, GAi, such that T ∪ P ∪ {α} is satisfiable for each α ∈ A, is CQA-reducible. Also, each DL-LiteR CQA instance hT , A, GAi, such that T ∪ {α} is satisfiable for each α ∈ A, is CQE-reducible. For DL-LiteR CQE instances satisfying the conditions mentioned in Proposition 1 we can establish computational complexity of query answering by mutual reduction between CQE and CQA under CAR-semantics, similarly to what we have done to prove Theorem 4. In fact, we are able to exactly characterize the complexity for general DL-LiteR CQE instances by using tailored proofs, which are inspired to those used in CQA to establish both upper and lower complexity bounds. Theorem 5. Instance checking and CQ entailment are respectively in AC0 and coNP- complete in data complexity, for DL-LiteR CQE instances hT , A, P, GAi. 6 CQE under Full Censor Language In this section we study CQE-entailment when the censor language LC is CQ. We start with the following property that is central for our analysis. Theorem 6. Let T be a DL-LiteR TBox, let A be an ABox such that T ∪ A is con- sistent, let P be a policy, let q be a BCQ, and let k = max(h, length(q)), where h is the maximum length of a denial assertion in P. Then, hT , A, P, CQi |=CQE q iff hT , A, P, CQk i |=CQE q. LC = A LC = GA LC = CQ Instance Checking coNP-complete in AC0 in PTIME CQ Entailment coNP-complete coNP-complete in PTIME Fig. 1: Data complexity of entailment over DL-LiteR CQE instances hT , A, P, LC i In the rest of the section, without loss of generality, we assume that, in every CQE instance of the form hT , A, P, CQk i, all formulas of the language CQk (as well as the query q of the CQE-entailment problem) use the set of 2k variables {x1 , . . . , x2k }. We now define the CQE-Ent-DL-LiteR algorithm for deciding CQE-entailment of BCQs for DL-LiteR KBs. Algorithm CQE-Ent-DL-LiteR (T , A, P, q) Input: DL-LiteR TBox T , ABox A s. t. T ∪ A is consistent, policy P, BCQ q Output: true if hT , A, P, CQi |=CQE q, false otherwise let h be the maximum length of a denial in P; let k = max(h, length(q)); Φ = CQEntailedSubset(T , A, k); for i = 1 to k do remove from Φ every subset Φ0 such that |Φ0 | = i and T ∪ P ∪ Φ0 is inconsistent; if q ∈ Φ then return true else return false In the algorithm, CQEntailedSubset(T , A, k) is the function returning the set of BCQs from CQk that are entailed by T ∪ A. Informally, the algorithm first computes an integer k, based on the length of q and of the denials in P; then, it computes the set Φ that represents the intersection of the theories of the optimal censors for the CQE instance hT , A, P, CQk i: this is done by eliminating from Φ all formulas that belong to minimal subsets of Φ that are inconsistent with T ∪ P; finally, it checks the presence of q among the queries in the above set Φ. Theorem 7. Let E = hT , A, P, CQi be a DL-LiteR CQE instance, and q a BCQ. Then, E |=CQE q iff CQE-Ent-DL-LiteR (T , A, P, q) returns true. Algorithm CQE-Ent-DL-LiteR (T , A, P, q) runs in PTIME in the size of the ABox, since CQEntailedSubset(T , A, k) can be computed in polynomial time in the size of A and checking the consistency of T ∪ P ∪ Φ0 can be reduced to checking the consistency of a DL-LiteR,den ontology, which is polynomial in data complexity [14]. Theorem 8. Entailment of CQs is in PTIME in data complexity for DL-LiteR CQE instances hT , A, P, CQi. 7 Conclusions Our results (cf. Figure 1) show a surprising aspect: the complexity of CQ entailment for restricted censor languages is harder than when LC = CQ. Indeed, in this latter case, CQ entailment can be established through the computation of the intersection of (a finite and polynomial representation of) all the theories of optimal censors, which can be done in polynomial time. This does not hold for LC equal to A or GA. Our research work can be extended in many directions. First, the PTIME upper bound for CQE over DL-LiteR TBoxes and CQ censor language should be refined. We believe that an AC0 bound can be shown in this case. Then, the complexity analysis of CQE could be extended to other DLs, as well as to other policy and censor languages. Also, based on the complexity analysis presented in this paper, it would be important to look for practical techniques allowing for the implementation of CQE extensions of current DL reasoners and Ontology-based Data Access systems [8,12]. References 1. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proc. of IJCAI, pages 364–369, 2005. 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The De- scription Logic Handbook: Theory, Implementation and Applications. Cambridge University Press, 2nd edition, 2007. 3. L. E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on Data Management. Morgan & Claypool Publishers, 2011. 4. L. E. Bertossi and L. Li. Achieving data privacy through secrecy views and null-based virtual updates. IEEE Trans. Knowl. Data Eng., 25(5):987–1000, 2013. 5. M. Bienvenu and C. Bourgaux. Inconsistency-tolerant querying of description logic knowledge bases. In RW Tutorial Lectures, pages 156–202, 2016. 6. J. Biskup and P. A. Bonatti. Controlled query evaluation for enforcing confidentiality in complete information systems. Int. J. Inf. Sec., 3(1):14–27, 2004. 7. J. Biskup and T. Weibert. Keeping secrets in incomplete databases. Int. J. Inf. Sec., 7(3):199– 217, 2008. 8. D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez- Muro, and G. Xiao. Ontop: Answering SPARQL queries over relational databases. Semantic Web J., 8(3):471–487, 2017. 9. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning and efficient query answering in description logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429, 2007. 10. B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, and D. Zheleznyakov. Controlled query evaluation over OWL 2 RL ontologies. In Proc. of ISWC, pages 49–65, 2013. 11. B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, and D. Zheleznyakov. Controlled query evaluation for Datalog and OWL 2 profile ontologies. In Proc. of IJCAI, pages 2883–2889, 2015. 12. G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, M. Ruzzi, and D. F. Savo. MASTRO: A reasoner for effective Ontology-Based Data Access. In Proc. of ORE, 2012. 13. D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, and D. F. Savo. Inconsistency-tolerant semantics for description logics. In Proc. of RR, pages 103–117, 2010. 14. D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, and D. F. Savo. Inconsistency-tolerant query answering in ontology-based data access. J. of Web Semantics, 33:3–29, 2015. 15. D. Lembo, R. Rosati, and D. F. Savo. Revisiting controlled query evaluation in Description Logics. In Proc. of IJCAI, 2019. To appear. 16. G. L. Sicherman, W. de Jonge, and R. P. van de Riet. Answering queries without revealing secrets. ACM Trans. Database Syst., 8(1):41–59, 1983.