Controlled Query Evaluation in DL-Lite through Epistemic Protection Policies (Extended Abstract) Gianluca Cima1 , Domenico Lembo1 , Lorenzo Marconi1 , Riccardo Rosati1 and Domenico Fabio Savo2 1 Sapienza UniversitΓ  di Roma 2 UniversitΓ  degli Studi di Bergamo Abstract This extended abstract summarizes our work recently accepted to the 33rd International Joint Conference on Artificial Intelligence, in which we present a new language for data protection policies in the Controlled Query Evaluation framework, based on the use of an epistemic operator. Keywords Description Logics, Data Protection, First-Order Rewritability, Epistemic Modal Logic Controlled Query Evaluation (CQE) is an approach to safeguard sensitive information by filtering query responses to prevent users from inferring data declared confidential by a data protection policy [1, 2, 3, 4]. A critical aspect of CQE revolves around the expressiveness of the policy language, which allows to determine the information that must remain undisclosed. Previous literature has mainly considered only policies consisting of sentences, i.e. closed logical formulas, as in [5, 2, 3, 6]. Through this approach, it is only possible to impose that the truth value of a sentence in the policy cannot be inferred by asking queries to the system. For example, in [6] policy state- ments take the form βˆ€π‘₯ βƒ— ) β†’ βŠ₯), referred to as denial. Enforcing one such denial over the data means βƒ— (π‘π‘ž(π‘₯ refraining from disclosing the inference of the Boolean conjunctive query (BCQ) βˆƒπ‘₯ βƒ— ) by the system, βƒ— π‘π‘ž(π‘₯ even if the inference has occurred. For instance, the rule 𝛿1 = βˆ€π‘₯, 𝑦(Patient(π‘₯) ∧ admitted(π‘₯, 𝑦) β†’ βŠ₯) aims to keep undisclosed the fact that a patient is admitted to a hospital department. One may argue that the above dependency imposes an excessively stringent level of protection, since it denies the presence of any patient in the hospital to protect their identity, which is implausible in practice. A more effective approach could require the system not to answer the open query π‘ž(π‘₯) : βˆƒπ‘¦(Patient(π‘₯)∧admitted(π‘₯, 𝑦)). Intuitively, a rule of this kind expresses that the set of patients that the system knows to be admitted to the hospital must not be disclosed. To properly capture this behaviour, we propose to use an epistemic operator 𝐾 in policy formulas, in the spirit of [7, 8]. This allows us to formalize the epistemic state of the user, i.e., which information can be safely undisclosed for her. In our example, by expressing a rule like (οΈ€ )οΈ€ 𝛿2 = βˆ€π‘₯ 𝐾 βˆƒπ‘¦(Patient(π‘₯) ∧ admitted(π‘₯, 𝑦)) β†’ 𝐾βŠ₯ one could still allow the end user to infer that some patient has been admitted, while protecting the identities of all such patients. In fact, our proposal enables us to accomplish more than this. In a more advanced scenario, concealing the relationship between a patient and a hospital department should apply only if the patient has not signed a consensus form. This can be expressed by the following formula: (οΈ€ )οΈ€ 𝛿3 = βˆ€π‘₯ 𝐾 βˆƒπ‘¦(Patient(π‘₯) ∧ admitted(π‘₯, 𝑦)) β†’ 𝐾Consent(π‘₯) DL 2024: 37th International Workshop on Description Logics, June 18–21, 2024, Bergen, Norway $ cima@diag.uniroma1.it (G. Cima); lembo@diag.uniroma1.it (D. Lembo); marconi@diag.uniroma1.it (L. Marconi); rosati@diag.uniroma1.it (R. Rosati); domenicofabio.savo@unibg.it (D. F. Savo)  0000-0003-1783-5605 (G. Cima); 0000-0002-0628-242X (D. Lembo); 0000-0001-9633-8476 (L. Marconi); 0000-0002-7697-4958 (R. Rosati); 0000-0002-8391-8049 (D. F. Savo) Β© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings In essence, this formula stipulates that if a user knows a patient has been admitted, she must also know the patient has signed a consent form. Consequently, if a patient has not signed a consent form, the system cannot disclose her admission status. Hereafter, we employ standard notions of function-free first-order (FO) logics with unary and binary predicates and, besides the alphabets of predicate names, constants and variables, we assume the existence of an infinite set of symbols called labeled nulls. As customary when dealing with epistemic operators, we adopt the standard name assumption [9]. Moreover, we write eval(πœ‘, ℐ) to indicate the evaluation of an FO sentence πœ‘ over an FO interpretation ℐ. A Description Logic ontology is a FO theory 𝒯 βˆͺ π’œ, where 𝒯 is a TBox and π’œ is a quantified ABox [10] (i.e. a set of atomic FO formulas predicating over constants and labeled nulls), and all the predicates occurring in π’œ also occur in 𝒯 . As for user queries, we focus on BCQs and on Boolean Union of Conjunctive Queries (BUCQs). Formulas as 𝛿2 and 𝛿3 belong to the language of epistemic dependencies (EDs) which, in general, take the following form: βˆ€π‘₯ βƒ— 1 , βƒ—π‘₯2 ) β†’ πΎπ‘žβ„Ž (π‘₯ βƒ— 1 , βƒ—π‘₯2 (πΎπ‘žπ‘ (π‘₯ βƒ— 2 )) (1) where π‘žπ‘ (π‘₯ βƒ— 1 , βƒ—π‘₯2 ) is a CQ with free variables βƒ—π‘₯1 βˆͺ βƒ—π‘₯2 , π‘žβ„Ž (π‘₯ βƒ— 2 ) is a CQ with free variables βƒ—π‘₯2 , and 𝐾 is a modal operator. EDs have been originally introduced in [8] to express integrity constraints in ontology-based data management and are indeed a special case of domain-independent EQL-Lite(CQ) sentences [9]. A policy consisting of EDs is called epistemic policy. In the present paper, we use EDs as policy rules that must be satisfied to preserve data confidentiality over Descprition Logic ontologies, similarly as integrity constraints must be satisfied to guarantee data consistency. However, our aim is totally different from that of [8], whose focus is on consistency checking. Intuitively, an ED of form (1) should be read as follows: if the sentence π‘žπ‘ (𝑐 βƒ— 1 , ⃗𝑐2 ) is known to hold, then the sentence π‘žβ„Ž (𝑐 βƒ— 2 ) is known to hold, for any sequences of constants ⃗𝑐1 and ⃗𝑐2 of the same length of βƒ—π‘₯1 and βƒ—π‘₯2 , respectively. More formally, we say that a FO theory Ξ¦ satisfies an ED 𝛿 (denoted Ξ¦ |=EQL 𝛿) if, for every sequences of constants ⃗𝑐1 and ⃗𝑐2 of the same length of βƒ—π‘₯1 and βƒ—π‘₯2 , respectively, the fact that eval(π‘žπ‘ (𝑐⃗ 1 , ⃗𝑐2 ), ℐ) is true for every ℐ ∈ β„³ implies that eval(π‘žβ„Ž (𝑐 βƒ— 2 ), ℐ) is true for every ℐ ∈ β„³, where β„³ is the set of all FO models of Ξ¦. We say that Ξ¦ satisfies a policy 𝒫 (denoted Ξ¦ |=EQL 𝒫) if Ξ¦ satisfies every 𝛿 ∈ 𝒫. Now, to completely characterize our CQE framework, we need to specify its semantics. This issue is addressed in CQE through censors. Here, we employ CQ-censors, introduced in [6]. To this aim, we first define the notion of CQE instance as a triple β„° = βŸ¨π’― , π’œ, π’«βŸ©, where 𝒯 is a TBox, π’œ is a quantified ABox such that 𝒯 βˆͺ π’œ has at least one model, and 𝒫 is an epistemic policy such that 𝒯 |=EQL 𝒫. The notion of (optimal) CQ-censors is then as follows. Definition 1 (CQ-censor). A CQ-censor of a CQE instance β„° = βŸ¨π’― , π’œ, π’«βŸ© is a set π’ž of BCQs such that (𝑖) 𝒯 βˆͺ π’ž |=EQL 𝒫 and (𝑖𝑖) 𝒯 βˆͺ π’œ |= π‘ž for every π‘ž ∈ π’ž. Moreover, π’ž is an optimal CQ-censor of β„° if there does not exist any CQ-censor π’ž β€² of β„° such that π’ž β€² βŠƒ π’ž. As in [6], we define CQE as the problem of checking whether an FO sentence is entailed by all optimal censors. We call this problem SC-entailment (Skeptical entailment under CQ-censors). This form of CQE does not suffer from the problem of having to arbitrarily select an optimal censor among several incomparable ones [3, 11]. We also consider a sound approximation of SC-entailment, that we call IC-entailment, consisting in checking whether an FO sentence is entailed by the intersection of all optimal censors. We use the notation β„° |=SC π‘ž (resp., β„° |=IC π‘ž) for indicating that an FO sentence π‘ž is SC-entailed (resp., IC-entailed) by a CQE instance β„°. Example 1. Suppose that company C𝐴 wants to share certain user-profiling data with a company C𝐡 for targeted advertising. This is not allowed in general, but only in some countries with a special regulation that enables sharing based on the users’ consent. C𝐴 may use the following ED in the policy to enable C𝐡 to access only data compliant with the above requirements: (οΈ€ )οΈ€ 𝛿4 = βˆ€π‘₯, 𝑦 𝐾profileActivity(π‘₯, 𝑦) β†’ 𝐾 βˆƒπ‘§(citOf(π‘₯, 𝑧) ∧ SR(𝑧) ∧ Consent(π‘₯)) . In the rule, profileActivity links a user with her profiling-data, citOf relates a user to her country of citizenship, SR denotes countries with special regulation, and Consent denotes users who have given their consent. Suppose that C𝐴 also wants C𝐡 not to be able to associate to a user with her real identity, and that this is possible by collecting the person’s name and her date of birth at the same time. To this aim, C𝐴 also specifies the ED: (οΈ€ )οΈ€ 𝛿5 = βˆ€π‘₯, 𝑦, 𝑧 𝐾(name(π‘₯, 𝑦) ∧ dateB(π‘₯, 𝑧)) β†’ 𝐾βŠ₯ . Now, let β„° = βŸ¨π’― , π’œ, π’«βŸ© be a CQE instance, where 𝒯 = βˆ…, 𝒫 = {𝛿4 , 𝛿5 }, and π’œ is as follows: π’œ = { profileActivity(u1 , a1 ), Consent(u1 ), citOf(u1 , 𝑛1 ), SR(𝑛1 ), name(u1 , ann), dateB(u1 , date1 ), profileActivity(u2 , a2 ), citOf(u2 , 𝑛1 ) }, where 𝑛1 is a labeled null and all the other terms used in π’œ are constants. Let us also consider the following four BCQs: π‘ž1 = βˆƒπ‘¦ (profileActivity(u1 , a1 ) ∧ citOf(u1 , 𝑦) ∧ SR(𝑦)), π‘ž2 = profileActivity(u2 , a2 ), π‘ž3 = βˆƒπ‘¦ profileActivity(𝑦, a2 ), π‘ž4 = profileActivity(u1 , a1 ) ∧ name(u1 , ann). For 𝑋 ∈ {SC, IC}, one can verify that β„° |=𝑋 π‘ž1 since user u1 gave the consent and she is a citizen of some country (𝑛1 ) with special regulation. Conversely, β„° ΜΈ|=𝑋 π‘ž2 since u2 did not give the consent. Nevertheless, one can verify that π‘ž3 belongs to each optimal CQ-censor of β„°, and so β„° |=𝑋 π‘ž3 . Finally, we have that β„° ΜΈ|=𝑋 π‘ž4 since there is an optimal CQ-censor π’ž of β„° such that dateB(u1 , date1 ) ∈ π’ž, implying that name(u1 , ann) ̸∈ π’ž (otherwise 𝛿5 would be violated). We focus on TBoxes specified in DL-Liteβ„› [7], for which SC-entailment of BCQs is known to be tractable in data complexity when policies are expressed as denials [6]. Unfortunately, when the policy consists of epistemic dependencies, such a problem turns out to be intractable in general, and the same holds for IC-entailment. Theorem 1. In the case of CQE instances with DL-Liteβ„› TBoxes, both SC-entailment and IC-entailment of BUCQs are coNP-complete in data complexity. We remark that, in the above findings, the hardness for coNP holds already when the TBox is empty and the query is a ground atom. To regain tractability, we then impose an acyclicity condition on the policy 𝒫 w.r.t. a DL-Liteβ„› TBox 𝒯 , defined as follows. Definition 2 (Acyclicity). We call dependency graph of (𝒯 , 𝒫) the direct graph whose nodes are the predicates occurring in 𝒯 βˆͺ 𝒫, and whose edges (𝑝1 , 𝑝2 ) correspond to the presence of 𝑝1 in the left-hand side and 𝑝2 in right-hand side of either an ED in 𝒫 (we call such edges P-edge) or a positive inclusion axiom in 𝒯 . We say that 𝒫 is acyclic for 𝒯 if there exists no cycle involving a P-edge in the dependency graph of (𝒯 , 𝒫). For this class of epistemic policies, we are able to provide first-order rewriting algorithms for deciding if a BUCQ is SC- or IC-entailed by a CQE instance, thus proving the following result. Theorem 2. In the case of CQE instances with DL-Liteβ„› TBoxes 𝒯 and epistemic policies 𝒫 acyclic for 𝒯 , both SC-entailment and IC-entailment of BUCQs are in AC0 in data complexity. Besides the computational complexity study, we also carry out an analysis of the robustness of the above-defined entailment semantics w.r.t. confidentiality-preservation. In [12, 3, 13], it is shown that censoring mechanisms based on an indistinguishability criterion are indeed more secure than others. In this regard, we adopt a similar approach to the one described in [12] for relational databases. Intuitively, under such an approach an entailment semantics preserves confidentiality if, for every CQE instance βŸ¨π’― , π’œ, π’«βŸ© and every finite set 𝒬 of queries, the answers to such queries are the same as if they were obtained from a (possibly different) CQE instance βŸ¨π’― , π’œβ€² , π’«βŸ© such that 𝒯 βˆͺ π’œβ€² |=EQL 𝒫. We here describe this property formally. First, for a TBox 𝒯 , a policy 𝒫, two ABoxes π’œ and π’œβ€² , a set 𝒬 of BUCQs, and 𝑋 ∈ {SC, IC}, we say that π’œ and π’œβ€² are 𝒬-indistinguishable for 𝑋-entailment w.r.t. (𝒯 , 𝒫) if, for every π‘ž ∈ 𝒬, βŸ¨π’― , π’œ, π’«βŸ© |=𝑋 π‘ž iff βŸ¨π’― , π’œβ€² , π’«βŸ© |=𝑋 π‘ž. Definition 3 (Confidentiality preservation). For 𝑋 ∈ {SC, IC}, we say that 𝑋-entailment preserves confidentiality for BCQs (resp., BUCQs) if, for every CQE instance βŸ¨π’― , π’œ, π’«βŸ©, and for every finite set 𝒬 of BCQs (resp., BUCQs), there exists an ABox π’œβ€² such that the following holds: (𝑖) 𝒯 βˆͺ π’œβ€² |=EQL 𝒫; (𝑖𝑖) π’œ and π’œβ€² are 𝒬-indistinguishable for X-entailment with respect to (𝒯 , 𝒫). In the case of DL-Liteβ„› ontologies, we can show that SC-entailment preserves confidentiality for BCQs, but this result does not carry over to BUCQs. We however prove that the property is enjoyed even for BUCQs in the case of IC-entailment. Acknowledgments This work was partially supported by: projects FAIR (PE0000013) and SERICS (PE00000014) under the MUR National Recovery and Resilience Plan funded by the EU - NextGenerationEU; GLACIATION project funded by the EU (N. 101070141); ANTHEM (AdvaNced Technologies for Human-centrEd Medicine) project (CUP B53C22006700001) funded by the National Plan for NRRP Complementary Investments; the MUR PRIN 2022LA8XBH project Polar (POLicy specificAtion and enfoRcement for privacy-enhanced data management); and by the EU under the H2020-EU.2.1.1 project TAILOR (grant id. 952215). References [1] J. Biskup, For unknown secrecies refusal is better than lying, Data and Knowledge Engineering 33 (2000) 1–23. [2] J. Biskup, P. A. Bonatti, Controlled query evaluation for enforcing confidentiality in complete information systems, Int. J. Inf. Sec. 3 (2004) 14–27. [3] P. A. Bonatti, L. Sauro, A confidentiality model for ontologies, in: Proc. of the 12th Int. Semantic Web Conf. (ISWC), 2013, pp. 17–32. [4] B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, D. Zheleznyakov, Controlled query evaluation over OWL 2 RL ontologies, in: Proc. of the 12th Int. Semantic Web Conf. (ISWC), 2013, pp. 49–65. [5] G. L. Sicherman, W. de Jonge, R. P. van de Riet, Answering queries without revealing secrets, ACM Trans. on Database Systems 8 (1983) 41–59. [6] D. Lembo, R. Rosati, D. F. Savo, Revisiting controlled query evaluation in description logics, in: Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), 2019, pp. 1786–1792. [7] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and efficient query answering in description logics: The DL-Lite family, J. of Automated Reasoning 39 (2007) 385–429. [8] M. Console, M. Lenzerini, Epistemic integrity constraints for ontology-based data management, in: Proc. of the 37th AAAI Conf. on Artificial Intelligence (AAAI), AAAI Press, 2020, pp. 2790–2797. URL: https://doi.org/10.1609/aaai.v34i03.5667. doi:10.1609/AAAI.V34I03.5667. [9] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, EQL-Lite: Effective first-order query processing in description logics, in: Proc. of the 20th Int. Joint Conf. on Artificial Intelligence (IJCAI), 2007, pp. 274–279. [10] F. Baader, F. Kriegel, A. Nuradiansyah, R. PeΓ±aloza, Computing compliant anonymisations of quantified ABoxes w.r.t. β„°β„’ policies, in: Proc. of the 19th Int. Semantic Web Conf. (ISWC), volume 12506 of Lecture Notes in Computer Science, Springer, 2020, pp. 3–20. [11] B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, D. Zheleznyakov, Controlled query evaluation for datalog and OWL 2 profile ontologies, in: Proc. of the 24th Int. Joint Conf. on Artificial Intelligence (IJCAI), 2015, pp. 2883–2889. [12] J. Biskup, T. Weibert, Keeping secrets in incomplete databases, Int. J. Inf. Sec. 7 (2008) 199–217. [13] P. A. Bonatti, A false sense of security, Artificial Intelligence 310 (2022).