Controlled Query Evaluation in DL-Lite through Epistemic
                         Protection Policies (Extended Abstract)
                         Gianluca Cima1 , Domenico Lembo1 , Lorenzo Marconi1 , Riccardo Rosati1 and
                         Domenico Fabio Savo2
                         1
                             Sapienza Università di Roma
                         2
                             Università degli Studi di Bergamo


                                        Abstract
                                        This extended abstract summarizes our work recently accepted to the 33rd International Joint Conference on
                                        Artificial Intelligence, in which we present a new language for data protection policies in the Controlled Query
                                        Evaluation framework, based on the use of an epistemic operator.

                                        Keywords
                                        Description Logics, Data Protection, First-Order Rewritability, Epistemic Modal Logic


                            Controlled Query Evaluation (CQE) is an approach to safeguard sensitive information by filtering
                         query responses to prevent users from inferring data declared confidential by a data protection policy [1,
                         2, 3, 4]. A critical aspect of CQE revolves around the expressiveness of the policy language, which
                         allows to determine the information that must remain undisclosed.
                            Previous literature has mainly considered only policies consisting of sentences, i.e. closed logical
                         formulas, as in [5, 2, 3, 6]. Through this approach, it is only possible to impose that the truth value of a
                         sentence in the policy cannot be inferred by asking queries to the system. For example, in [6] policy state-
                         ments take the form ∀𝑥        ⃗ ) → ⊥), referred to as denial. Enforcing one such denial over the data means
                                                 ⃗ (𝑐𝑞(𝑥
                         refraining from disclosing the inference of the Boolean conjunctive query (BCQ) ∃𝑥         ⃗ ) by the system,
                                                                                                               ⃗ 𝑐𝑞(𝑥
                         even if the inference has occurred. For instance, the rule 𝛿1 = ∀𝑥, 𝑦(Patient(𝑥) ∧ admitted(𝑥, 𝑦) → ⊥)
                         aims to keep undisclosed the fact that a patient is admitted to a hospital department. One may argue that
                         the above dependency imposes an excessively stringent level of protection, since it denies the presence
                         of any patient in the hospital to protect their identity, which is implausible in practice. A more effective
                         approach could require the system not to answer the open query 𝑞(𝑥) : ∃𝑦(Patient(𝑥)∧admitted(𝑥, 𝑦)).
                         Intuitively, a rule of this kind expresses that the set of patients that the system knows to be admitted to
                         the hospital must not be disclosed.
                            To properly capture this behaviour, we propose to use an epistemic operator 𝐾 in policy formulas, in
                         the spirit of [7, 8]. This allows us to formalize the epistemic state of the user, i.e., which information
                         can be safely undisclosed for her. In our example, by expressing a rule like
                                                            (︀                                              )︀
                                                    𝛿2 = ∀𝑥 𝐾 ∃𝑦(Patient(𝑥) ∧ admitted(𝑥, 𝑦)) → 𝐾⊥

                         one could still allow the end user to infer that some patient has been admitted, while protecting the
                         identities of all such patients.
                            In fact, our proposal enables us to accomplish more than this. In a more advanced scenario, concealing
                         the relationship between a patient and a hospital department should apply only if the patient has not
                         signed a consensus form. This can be expressed by the following formula:
                                                      (︀                                                     )︀
                                              𝛿3 = ∀𝑥 𝐾 ∃𝑦(Patient(𝑥) ∧ admitted(𝑥, 𝑦)) → 𝐾Consent(𝑥)


                              DL 2024: 37th International Workshop on Description Logics, June 18–21, 2024, Bergen, Norway
                          $ cima@diag.uniroma1.it (G. Cima); lembo@diag.uniroma1.it (D. Lembo); marconi@diag.uniroma1.it (L. Marconi);
                          rosati@diag.uniroma1.it (R. Rosati); domenicofabio.savo@unibg.it (D. F. Savo)
                           0000-0003-1783-5605 (G. Cima); 0000-0002-0628-242X (D. Lembo); 0000-0001-9633-8476 (L. Marconi); 0000-0002-7697-4958
                          (R. Rosati); 0000-0002-8391-8049 (D. F. Savo)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
In essence, this formula stipulates that if a user knows a patient has been admitted, she must also know
the patient has signed a consent form. Consequently, if a patient has not signed a consent form, the
system cannot disclose her admission status.
   Hereafter, we employ standard notions of function-free first-order (FO) logics with unary and binary
predicates and, besides the alphabets of predicate names, constants and variables, we assume the
existence of an infinite set of symbols called labeled nulls. As customary when dealing with epistemic
operators, we adopt the standard name assumption [9]. Moreover, we write eval(𝜑, ℐ) to indicate the
evaluation of an FO sentence 𝜑 over an FO interpretation ℐ. A Description Logic ontology is a FO
theory 𝒯 ∪ 𝒜, where 𝒯 is a TBox and 𝒜 is a quantified ABox [10] (i.e. a set of atomic FO formulas
predicating over constants and labeled nulls), and all the predicates occurring in 𝒜 also occur in 𝒯 . As
for user queries, we focus on BCQs and on Boolean Union of Conjunctive Queries (BUCQs).
   Formulas as 𝛿2 and 𝛿3 belong to the language of epistemic dependencies (EDs) which, in general, take
the following form:
                                    ∀𝑥               ⃗ 1 , ⃗𝑥2 ) → 𝐾𝑞ℎ (𝑥
                                     ⃗ 1 , ⃗𝑥2 (𝐾𝑞𝑏 (𝑥                  ⃗ 2 ))                         (1)
where 𝑞𝑏 (𝑥   ⃗ 1 , ⃗𝑥2 ) is a CQ with free variables ⃗𝑥1 ∪ ⃗𝑥2 , 𝑞ℎ (𝑥
                                                                      ⃗ 2 ) is a CQ with free variables ⃗𝑥2 , and 𝐾
 is a modal operator. EDs have been originally introduced in [8] to express integrity constraints in
 ontology-based data management and are indeed a special case of domain-independent EQL-Lite(CQ)
 sentences [9]. A policy consisting of EDs is called epistemic policy. In the present paper, we use EDs as
 policy rules that must be satisfied to preserve data confidentiality over Descprition Logic ontologies,
 similarly as integrity constraints must be satisfied to guarantee data consistency. However, our aim is
 totally different from that of [8], whose focus is on consistency checking.
     Intuitively, an ED of form (1) should be read as follows: if the sentence 𝑞𝑏 (𝑐          ⃗ 1 , ⃗𝑐2 ) is known to hold,
 then the sentence 𝑞ℎ (𝑐      ⃗ 2 ) is known to hold, for any sequences of constants ⃗𝑐1 and ⃗𝑐2 of the same length of
⃗𝑥1 and ⃗𝑥2 , respectively. More formally, we say that a FO theory Φ satisfies an ED 𝛿 (denoted Φ |=EQL 𝛿)
 if, for every sequences of constants ⃗𝑐1 and ⃗𝑐2 of the same length of ⃗𝑥1 and ⃗𝑥2 , respectively, the fact that
 eval(𝑞𝑏 (𝑐⃗ 1 , ⃗𝑐2 ), ℐ) is true for every ℐ ∈ ℳ implies that eval(𝑞ℎ (𝑐  ⃗ 2 ), ℐ) is true for every ℐ ∈ ℳ, where
 ℳ is the set of all FO models of Φ. We say that Φ satisfies a policy 𝒫 (denoted Φ |=EQL 𝒫) if Φ satisfies
 every 𝛿 ∈ 𝒫.
     Now, to completely characterize our CQE framework, we need to specify its semantics. This issue is
 addressed in CQE through censors. Here, we employ CQ-censors, introduced in [6]. To this aim, we first
 define the notion of CQE instance as a triple ℰ = ⟨𝒯 , 𝒜, 𝒫⟩, where 𝒯 is a TBox, 𝒜 is a quantified ABox
 such that 𝒯 ∪ 𝒜 has at least one model, and 𝒫 is an epistemic policy such that 𝒯 |=EQL 𝒫. The notion
 of (optimal) CQ-censors is then as follows.

Definition 1 (CQ-censor). A CQ-censor of a CQE instance ℰ = ⟨𝒯 , 𝒜, 𝒫⟩ is a set 𝒞 of BCQs such that
(𝑖) 𝒯 ∪ 𝒞 |=EQL 𝒫 and (𝑖𝑖) 𝒯 ∪ 𝒜 |= 𝑞 for every 𝑞 ∈ 𝒞. Moreover, 𝒞 is an optimal CQ-censor of ℰ if
there does not exist any CQ-censor 𝒞 ′ of ℰ such that 𝒞 ′ ⊃ 𝒞.

   As in [6], we define CQE as the problem of checking whether an FO sentence is entailed by all
optimal censors. We call this problem SC-entailment (Skeptical entailment under CQ-censors). This
form of CQE does not suffer from the problem of having to arbitrarily select an optimal censor among
several incomparable ones [3, 11]. We also consider a sound approximation of SC-entailment, that we
call IC-entailment, consisting in checking whether an FO sentence is entailed by the intersection of all
optimal censors. We use the notation ℰ |=SC 𝑞 (resp., ℰ |=IC 𝑞) for indicating that an FO sentence 𝑞 is
SC-entailed (resp., IC-entailed) by a CQE instance ℰ.

Example 1. Suppose that company C𝐴 wants to share certain user-profiling data with a company C𝐵 for
targeted advertising. This is not allowed in general, but only in some countries with a special regulation
that enables sharing based on the users’ consent. C𝐴 may use the following ED in the policy to enable C𝐵
to access only data compliant with the above requirements:
                      (︀                                                                     )︀
           𝛿4 = ∀𝑥, 𝑦 𝐾profileActivity(𝑥, 𝑦) → 𝐾 ∃𝑧(citOf(𝑥, 𝑧) ∧ SR(𝑧) ∧ Consent(𝑥)) .
In the rule, profileActivity links a user with her profiling-data, citOf relates a user to her country of
citizenship, SR denotes countries with special regulation, and Consent denotes users who have given their
consent.
   Suppose that C𝐴 also wants C𝐵 not to be able to associate to a user with her real identity, and that this
is possible by collecting the person’s name and her date of birth at the same time. To this aim, C𝐴 also
specifies the ED:                       (︀                                     )︀
                          𝛿5 = ∀𝑥, 𝑦, 𝑧 𝐾(name(𝑥, 𝑦) ∧ dateB(𝑥, 𝑧)) → 𝐾⊥ .
  Now, let ℰ = ⟨𝒯 , 𝒜, 𝒫⟩ be a CQE instance, where 𝒯 = ∅, 𝒫 = {𝛿4 , 𝛿5 }, and 𝒜 is as follows:

          𝒜 = { profileActivity(u1 , a1 ), Consent(u1 ), citOf(u1 , 𝑛1 ), SR(𝑛1 ), name(u1 , ann),
                dateB(u1 , date1 ), profileActivity(u2 , a2 ), citOf(u2 , 𝑛1 ) },

where 𝑛1 is a labeled null and all the other terms used in 𝒜 are constants. Let us also consider the following
four BCQs:
                        𝑞1 = ∃𝑦 (profileActivity(u1 , a1 ) ∧ citOf(u1 , 𝑦) ∧ SR(𝑦)),
                        𝑞2 = profileActivity(u2 , a2 ),
                        𝑞3 = ∃𝑦 profileActivity(𝑦, a2 ),
                        𝑞4 = profileActivity(u1 , a1 ) ∧ name(u1 , ann).
For 𝑋 ∈ {SC, IC}, one can verify that ℰ |=𝑋 𝑞1 since user u1 gave the consent and she is a citizen of some
country (𝑛1 ) with special regulation. Conversely, ℰ ̸|=𝑋 𝑞2 since u2 did not give the consent. Nevertheless,
one can verify that 𝑞3 belongs to each optimal CQ-censor of ℰ, and so ℰ |=𝑋 𝑞3 . Finally, we have that
ℰ ̸|=𝑋 𝑞4 since there is an optimal CQ-censor 𝒞 of ℰ such that dateB(u1 , date1 ) ∈ 𝒞, implying that
name(u1 , ann) ̸∈ 𝒞 (otherwise 𝛿5 would be violated).

   We focus on TBoxes specified in DL-Liteℛ [7], for which SC-entailment of BCQs is known to be
tractable in data complexity when policies are expressed as denials [6]. Unfortunately, when the policy
consists of epistemic dependencies, such a problem turns out to be intractable in general, and the same
holds for IC-entailment.

Theorem 1. In the case of CQE instances with DL-Liteℛ TBoxes, both SC-entailment and IC-entailment
of BUCQs are coNP-complete in data complexity.

  We remark that, in the above findings, the hardness for coNP holds already when the TBox is empty
and the query is a ground atom. To regain tractability, we then impose an acyclicity condition on the
policy 𝒫 w.r.t. a DL-Liteℛ TBox 𝒯 , defined as follows.

Definition 2 (Acyclicity). We call dependency graph of (𝒯 , 𝒫) the direct graph whose nodes are the
predicates occurring in 𝒯 ∪ 𝒫, and whose edges (𝑝1 , 𝑝2 ) correspond to the presence of 𝑝1 in the left-hand
side and 𝑝2 in right-hand side of either an ED in 𝒫 (we call such edges P-edge) or a positive inclusion
axiom in 𝒯 . We say that 𝒫 is acyclic for 𝒯 if there exists no cycle involving a P-edge in the dependency
graph of (𝒯 , 𝒫).

   For this class of epistemic policies, we are able to provide first-order rewriting algorithms for deciding
if a BUCQ is SC- or IC-entailed by a CQE instance, thus proving the following result.

Theorem 2. In the case of CQE instances with DL-Liteℛ TBoxes 𝒯 and epistemic policies 𝒫 acyclic for 𝒯 ,
both SC-entailment and IC-entailment of BUCQs are in AC0 in data complexity.

   Besides the computational complexity study, we also carry out an analysis of the robustness of the
above-defined entailment semantics w.r.t. confidentiality-preservation. In [12, 3, 13], it is shown that
censoring mechanisms based on an indistinguishability criterion are indeed more secure than others. In
this regard, we adopt a similar approach to the one described in [12] for relational databases. Intuitively,
under such an approach an entailment semantics preserves confidentiality if, for every CQE instance
⟨𝒯 , 𝒜, 𝒫⟩ and every finite set 𝒬 of queries, the answers to such queries are the same as if they were
obtained from a (possibly different) CQE instance ⟨𝒯 , 𝒜′ , 𝒫⟩ such that 𝒯 ∪ 𝒜′ |=EQL 𝒫. We here
describe this property formally. First, for a TBox 𝒯 , a policy 𝒫, two ABoxes 𝒜 and 𝒜′ , a set 𝒬 of BUCQs,
and 𝑋 ∈ {SC, IC}, we say that 𝒜 and 𝒜′ are 𝒬-indistinguishable for 𝑋-entailment w.r.t. (𝒯 , 𝒫) if, for
every 𝑞 ∈ 𝒬, ⟨𝒯 , 𝒜, 𝒫⟩ |=𝑋 𝑞 iff ⟨𝒯 , 𝒜′ , 𝒫⟩ |=𝑋 𝑞.

Definition 3 (Confidentiality preservation). For 𝑋 ∈ {SC, IC}, we say that 𝑋-entailment preserves
confidentiality for BCQs (resp., BUCQs) if, for every CQE instance ⟨𝒯 , 𝒜, 𝒫⟩, and for every finite set 𝒬 of
BCQs (resp., BUCQs), there exists an ABox 𝒜′ such that the following holds:

  (𝑖) 𝒯 ∪ 𝒜′ |=EQL 𝒫;
 (𝑖𝑖) 𝒜 and 𝒜′ are 𝒬-indistinguishable for X-entailment with respect to (𝒯 , 𝒫).

  In the case of DL-Liteℛ ontologies, we can show that SC-entailment preserves confidentiality for
BCQs, but this result does not carry over to BUCQs. We however prove that the property is enjoyed
even for BUCQs in the case of IC-entailment.


Acknowledgments
This work was partially supported by: projects FAIR (PE0000013) and SERICS (PE00000014) under the
MUR National Recovery and Resilience Plan funded by the EU - NextGenerationEU; GLACIATION
project funded by the EU (N. 101070141); ANTHEM (AdvaNced Technologies for Human-centrEd
Medicine) project (CUP B53C22006700001) funded by the National Plan for NRRP Complementary
Investments; the MUR PRIN 2022LA8XBH project Polar (POLicy specificAtion and enfoRcement for
privacy-enhanced data management); and by the EU under the H2020-EU.2.1.1 project TAILOR (grant
id. 952215).


References
 [1] J. Biskup, For unknown secrecies refusal is better than lying, Data and Knowledge Engineering 33
     (2000) 1–23.
 [2] J. Biskup, P. A. Bonatti, Controlled query evaluation for enforcing confidentiality in complete
     information systems, Int. J. Inf. Sec. 3 (2004) 14–27.
 [3] P. A. Bonatti, L. Sauro, A confidentiality model for ontologies, in: Proc. of the 12th Int. Semantic
     Web Conf. (ISWC), 2013, pp. 17–32.
 [4] B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, D. Zheleznyakov, Controlled query evaluation over
     OWL 2 RL ontologies, in: Proc. of the 12th Int. Semantic Web Conf. (ISWC), 2013, pp. 49–65.
 [5] G. L. Sicherman, W. de Jonge, R. P. van de Riet, Answering queries without revealing secrets,
     ACM Trans. on Database Systems 8 (1983) 41–59.
 [6] D. Lembo, R. Rosati, D. F. Savo, Revisiting controlled query evaluation in description logics, in:
     Proc. of the 28th Int. Joint Conf. on Artificial Intelligence (IJCAI), 2019, pp. 1786–1792.
 [7] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, Tractable reasoning and efficient
     query answering in description logics: The DL-Lite family, J. of Automated Reasoning 39 (2007)
     385–429.
 [8] M. Console, M. Lenzerini, Epistemic integrity constraints for ontology-based data management, in:
     Proc. of the 37th AAAI Conf. on Artificial Intelligence (AAAI), AAAI Press, 2020, pp. 2790–2797.
     URL: https://doi.org/10.1609/aaai.v34i03.5667. doi:10.1609/AAAI.V34I03.5667.
 [9] D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, R. Rosati, EQL-Lite: Effective first-order
     query processing in description logics, in: Proc. of the 20th Int. Joint Conf. on Artificial Intelligence
     (IJCAI), 2007, pp. 274–279.
[10] F. Baader, F. Kriegel, A. Nuradiansyah, R. Peñaloza, Computing compliant anonymisations of
     quantified ABoxes w.r.t. ℰℒ policies, in: Proc. of the 19th Int. Semantic Web Conf. (ISWC), volume
     12506 of Lecture Notes in Computer Science, Springer, 2020, pp. 3–20.
[11] B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, D. Zheleznyakov, Controlled query evaluation for
     datalog and OWL 2 profile ontologies, in: Proc. of the 24th Int. Joint Conf. on Artificial Intelligence
     (IJCAI), 2015, pp. 2883–2889.
[12] J. Biskup, T. Weibert, Keeping secrets in incomplete databases, Int. J. Inf. Sec. 7 (2008) 199–217.
[13] P. A. Bonatti, A false sense of security, Artificial Intelligence 310 (2022).