=Paper= {{Paper |id=Vol-2400/paper-49 |storemode=property |title=A novel approach to Controlled Query Evaluation in DL-Lite (DISCUSSION PAPER) |pdfUrl=https://ceur-ws.org/Vol-2400/paper-49.pdf |volume=Vol-2400 |authors=Domenico Lembo,Riccardo Rosati,Domenico Fabio Savo |dblpUrl=https://dblp.org/rec/conf/sebd/LemboRS19 }} ==A novel approach to Controlled Query Evaluation in DL-Lite (DISCUSSION PAPER)== https://ceur-ws.org/Vol-2400/paper-49.pdf
                     A Novel Approach to Controlled
                      Query Evaluation in DL-Lite
                             (DISCUSSION PAPER)

              Domenico Lembo1 , Riccardo Rosati1 , Domenico Fabio Savo2
         1                                             2
           Sapienza Università di Roma                  Università degli Studi di Bergamo
       lastname@dis.uniroma1.it                       domenicofabio.savo@unibg.it



         Abstract. In Controlled Query Evaluation (CQE) confidential data are protected
         through a declarative policy and a (optimal) censor, which guarantees that answers
         to queries are maximized without disclosing secrets. In this paper we consider CQE
         over Description Logic ontologies and study query answering over all optimal
         censors. We establish data complexity of the problem for ontologies specified
         in DL-LiteR and for variants of the censor language, which is the language used
         by the censor to enforce the policy. In our investigation we also analyze the
         relationship between CQE and the problem of Consistent Query Answering.




1     Introduction

In Controlled Query Evaluation (CQE), a policy, i.e., a set of logical assertions, regulates
the access to a database or knowledge base by specifying the information that must be
kept secret, and a censor alters answers to queries so that confidential data cannot be
inferred by the users on the basis of the queries they ask. The notion of censor traces back
to [16], and since then it has been investigated for propositional closed databases [6],
incomplete databases [7], and, more recently, Description Logic (DL) ontologies [10,11].
In this latter context, optimal censors are defined as those censors that modify query
answers in a “minimal” way. Intuitively, such censors hide data to preserve confidentiality
according to the policy, without restricting unnecessarily the ability of the system to
return answers to users’ queries.
    Previous work on CQE in DLs has mainly focused on the tasks of establishing
the existence of an optimal censor and characterizing the complexity of computing it.
However, considering only one such censor means making an arbitrary selection among
several optimal censors. To avoid such a discretionary choice, in this paper we adopt a
different approach and study query answering over all optimal censors. Intuitively, given
a query q, this amounts to compute the answers to q that are returned by all optimal
censors. This form of reasoning can also be considered as the application of a single
    Copyright c 2019 for the individual papers by the papers’ authors. Copying permitted for
    private and academic purposes. This volume is published and copyrighted by its editors. SEBD
    2019, June 16-19, 2019, Castiglione della Pescaia, Italy.
censor corresponding to an “intersection” of all the optimal censors, which is thus a
semantically well-founded (i.e., sound) approximation of any optimal censor. This idea
has been also previously discussed in [10].
     Our approach has similarities with the work on Consistent Query Answering (CQA),
a framework for inconsistency management based on the notion of repair [3,5]. Roughly
speaking, in DL, a repair of a possibly inconsistent ontology O is any ABox for O (i.e.,
the extensional component of Q) that is consistent with the TBox (i.e., the intensional
component of O), and that differs “minimally” from the original ABox. Then, comput-
ing query answers in CQA amounts to reasoning over all repairs and the TBox. The
connection between CQE and CQA in DL is based on the intuition that the assertions in
the policy in CQE seems to act as the class of assertions in T that may be violated by
the data of the ABox.
     Some connections between CQA and a declarative approach for privacy preservation
had already been discussed in [4]. The framework in that paper is similar to ours, with
so-called secrecy views playing essentially the role of the policy. However, the setting
considered there is relational and without intensional knowledge (TBox), and secrecy
views are enforced through suitable virtual modifications of database values with SQL
NULLs, so that this approach is incomparable with ours. Nonetheless, in this paper we
elaborate on the intuition of [4] and investigate in depth the relationship between our
CQE framework and CQA in DLs. We provide some general conditions ensuring that
the two problems are mutually reducible, and we show cases of practical interest for
which such conditions are satisfied and cases for which they are not. This also allows us
to highlight the differences between the two frameworks.
     The ultimate goal of this paper is to investigate data complexity of answering
conjunctive queries (CQs) in CQE. In our analysis we consider ontologies specified in
DL-LiteR [9]. We also consider some variants of the censor language LC . Intuitively, LC
is the language in which the censor expresses the sentences implied by the ontology that
can be disclosed to the users without violating the policy. We provide data complexity
results for the cases when: (i) LC is the ABox of the ontology; (ii) LC coincides with
the set of ground atoms expressed over the signature of the ontology; and (iii) LC is the
language of CQs expressed over the signature of the ontology. Some of the complexity
results follow from the correspondence between CQA and CQE; we provide novel
techniques for the cases in which the CQE problem does not have a CQA counterpart.
The complexity results proved in this paper are shown in Figure 1.
     This paper is an extended abstract of [15], where also complexity results for ontolo-
gies specified in EL⊥ [1] are provided.


2   Preliminaries

We consider a signature Σ of predicates and constants, and a countably infinite alphabet
of variables V. To simplify the presentation, we consider only languages containing FO
sentences, i.e., formulas without free variables (our results applies to open formulas
as well, modulo standard encoding of open formulas into closed ones). We use FO to
indicate the language of all function-free FO sentences over Σ and V. Every language
considered in this paper is a subset of FO.
     Given a set K ⊆ FO, Mod (K) indicates the set of models of K, i.e., the FO
interpretations I such that φI (i.e., the interpretation of φ in I) evaluates to true, for
each sentence φ ∈ K. If I is a model of K, we say that I satisfies K and write I |= K.
K is consistent if it has at least one model, i.e., if Mod (K) 6= ∅, inconsistent otherwise.
K entails a FO sentence φ, denoted K |= φ, if φI is true in every I ∈ Mod (K).
     A Boolean conjunctive query (BCQ) q is a FO sentence of the form ∃x.conj(x),
where conj(x) = α1 (x) ∧ . . . ∧ αn (x), x is a sequence of variables, and each αi (x) is
an atom (possibly with constants) with predicate αi and variables in x. The length of a
BCQ q is the number of its atoms, denoted by length(q).
     In the following, CQ denotes the language of BCQs over Σ and V, CQk the
language of BCQs from CQ whose maximum length is k, and GA the language of
ground atoms. Obviously, for every integer k, GA ⊂ CQk ⊂ CQ. Verifying whether
K |= α for K ⊆ FO and α ∈ GA is also called instance checking.
     A DL ontology O is specified as T ∪ A, where T is the TBox and A is the ABox [2].
Throughout the paper an ABox is always a set of ground atoms. We are interested in
DL-LiteR DL ontologies [9]. A DL-LiteR TBox is a finite set of assertions of the form:
B1 v B2 , B1 v ¬B2 , R1 v R2 , R1 v ¬R2 , where: each Ri , with i ∈ {1, 2}, is an
atomic role Q ∈ Σ, or its inverse Q− ; each Bi , with i ∈ {1, 2}, is an atomic concept
A ∈ Σ, or a concept of the form ∃Q (resp. ∃Q− ), i.e., unqualified existential restriction,
which denotes the set of objects occurring as first (resp. second) argument of Q.
     We also consider denial assertions (or simply denials) over concepts and roles, i.e.,
sentences of the form: ∀x.conj(x) → ⊥ where conj(x) is such that ∃x.conj(x) is a
BCQ whose atoms use only unary and binary predicates. The length of the denial is the
length of such query. A denial is satisfied by an ontology O if O 6|= ∃x.conj(x).
     In the following, given an ontology O and a language L ⊆ FO, we denote with
L(O) the subset of formulas of L over the predicates and constants occurring in O.
     All the complexity results we provide refer to data complexity, that in our framework
is the complexity computed only with respect to the size of the ontology ABox.


3   CQE Framework

Our CQE framework is adapted from [11]. An L CQE instance E is a quadruple
hT , A, P, LC i, where T is a TBox in the DL L, A is an ABox such that T ∪ A is
consistent, P is the policy, i.e., a set of denial assertions over the signature of T ∪ A,
such that T ∪ P is consistent, and LC ⊆ FO(T ∪ A) is the censor language. Intuitively:
T is the schema a user interacts with to pose her queries; A is the dataset underlying the
schema; P specifies the knowledge that cannot be disclosed for confidentiality reasons,
in the sense that the user will never get, through query answers, sufficient knowledge
to violate the denials in P; and LC is the language with respect to which the censor is
specified, that is, the censor establishes which are the sentences in LC implied by T ∪ A
that can be divulged to the user while preserving the policy (cf. Definition 1). To simplify
the notation, we will sometimes omit to specify that a certain censor language is limited
to the signature of T ∪ A (e.g., we will use GA instead of GA(T ∪ A)).
    We then define a censor in terms of its underlying theory.
Definition 1. The theory Thcens of a censor cens for a CQE instance hT , A, P, LC i is a
subset of LC such that: (i) T ∪ A |= φ, for each φ ∈ Thcens , and (ii) T ∪ P ∪ Thcens is
consistent.

    A censor c is optimal if there is no censor c0 such that Thc ⊂ Thc0 ⊆ LC . The set of
theories of all the optimal censors of a CQE instance E is denoted by Thoc -Set(E).

Example 1. A CQE instance E = (T , A, P, CQ) is used by an international humanitar-
ian organization for regulating the access to the information about its volunteer staff. In
E, T is an empty TBox, A = {workedIn(v1 , cA ), workedIn(v1 , cB ), workedIn(v2 , cA )},
and P = {∀x.workedIn(x, cA ) ∧ workedIn(x, cB ) → ⊥}. In words, the policy specifies
as confidential the fact that a volunteer worked in both country A and country B, given
that these countries are currently at war with each other.

      Below we provide the definition of entailment in CQE1 .

Definition 2. (CQE-entailment) Given a CQE instance E and a FO sentence φ, decide
whether Th |= φ for every Th ∈ Thoc -Set(E). If this is the case, we write E |=CQE φ.

  As usual, when the language of φ is restricted to ground atoms (i.e., ABox assertions),
CQE-entailment is called (CQE-)instance checking.

Example 2. For the CQE instance E of Example 1, we have, for instance, that E |=CQE
∃x.workedIn(v1 , x) and E |=CQE ∃x.workedIn(x, cB ), but E 6|=CQE workedIn(v1 , cB ).


4      Relationship between CQE and CQA
In this section we discuss the relationship between the CQE framework we have just
defined and CQA. To this aim, we first provide a general definition for CQA.
    A L CQA instance J is a triple hT , A, LR i where T is a consistent TBox in the
DL L, A is an ABox, and LR ⊆ FO(T ∪ A) is the repair language. The consistent
entailment set in a language L of a (possibly inconsistent) theory T ∪ A, denoted
by CES (T , A, L), is the set {φ | φ ∈ L and there exists a A0 ⊆ A such that T ∪
A0 is consistent and T ∪ A0 |= φ}. A repair for a CQA instance is defined as follows.

Definition 3. A repair R for a CQA instance J = hT , A, LR i is a subset of LR such
that: (i) R ⊆ CES (T , A, LR ) and; (ii) T ∪ R is consistent and; (iii) there does not
exist any R0 such that R ⊂ R0 ⊆ CES (T , A, LR ) and T ∪ R0 is consistent. We denote
by RepSet(J ) the set of repairs of J .

    Definition 3 captures some notions of repair proposed in the literature, such as the
repair at the basis of the prototypical AR-semantics, or the repair adopted by the CAR-
semantics [13,14]. Indeed, given an ontology O = T ∪ A, repairs in the AR-semantics
aim to preserve as many facts as possible of those in A. This means that, in a CQA
instance adopting the AR-semantics, the language LR has to be set to A. Differently, the
CAR-semantics aims to preserve as many facts as possible of those implied by T and
 1
     Due to space limits, we give a simplified definition and refer to [15] for other formal details.
each subset of A consistent with T . Thus, to encode such semantics LR has to coincide
with the set GA(O) of ground atoms over the predicates and constants in O.
    We now provide some conditions on CQE and CQA instances that allow to establish
correspondences between (theories of) censors and repairs.
 Definition 4. A CQE instance hT , A, P, LC i is CQA-reducible if:
 (i) for every φ ∈ LC such that T ∪ A |= φ and {φ} ∪ T ∪ P is consistent, there exists
     A0 ⊆ A such that T ∪ A0 ∪ P is consistent and T ∪ A0 |= φ;
(ii) for every φ ∈ LC and every A0 ⊆ A such that T ∪ A0 ∪ P is consistent, if
     T ∪ A0 ∪ P |= φ then T ∪ A0 |= φ.
    In words, condition (i) imposes that every logical consequence of T ∪ A that is
consistent with the policy and the TBox belongs to CES (T ∪ P, A, LC ). Condition
(ii) instead says that in a CQA-reducible instance the sentences in the policy act as
constraints on top of T ∪ A, since they never contribute to infer new formulas from LC
if added to T ∪ A.
Example 3. The instance E = hT , A, P, LC i with T = {A v B}, A = {A(d)},
P = {∀x.A(x) → ⊥}, and LC = GA is not CQA-reducible, since it does not respect
condition (i), even though it satisfies condition (ii) (in a trivial way). Instead, E =
hT , A0 , P 0 , LC i with A0 = {A(d), B(d)}, P = {∀x.A(x) ∧ B(x) → ⊥}, and T and
LC as before is CQA-reducible.
Theorem 1. Let E = hT , A, P, LC i be a CQE instance, such that E is a CQA-reducible.
Then Thoc -Set(hT , A, P, LC i) = RepSet(hT ∪ P, A, LC i).
    Below we consider reducibility of CQA instances into CQE ones.
 Definition 5. A CQA instance hT , A, LR i is CQE-reducible if there exists a partition
 TP ∪ TN of T such that TP ∪ A is consistent, TN is equivalent to a set of denials, and:
 (i) for every φ ∈ LR , such that TP ∪ A |= φ and {φ} ∪ T is consistent, there exists
     A0 ⊆ A such that T ∪ A0 is consistent and T ∪ A0 |= φ;
(ii) for every φ ∈ LR and every A0 ⊆ A such that T ∪ A0 is consistent, if T ∪ A0 |= φ
     then TP ∪ A0 |= φ.
     Intuitively, the above definition says that in a CQE-reducible instance we can identify
a portion TN of T such that its assertions act as constraints on the ontology TP ∪ A
(cond. (ii)), thus TN behaves as a policy in a CQE instance. At the same time, each
logical consequence of TP ∪ A consistent with T must belong to CES (T , A, LR ) (cond.
(i)). CQE-reducible instances have the following property.
Theorem 2. Let J = hT , A, LR i be a CQA instance, such that J is CQE-reducible
with T = TP ∪ TN . Then RepSet(hTP ∪ TN , A, LR i) = Thoc -Set(hTP , A, TN , LR i).
   We now rephrase entailment in CQA [14]: given a CQA instance J = hT , A, LR i
and a FO sentence φ, decide whether T ∪ R |= φ for every R ∈ RepSet(J ). This is
denoted by J |=CQA φ. The following results follow from Theorem 1 and Theorem 2.
Corollary 1. Let E = hT , A, P, LC i be a CQA-reducible CQE instance and φ a FO
sentence. Then, E |=CQE φ iff J |=CQA φ, where J = hT ∪ P, A, LC i. Furthermore,
Let J = hT , A, LR i be a CQE-reducible CQA instance with T = TP ∪ TN , and φ a
FO sentence. Then, J |=CQA φ iff E |=CQE φ, where E = hTP , A, TN , LR i .
5   CQE under Restricted Censor Languages
In this section we establish data complexity of instance checking and CQ entailment
for DL-LiteR CQE instances, when the censor language LC coincides with the ABox A,
and with the set of ground atoms GA. For the former case, we establish our complexity
results by exploiting a mutual reduction between entailment in CQE and CQA. For the
latter case, the two frameworks behave in a slightly different way, and thus we also need
to use techniques tailored to the CQE setting.
     We start with LC = A, and show that CQE instances are CQA-reducible, but also
that CQA-instances are CQE-reducible when the repair language coincides with A.

Theorem 3. Each DL-LiteR CQE instance hT , A, P, Ai is CQA-reducible, and each
DL-LiteR CQA instance hT , A, Ai is CQE-reducible.

   The following result follows from Theorem 3 and the fact that CQ entailment in
DL-LiteR,den CQA instances under AR-semantics is coNP-complete, already for instance
checking [14].
Theorem 4. Both instance checking and CQ entailment are coNP-complete in data
complexity for DL-LiteR CQE instances hT , A, P, Ai.
   We now consider LC = GA. In this case, DL-LiteR CQE instances are not always
CQA-reducible, as shown in Example 3. Reducibility in the other way round is also not
always possible. However, we can show some weaker, but useful, properties.

Proposition 1. Each DL-LiteR CQE instance hT , A, P, GAi, such that T ∪ P ∪ {α}
is satisfiable for each α ∈ A, is CQA-reducible. Also, each DL-LiteR CQA instance
hT , A, GAi, such that T ∪ {α} is satisfiable for each α ∈ A, is CQE-reducible.

   For DL-LiteR CQE instances satisfying the conditions mentioned in Proposition 1
we can establish computational complexity of query answering by mutual reduction
between CQE and CQA under CAR-semantics, similarly to what we have done to prove
Theorem 4. In fact, we are able to exactly characterize the complexity for general
DL-LiteR CQE instances by using tailored proofs, which are inspired to those used in
CQA to establish both upper and lower complexity bounds.

Theorem 5. Instance checking and CQ entailment are respectively in AC0 and coNP-
complete in data complexity, for DL-LiteR CQE instances hT , A, P, GAi.


6   CQE under Full Censor Language
In this section we study CQE-entailment when the censor language LC is CQ. We start
with the following property that is central for our analysis.

Theorem 6. Let T be a DL-LiteR TBox, let A be an ABox such that T ∪ A is con-
sistent, let P be a policy, let q be a BCQ, and let k = max(h, length(q)), where h
is the maximum length of a denial assertion in P. Then, hT , A, P, CQi |=CQE q iff
hT , A, P, CQk i |=CQE q.
                                   LC = A       LC = GA LC = CQ
              Instance Checking coNP-complete    in AC0     in PTIME
                CQ Entailment   coNP-complete coNP-complete in PTIME
      Fig. 1: Data complexity of entailment over DL-LiteR CQE instances hT , A, P, LC i



    In the rest of the section, without loss of generality, we assume that, in every CQE
instance of the form hT , A, P, CQk i, all formulas of the language CQk (as well as the
query q of the CQE-entailment problem) use the set of 2k variables {x1 , . . . , x2k }.
    We now define the CQE-Ent-DL-LiteR algorithm for deciding CQE-entailment of
BCQs for DL-LiteR KBs.

Algorithm CQE-Ent-DL-LiteR (T , A, P, q)
Input: DL-LiteR TBox T , ABox A s. t. T ∪ A is consistent, policy P, BCQ q
Output: true if hT , A, P, CQi |=CQE q, false otherwise
   let h be the maximum length of a denial in P;
   let k = max(h, length(q));
   Φ = CQEntailedSubset(T , A, k);
   for i = 1 to k do
      remove from Φ every subset Φ0 such that |Φ0 | = i
      and T ∪ P ∪ Φ0 is inconsistent;
   if q ∈ Φ then return true else return false

    In the algorithm, CQEntailedSubset(T , A, k) is the function returning the set of
BCQs from CQk that are entailed by T ∪ A.
    Informally, the algorithm first computes an integer k, based on the length of q and
of the denials in P; then, it computes the set Φ that represents the intersection of the
theories of the optimal censors for the CQE instance hT , A, P, CQk i: this is done by
eliminating from Φ all formulas that belong to minimal subsets of Φ that are inconsistent
with T ∪ P; finally, it checks the presence of q among the queries in the above set Φ.

Theorem 7. Let E = hT , A, P, CQi be a DL-LiteR CQE instance, and q a BCQ. Then,
E |=CQE q iff CQE-Ent-DL-LiteR (T , A, P, q) returns true.

    Algorithm CQE-Ent-DL-LiteR (T , A, P, q) runs in PTIME in the size of the ABox,
since CQEntailedSubset(T , A, k) can be computed in polynomial time in the size of A
and checking the consistency of T ∪ P ∪ Φ0 can be reduced to checking the consistency
of a DL-LiteR,den ontology, which is polynomial in data complexity [14].

Theorem 8. Entailment of CQs is in PTIME in data complexity for DL-LiteR CQE
instances hT , A, P, CQi.


7   Conclusions

Our results (cf. Figure 1) show a surprising aspect: the complexity of CQ entailment for
restricted censor languages is harder than when LC = CQ. Indeed, in this latter case,
CQ entailment can be established through the computation of the intersection of (a finite
and polynomial representation of) all the theories of optimal censors, which can be done
in polynomial time. This does not hold for LC equal to A or GA.
    Our research work can be extended in many directions. First, the PTIME upper
bound for CQE over DL-LiteR TBoxes and CQ censor language should be refined. We
believe that an AC0 bound can be shown in this case. Then, the complexity analysis of
CQE could be extended to other DLs, as well as to other policy and censor languages.
Also, based on the complexity analysis presented in this paper, it would be important
to look for practical techniques allowing for the implementation of CQE extensions of
current DL reasoners and Ontology-based Data Access systems [8,12].

References
 1. F. Baader, S. Brandt, and C. Lutz. Pushing the EL envelope. In Proc. of IJCAI, pages 364–369,
    2005.
 2. F. Baader, D. Calvanese, D. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The De-
    scription Logic Handbook: Theory, Implementation and Applications. Cambridge University
    Press, 2nd edition, 2007.
 3. L. E. Bertossi. Database Repairing and Consistent Query Answering. Synthesis Lectures on
    Data Management. Morgan & Claypool Publishers, 2011.
 4. L. E. Bertossi and L. Li. Achieving data privacy through secrecy views and null-based virtual
    updates. IEEE Trans. Knowl. Data Eng., 25(5):987–1000, 2013.
 5. M. Bienvenu and C. Bourgaux. Inconsistency-tolerant querying of description logic knowledge
    bases. In RW Tutorial Lectures, pages 156–202, 2016.
 6. J. Biskup and P. A. Bonatti. Controlled query evaluation for enforcing confidentiality in
    complete information systems. Int. J. Inf. Sec., 3(1):14–27, 2004.
 7. J. Biskup and T. Weibert. Keeping secrets in incomplete databases. Int. J. Inf. Sec., 7(3):199–
    217, 2008.
 8. D. Calvanese, B. Cogrel, S. Komla-Ebri, R. Kontchakov, D. Lanti, M. Rezk, M. Rodriguez-
    Muro, and G. Xiao. Ontop: Answering SPARQL queries over relational databases. Semantic
    Web J., 8(3):471–487, 2017.
 9. D. Calvanese, G. De Giacomo, D. Lembo, M. Lenzerini, and R. Rosati. Tractable reasoning
    and efficient query answering in description logics: The DL-Lite family. J. of Automated
    Reasoning, 39(3):385–429, 2007.
10. B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, and D. Zheleznyakov. Controlled query
    evaluation over OWL 2 RL ontologies. In Proc. of ISWC, pages 49–65, 2013.
11. B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, and D. Zheleznyakov. Controlled query
    evaluation for Datalog and OWL 2 profile ontologies. In Proc. of IJCAI, pages 2883–2889,
    2015.
12. G. De Giacomo, D. Lembo, M. Lenzerini, A. Poggi, R. Rosati, M. Ruzzi, and D. F. Savo.
    MASTRO: A reasoner for effective Ontology-Based Data Access. In Proc. of ORE, 2012.
13. D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, and D. F. Savo. Inconsistency-tolerant semantics
    for description logics. In Proc. of RR, pages 103–117, 2010.
14. D. Lembo, M. Lenzerini, R. Rosati, M. Ruzzi, and D. F. Savo. Inconsistency-tolerant query
    answering in ontology-based data access. J. of Web Semantics, 33:3–29, 2015.
15. D. Lembo, R. Rosati, and D. F. Savo. Revisiting controlled query evaluation in Description
    Logics. In Proc. of IJCAI, 2019. To appear.
16. G. L. Sicherman, W. de Jonge, and R. P. van de Riet. Answering queries without revealing
    secrets. ACM Trans. Database Syst., 8(1):41–59, 1983.