Controlled Query Evaluation over Lightweight Ontologies? B. Cuenca Grau, E. Kharlamov, E. V. Kostylev, and D. Zheleznyakov Department of Computer Science, University of Oxford Abstract. We study confidentiality enforcement in ontologies under the Controlled Query Evaluation (CQE) framework. In a CQE system, a pol- icy specifies the sensitive information, and a censor ensures that answers to user’s queries that could violate the policy are not returned. Our goal is the design of optimal CQE algorithms, which ensure confidentiality while maximising access to information. We study two natural classes of censors that can be realised using existing infrastructure for query answering and propose optimal CQE algorithms for the standardised profiles of the ontology language OWL 2. 1 Introduction As ontology-based information systems are becoming increasingly mature, there is a pressing need to devise mechanisms for ensuring that data is only made accessible to authorised users [1, 7–12, 20, 21]. Controlled Query Evaluation (CQE) is a prominent formal framework for confidentiality enforcement. In CQE, sensitive information is declaratively spec- ified by means of a policy and confidentiality is enforced by a censor : when given a user query, a censor checks whether returning the answer might lead to a pol- icy violation, in which case it returns a distorted answer. CQE was introduced in [18], and studied in [3, 4, 6] for propositional databases with complete infor- mation. It was extended to (propositional) incomplete databases in [5]. Beyond propositional logic, CQE remains largely unexplored [2]. In this paper we study CQE in the context of ontologies. Our basic framework is described in Section 3. We assume data to be hidden and that users interact with the system by means of a query interface. An ontology, which we assume to be known to all users, provides the vocabulary and background knowledge needed for users to formulate accurate queries, as well as to enrich query answers with implicit information. Policies are given as a set of ground atoms that follow from the ontology and data. When given a (conjunctive) query, the system returns the subset of certain answers determined by the censor; in this way, the role of the censor is to preserve confidentiality by filtering out those answers that could lead to a violation of the policy. Formally, we model the information that users could gather by querying the system as an (infinite) first-order theory; ? This paper recapitulates and extends our previous work [11]. It comes with a tech- nical report available at http://tinyurl.com/DL14paper55. confidentiality preservation then amounts to ensuring that such theory together with the ontology does not entail any atom in the policy. In this setting, there is a danger that confidentiality enforcement may over-restrict users’ access. Thus, we focus on optimal censors, which maximise answers to queries while ensuring confidentiality of the policy. Furthermore, we are interested in censors that can be implemented by reusing off-the-shelf query answering infrastructure. To fulfil this requirement, we introduce in Section 4 two classes of censors, which we call view and obstruction censors, respectively. View censors return only those answers that follow from the ontology and a materialised dataset (a view ). In this way, a view encodes the information that users are authorised to access: the censor answers faithfully all queries against the view, and any information not captured by the view is inaccessible by default. View censors require the ability to materialise implicit data, and hence are especially well-suited for RDF-based applications in which reasoning is performed by a triple store. Obstruction censors are dual to view censors in the sense that they explicitly specify information which users are denied access to (with all other information being accessible by default). Obstruction censors are specified by a finite set of “forbidden query patterns” (obstructions), and all query answers that instantiate such patterns are filtered out. In contrast to view censors, obstruction censors do not require modification of the data and hence are well-suited for OBDA applications, where data is typically managed by an RDBMS. We finally characterise the duality of views and obstructions and argue that it is not always possible to “simulate” one with another. In Section 5 we focus on view censors. First, we investigate their intrinsic lim- itations, and then show how these limitations can be circumvented. We propose algorithms for computing optimal view censors for knowledge bases with OWL 2 RL, EL and QL ontologies under relatively minor restrictions. Our algorithms, however, rely on views that can be of exponential size in the worst case. So, we identify natural conditions on ontologies that guarantee polynomial size of optimal views. In particular, all OWL 2 QL ontologies satisfy these conditions. In Section 6 we turn our attention to obstruction censors and provide suf- ficient and necessary conditions for an optimal such censor to exist. Then, we propose algorithms for computing optimal obstruction censors for knowledge bases with OWL 2 QL as well as restricted OWL 2 RL ontologies, which are based on obstructions of polynomial size. 2 Preliminaries We adopt standard notions in first-order logic over finite function-free signatures. We treat equality ≈ as an ordinary predicate, but assume that any set of formulae Σ contains all the axioms of equality for Σ. A fact is a ground, equality-free atom, and a dataset is a finite set of facts. A structure I is a pair (∆I , ·I ) of a domain and interpretation function for the symbols in the signature. We define homomorphisms between structures I and J in the standard way and I ,→ J denotes the fact that such a homomorphism from I to J exists. (1 ) A(x) ∧ R(x, y1 ) ∧ B(y1 ) ∧ R(x, y2 ) ∧ B(y2 ) → y1 ≈ y2 , (2 ) R(x, y) → S(x, y), (3 ) A(x) → ∃y.[R(x, y) ∧ B(y)], (4 ) A(x) → x ≈ a, (5 ) R(x, y) ∧ S(y, z) → T (x, z), (6 ) A(x) → R(x, x), (7 ) A(x) ∧ B(x) → C(x), (8 ) R(x, x) → A(x), (9 ) A(x) ∧ R(x, y) → B(y), (10 ) R(x, y) → S(y, x), (11 ) R(x, a) → B(x), (12 ) R(x, y) → A(y), (13 ) A(x) → R(x, a), (14 ) A(x) → B(x), (15 ) R(x, y) ∧ B(y) → A(x). Table 1. Horn-SROIF rules; unary predicates can be >. A rule is a sentence of the form ∀x.∀z.[ϕ(x, z) → ∃y.ψ(x, y)], where x, z, and y are pairwise disjoint vectors of variables, the body ϕ(x, z) is an equality- free conjunction of atoms with variables x ∪ z, and the head ψ(x, y) is a con- junction of atoms with variables x ∪ y. For simplicity universal quantifiers in rules are usually omitted. A rule is (i) Datalog if its head consists of a single atom and y is empty; (ii) guarded if it has a body atom (a guard ) mentioning all universally quantified variables; (iii) linear if it has a single body atom; and (iv) multi-linear if the body contains only guards. An ontology is a finite set of rules. We assume that both rule heads and bodies are non-empty and they do not contain the nullary atoms > and ⊥. Thus, O ∪ D is satisfiable for each on- tology O and dataset D, and O 6|= α for each fact α, which ensures a separation between schema and data. To capture all OWL 2 profiles, we focus on Horn-SROIF. Table 1 provides the normalised axioms of this DL in the form of rules. To capture the semantics of >, usually allowed in DLs, we treat it as a unary predicate and assume that each ontology O contains the rule P (x1 , . . . , xn ) → >(xi ) for each predicate P and 1 ≤ i ≤ n. A Horn-SROIF ontology is in (i) RL if it has no rules of Type (3); (ii) QL if it contains only rules of Types (2), (3), (10), (12), and (14); (iii) EL if it does not contain rules of Types (1), (9), and (10). A conjunctive query (CQ) is a formula Q(x) of the form ∃y.ϕ(x,Wy), with ϕ(x, y) a conjunction of atoms. A union of CQs (UCQ) is a formula i Qi (x), with each Qi (x) a CQ. A CQ is Boolean (BCQ) if x is empty. A tuple of constants t is a (certain) answer to Q(x) over ontology O and dataset D if O ∪ D |= Q(t). Then, cert(Q, O, D) is the set of answers to Q(x) over O and D. Given a BCQ Q, A[Q] denotes the structure interpreting each relation R with (f (u1 ), . . . f (un )) for every atom R(u1 , . . . , un ) in Q, where f maps each constant in Q to itself and each variable y to a fresh element dy in the structure. 3 Basic Framework Given an ontology O and dataset D, we assume that D is hidden while O is known to users, who can formulate arbitrary CQs via a query interface. A policy, which is unknown to users, is given as a set of facts entailed by O∪D. It is assumed that system administrators are in charge of specifying policies, and that each policy is assigned to a specific (group of) users by means of standard mechanisms such as role-based access control techniques [17]. Definition 1. A policy P for O and D is a dataset such that O ∪ D |= P. A CQE-instance I is a triple (O, D, P), with P a policy for O and D. Example 1. Consider the following ontology and dataset that describe an excerpt of a social network including information about movies: Oex = {FOf (x, y) →FOf (y, x), Susp(x)∧Cr (x) →Thr (x), Likes(x, y) ∧ Thr (y) → ThrFan(x)}, Dex = {FOf (John, Bob), FOf (Bob, Mary), Likes(John, Seven), Likes(Bob, Seven), Susp(Seven), Cr (Seven)}. Here, Oex states that friendship is symmetric; movies that are both suspense and crime are thrillers; and everyone who likes a thriller is a thriller fan. Assume that John wants to hide his friend list. Then, Pex = {αex } with αex = FOf (John, Bob), and Iex = (Oex , Dex , Pex ). A key component of a CQE system is the censor, whose goal is to decide accord- ing to the policy which query answers can be safely returned to users. Definition 2. A censor for a CQE-instance I = (O, D, P) is a function cens that maps each CQ Q to a subset of cert(Q, O, D). The characteristic theory Thcens of cens is the (possibly infinite) set of sentences {Q(t) | t ∈ cens(Q) and Q(x) is a CQ}. Censor cens is confidentiality preserving if O ∪ Thcens 6|= α for each α ∈ P. It is optimal if (i) it is confidentiality preserving and (ii) no confidentiality preserving censor cens0 6= cens exists such that cens(Q) ⊆ cens0 (Q) for every Q. Intuitively, Thcens represents the information that a user can potentially gather by asking an unbounded number of queries to the system. If the censor is confi- dentiality preserving, then no information can be obtained about P, regardless of the CQs asked. Finally, optimal censors maximise information accessibility without compromisig the policy. 4 View and Obstruction Censors The idea behind view censors is to associate to a CQE instance I a new dataset, called a view. Intuitively, a view encodes the information that a user is allowed to see. The user gets only those query answers that follow from O and this view. In this way, the main workload of the censor boils down to the computation of certain answers, which can be fully delegated to the query answering engine.1 1 We assume that all the definitions in this section are parameterised by a (fixed) instance I = (O, D, P). Definition 3. The view censor vcensV I for I based on a dataset V (a view), is the function mapping each CQ Q(x) to the set cert(Q, O, D) ∩ cert(Q, O, V). Obviously, if we want the censor to enjoy the properties we are after, the view V must be constructed with care. For the censor to be confidentiality preserving, O ∪ V must not entail any atom from the policy P, and to be optimal V must encode as much information from the hidden dataset as possible. Example 2. Consider a view Vex obtained from the dataset Dex by replacing Bob with a fresh constant an b . Intuitively, Vex is the result of “anonymising” the constant Bob, while keeping the structure of the data intact. Since Vex contains no information about Bob, we have Oex ∪ Vex 6|= αex , that is the censor based on Vex is confidentiality preserving. View Vex , however, is not optimal: Oex ∪ Vex does not entail the fact Likes(Bob, Seven), which is harmless for confidentiality. 0 0 Indeed, Oex ∪Vex 6|= αex holds for the extension Vex of Vex with Likes(Bob, Seven). View censors require the ability to materialise implicit data, and hence are es- pecially well-suited for RDF-based applications, in which reasoning is performed by a triple store. In OBDA scenarios, however, data is typically managed by an RDBMS and materialisation is not possible. To fulfill the requirement of OBDA applications, we need a different kind of censors. The idea behind obstruction censors is to associate to I an obstruction in the form of a Boolean UCQ U , such that given a query Q(x) and an answer t over O and D, the censor returns t only if no CQ in U follows from Q(t). Thus, the obstruction can be seen as a collection of “forbidden query patterns”, which should not be disclosed. Definition 4. The obstruction censor ocensU I for I based on a Boolean UCQ U (called an obstruction) is the function mapping each CQ Q(x) to the set {t | t ∈ cert(Q, O, D) and A[Q(t)] 6|= U }. Similarly to view censors, obstruction censors do not require dedicated algo- rithms: checking A[Q(t)] |= U can be delegated to an RDBMS. Also, obstruc- tions can be maintained virtually without the need of data materialisation. Example 3. The censor based on Vex from Example 2 can also be realised with following obstruction Uex : ∃x.FOf (x, Bob) ∨ ∃x.FOf (Bob, x) ∨ ∃x.Likes(Bob, x) ∨ ThrFan(Bob). Intuitively, Uex “blocks” query answers involving Bob; and all other answers are the same as over Oex ∪ Dex . As seen in Examples 2 and 3, the same censor may be based on both a view and an obstruction. View and obstruction censors, however, behave rather dif- ferently: a view explicitly encodes the information accessible to users, whereas obstructions specify information which users are denied access to. Thus, obstruc- tions are dual to views. Unsurprisingly, even in simple cases it is not obvious whether (and how) a view can be realised by an obstruction, or vice-versa. We next focus on Datalog ontologies and characterise when a given view V and obstruction U yield the same censor. Each Datalog ontology O and dataset D have a unique least Herbrand model HO,D , that is a finite structure that sat- isfies t ∈ cert(Q, O, D) iff A[Q(t)] ,→ HO,D and hence captures the information relevant to query answering. We can then formalise the duality between views and obstructions in a natural way: U and V implement the same censor iff U cap- tures the structures not homomorphically embeddable into HO,V . To formalise this statement, we recall the notion of (non-uniform) constraint satisfaction [13]. Definition 5. Let J be a finite structure and C a class of finite structures. The CSP of J relative to C (denoted CSP[C] (J )) is the set {I ∈ C | I ,→ J }. A central problem is to determine whether a class of finite structures can be captured by a single formula. Definition 6. Let C be a class of finite structures and let C 0 ⊆ C. First-order sentence ψ defines C 0 if I ∈ C 0 is equivalent to I |= ψ for every structure I ∈ C. The correspondence between view and obstruction censors is then as follows. Theorem 1. Let I = (O, D, P) be a CQE-instance with O Datalog ontology, and C the class of finite structures I with I ,→ HO,D . Then, vcensV U I = ocensI iff U defines ¬CSP[C] (HO,V ), for any view V and obstruction U . Using Theorem 1 together with definability results in Finite Model Theory, we can show that views and obstructions cannot simulate one another in general. Theorem 2. There is a CQE-instance for which there exists a view censor, but no obstruction censor. There is a CQE-instance for which there exists an obstruction censor, but no view censor. 5 Optimal View Censors Our discussion in Section 4 shows that view and obstruction censors should be studied independently. In this section, we focus on view censors. Before investigating the design of view-based CQE algorithms, we first es- tablish the theoretical limitations of our approach. We show that an optimal view-based censor for an instance I is not guaranteed to exist since the optimal- ity requirement may lead to infinite “views”, even for EL and RL ontologies. Theorem 3. There are CQE-instances I1 and I2 such that - the ontology of I1 uses rules of Types (1) and (9), and - the ontology of I2 uses rules of Types (5), (8) and (15), for which no optimal view censors exist. The construction of I1 shows that equality rules lead to non-existence of optimal views; in turn, the construction of I2 shows that equality is not needed to preclude optimality in the presence of recursion, transitivity axioms, and Self restrictions. MovFan anj MovFan ThrFan anb2 ThrFan anb1 MovFan John Bob Fig. 1. Essential part of an exhaustive view (solid arrows represent FOf relation). 5.1 View Censors for Guarded RL We next describe how to compute an optimal view for guarded RL ontologies. The idea is to create anonymised copies of constants in the data to encode the information required for optimality. Such a view may use exponentially many anonymised copies of constants. Definition 7. Let I = (O, D, P) be a CQE-instance with O a guarded RL on- tology. A view V consisting of unary atoms Vc1 on constants from I, unary atoms V∃1 on new constants (anonymised copies), and binary atoms V 2 , is exhaustive on I if it satisfies all of the following. 1. The part Vc1 is a maximal set of unary atoms from HO,D such that O∪Vc1 6|= α for each α ∈ P. 2. For each constant a from I, and each set A of unary predicates, such that - A(a) ∈ HO,D for every A ∈ A, - if A, B ∈ A, and O |= A(x) ∧ B(x) → C(x), then C ∈ A, - if A ∈ A, then O 6|= A(x) → x ≈ a for each a, - O ∪ Vc1 ∪ {A(a0 )|A ∈ A} 6|= α for each α ∈ P and a fresh constant a0 , the view V uses a fresh constant aA , and the part V∃1 contains all unary atoms A(aA ) such that A ∈ A. 3. For each constant a from I, let σa be the set of constants consisting of a itself and all the constants aA . The binary part V 2 of the view V contains the atom R(a∗1 , a∗2 ) for constants a∗1 , a∗2 iff - R(a1 , a2 ) ∈ HO,D , where a∗1 ∈ σa1 and a∗2 ∈ σa2 , - O ∪ {R(a∗1 , a∗2 )} 6|= α for any α ∈ P, and - O ∪ Vc1 ∪ V∃1 ∪ {R(a∗1 , a∗2 )} |= A(a∗ ) implies that A(a∗ ) ∈ Vc1 ∪ V∃1 . This definition is constructive and it is routine to devise an algorithm, which for any instance (non-deterministically) constructs an exhaustive view. Example 4. Consider the following CQE-instance (O, D, P): O = {ThrFan(x) → MovieFan(x), ThrFan(y) ∧ FOf (x, y) → MovieFan(x)}, D = {FOf (John, Bob), ThrFan(John), ThrFan(Bob)}, P = {MovieFan(Bob), MovieFan(John)}. The essential part of the exhaustive view on this CQE-instance is given in Fig- ure 1, where Vc1 = ∅, V∃1 contains unary atoms over the anonymised copies an1b , an2b of Bob, and anj of John, and V 2 contains the depicted binary atoms. Two anonymised copies of Bob are necessary in any optimal view for I to answer correctly “harmless” queries like ∃x, y, z.ThrFan(z) ∧ MovieFan(z) ∧ FOf (y, z) ∧ ThrFan(y) ∧ MovieFan(y) ∧ FOf (y, x) ∧ MovieFan(x) ∧ FOf (John, x). The following theorem formulates the desired properties of exhaustive views. Theorem 4. Let I = (O, D, P) be a CQE-instance with O a guarded RL ontol- ogy, and V an exhaustive view on I. Then V is optimal. Furthermore, if O is linear, then vcensV I is the only optimal censor for I. The proof relies on the following facts. First, the construction ensures that HO,V = V for any exhaustive view V, that is, no rules are applicable to V. Also, properties of Vc1 , V∃1 , and V 2 guarantee that V does not entail any policy atom. Optimality follows from the fact that for any a in I, each combination of its unary atoms that satisfies the relevant axioms in O is “witnessed” by a new constant from σa , and all possible binary atoms which are compatible with those combinations are added to the view. Then, no essentially new atom can be “added” to the view V without disclosing a policy. The uniqueness of the optimal censor for linear ontologies follows from the lack of choices in the construction of V. An exhaustive view may use exponentially many constants. However, for multi-linear ontologies, optimal views are of polynomial size. Proposition 1. Let I = (O, D, P) be a CQE-instance with O a multi-linear RL ontology. There is an optimal censor for I based on a view of polynomial size. 5.2 View Censors for EL and QL In contrast to OWL 2 RL, the QL and EL profiles can capture existentially quantified knowledge. To bridge this gap, we show that, under some mild con- ditions, we can transform an ontology O into a Datalog ontology O0 such that an optimal view for (O, D, P) can be directly obtained from such a view for (O0 , D, P). Thus, devising an optimal view censor for an instance is reduced to devising one for an instance with a Datalog ontology. Definition 8. Let σ be a set of constants. A Datalog ontology O0 is a ( Datalog) σ-rewriting of an ontology O if for each fact β and dataset D over constants from σ we have that O ∪ D |= β iff O0 ∪ D |= β. Proposition 2. Let I = (O, D, P) be a CQE-instance with D using set of con- stants σ, O0 a σ-rewriting of O such that O0 |= O, and V 0 an optimal view for (O0 , D, P). Then HO0 ,V 0 is an optimal view for I. Now, we just need to transform a QL (or guarded EL) ontology into a stronger guarded RL ontology, which, however, entails the same facts for any dataset. We exploit techniques developed for the combined approach to query answering [14– 16, 19]. The idea is to transform rules of Type (3) into Datalog by Skolemising existentially quantified variables into globally fresh constants. Such transfor- mation strengthens the ontology; however, if applied to a QL or guarded EL ontology, it preserves entailment of facts for any dataset over σ [19]. Definition 9. Let O be an ontology and σ a set of constants. The ontology Ξσ (O) is obtained from O by replacing each rule of the form A(x) → ∃y.[R(x, y)∧ B(y)] with A(x) → P (x, a), P (x, y) → R(x, y), P (x, y) → B(y), where P is a fresh predicate and a is a globally fresh constant not from σ, unique to A and R.2 Proposition 3. If O is a Horn-SROIF ontology, then Ξσ (O) |= O. If also O is either a QL or guarded EL ontology, then Ξσ (O) is a σ-rewriting of O. Propositions 2 and 3 ensure that HΞσ (O),V is optimal for I = (O, D, P) with O a QL or guarded EL ontology, whenever V is such a view for (Ξσ (O), D, P). The transformation of O to Ξσ (O) preserves linearity and guardedness, so Ξσ (O) is a guarded RL ontology, and the results of Section 5.1 are applicable. Theorem 5. A CQE-instance with a QL or guarded EL ontology has an optimal view censor. For QL, it is unique and can be based on a polynomial size view. 6 Obstruction Censors We start our study of obstruction censors by focusing on Datalog ontologies and characterising optimality in terms of resolution proofs of the policy. To this end, we first recapitulate the standard notions on (clause) SLD resolution. Definition 10. A goal is a conjunction ofVatoms. The SLD resolution step takes k a goal β1 ∧ . . . ∧ βm and a Datalog rule i=0 γi → δ and produces a new goal Vk ( i=0 γi θ) ∧ β2 θ ∧ . . . ∧ βm θ, where θ is a most general unifier (MGU) of β1 and δ. A proof of a goal G0 in a Datalog ontology O and dataset D is a sequence r1 ,θ1 r2 ,θ2 rn ,θn G0 −− −→ G1 −−−→ . . . −−−−→ Gn , where Gn = > and the goal Gi is obtained from the goal Gi−1 and sentence ri ∈ O ∪D by an SLD resolution step with MGU θi . SLD resolution is sound and complete: for each satisfiable O ∪ D and goal G, a proof of G exists in O ∪ D iff O ∪ D |= ∃∗ G, with ∃∗ G the existential closure of G. We next provide a characterisation of optimality based on proofs. Consider a policy atom α ∈ P and some proof π of α in O ∪D. If a censor answers positively sufficiently many BCQs ∃∗ G for goals G in π, then a user could “reconstruct” (a part of) π and compromise the policy. Also, there can be many proofs of α, and a user can compromise the policy by reconstructing any of them. Thus, to ensure that a censor is confidentiality preserving, we must guarantee that the obstruction contains enough CQs to prevent reconstruction of any π. If we want the censor to be optimal, the obstruction should not “block” too many queries. As we will see later on, these requirements may be in conflict and lead to an infinite “obstruction”. To formalise this intuition we need an auxiliary notion. A core of a set of Boolean CQs Q is a minimal subset C of Q such that for each Q ∈ Q there exists Q0 ∈ C with Q |= Q0 . 2 To correctly deal with Self restrictions (rules (6) and (8)) a slightly more complex transformation is required. These changes are straightforward but require introduc- ing further notation, so we present here only the basic transformation for simplicity. Definition 11. Given a CQE-instance I = (O, D, P) with O a Datalog ontology, let Q(I) be the set of all Boolean CQs ∃∗ G with G 6= > a goal in a proof of a fact α ∈ P in O ∪ D, and let S be a maximal subset of Q(I) such that O ∪ S 6|= α for any α ∈ P. Then, a pseudo-obstruction Υ of I is a core of Q(I) \ S. We now relate pseudo-obstructions and optimality. Theorem 6. Let I = (O, D, P) be a CQE-instance with O a Datalog ontology. W 1. If Υ is a finite pseudo-obstruction for I, then U = Q∈Υ Q is an optimal obstruction for I. 2. If each pseudo-obstruction for I is infinite, then no optimal obstruction censor for I exists. This theorem has consequences on the expressive power of obstructions. Us- ing the results from Section 5.1 we can see that optimal view and obstruction censors are incomparable. This complements Theorem 2, which talks about not necessarily optimal censors. Theorem 7. There is a CQE-instance with ontology in both RL and EL (respec- tively, RL) for which an optimal view (respectively, obstruction) censor exists, but no optimal obstruction (respectively, view) censor exists. Next, we show how to apply resolution-based techniques to compute opti- mal obstructions for instances with linear RL ontologies. These results are then adapted to the case of QL. The algorithm for linear RL is based on the compu- tation of the set Q(I). To do this computation efficient, we need the following auxiliary structure. Definition 12. Let O be a linear RL ontology, D a dataset, x and y fresh vari- ables, and A the set of all equality-free atoms over the signature of O∪D extended with x and y. The proof graph of O ∪ D is the directed graph with the set of nodes A ∪ {>}, and edges (β, γ) such that γ can be derived from β by means of a single SLD resolution step with a rule from O ∪ D. The following example illustrates proof graphs. 1 Example 5. Consider a CQE-instance I1ex with ontology Oex = {Likes(x, y) → 1 Movie(y), Likes(x, y) → MovieFan(x)}, dataset Dex = {Likes(John, Seven)}, 1 and the policy of single atom αex = MovieFan(John). A fragment of the proof graph is given in Figure 2. Using proof graphs we can compute optimal censors. Theorem 8. Let I = (O, D, P) be a CQE-instance with O a linear RL ontology. For each α ∈ P, let Sα be the set of nodes in the proof graph of O ∪ D in a path from α to >. Finally, let U be the Boolean UCQ _ _ ∃∗ G. α∈P G∈Sα \{>} Then, ocensU I is the unique optimal censor for I, and U can be computed in polynomial time in the size of I. 1 Movie(Seven) MovieFan(x) ↵ex = MovieFan(John) Movie(y) Likes(John, y) Likes(x, Seven) Likes(John, Seven) > Likes(x, y) 1 1 Fig. 2. Fragment of proof graph for Oex ∪ Dex Example 6. For I1ex from Example 5, there is only one path in the proof graph 1 from αex to > and Sα1ex\ {>} = {MovieFan(John), likes(John, y)}. Thus, U = MovieFan(John) ∨ ∃y.Likes(John, y) is optimal. Finally, note that the transformation of a QL ontology O to an RL ontology Ξσ (O) given in Definition 9, preserves linearity of rules. Hence, Proposition 3 and Theorem 8 yield the following result. Theorem 9. For every CQE-instance with a QL ontology there exists a unique optimal obstruction censor. 7 Discussion and Conclusions We have studied CQE in the context of ontologies. Our results yield a flexible way for system designers to ensure selective access to data and provide insights on the fundamental tradeoff between accessibility and confidentiality of information. We have proposed algorithms applicable to the profiles of OWL 2, which can be implemented using off-the-shelf query answering infrastructure. Thus, our algorithms provide a starting point to the development of CQE systems. The problems studied here remain rather unexplored and we see many open questions. From a theoretic point of view, we plan to consider policies beyond sets of facts (e.g., given as CQs). We also plan to study weaker notions of optimality that can ensure polynomiality of views and obstructions for more expressive languages. From a practical perspective, we will implement our algorithms and test their scalability using state-of-the art Datalog engines such as RDFox.3 The approach closest to ours is the view-based access authorisation frame- work in [9]. In this setting, policies are represented as authorisation views: CQs that define the only information accessible to the user; since queries are answered faithfully against the views, there is no explicit notion of policy violation. In con- trast, in our setting policies express inaccessible information, and our goal is to maximally answer queries without violating the policy. Acknowledgements. Work supported by the Royal Society, the EPSRC projects Score!, Exoda, and MaSI3 , and the FP7 project OPTIQUE. 3 http://www.cs.ox.ac.uk/isg/tools/RDFox/ References 1. Bao, J., Slutzki, G., Honavar, V.: Privacy-Preserving Reasoning on the Semantic Web. In: WI. pp. 791–797. IEEE Computer Society (2007) 2. Biskup, J., Bonatti, P.: Controlled Query Evaluation with Open Queries for a Decidable Relational Submodel. Ann. Math. and Artif. Intell. 50(1-2), 39–77 (2007) 3. Biskup, J., Bonatti, P.A.: Lying Versus Refusal for Known Potential Secrets. Data Knowl. Eng. 38(2), 199–222 (2001) 4. Biskup, J., Bonatti, P.A.: Controlled Query Evaluation for Enforcing Confidential- ity in Complete Information Systems. Int. J. Inf. Sec. 3(1), 14–27 (2004) 5. Biskup, J., Weibert, T.: Keeping Secrets in Incomplete Databases. Int. J. Inf. Sec. 7(3), 199–217 (2008) 6. Bonatti, P.A., Kraus, S., Subrahmanian, V.S.: Foundations of Secure Deductive Databases. IEEE Trans. Knowl. Data Eng. 7(3), 406–422 (1995) 7. Bonatti, P.A., Sauro, L.: A Confidentiality Model for Ontologies. In: ISWC. pp. 17–32 (2013) 8. Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: View-based Query An- swering over Description Logic Ontologies. In: KR. AAAI Press (2008) 9. Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.: View-based Query An- swering in Description Logics: Semantics and Complexity. J. Comput. Syst. Sci. 78(1), 26–46 (2012) 10. Cuenca Grau, B.: Privacy in ontology-based information systems: A pending mat- ter. Semantic Web 1(1-2), 137–141 (2010) 11. Cuenca Grau, B., Kharlamov, E., Kostylev, E.V., Zheleznyakov, D.: Controlled Query Evaluation over OWL 2 RL Ontologies. In: ISWC. pp. 49–65 (2013) 12. Cuenca Grau, B., Motik, B.: Reasoning over Ontologies with Hidden Content: The Import-by-Query Approach. J. Artif. Intell. Res. 45, 197–255 (2012) 13. Kolaitis, P.G., Vardi, M.Y.: A Logical Approach to Constraint Satisfaction. In: Complexity of Constraints. pp. 125–155 (2008) 14. Kontchakov, R., Lutz, C., Toman, D., Wolter, F., Zakharyaschev, M.: The Com- bined Approach to Ontology-Based Data Access. In: IJCAI. pp. 2656–2661 (2011) 15. Lutz, C., Seylan, I., Toman, D., Wolter, F.: The Combined Approach to OBDA: Taming Role Hierarchies Using Filters. In: ISWC. pp. 314–330 (2013) 16. Lutz, C., Toman, D., Wolter, F.: Conjunctive Query Answering in the Description Logic EL Using a Relational Database System. In: IJCAI. pp. 2070–2075 (2009) 17. Sandhu, R.S., Coyne, E.J., Feinstein, H.L., Youman, C.E.: Role-Based Access Con- trol Models. IEEE Computer 29(2), 38–47 (1996) 18. Sicherman, G.L., de Jonge, W., van de Riet, R.P.: Answering Queries Without Revealing Secrets. ACM Trans. Database Syst. 8(1), 41–59 (1983) 19. Stefanoni, G., Motik, B., Horrocks, I.: Introducing Nominals to the Combined Query Answering Approaches for EL. In: AAAI (2013) 20. Stouppa, P., Studer, T.: A Formal Model of Data Privacy. In: PSI (2007) 21. Tao, J., Slutzki, G., Honavar, V.: Secrecy-Preserving Query Answering for Instance Checking in EL. In: RR. pp. 195–203 (2010)