A mechanism for ontology confidentiality P. A. Bonatti, I. M. Petrova and L. Sauro Dept. of Electrical Engineering and Information Technologies Università di Napoli “Federico II” Abstract. We illustrate several novel attacks to the confidentiality of knowledge bases (KB). Then we introduce a new confidentiality model, sensitive enough to detect those attacks, and a method for constructing secure KB views.We identify safe approximations of the background knowledge exploited in the attacks; they can be used to reduce the complexity of constructing secure KB views. Finally we describe a prototype implementation of the new approach that suggests its applicability in practice. 1 Introduction Ontology languages and Linked Open Data are increasingly being used to encode the private knowledge of companies and public organizations. Semantic Web techniques make possible to merge di↵erent sources of knowledge and extract implicit information, putting on risk security and privacy of individuals. Even the authors of public ontolo- gies may want to hide some axioms to capitalize on their formalization e↵orts. Several approaches have been proposed in order to tackle the confidentiality requirements that arise form these scenarios. The most popular security criterion is that the published view of the knowledge base should not entail a secret sentence. However, there exist attacks that cannot be prevented this way. The user may exploit various sources of background knowledge and metaknowledge to reconstruct the hidden part of the knowledge base. This paper contributes to the area of knowledge base confidentiality in several ways: (i) It highlights some vulnerabilities of the approaches that can be found in the literature, (Sec. 3). (ii) It introduces a stronger confidentiality model that takes both object-level and meta-level background knowledge into account (Sec. 4), and it defines a method for computing secure knowledge views (Sec. 5) that generalizes some previous approaches. (iii) It proposes a safe approximation of background metaknowledge (Sec. 6 and 7). (iv) It investigates the computational complexity of constructing secure knowledge base views with our methodology (Sec. 7). (v) It describes a prototypical implementation of the new framework (Sec. 9) The paper is closed by a discussion of related work (Sec. 10), and conclusions. Proofs are omitted due to space limitations. 2 Preliminaries on Description Logics We assume the reader to be familiar with description logics, and refer to [1] for all def- initions and results. We assume a fixed, denumerable signature ⌃ specifying the names of concepts, roles, and individuals. Our framework is compatible with any description 147 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality logic DL that enjoys compactness (needed by Theorem 6) and has decidable reasoning problems (e.g., ALC, EL, SHIQ, etc.). We simply assume that our reference logical language L is generated from ⌃ by the grammar of the selected logic DL. By axioms, we mean members of L, unless stated otherwise. A knowledge base is any subset of L.1 Recall that axioms are expressions of the form C v D, R v S , C(a), and R(a, b) where C, D are concept expressions, R, S are role expressions, and a, b are individual constants. In some DL, an individual constant a may occur also in a nominal, that is, a concept expression {a} denoting the singleton containing a. The axioms involving v are called inclusions (or subsumptions), while C(a) and R(a, b) are called assertions. In the simplest case, C and R are first order predicates and assertions are actually standard first-order atomic formulae. Inclusions are syntactic variants of logical implications. The notion of logical consequence is the classical one; for all K ✓ L, the logical consequences of K will be denoted by Cn(K) (K ✓ Cn(K) ✓ L). 3 A simple confidentiality model The most natural way of preserving confidentiality in a knowledge base KB is checking that its answers to user queries do not entail any secret. Conceptually, the queries of a user u are answered using u’s view KBu of the knowledge base, where KBu is a maximal subset of KB that entails no secret. In order to illustrate some possible attacks to this mechanism, let us formalize the above simple confidentiality model (SCM).2 It consists of: the knowledge base KB (KB ✓ L); a set of users U; a view KBu ✓ KB for each u 2 U; a set of secrecies S u ✓ L for each u 2 U. Secrecies are axioms that may or may not be entailed by KB; if they do, then they are called secrets and must not be disclosed to u. Revealing that a secrecy is not entailed by KB is harmless, cf. [4]. A view KBu is secure i↵ Cn(KBu ) \ S u = ;. A view KBu is maximal secure if it is secure and there exists no K such that KBu ⇢ K ✓ KB and Cn(K) \ S u = ;. Attacks using object-level background knowledge. Frequently, part of the domain knowledge is not axiomatized in KB, therefore checking that Cn(KBu )\S u = ; does not suffice in practice to protect confidentiality. For example, suppose that there is one secret S u = {OncologyPatient(John)} and KBu = {SSN(John, 12345), SSN(user123, 12345), OncologyPatient(user123)}. KBu does not entail OncologyPatient(John), so according to the SCM model KBu is secure. However, it is common knowledge that a SSN uniquely identifies a person, then the user can infer that John = user123, and hence the secret. In other examples, the additional knowledge used to infer secrets may be stored in a public ontology or RDF repository, and confidentiality violations may be automated. Attacks to complete knowledge. Suppose the attacker knows that KB has complete knowledge about a certain set of axioms. Then the attacker may be able to reconstruct some secrets from the “I don’t know” answers of a maximal secure view KBu . Example 1. Consider a company’s knowledge base that defines a concept Employee and a role works for that describes which employees belong to which of the n departments 1 Real knowledge bases are finite, but this restriction is not technically needed until Sec. 7. 2 This usage of term “model” is common in Security & Privacy. 148 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality of the company, d1 , . . . , dn . The KB consists of assertions like: Employee(e) (1) works for(e, di ) (2) where we assume that each employee e belongs to exactly one department di . A user u is authorized to see all assertions but the instances of (2) with i = n, because dn is a special department, devoted to controlling the other ones. So S u (the set of secrecies for u) is the set of all assertions works for(e, dn ). Note that there is one maximal secure view KBu . It consists of all instances of (1), plus all instances of (2) such that i , n. Clearly, KBu is secure according to SCM (because Cn(KBu ) \ S u = ;). However, observe that works for(e, dn ) 2 Cn(KB) i↵ Employee(e) 2 Cn(KBu ) and for all i = 1, . . . , n, works for(e, di ) < Cn(KBu ) (that is, the members of dn are all the employees that apparently work for no department). Using this property (based on the knowledge that for each employee e, KB contains exactly one assertion works for(e, di )) and the knowledge of the protection mechanism (i.e. maxi- mal secure views), that we assume to be known by attackers by Kercho↵’s principle, a smart user can easily identify all the members of dn . t u In practice, it is not hard to identify complete knowledge. A hospital’s KB is ex- pected to have complete knowledge about which patients are in which ward; a com- pany’s KB is likely to encode complete information about its employees, etc. Some approaches filter query answers rather than publishing a subset of KB [8, 13, 15]. We call our abstraction of this method simple answer confidentiality model (SACM). It is obtained from the SCM by replacing the views KBu ✓ KB with answer views KBau ✓ Cn(KB). The di↵erence is that KBau is not required to be a subset of KB and—conceptually—KBau may be infinite. KBau is secure i↵ Cn(KBau ) \ S u = ;. The reader may easily verify that the SACM is vulnerable to the two kinds of attacks illustrated for the SCM. It is also vulnerable to a third kind of attacks, illustrated below. Attacks to the signature. Suppose the user knows the signature of KB well enough to identify a symbol that does not occur in KB. First assume that is a concept name. It can be proved that: Proposition 1. If KBau is a maximal secure answer view and is a concept name not occurring in KB, then for all secrecies C v D 2 S u , KBau |= C u v D i↵ KB |= C v D. The problem is that although C u v D does not entail the secret inclusion C v D, still a smart user knows that the former inclusion cannot be proved unless KB entails also the latter (then maximal secure answer views generally fail to protect secrets). This attack can be easily adapted to the case where is a role name. In practice, it is not necessary to be sure that does not occur in KB. The attacker may make a sequence of educated guesses (say, by trying meaningless long strings, or any word that is clearly unrelated to the domain of the KB); after a sufficient number of trials, the majority of answers should agree with the “real” answer with high probability. Rejecting queries whose signature is not contained in KB’s signature mitigates this kind of attacks but it leaks KB’s signature and it does not provide a complete solution. The attacker may still guess a which is logically unrelated to C and D and carry out a similar attack. 149 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality 4 A meta-safe confidentiality model In this section we introduce a confidentiality model that makes the vulnerabilities illus- trated above visible, by taking into account object- and meta-level background knowl- edge. A bk-model M = hKB, U, f, hS u , PKBu , BK u iu2U i consists of a knowledge base KB ✓ L, a set of users U, plus: – a filtering function f : }(L) ⇥ U ! }(L), mapping each knowledge base K and each user u on a view f (K, u) ✓ Cn(K); – for all u 2 U: • a finite set of secrecies S u ✓ L; • a set of axioms BK u ✓ L, encoding the users’ object-level knowledge; • a set of possible knowledge bases PKBu ✓ }(L) (users’ metaknowledge).3 The view of KB released to a user u is f (KB, u). We adopt PKB because at this stage we do not want to tie our framework to any specific metalanguage. PKB represents the knowledge bases that are compatible with the user’s metaknowledge. Definition 1. A filtering function f is secure (w.r.t. M) i↵ for all u 2 U and all s 2 S u , there exists K 2 PKBu such that: 1. f (K, u) = f (KB, u); 2. s < Cn(K [ BK u ). Intuitively, if f is safe according to Def. 1, then no user u can conclude that any secret s is entailed by the KB she is interacting with—enhanced with the object-level back- ground knowledge BK u —for the following reasons: By point 1, KB and K have the same observable behavior, and K is a possible knowledge base for u since K 2 PKBu ; therefore, as far as u knows, the knowledge base might be K. Moreover, by point 2, K and the object-level background knowledge BK u do not suffice to entail the secret s. In the rest of the paper we tacitly assume that no secret is violated a priori, that is, for all secrets s 2 S u there exists K 2 PKBu such that s < Cn(K [ BK u ).4 Moreover, in order to improve readability, we shall omit the user u from subscripts and argument lists whenever u is irrelevant to the context. The attacks discussed in Section 3 can be easily formalized in this setting; so, in general, the maximal secure views of SCM are not secure according to Def. 1. Example 2. Example 1 can be formalized in our model as follows: The set of secrecies S is the set of all assertions works for(e, dn ); BK = ; and PKB is the set of all the knowledge bases K that consist of assertions like (1) and (2), and such that for each axiom Employee(e), K contains exactly one corresponding axiom works for(e, di ) and viceversa. The filtering function f maps each K 2 PKB on the maximal subset of K that entails none of S ’s members, that is, f (K) = K \ S (by definition of PKB). Note that f is injective over PKB, so condition 1 of Def. 1 is satisfied only if K = KB. So, if KB contains at least one secret, then the conditions of Def. 1 cannot be satis- fied, that is, maximal secure SCM views are not secure in our model. Indeed, KB can be 3 In practice, bk-models are finite, and filterings computable, but no such assumption will be technically needed until Sec. 7. 4 Conversely, no filtering function can conceal a secret that is already known by the user. 150 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality reconstructed from the secure view by observing that KB = f (KB) [ {works for(e, dn ) | Employee(e) 2 f (KB) ^ 8i = 1, . . . , n, works for(e, di ) < f (KB)}. t u Similarly, the formalizations of the other attacks yield injective filtering functions (the details are left to the reader). 5 A meta-secure query answering mechanism In this section we introduce a secure filtering function. It is formulated as an itera- tive process based on a censor, that is a boolean function that decides for each axiom whether it should be obfuscated to protect confidentiality. The iterative construction manipulates pairs hX + , X i 2 }(L) ⇥ }(L) that represent a meta constraint on possible knowledge bases: we say that a knowledge base K satisfies hX + , X i i↵ K entails all the sentences in X + and none of those in X (formally, Cn(K) ◆ X + and Cn(K) \ X = ;). Let PAX (the set of possible axioms) be the set of axioms that may occur in the S knowledge base according to the user’s knowledge, i.e. PAX = K 0 2PKB K 0 . Let ⌫ = |PAX| + 1 if PAX is finite and ⌫ = ! otherwise; let ↵1 , ↵2 , . . . , ↵i , . . . be any enumeration of PAX (i < ⌫).5 The secure view construction for a knowledge base K in a bk-model M consists of the following, inductively defined sequence of pairs hKi+ , Ki ii 0 : – hK0+ , K0 i = h;, ;i , and for all 1  i < ⌫ , hKi+1 + , Ki+1 i is defined as follows: • if censorM (Ki , Ki , ↵i+1 ) = true then let hKi+1 + + , Ki+1 i = hKi+ , Ki i ; • if censorM (Ki , Ki , ↵i+1 ) = f alse and K |= ↵i+1 then + hKi+1 + , Ki+1 i = hKi+ [ {↵i+1 }, Ki i; • otherwise let hKi+1 + , Ki+1 i = hKi+ , Ki [ {↵i+1 }i . S S Finally, let K + = i<⌫ Ki+ , K = i<⌫ Ki , and fM (K, u) = K + . Note that the inductive construction aims at finding maximal sets K + and K that (i) partly describe what does / does not follow from K (as K satisfies hK + , K i by construction), and (ii) do not trigger the censor (the sentences ↵i+1 that trigger the censor are included neither in K + nor in K , cf. the induction step). In order to define the censor we need an auxiliary definition that captures all the sen- tences that can be entailed with the background knowledge BK and the meta-knowledge PKB enriched by a given constraint hX + , X i analogous to those adopted in the iterative construction: Let CnM (X + , X ) be the set of all axioms ↵ 2 L such that for all K 0 2 PKB such that K 0 satisfies hX + , X i, ↵ 2 Cn(K 0 [ BK) . (3) Now the censor is defined as follows: For all X + , X ✓ L and ↵ 2 L, 8 > > > true if there exists s 2 S s.t. either s 2 CnM (X + [ {↵}, X ) < censorM (X , X , ↵) = > + > or s 2 CnM (X + , X [ {↵}); (4) > : false otherwise. In other words, the censor checks whether telling either that ↵ is derivable or that ↵ is not derivable to a user aware that the knowledge base satisfies hX + , X i, restricts the 5 We will show later how to restrict the construction to finite sequences, by approximating PAX. 151 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality set of possible knowledge bases enough to conclude that a secret s is entailed by the knowledge base and the background knowledge BK. Note that the censor obfuscates ↵i+1 if any of its possible answers entail a secret, independently of the actual contents of K (the two possible answers “yes” and “no” correspond to conditions s 2 CnM (X + [ {↵}, X ) and s 2 CnM (X + , X [ {↵}), respec- tively). In this way, roughly speaking, the knowledge bases that entail s are given the same observable behavior as those that don’t. Under a suitable continuity assumption on CnM , this enforces confidentiality: S Theorem 1. If CnM (KB+ , KB ) ✓ i<⌫ CnM (KB+i , KBi ), then fM is secure w.r.t. M. Examples of the behavior of fM are deferred until Sec.7. 6 Approximating background knowledge Of course, the actual confidentiality of a filtering f (KB, u) depends on a careful defini- tion of the user’s background knowledge, that is, PKBu and BK u . If background knowl- edge is not exactly known, as it typically happens, then it can be safely approximated by overestimating it. More background knowledge means larger BK u and smaller PKBu , which leads to the following comparison relation k over bk-models: Definition 2. Given two bk-models M = hKB, U, f, hS u , PKBu , BK u iu2U i and M0 = hKB0 , U 0 , f 0 , hS u0 , PKB0u , BK 0u iu2U 0 i, we write M k M0 i↵ 1. KB = KB0 , U = U 0 , f = f 0 , and S u = S u0 (for all u 2 U); 2. for all u 2 U, PKBu ◆ PKB0u and BK u ✓ BK 0u . The next proposition proves that a bk-model M can be safely approximated by any M0 such that M k M0 : Proposition 2. If f is secure w.r.t. M0 and M k M0 , then f is secure w.r.t. M. Consequently, a generic advice for estimating BK consists in including as many pieces of relevant knowledge as possible, for example: (i) modelling as completely as possible the integrity constraints satisfied by the data, as well as role domain and range restrictions and disjointness constraints; (ii) including in BK all the relevant public sources of formalized relevant knowledge (such as ontologies and triple stores). While object-level background knowledge is dealt with in the literature, the general metaknowledge encoded by PKB is novel. Therefore, the next section is focussed on some concrete approximations of PKB and their properties. 7 Approximating and reasoning about possible knowledge bases In this section, we investigate the real world situations where the knowledge base KB is finite and so are all the components of bk-models (U, S u , BK u , PKBu ); then we focus on PKBu that contain only finite knowledge bases. Consequently, fM will turn out to be decidable and we will study its complexity under di↵erent assumptions. 152 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality A language for defining PKB is a necessary prerequisite for the practical imple- mentation of our framework and a detailed complexity analysis of fM . Here we express PKB as the set of all theories that are contained in a given set of possible axioms PAX 6 and satisfy a given, finite set MR of metarules like: ↵1 , . . . , ↵ n ) 1 | ... | m (n 0, m 0) , (5) where all ↵i and j are in L (1  i  n, 1  j  m). Informally, (5) means that if KB entails ↵1 , . . . , ↵n then KB entails also some of 1 , . . . , m . Sets of similar metarules can be succintly specified using metavariables; they can be placed wherever individual constants may occur, that is, as arguments of assertions, and in nominals. A metarule with such variables abbreviates the set of its ground instantiations: Given a K ✓ L, let ground K (MR) be the ground (variable-free) instantiation of MR where metavariables are uniformly replaced by the individual constants occurring in K in all possible ways. n o Example 3. Let MR = 9R.{X} ) A(X) , where X is a metavariable, and let K = n o n o R(a, b) . Then ground K (MR) = (9R.{a} ) A(a)), (9R.{b} ) A(b)) . t u If r denotes rule (5), then let body(r) = {↵1 , . . . , ↵n } and head(r) = { 1 , . . . , m }. We say r is Horn if |head(r)|  1. A set of axioms K ✓ L satisfies a ground metarule r if either body(r) * Cn(K) or head(r) \ Cn(K) , ;. In this case we write K |=m r. Example 4. Let A, B, C be concept names and R be a role name. The axiom set K = {A v 9R.B, A v C} satisfies A v 9R ) A v B | A v C but not A v 9R ) A v B. t u Moreover, if K satisfies all the metarules in ground K (MR) then we write K |=m MR. Therefore the formal definition of PKB now becomes: PKB = {K | K ✓ PAX ^ K |=m MR} . (6) In accordance with Prop. 2, we approximate PAX in a conservative way. We will analyze two possible definitions: 1. PAX 0 = KB (i.e., as a minimalistic choice we only assume that the axioms of KB are possible axioms; of course, by Prop. 2, this choice is safe also w.r.t. any larger PAX where at least the axioms of KB are regarded as possible axioms); S 2. PAX 1 = KB [ r2groundKB (MR) head(r). Remark 1. The latter definition is most natural when metarules are automatically ex- tracted from KB with rule mining techniques, that typically construct rules using mate- rial from the given KB (then rule heads occur in KB). Example 5. Consider again Example 1. The user’s metaknowledge about KB’s com- pleteness can be encoded with: Employee(X) ) works for(X, d1 ) | . . . | works for(X, dn ) , (7) 6 Di↵erently from Sec. 5, here PKB is defined in terms of PAX. 153 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality where X is a metavariable. First let PAX = PAX 1 . The secure view fM (KB) depends on the enumeration order of PAX. If the role assertions works for(e, di ) precede the con- cept assertions Employee(e), then, in a first stage, the sets KB+j are progressively filled with the role assertions with di , dn that belong to KB, while the sets KB j accumulate all the role assertions that do not belong to KB. In a second stage, the sets KB+j are further extended with the concept assertions Employee(e) such that e does not work for dn . The role assertions works for(e, dn ) of KB and the corresponding concept assertions Employee(e) are neither in KB+ nor in KB . Note that the final e↵ect is equivalent to removing from KB all the axioms referring to the individuals that work for dn . Analo- gously, in [8] the individuals belonging to a specified set are removed from all answers. Next suppose that the role assertions works for(e, di ) follow the concept assertions Employee(e), and that each works for(e, di ) follows all works for(e, dk ) such that k < i. Now all the assertions Employee(e) of KB enter KB+ , and all axioms works for(e, di ) with i < n 1 enter either KB+ or KB , depending on whether they are members of KB or not. Finally, the assertions works for(e, di ) 2 Cn(KB) with i 2 {n 1, n} are inserted neither in KB+ nor in KB , because the corresponding instance of (7) with X = e has the body in KB+ and the first n 2 alternatives in the head in KB , therefore a negative an- swer to works for(e, dn 1 ) would entail the secret works for(e, dn ) by (7). This triggers the censor for all assertions works for(e, dn 1 ). Summarizing, with this enumeration or- dering it is possible to return the complete list of employees; the members of dn are protected by hiding also which employees belong to dn 1 . Finally, let PAX = PAX 0 . Note that in this case all possible knowledge bases are sub- sets of KB, that contains exactly one assertion works for(e, di(e) ) for each employee e. To satisfy (7), every K 2 PKB containing Employee(e) must contain also works for(e, di(e) ). It follows that fM must remove all references to the individuals that work for dn , as it happens with the first enumeration of PAX 1 . t u Definition 3. A bk-model M is canonical if for all users u 2 U, PAX u is either PAX 0 or PAX 1 and PKBu is defined by (6) for a given MRu . Moreover, M is in a description logic DL if for all u 2 U, all the axioms in KB, PKBu , BK u , and S u belong to DL. The size of PAX 0 and PAX 1 is polynomial in the size of KB [ MR, therefore PKB is finite and exponential in the size of KB [ MR. Finiteness implies the continuity hypoth- esis on CnM of Theorem 1, and hence (using Theorem 1 and Prop. 2): Theorem 2. If M is canonical, then fM is secure with respect to all M0 k M. First we analyze the complexity of constructing the secure view fM (KB) when the underlying description logic is tractable, like EL and DL-lite for example. Lemma 1. If the axioms occurring in MR and K are in a DL with tractable subsump- tion and instance checking, then checking K |=m MR is: 1. in P if either MR is ground or there exists a fixed bound on the number of distinct variables in MR; 2. coNP-complete otherwise. With Lemma 1, one can prove the following two lemmas. 154 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality Lemma 2. Let M range over canonical bk-models. If M, s, X + , and X are in a DL with tractable subsumption/instance checking, and the number of distinct variables in MR is bounded by a constant, then checking whether s 2 CnM (X + , X ) is: 1. in P if MR is Horn and PAX = PAX 1 ; 2. coNP-complete if either MR is not Horn or PAX = PAX 0 . Lemma 3. Let M be a canonical bk-model. If M, s, X + , and X are in a DL with tractable entailment problems, and there is no bound on the number of variables in the metarules of MR, then checking s 2 CnM (X + , X ) is: 1. in PNP if MR is Horn and PAX = PAX 1 ; 2. in ⇧2p if either MR is not Horn or PAX = PAX 0 . The value of censor(X + , X , ↵) can be computed straightforwardly by iterating the tests s 2 CnM (X + [ {↵}, X ) and s 2 CnM (X + , X [ {↵}) for all secrets s 2 S . Since the set of secrets is part of the parameter M of the filtering function, the number of iterations is polynomial in the input and the complexity of the censor is dominated by the complexity of CnM (). The latter is determined by Lemma 2 and Lemma 3, so we immediately get: Corollary 1. Let M be a canonical bk-model and suppose that M, X + , X , and ↵ are in a DL with tractable entailment problems. If the number of distinct variables in MR is bounded by a constant, then computing censor(X + , X , ↵) is: – in P if MR is Horn and PAX = PAX 1 ; – coNP-complete if either MR is not Horn or PAX = PAX 0 . If there is no bound on the number of variables in the metarules of MR, then computing censor(X + , X , ↵) is: – in PNP if MR is Horn and PAX = PAX 1 ; – in ⇧2p if either MR is not Horn or PAX = PAX 0 . We are now ready to analyze the complexity of filtering functions: Theorem 3. If M is a canonical bk-models in a DL with tractable entailment problems, then computing fM (KB) is: 1. P-complete if the number of distinct variables in the rules of MR is bounded, MR is Horn, and PAX = PAX 1 ; 2. PNP -complete if the number of distinct variables in MR is bounded, and either MR is not Horn or PAX = PAX 0 ; 3. in PNP if the variables in MR are unbounded, MR is Horn, and PAX = PAX 1 ; 4. in 3p if MR is not restricted and PAX 2 {PAX 0 , PAX 1 }. Theorem 4. Computing fM (KB) over canonical M in a DL with ExpTime entailment (e.g. ALCQO, ALCIO, ALCQI, SHOQ, SHIO, SHIQ), is still in ExpTime. Theorem 5. Computing fM (KB) over canonical M in SROIQ(D) is in coNPN2ExpT ime . 155 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality 8 Relationships with the SCM Here we show that the meta-secure framework is a natural generalization of the SCM. The main result—roughly speaking—demonstrates that the SCM model can be essen- tially regarded as a special case of our framework where PKB ◆ }(KB) and BK = ;. In this case fM is secure even if M is not assumed to be canonical. Theorem 6. Let M = hKB, U, fM , hS u , PKBu , BK u iu2U i. If PKB = }(KB), BK = ;, and KB is finite, then S 1. CnM (KB+ , KB ) = i<⌫ CnM (KB+i , KBi ). 2. For all enumerations of PAX, the corresponding fM (KB, u) is logically equivalent to a maximal secure view KBu of KB according to the SCM; conversely, for all maximal secure view KBu of KB (according to the SCM) there exists an enumeration of PAX such that the resulting fM (KB, u) is logically equivalent to KBu . 3. fM is secure w.r.t. M and w.r.t. any M0 = hKB, U, fM , hS u , PKB0u , BK 0u iu2U i such that PKB0 ◆ }(KB) and BK 0 = ;. Theorem 6 applies to every canonical M such that MR = BK = ;, because MR = ; implies that PAX 0 = PAX 1 = KB and hence PKB = }(KB). This shows that the SCM can be regarded as a special case of our framework where the user has no background knowledge. Moreover, by this correspondence, one immediately obtains complexity bounds for the SCM from those for PAX 1 and Horn, bounded-variable MR. 9 Framework Implementation In this section we introduce a prototypical implementation of the framework based on PAX 1 and Horn metarules. Nowadays ontologies are managed with the help of the OWL API7 and DL reasoners which allow us to take full advantage of their rich underly- ing semantics. Unfortunately, the OWL reasoners publicly available do not o↵er native support for conjunctive query answering required to process users’ metaknowledge. A partial exception of this rule is the Pellet reasoner discussed later. Straightforward eval- uation of metarules in the presence of metavariables with an OWL reasoner would need to consider all possible ways of uniformly replacing metavariables by individual con- stants occurring in the ontology. On the other hand, the evaluation of a ground rule r with an OWL reasoner in the worst case would require checking that all the axioms ↵1 , . . . , ↵n 2 body(r) and 1 , . . . , m 2 head(r) are entailed by KB. Summing up, with this method, as the ontology ABox grow, metarule evaluation can easily become un- manageable in terms of execution time. Consequently, the presence of technologies that permit native conjunctive query evaluation reveals fundamental to achieve efficient im- plementation of the framework. SPARQL8 , the W3C standard that provide languages and protocols to query and manipulate RDF content (and so ontologies encoded in the XML/RDF Syntax), constitute a de facto standard when it comes to conjunctive query answering. It has recently been extended with the so-called entailment regimes, which 7 http://owlapi.sourceforge.net/ 8 http://www.w3.org/TR/sparql11-overview/ 156 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality define how queries are evaluated under more expressive semantics, such as OWL se- mantics, than the SPARQL standard simple entailment, based essentially on pattern matching on graphs. Unfortunately, most of the available engines do not provide sup- port for OWL reasoning. To the best of our knowledge only Apache Jena Semantic Web Toolkit and Pellet support OWL inference over the queried ontologies. Moreover, Pellet query engine seems not to have been reengineered for the last few years. It was therefore an obvious option to prefer the Jena query engine for our system. The Jena inference subsystem is designed to allow usage of a number of predefined Jena OWL reasoners, as well as external reasoners9 . However, the usage of the internal reasoners is recommended for efficiency reasons. Note, that OWL, OWL Mini, OWL Micro Jena Reasoners are a set of useful but not a full-fledged rule-based implementations of the OWL/Lite subset of the OWL/Full language. Critical not supported constructs which go beyond OWL/Lite are complementOf and oneOf, while the support for unionOf is par- tial. We consider our choice to use a Jena OWL reasoner a good compromise between expressivity and efficiency. According to the theoretical framework the system consists of two modules. The first one actuates the parsing of the user’s metaknowledge repre- sented by means of a set of metarules. The second module is in charge of the secure ontology view construction. Algorithm 1 provides an abstract view on the implementation of our framework. It takes as input an ontology KB, a set of secrets S , a set of metarules MR and the user’s background knowledge BK. The output is the set of axioms that constitute a secure ontology view for the user. The set M M and MG form a partition of MR according to rule types (ground or containing metavariables). Remark 2. By standard logic programming techniques, a minimal PKB ✓ PAX 1 sat- isfying the set of metarules and the constraints K + can be obtained with the following PTIME construction: [ PKB0 = K + , PKBi+1 = PKBi [ { head(r) | r 2 ground PKBi (MR) ^ body(r) ✓ Cn(PKBi ) } The sequence’s limit PKB|PAX 1 | satisfy hK + , K i as well if Cn(PKB|PAX 1 | ) \ K , ;. Then, for all s 2 S , s 2 CnM (K + , K ) holds i↵ s 2 Cn(PKB|PAX 1 | [ BK). For more details refer to [7]. By iterating over the axioms ↵ 2 PAX 1 (line 6-25), PKB collects at each step all parts of PAX 1 that can be revealed to the user. The repeat-until loop (lines 9-17) com- 0 putes the deductive closure PKB of PKB under MR10 . In particular, for every ground metarule (lines 10-13) we execute a SPARQL ASK query (hidden in line 11) to verify 0 if its body is entailed by the current PKB . For every metarule containing metavariables (lines 14-16) we execute a SPARQL SELECT query (encoded in line 15) in order to obtain all possible bindings of the metavariables that satisfy the metarule’s body. The pair of steps described above is iterated until a fixpoint is reached (no elements are 0 0 added to PKB (line 17)). At this point the condition Cn(PKB ) \ Ki , ; is checked (line 18). We are now ready to determine the value of the censor function for ↵. We verify that no secret is entailed from the minimal PKB (line 19) taking in consideration 9 To be plugged into Jena a reasoner must expose Jena API. 10 The result of Proposition 2 guarantees that considering only the minimal PKB is sound. 157 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality Algorithm 1: Data: KB, S , MR, BK. 1 Ki+ , Ki ;; 2 MM {ri |ri 2 MR and ri metarule containing metavariables}; 3 MG {ri |ri 2 MR and ri ground metarule}; S 4 PAX 1 {↵ 2 KB [ r2groundKB (MR) head(r)}; 5 PKB ;; 6 forall ↵ 2 PAX 1 do 7 PKB0 PKB [ {↵}; 8 MG0 MG ; 9 repeat 10 forall m 2 MG0 do 11 if PKB0 |= body(m) then 12 PKB0 PKB0 [ {head(m)}; 13 MG0 MG0 \ {m}; 14 forall m 2 M M do 15 forall (a0 , . . . , an ) | PKB0 |= body(m, [X0 /a0 , . . . , Xn /an ]) do 16 PKB0 PKB0 [ {head(m, [X0 /a0 , . . . , Xn /an ])}; 17 until No element is added to PKB0 ; 18 if { 2 Ki | PKB0 |= } = ; then 19 if {s 2 S | PKB0 [ BK |= s} = ; then 20 if KB |= ↵ then 21 Ki+ Ki+ [ {↵}; 22 PKB PKB0 ; 23 MG MG0 ; 24 else 25 Ki Ki [ {↵}; 26 return Ki+ the background knowledge11 . In case ↵ is entailed by KB, it is safe to include it in the view (line 21). Otherwise, the set Ki is updated (line 25). Note that we need an OWL reasoner in order to perform the entailment checks in lines 18-20. We make use of the incremental reasoner Pellet, that for each ↵i in the enumeration of PAX 1 , is expected to restrict reasoning to the new inferences triggered by ↵ without repeating the inferences that involve only Ki+ 1 . A first optimization regards the evaluation of the set of ground rules MG . During the construction of PKB, due to the monotonicity of reasoning, at each iteration we can safely remove from MG all the ground rules already satisfied at the previous iterations (line 13,23). Another optimization concerns the evaluation order of PAX1 . Checking Cn(PKB|PAX 1 | )\K , ; (line 18) is time consuming, so we adopt an approach that main- 11 This corresponds to check whether s 2 CnM (K + [ {↵}, K ) only. The condition s 2 CnM (K + , K [ {↵}) is in fact embedded in line 18. 158 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality tain the set Ki as small as possible. This is achieved processing all {↵ 2 PAX 1 | ↵ 2 KB} in line 6 first. Provided that the condition in line 19 is vacuously satisfied we are sure that the Ki remains empty. Experimental analysis show that the generation of secure views for medium sized ontologies may take several minutes. We plan to investigate module extraction tech- niques that are expected to improve drastically the execution time by restricting the part of the knowledge base on which metarules apply. 10 Related work Baader et al. [2], Eldora et al. [11], and Knechtel and Stuckenschmidt [13] attach se- curity labels to axioms and users to determine which subset of the KB can be used by each subject. These works are instances of the SCM so they are potentially vulnerable to the attacks based on background knowledge; this holds in particular for [13] that pur- sues the construction of maximal secure views. Moreover, in [2, 11] axiom labels are not derived from the set of secrets; knowledge engineers are responsible for checking ex post that no confidential knowledge is entailed; in case of leakage, the view can be modified with a revision tool based on pinpointing. On the contrary, our mechanism automatically selects which axioms shall be hidden in order to produce a secure view. Chen and Stuckenschmidt [8] adopt an instance of the SACM based on removing some individuals entirely. In general, this may be secure against metaknowledge attacks (cf. Ex. 5). However, no methodology is provided for selecting the individuals to be re- moved given a target set of secrets. In [3], KB is partitioned into a visible part KBv and a hidden part KBh . Conceptually, this is analogous to axiom labelling, cf. the above ap- proaches. Their confidentiality methodology seems to work only under the assumption that the signatures of KBv and KBh are disjoint, because in strong safety they do not consider the formulae that are implied by a combination of KBv and KBh . Surely the axioms of KBh whose signature is included in the signature of KBv cannot be protected, in general. A partition-based approach is taken in [10], too. It is not discussed how to select the hidden part KBh given a set of target secrets (which includes the issue of deciding secondary protection). Similarly, in [14] only ex-post confidentiality verification methods are provided. In their model the equivalent of PKB is the set of all knowledge bases that include a given set of publicly known axioms S ✓ KB; consequently, in some cases their verification method is vulnerable to the attacks to complete knowledge, that are based on more complex (conditional) metaknowledge (cf. Example 2 and Example 5) that cannot be encoded in their framework. Cuenca Grau and Horrocks [9] investigate knowledge confidentiality from a prob- abilistic perspective: enlarging the public view should not change the probability dis- tribution over the possible answers to a “sensitive query” Q that represents the set of secrets. In [9] users can query the knowledge base only through a pre-defined set of views (we place no such restriction, instead). A probability distribution P over the set of knowledge bases plays a role similar to metaknowledge. However, their confiden- tiality condition allows P to be replaced with a di↵erent P0 after enlarging the public 159 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality view, so at a closer look P does not really model the user’s a priori knowledge about the knowledge base (that should remain constant), di↵erently from our PKB. Our method is inspired by the literature on controlled query evaluation (CQE) based on lies and/or refusals ([4, 5, 6] etc). Technically we use lies, because rejected queries are not explicitly marked (the cited papers use the special answer “mum”). However, our censor resembles the classical refusal censor, so the properties of fM are not subsumed by any of the classical CQE methods. For example (unlike the CQE approaches that use lies), fM (KB, u) encodes only correct knowledge (i.e. entailed by KB), and it is secure whenever users do not initially know any secret (while lies-based CQE further require that no disjunction of secrets should be known a priori). Unlike the refusal method, fM can handle cover stories because users are not told that some queries are obfuscated; as an additional advantage, our method needs not to adapt existing engines to handle nonstandard answers like mum. Finally, the CQE approaches do not deal specifically with DL knowledge bases, metaknowledge, and related complexity analysis. 11 Conclusions The confidentiality preservation methods that do not consider background knowledge are vulnerable to several attacks. We identified two vulnerabilities (attacks to complete knowledge and to the signature) and introduced a knowledge base confidentiality model that can detect these vulnerabilities, based on a fully generic formalization of object- and meta-level background knowledge. Confidentiality is enforced through a generic mechanism for constructing secure views (the filtering fM ) that is provably secure w.r.t. the meta-confidentiality model under a continuity assumption, and generalizes a few previous approaches (cf. Thm. 6 and Ex. 5). In order to compute secure views in prac- tice we introduced a safe, generic method for approximating background knowledge, together with a specific rule-based language for expressing metaknowledge. Based on this instantiation of the general framework, where fM is always secure, we analyzed the computational complexity of computing secure views. If the underlying DL is tractable, then in the simplest case fM can be computed in polynomial time. The number of variables in metarules and the adoption of a more secure approximation (PAX 0 ) may increase complexity up to PNP = 2p and perhaps 3p . The complexity of non-Horn metarules, however, can be avoided by replacing each non-Horn r with one of its Horn strengthenings: body(r) ) ↵ such that ↵ 2 head(r). This approximation is safe (be- cause it restricts PKB), and opens the way to a systematic use of the low-complexity bk-models based on PAX 1 and Horn metarules. For the many ExpTime-complete DL, secure view computation does not increase asymptotic complexity. So far, the best up- per complexity bound for computing secure views in the description logic underlying OWL DL (i.e. SROIQ(D)) is coNPN2ExpT ime . Finally, we have provided a prototype implementation of the low-complexity frame- works based on PAX 1 and Horn metarules using incremental engine versions available for Pellet and ELK to avoid repeated classifications in the iterative construction of fM . Metarule bodies are evaluated with SPARQL. Secure views are constructed o↵-line, so no overhead is placed on user queries, and this approach is expected to be applicable in practice. 160 P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality Bibliography [1] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors. The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge University Press, 2003. [2] F. Baader, M. Knechtel, and R. Peñaloza. A generic approach for large-scale ontological reasoning in the presence of access restrictions to the ontology’s axioms. In International Semantic Web Conference, pages 49–64, 2009. [3] J. Bao, G. Slutzki, and V. Honavar. Privacy-preserving reasoning on the semantic web. In Web Intelligence, pages 791–797. IEEE Computer Society, 2007. [4] J. Biskup and P. A. Bonatti. Lying versus refusal for known potential secrets. Data Knowl. Eng., 38(2):199–222, 2001. [5] J. Biskup and P. A. Bonatti. Controlled query evaluation for enforcing confidentiality in complete information systems. Int. J. Inf. Sec., 3(1):14–27, 2004. [6] J. Biskup and P. A. Bonatti. Controlled query evaluation for known policies by combining lying and refusal. Ann. Math. Artif. Intell., 40(1-2):37–62, 2004. [7] P. A. Bonatti and L. Sauro. A confidentiality model for ontologies. In International Seman- tic Web Conference (1), pages 17–32, 2013. [8] W. Chen and H. Stuckenschmidt. A model-driven approach to enable access control for on- tologies. In H. R. Hansen et al., editor, Wirtschaftsinformatik, volume 246 of books@ocg.at, pages 663–672. Österreichische Computer Gesellschaft, 2009. [9] B. Cuenca Grau and I. Horrocks. Privacy-preserving query answering in logic-based in- formation systems. In M. Ghallab, C. D. Spyropoulos, N. Fakotakis, and N. M. Avouris, editors, ECAI, volume 178 of Frontiers in Artificial Intelligence and Applications, pages 40–44. IOS Press, 2008. [10] B. Cuenca Grau and B. Motik. Importing ontologies with hidden content. In B. Cuenca Grau et al., editor, Description Logics, volume 477 of CEUR Workshop Proceedings. CEUR- WS.org, 2009. [11] Eldora, M. Knechtel, and R. Peñaloza. Correcting access restrictions to a consequence more flexibly. In R. Rosati, S. Rudolph, and M. Zakharyaschev, editors, Description Logics, volume 745 of CEUR Workshop Proceedings. CEUR-WS.org, 2011. [12] P. Hitzler and T. Lukasiewicz, editors. Web Reasoning and Rule Systems - 4th Int. Confer- ence, RR 2010., volume 6333 of Lecture Notes in Computer Science. Springer, 2010. [13] M. Knechtel and H. Stuckenschmidt. Query-based access control for ontologies. In Hitzler and Lukasiewicz [12], pages 73–87. [14] P. Stouppa and T. Studer. Data privacy for knowledge bases. In S. N. Artëmov and A. Nerode, editors, LFCS, volume 5407 of LNCS, pages 409–421. Springer, 2009. [15] J. Tao, G. Slutzki, and V. Honavar. Secrecy-preserving query answering for instance check- ing in EL. In Hitzler and Lukasiewicz [12], pages 195–203. 161