A mechanism for ontology confidentiality
                        P. A. Bonatti, I. M. Petrova and L. Sauro

                Dept. of Electrical Engineering and Information Technologies
                              Università di Napoli “Federico II”


       Abstract. We illustrate several novel attacks to the confidentiality of knowledge
       bases (KB). Then we introduce a new confidentiality model, sensitive enough to
       detect those attacks, and a method for constructing secure KB views.We identify
       safe approximations of the background knowledge exploited in the attacks; they
       can be used to reduce the complexity of constructing secure KB views. Finally
       we describe a prototype implementation of the new approach that suggests its
       applicability in practice.


1   Introduction
Ontology languages and Linked Open Data are increasingly being used to encode the
private knowledge of companies and public organizations. Semantic Web techniques
make possible to merge di↵erent sources of knowledge and extract implicit information,
putting on risk security and privacy of individuals. Even the authors of public ontolo-
gies may want to hide some axioms to capitalize on their formalization e↵orts. Several
approaches have been proposed in order to tackle the confidentiality requirements that
arise form these scenarios. The most popular security criterion is that the published view
of the knowledge base should not entail a secret sentence. However, there exist attacks
that cannot be prevented this way. The user may exploit various sources of background
knowledge and metaknowledge to reconstruct the hidden part of the knowledge base.
This paper contributes to the area of knowledge base confidentiality in several ways:
     (i) It highlights some vulnerabilities of the approaches that can be found in the
literature, (Sec. 3).
     (ii) It introduces a stronger confidentiality model that takes both object-level and
meta-level background knowledge into account (Sec. 4), and it defines a method for
computing secure knowledge views (Sec. 5) that generalizes some previous approaches.
     (iii) It proposes a safe approximation of background metaknowledge (Sec. 6 and 7).
     (iv) It investigates the computational complexity of constructing secure knowledge
base views with our methodology (Sec. 7).
     (v) It describes a prototypical implementation of the new framework (Sec. 9)
     The paper is closed by a discussion of related work (Sec. 10), and conclusions.
Proofs are omitted due to space limitations.

2   Preliminaries on Description Logics
We assume the reader to be familiar with description logics, and refer to [1] for all def-
initions and results. We assume a fixed, denumerable signature ⌃ specifying the names
of concepts, roles, and individuals. Our framework is compatible with any description


                                         147
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    logic DL that enjoys compactness (needed by Theorem 6) and has decidable reasoning
    problems (e.g., ALC, EL, SHIQ, etc.). We simply assume that our reference logical
    language L is generated from ⌃ by the grammar of the selected logic DL. By axioms,
    we mean members of L, unless stated otherwise. A knowledge base is any subset of L.1
         Recall that axioms are expressions of the form C v D, R v S , C(a), and R(a, b)
    where C, D are concept expressions, R, S are role expressions, and a, b are individual
    constants. In some DL, an individual constant a may occur also in a nominal, that is,
    a concept expression {a} denoting the singleton containing a. The axioms involving v
    are called inclusions (or subsumptions), while C(a) and R(a, b) are called assertions. In
    the simplest case, C and R are first order predicates and assertions are actually standard
    first-order atomic formulae. Inclusions are syntactic variants of logical implications.
         The notion of logical consequence is the classical one; for all K ✓ L, the logical
    consequences of K will be denoted by Cn(K) (K ✓ Cn(K) ✓ L).

    3      A simple confidentiality model
    The most natural way of preserving confidentiality in a knowledge base KB is checking
    that its answers to user queries do not entail any secret. Conceptually, the queries of a
    user u are answered using u’s view KBu of the knowledge base, where KBu is a maximal
    subset of KB that entails no secret. In order to illustrate some possible attacks to this
    mechanism, let us formalize the above simple confidentiality model (SCM).2 It consists
    of: the knowledge base KB (KB ✓ L); a set of users U; a view KBu ✓ KB for each
    u 2 U; a set of secrecies S u ✓ L for each u 2 U. Secrecies are axioms that may or may
    not be entailed by KB; if they do, then they are called secrets and must not be disclosed
    to u. Revealing that a secrecy is not entailed by KB is harmless, cf. [4].
        A view KBu is secure i↵ Cn(KBu ) \ S u = ;. A view KBu is maximal secure if it is
    secure and there exists no K such that KBu ⇢ K ✓ KB and Cn(K) \ S u = ;.
    Attacks using object-level background knowledge. Frequently, part of the domain
    knowledge is not axiomatized in KB, therefore checking that Cn(KBu )\S u = ; does not
    suffice in practice to protect confidentiality. For example, suppose that there is one secret
    S u = {OncologyPatient(John)} and KBu = {SSN(John, 12345), SSN(user123, 12345),
    OncologyPatient(user123)}. KBu does not entail OncologyPatient(John), so according
    to the SCM model KBu is secure. However, it is common knowledge that a SSN uniquely
    identifies a person, then the user can infer that John = user123, and hence the secret.
         In other examples, the additional knowledge used to infer secrets may be stored in
    a public ontology or RDF repository, and confidentiality violations may be automated.
    Attacks to complete knowledge. Suppose the attacker knows that KB has complete
    knowledge about a certain set of axioms. Then the attacker may be able to reconstruct
    some secrets from the “I don’t know” answers of a maximal secure view KBu .

    Example 1. Consider a company’s knowledge base that defines a concept Employee and
    a role works for that describes which employees belong to which of the n departments

     1
         Real knowledge bases are finite, but this restriction is not technically needed until Sec. 7.
     2
         This usage of term “model” is common in Security & Privacy.


                                                 148
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    of the company, d1 , . . . , dn . The KB consists of assertions like:

                        Employee(e)             (1)                   works for(e, di )   (2)

    where we assume that each employee e belongs to exactly one department di . A user
    u is authorized to see all assertions but the instances of (2) with i = n, because dn is a
    special department, devoted to controlling the other ones. So S u (the set of secrecies for
    u) is the set of all assertions works for(e, dn ).
        Note that there is one maximal secure view KBu . It consists of all instances of (1),
    plus all instances of (2) such that i , n. Clearly, KBu is secure according to SCM
    (because Cn(KBu ) \ S u = ;). However, observe that works for(e, dn ) 2 Cn(KB) i↵
    Employee(e) 2 Cn(KBu ) and for all i = 1, . . . , n, works for(e, di ) < Cn(KBu ) (that is, the
    members of dn are all the employees that apparently work for no department). Using this
    property (based on the knowledge that for each employee e, KB contains exactly one
    assertion works for(e, di )) and the knowledge of the protection mechanism (i.e. maxi-
    mal secure views), that we assume to be known by attackers by Kercho↵’s principle, a
    smart user can easily identify all the members of dn .                                        t
                                                                                                  u

        In practice, it is not hard to identify complete knowledge. A hospital’s KB is ex-
    pected to have complete knowledge about which patients are in which ward; a com-
    pany’s KB is likely to encode complete information about its employees, etc.
        Some approaches filter query answers rather than publishing a subset of KB [8,
    13, 15]. We call our abstraction of this method simple answer confidentiality model
    (SACM). It is obtained from the SCM by replacing the views KBu ✓ KB with answer
    views KBau ✓ Cn(KB). The di↵erence is that KBau is not required to be a subset of KB
    and—conceptually—KBau may be infinite. KBau is secure i↵ Cn(KBau ) \ S u = ;.
        The reader may easily verify that the SACM is vulnerable to the two kinds of attacks
    illustrated for the SCM. It is also vulnerable to a third kind of attacks, illustrated below.
    Attacks to the signature. Suppose the user knows the signature of KB well enough to
    identify a symbol that does not occur in KB. First assume that is a concept name.
    It can be proved that:

    Proposition 1. If KBau is a maximal secure answer view and is a concept name not
    occurring in KB, then for all secrecies C v D 2 S u , KBau |= C u v D i↵ KB |= C v D.

    The problem is that although C u v D does not entail the secret inclusion C v D,
    still a smart user knows that the former inclusion cannot be proved unless KB entails
    also the latter (then maximal secure answer views generally fail to protect secrets). This
    attack can be easily adapted to the case where is a role name. In practice, it is not
    necessary to be sure that does not occur in KB. The attacker may make a sequence of
    educated guesses (say, by trying meaningless long strings, or any word that is clearly
    unrelated to the domain of the KB); after a sufficient number of trials, the majority of
    answers should agree with the “real” answer with high probability. Rejecting queries
    whose signature is not contained in KB’s signature mitigates this kind of attacks but it
    leaks KB’s signature and it does not provide a complete solution. The attacker may still
    guess a which is logically unrelated to C and D and carry out a similar attack.


                                                149
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    4    A meta-safe confidentiality model
    In this section we introduce a confidentiality model that makes the vulnerabilities illus-
    trated above visible, by taking into account object- and meta-level background knowl-
    edge. A bk-model M = hKB, U, f, hS u , PKBu , BK u iu2U i consists of a knowledge base
    KB ✓ L, a set of users U, plus:
     – a filtering function f : }(L) ⇥ U ! }(L), mapping each knowledge base K and
        each user u on a view f (K, u) ✓ Cn(K);
     – for all u 2 U:
          • a finite set of secrecies S u ✓ L;
          • a set of axioms BK u ✓ L, encoding the users’ object-level knowledge;
          • a set of possible knowledge bases PKBu ✓ }(L) (users’ metaknowledge).3
    The view of KB released to a user u is f (KB, u). We adopt PKB because at this stage
    we do not want to tie our framework to any specific metalanguage. PKB represents the
    knowledge bases that are compatible with the user’s metaknowledge.
    Definition 1. A filtering function f is secure (w.r.t. M) i↵ for all u 2 U and all s 2 S u ,
    there exists K 2 PKBu such that:
     1. f (K, u) = f (KB, u);
     2. s < Cn(K [ BK u ).
    Intuitively, if f is safe according to Def. 1, then no user u can conclude that any secret
    s is entailed by the KB she is interacting with—enhanced with the object-level back-
    ground knowledge BK u —for the following reasons: By point 1, KB and K have the
    same observable behavior, and K is a possible knowledge base for u since K 2 PKBu ;
    therefore, as far as u knows, the knowledge base might be K. Moreover, by point 2, K
    and the object-level background knowledge BK u do not suffice to entail the secret s.
         In the rest of the paper we tacitly assume that no secret is violated a priori, that is,
    for all secrets s 2 S u there exists K 2 PKBu such that s < Cn(K [ BK u ).4 Moreover,
    in order to improve readability, we shall omit the user u from subscripts and argument
    lists whenever u is irrelevant to the context.
         The attacks discussed in Section 3 can be easily formalized in this setting; so, in
    general, the maximal secure views of SCM are not secure according to Def. 1.
    Example 2. Example 1 can be formalized in our model as follows: The set of secrecies
    S is the set of all assertions works for(e, dn ); BK = ; and PKB is the set of all the
    knowledge bases K that consist of assertions like (1) and (2), and such that for each
    axiom Employee(e), K contains exactly one corresponding axiom works for(e, di ) and
    viceversa. The filtering function f maps each K 2 PKB on the maximal subset of K that
    entails none of S ’s members, that is, f (K) = K \ S (by definition of PKB).
        Note that f is injective over PKB, so condition 1 of Def. 1 is satisfied only if K =
    KB. So, if KB contains at least one secret, then the conditions of Def. 1 cannot be satis-
    fied, that is, maximal secure SCM views are not secure in our model. Indeed, KB can be
     3
       In practice, bk-models are finite, and filterings computable, but no such assumption will be
       technically needed until Sec. 7.
     4
       Conversely, no filtering function can conceal a secret that is already known by the user.


                                                150
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    reconstructed from the secure view by observing that KB = f (KB) [ {works for(e, dn ) |
    Employee(e) 2 f (KB) ^ 8i = 1, . . . , n, works for(e, di ) < f (KB)}.              t
                                                                                        u

    Similarly, the formalizations of the other attacks yield injective filtering functions (the
    details are left to the reader).

    5      A meta-secure query answering mechanism
    In this section we introduce a secure filtering function. It is formulated as an itera-
    tive process based on a censor, that is a boolean function that decides for each axiom
    whether it should be obfuscated to protect confidentiality. The iterative construction
    manipulates pairs hX + , X i 2 }(L) ⇥ }(L) that represent a meta constraint on possible
    knowledge bases: we say that a knowledge base K satisfies hX + , X i i↵ K entails all the
    sentences in X + and none of those in X (formally, Cn(K) ◆ X + and Cn(K) \ X = ;).
        Let PAX (the set of possible axioms) be the set of axioms that may occur in the
                                                                                   S
    knowledge base according to the user’s knowledge, i.e. PAX = K 0 2PKB K 0 . Let ⌫ =
    |PAX| + 1 if PAX is finite and ⌫ = ! otherwise; let ↵1 , ↵2 , . . . , ↵i , . . . be any enumeration
    of PAX (i < ⌫).5 The secure view construction for a knowledge base K in a bk-model
    M consists of the following, inductively defined sequence of pairs hKi+ , Ki ii 0 :

      – hK0+ , K0 i = h;, ;i , and for all 1  i < ⌫ , hKi+1
                                                          +
                                                             , Ki+1 i is defined as follows:
          • if censorM (Ki , Ki , ↵i+1 ) = true then let hKi+1
                              +                                +
                                                                   , Ki+1 i = hKi+ , Ki i ;
          • if censorM (Ki , Ki , ↵i+1 ) = f alse and K |= ↵i+1 then
                              +

             hKi+1
                 +
                    , Ki+1 i = hKi+ [ {↵i+1 }, Ki i;
          • otherwise let hKi+1 +
                                   , Ki+1 i = hKi+ , Ki [ {↵i+1 }i .
                        S                 S
    Finally, let K + = i<⌫ Ki+ , K = i<⌫ Ki , and fM (K, u) = K + .
        Note that the inductive construction aims at finding maximal sets K + and K that
    (i) partly describe what does / does not follow from K (as K satisfies hK + , K i by
    construction), and (ii) do not trigger the censor (the sentences ↵i+1 that trigger the censor
    are included neither in K + nor in K , cf. the induction step).
        In order to define the censor we need an auxiliary definition that captures all the sen-
    tences that can be entailed with the background knowledge BK and the meta-knowledge
    PKB enriched by a given constraint hX + , X i analogous to those adopted in the iterative
    construction: Let CnM (X + , X ) be the set of all axioms ↵ 2 L such that

                 for all K 0 2 PKB such that K 0 satisfies hX + , X i, ↵ 2 Cn(K 0 [ BK) .          (3)

     Now the censor is defined as follows: For all X + , X ✓ L and ↵ 2 L,
                                  8
                                  >
                                  >
                                  > true if there exists s 2 S s.t. either s 2 CnM (X + [ {↵}, X )
                                  <
            censorM (X , X , ↵) = >
                       +
                                  >       or s 2 CnM (X + , X [ {↵});                              (4)
                                  >
                                  : false otherwise.

    In other words, the censor checks whether telling either that ↵ is derivable or that ↵ is
    not derivable to a user aware that the knowledge base satisfies hX + , X i, restricts the
     5
         We will show later how to restrict the construction to finite sequences, by approximating PAX.


                                                151
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    set of possible knowledge bases enough to conclude that a secret s is entailed by the
    knowledge base and the background knowledge BK.
        Note that the censor obfuscates ↵i+1 if any of its possible answers entail a secret,
    independently of the actual contents of K (the two possible answers “yes” and “no”
    correspond to conditions s 2 CnM (X + [ {↵}, X ) and s 2 CnM (X + , X [ {↵}), respec-
    tively). In this way, roughly speaking, the knowledge bases that entail s are given the
    same observable behavior as those that don’t. Under a suitable continuity assumption
    on CnM , this enforces confidentiality:
                                        S
    Theorem 1. If CnM (KB+ , KB ) ✓ i<⌫ CnM (KB+i , KBi ), then fM is secure w.r.t. M.
    Examples of the behavior of fM are deferred until Sec.7.


    6    Approximating background knowledge
    Of course, the actual confidentiality of a filtering f (KB, u) depends on a careful defini-
    tion of the user’s background knowledge, that is, PKBu and BK u . If background knowl-
    edge is not exactly known, as it typically happens, then it can be safely approximated by
    overestimating it. More background knowledge means larger BK u and smaller PKBu ,
    which leads to the following comparison relation k over bk-models:
    Definition 2. Given two bk-models M = hKB, U, f, hS u , PKBu , BK u iu2U i and M0 =
    hKB0 , U 0 , f 0 , hS u0 , PKB0u , BK 0u iu2U 0 i, we write M k M0 i↵
     1. KB = KB0 , U = U 0 , f = f 0 , and S u = S u0 (for all u 2 U);
     2. for all u 2 U, PKBu ◆ PKB0u and BK u ✓ BK 0u .
    The next proposition proves that a bk-model M can be safely approximated by any M0
    such that M k M0 :
    Proposition 2. If f is secure w.r.t. M0 and M k M0 , then f is secure w.r.t. M.
    Consequently, a generic advice for estimating BK consists in including as many pieces
    of relevant knowledge as possible, for example:
        (i) modelling as completely as possible the integrity constraints satisfied by the data,
    as well as role domain and range restrictions and disjointness constraints;
        (ii) including in BK all the relevant public sources of formalized relevant knowledge
    (such as ontologies and triple stores).
        While object-level background knowledge is dealt with in the literature, the general
    metaknowledge encoded by PKB is novel. Therefore, the next section is focussed on
    some concrete approximations of PKB and their properties.


    7    Approximating and reasoning about possible knowledge bases
    In this section, we investigate the real world situations where the knowledge base KB is
    finite and so are all the components of bk-models (U, S u , BK u , PKBu ); then we focus
    on PKBu that contain only finite knowledge bases. Consequently, fM will turn out to be
    decidable and we will study its complexity under di↵erent assumptions.


                                                152
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


       A language for defining PKB is a necessary prerequisite for the practical imple-
    mentation of our framework and a detailed complexity analysis of fM . Here we express
    PKB as the set of all theories that are contained in a given set of possible axioms PAX 6
    and satisfy a given, finite set MR of metarules like:

                              ↵1 , . . . , ↵ n )   1 | ... |   m   (n   0, m    0) ,                (5)

    where all ↵i and j are in L (1  i  n, 1  j  m). Informally, (5) means that if
    KB entails ↵1 , . . . , ↵n then KB entails also some of 1 , . . . , m . Sets of similar metarules
    can be succintly specified using metavariables; they can be placed wherever individual
    constants may occur, that is, as arguments of assertions, and in nominals. A metarule
    with such variables abbreviates the set of its ground instantiations: Given a K ✓ L,
    let ground K (MR) be the ground (variable-free) instantiation of MR where metavariables
    are uniformly replaced by the individual constants occurring in K in all possible ways.
                                  n                  o
    Example 3. Let MR = 9R.{X} ) A(X) , where X is a metavariable, and let K =
    n        o                            n                                         o
      R(a, b) . Then ground K (MR) = (9R.{a} ) A(a)), (9R.{b} ) A(b)) .                            t
                                                                                                   u

    If r denotes rule (5), then let body(r) = {↵1 , . . . , ↵n } and head(r) = { 1 , . . . , m }. We say
    r is Horn if |head(r)|  1. A set of axioms K ✓ L satisfies a ground metarule r if either
    body(r) * Cn(K) or head(r) \ Cn(K) , ;. In this case we write K |=m r.

    Example 4. Let A, B, C be concept names and R be a role name. The axiom set K =
    {A v 9R.B, A v C} satisfies A v 9R ) A v B | A v C but not A v 9R ) A v B.    t
                                                                                  u

    Moreover, if K satisfies all the metarules in ground K (MR) then we write K |=m MR.
    Therefore the formal definition of PKB now becomes:

                                  PKB = {K | K ✓ PAX ^ K |=m MR} .                                  (6)

       In accordance with Prop. 2, we approximate PAX in a conservative way. We will
    analyze two possible definitions:

     1. PAX 0 = KB (i.e., as a minimalistic choice we only assume that the axioms of KB
        are possible axioms; of course, by Prop. 2, this choice is safe also w.r.t. any larger
        PAX where at least the axioms of KB are regarded as possible axioms);
                       S
     2. PAX 1 = KB [ r2groundKB (MR) head(r).

    Remark 1. The latter definition is most natural when metarules are automatically ex-
    tracted from KB with rule mining techniques, that typically construct rules using mate-
    rial from the given KB (then rule heads occur in KB).

    Example 5. Consider again Example 1. The user’s metaknowledge about KB’s com-
    pleteness can be encoded with:

                         Employee(X) ) works for(X, d1 ) | . . . | works for(X, dn ) ,              (7)
     6
         Di↵erently from Sec. 5, here PKB is defined in terms of PAX.


                                                   153
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    where X is a metavariable. First let PAX = PAX 1 . The secure view fM (KB) depends on
    the enumeration order of PAX. If the role assertions works for(e, di ) precede the con-
    cept assertions Employee(e), then, in a first stage, the sets KB+j are progressively filled
    with the role assertions with di , dn that belong to KB, while the sets KB j accumulate
    all the role assertions that do not belong to KB. In a second stage, the sets KB+j are
    further extended with the concept assertions Employee(e) such that e does not work for
    dn . The role assertions works for(e, dn ) of KB and the corresponding concept assertions
    Employee(e) are neither in KB+ nor in KB . Note that the final e↵ect is equivalent to
    removing from KB all the axioms referring to the individuals that work for dn . Analo-
    gously, in [8] the individuals belonging to a specified set are removed from all answers.
         Next suppose that the role assertions works for(e, di ) follow the concept assertions
    Employee(e), and that each works for(e, di ) follows all works for(e, dk ) such that k < i.
    Now all the assertions Employee(e) of KB enter KB+ , and all axioms works for(e, di )
    with i < n 1 enter either KB+ or KB , depending on whether they are members of KB
    or not. Finally, the assertions works for(e, di ) 2 Cn(KB) with i 2 {n 1, n} are inserted
    neither in KB+ nor in KB , because the corresponding instance of (7) with X = e has the
    body in KB+ and the first n 2 alternatives in the head in KB , therefore a negative an-
    swer to works for(e, dn 1 ) would entail the secret works for(e, dn ) by (7). This triggers
    the censor for all assertions works for(e, dn 1 ). Summarizing, with this enumeration or-
    dering it is possible to return the complete list of employees; the members of dn are
    protected by hiding also which employees belong to dn 1 .
         Finally, let PAX = PAX 0 . Note that in this case all possible knowledge bases are sub-
    sets of KB, that contains exactly one assertion works for(e, di(e) ) for each employee e. To
    satisfy (7), every K 2 PKB containing Employee(e) must contain also works for(e, di(e) ).
    It follows that fM must remove all references to the individuals that work for dn , as it
    happens with the first enumeration of PAX 1 .                                              t
                                                                                               u

    Definition 3. A bk-model M is canonical if for all users u 2 U, PAX u is either PAX 0
    or PAX 1 and PKBu is defined by (6) for a given MRu . Moreover, M is in a description
    logic DL if for all u 2 U, all the axioms in KB, PKBu , BK u , and S u belong to DL.

         The size of PAX 0 and PAX 1 is polynomial in the size of KB [ MR, therefore PKB is
    finite and exponential in the size of KB [ MR. Finiteness implies the continuity hypoth-
    esis on CnM of Theorem 1, and hence (using Theorem 1 and Prop. 2):

    Theorem 2. If M is canonical, then fM is secure with respect to all M0 k M.

       First we analyze the complexity of constructing the secure view fM (KB) when the
    underlying description logic is tractable, like EL and DL-lite for example.

    Lemma 1. If the axioms occurring in MR and K are in a DL with tractable subsump-
    tion and instance checking, then checking K |=m MR is:
     1. in P if either MR is ground or there exists a fixed bound on the number of distinct
        variables in MR;
     2. coNP-complete otherwise.

    With Lemma 1, one can prove the following two lemmas.


                                                154
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    Lemma 2. Let M range over canonical bk-models. If M, s, X + , and X are in a DL
    with tractable subsumption/instance checking, and the number of distinct variables in
    MR is bounded by a constant, then checking whether s 2 CnM (X + , X ) is:

     1. in P if MR is Horn and PAX = PAX 1 ;
     2. coNP-complete if either MR is not Horn or PAX = PAX 0 .

    Lemma 3. Let M be a canonical bk-model. If M, s, X + , and X are in a DL with
    tractable entailment problems, and there is no bound on the number of variables in the
    metarules of MR, then checking s 2 CnM (X + , X ) is:

     1. in PNP if MR is Horn and PAX = PAX 1 ;
     2. in ⇧2p if either MR is not Horn or PAX = PAX 0 .

    The value of censor(X + , X , ↵) can be computed straightforwardly by iterating the tests
    s 2 CnM (X + [ {↵}, X ) and s 2 CnM (X + , X [ {↵}) for all secrets s 2 S . Since the set of
    secrets is part of the parameter M of the filtering function, the number of iterations is
    polynomial in the input and the complexity of the censor is dominated by the complexity
    of CnM (). The latter is determined by Lemma 2 and Lemma 3, so we immediately get:

    Corollary 1. Let M be a canonical bk-model and suppose that M, X + , X , and ↵ are
    in a DL with tractable entailment problems. If the number of distinct variables in MR is
    bounded by a constant, then computing censor(X + , X , ↵) is:

      – in P if MR is Horn and PAX = PAX 1 ;
      – coNP-complete if either MR is not Horn or PAX = PAX 0 .

    If there is no bound on the number of variables in the metarules of MR, then computing
    censor(X + , X , ↵) is:

      – in PNP if MR is Horn and PAX = PAX 1 ;
      – in ⇧2p if either MR is not Horn or PAX = PAX 0 .

        We are now ready to analyze the complexity of filtering functions:

    Theorem 3. If M is a canonical bk-models in a DL with tractable entailment problems,
    then computing fM (KB) is:

     1. P-complete if the number of distinct variables in the rules of MR is bounded, MR is
        Horn, and PAX = PAX 1 ;
     2. PNP -complete if the number of distinct variables in MR is bounded, and either MR
        is not Horn or PAX = PAX 0 ;
     3. in PNP if the variables in MR are unbounded, MR is Horn, and PAX = PAX 1 ;
     4. in 3p if MR is not restricted and PAX 2 {PAX 0 , PAX 1 }.

    Theorem 4. Computing fM (KB) over canonical M in a DL with ExpTime entailment
    (e.g. ALCQO, ALCIO, ALCQI, SHOQ, SHIO, SHIQ), is still in ExpTime.

    Theorem 5. Computing fM (KB) over canonical M in SROIQ(D) is in coNPN2ExpT ime .


                                                155
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    8      Relationships with the SCM
    Here we show that the meta-secure framework is a natural generalization of the SCM.
    The main result—roughly speaking—demonstrates that the SCM model can be essen-
    tially regarded as a special case of our framework where PKB ◆ }(KB) and BK = ;. In
    this case fM is secure even if M is not assumed to be canonical.

    Theorem 6. Let M = hKB, U, fM , hS u , PKBu , BK u iu2U i. If PKB = }(KB), BK = ;, and
    KB is finite, then
                            S
     1. CnM (KB+ , KB ) = i<⌫ CnM (KB+i , KBi ).
     2. For all enumerations of PAX, the corresponding fM (KB, u) is logically equivalent
        to a maximal secure view KBu of KB according to the SCM; conversely, for all
        maximal secure view KBu of KB (according to the SCM) there exists an enumeration
        of PAX such that the resulting fM (KB, u) is logically equivalent to KBu .
     3. fM is secure w.r.t. M and w.r.t. any M0 = hKB, U, fM , hS u , PKB0u , BK 0u iu2U i such
        that PKB0 ◆ }(KB) and BK 0 = ;.

       Theorem 6 applies to every canonical M such that MR = BK = ;, because MR = ;
    implies that PAX 0 = PAX 1 = KB and hence PKB = }(KB). This shows that the SCM
    can be regarded as a special case of our framework where the user has no background
    knowledge. Moreover, by this correspondence, one immediately obtains complexity
    bounds for the SCM from those for PAX 1 and Horn, bounded-variable MR.


    9      Framework Implementation
    In this section we introduce a prototypical implementation of the framework based on
    PAX 1 and Horn metarules. Nowadays ontologies are managed with the help of the OWL
    API7 and DL reasoners which allow us to take full advantage of their rich underly-
    ing semantics. Unfortunately, the OWL reasoners publicly available do not o↵er native
    support for conjunctive query answering required to process users’ metaknowledge. A
    partial exception of this rule is the Pellet reasoner discussed later. Straightforward eval-
    uation of metarules in the presence of metavariables with an OWL reasoner would need
    to consider all possible ways of uniformly replacing metavariables by individual con-
    stants occurring in the ontology. On the other hand, the evaluation of a ground rule r
    with an OWL reasoner in the worst case would require checking that all the axioms
    ↵1 , . . . , ↵n 2 body(r) and 1 , . . . , m 2 head(r) are entailed by KB. Summing up, with
    this method, as the ontology ABox grow, metarule evaluation can easily become un-
    manageable in terms of execution time. Consequently, the presence of technologies that
    permit native conjunctive query evaluation reveals fundamental to achieve efficient im-
    plementation of the framework. SPARQL8 , the W3C standard that provide languages
    and protocols to query and manipulate RDF content (and so ontologies encoded in the
    XML/RDF Syntax), constitute a de facto standard when it comes to conjunctive query
    answering. It has recently been extended with the so-called entailment regimes, which
     7
         http://owlapi.sourceforge.net/
     8
         http://www.w3.org/TR/sparql11-overview/


                                                156
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    define how queries are evaluated under more expressive semantics, such as OWL se-
    mantics, than the SPARQL standard simple entailment, based essentially on pattern
    matching on graphs. Unfortunately, most of the available engines do not provide sup-
    port for OWL reasoning. To the best of our knowledge only Apache Jena Semantic
    Web Toolkit and Pellet support OWL inference over the queried ontologies. Moreover,
    Pellet query engine seems not to have been reengineered for the last few years. It was
    therefore an obvious option to prefer the Jena query engine for our system. The Jena
    inference subsystem is designed to allow usage of a number of predefined Jena OWL
    reasoners, as well as external reasoners9 . However, the usage of the internal reasoners
    is recommended for efficiency reasons. Note, that OWL, OWL Mini, OWL Micro Jena
    Reasoners are a set of useful but not a full-fledged rule-based implementations of the
    OWL/Lite subset of the OWL/Full language. Critical not supported constructs which go
    beyond OWL/Lite are complementOf and oneOf, while the support for unionOf is par-
    tial. We consider our choice to use a Jena OWL reasoner a good compromise between
    expressivity and efficiency. According to the theoretical framework the system consists
    of two modules. The first one actuates the parsing of the user’s metaknowledge repre-
    sented by means of a set of metarules. The second module is in charge of the secure
    ontology view construction.
         Algorithm 1 provides an abstract view on the implementation of our framework. It
    takes as input an ontology KB, a set of secrets S , a set of metarules MR and the user’s
    background knowledge BK. The output is the set of axioms that constitute a secure
    ontology view for the user. The set M M and MG form a partition of MR according to
    rule types (ground or containing metavariables).
    Remark 2. By standard logic programming techniques, a minimal PKB ✓ PAX 1 sat-
    isfying the set of metarules and the constraints K + can be obtained with the following
    PTIME construction:
                                          [
     PKB0 = K + ,      PKBi+1 = PKBi [        { head(r) | r 2 ground PKBi (MR) ^ body(r) ✓ Cn(PKBi ) }

     The sequence’s limit PKB|PAX 1 | satisfy hK + , K i as well if Cn(PKB|PAX 1 | ) \ K , ;.
    Then, for all s 2 S , s 2 CnM (K + , K ) holds i↵ s 2 Cn(PKB|PAX 1 | [ BK). For more
    details refer to [7].

         By iterating over the axioms ↵ 2 PAX 1 (line 6-25), PKB collects at each step all
    parts of PAX 1 that can be revealed to the user. The repeat-until loop (lines 9-17) com-
                                        0
    putes the deductive closure PKB of PKB under MR10 . In particular, for every ground
    metarule (lines 10-13) we execute a SPARQL ASK query (hidden in line 11) to verify
                                               0
    if its body is entailed by the current PKB . For every metarule containing metavariables
    (lines 14-16) we execute a SPARQL SELECT query (encoded in line 15) in order to
    obtain all possible bindings of the metavariables that satisfy the metarule’s body. The
    pair of steps described above is iterated until a fixpoint is reached (no elements are
                    0                                                0
    added to PKB (line 17)). At this point the condition Cn(PKB ) \ Ki , ; is checked
    (line 18). We are now ready to determine the value of the censor function for ↵. We
    verify that no secret is entailed from the minimal PKB (line 19) taking in consideration
     9
         To be plugged into Jena a reasoner must expose Jena API.
    10
         The result of Proposition 2 guarantees that considering only the minimal PKB is sound.


                                                157
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


      Algorithm 1:
        Data: KB, S , MR, BK.
      1 Ki+ , Ki       ;;
      2 MM        {ri |ri 2 MR and ri metarule containing metavariables};
      3 MG       {ri |ri 2 MR and ri ground metarule};
                                 S
      4 PAX 1       {↵ 2 KB [ r2groundKB (MR) head(r)};
      5 PKB        ;;
      6 forall ↵ 2 PAX 1 do
      7       PKB0         PKB [ {↵};
      8       MG0       MG ;
      9       repeat
     10            forall m 2 MG0 do
     11                   if PKB0 |= body(m) then
     12                       PKB0     PKB0 [ {head(m)};
     13                       MG0    MG0 \ {m};

     14            forall m 2 M M do
     15                forall (a0 , . . . , an ) | PKB0 |= body(m, [X0 /a0 , . . . , Xn /an ]) do
     16                     PKB0          PKB0 [ {head(m, [X0 /a0 , . . . , Xn /an ])};

     17        until No element is added to PKB0 ;
     18        if { 2 Ki | PKB0 |= } = ; then
     19             if {s 2 S | PKB0 [ BK |= s} = ; then
     20                  if KB |= ↵ then
     21                      Ki+     Ki+ [ {↵};
     22                      PKB       PKB0 ;
     23                      MG      MG0 ;
     24                 else
     25                        Ki    Ki [ {↵};

     26    return Ki+


    the background knowledge11 . In case ↵ is entailed by KB, it is safe to include it in the
    view (line 21). Otherwise, the set Ki is updated (line 25). Note that we need an OWL
    reasoner in order to perform the entailment checks in lines 18-20. We make use of the
    incremental reasoner Pellet, that for each ↵i in the enumeration of PAX 1 , is expected to
    restrict reasoning to the new inferences triggered by ↵ without repeating the inferences
    that involve only Ki+ 1 .
        A first optimization regards the evaluation of the set of ground rules MG . During
    the construction of PKB, due to the monotonicity of reasoning, at each iteration we can
    safely remove from MG all the ground rules already satisfied at the previous iterations
    (line 13,23). Another optimization concerns the evaluation order of PAX1 . Checking
    Cn(PKB|PAX 1 | )\K , ; (line 18) is time consuming, so we adopt an approach that main-

    11
          This corresponds to check whether s 2 CnM (K + [ {↵}, K ) only. The condition s 2
          CnM (K + , K [ {↵}) is in fact embedded in line 18.


                                                      158
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    tain the set Ki as small as possible. This is achieved processing all {↵ 2 PAX 1 | ↵ 2 KB}
    in line 6 first. Provided that the condition in line 19 is vacuously satisfied we are sure
    that the Ki remains empty.
         Experimental analysis show that the generation of secure views for medium sized
    ontologies may take several minutes. We plan to investigate module extraction tech-
    niques that are expected to improve drastically the execution time by restricting the part
    of the knowledge base on which metarules apply.


    10     Related work

    Baader et al. [2], Eldora et al. [11], and Knechtel and Stuckenschmidt [13] attach se-
    curity labels to axioms and users to determine which subset of the KB can be used by
    each subject. These works are instances of the SCM so they are potentially vulnerable
    to the attacks based on background knowledge; this holds in particular for [13] that pur-
    sues the construction of maximal secure views. Moreover, in [2, 11] axiom labels are
    not derived from the set of secrets; knowledge engineers are responsible for checking
    ex post that no confidential knowledge is entailed; in case of leakage, the view can be
    modified with a revision tool based on pinpointing. On the contrary, our mechanism
    automatically selects which axioms shall be hidden in order to produce a secure view.
         Chen and Stuckenschmidt [8] adopt an instance of the SACM based on removing
    some individuals entirely. In general, this may be secure against metaknowledge attacks
    (cf. Ex. 5). However, no methodology is provided for selecting the individuals to be re-
    moved given a target set of secrets. In [3], KB is partitioned into a visible part KBv and
    a hidden part KBh . Conceptually, this is analogous to axiom labelling, cf. the above ap-
    proaches. Their confidentiality methodology seems to work only under the assumption
    that the signatures of KBv and KBh are disjoint, because in strong safety they do not
    consider the formulae that are implied by a combination of KBv and KBh . Surely the
    axioms of KBh whose signature is included in the signature of KBv cannot be protected,
    in general. A partition-based approach is taken in [10], too. It is not discussed how to
    select the hidden part KBh given a set of target secrets (which includes the issue of
    deciding secondary protection).
         Similarly, in [14] only ex-post confidentiality verification methods are provided. In
    their model the equivalent of PKB is the set of all knowledge bases that include a given
    set of publicly known axioms S ✓ KB; consequently, in some cases their verification
    method is vulnerable to the attacks to complete knowledge, that are based on more
    complex (conditional) metaknowledge (cf. Example 2 and Example 5) that cannot be
    encoded in their framework.
         Cuenca Grau and Horrocks [9] investigate knowledge confidentiality from a prob-
    abilistic perspective: enlarging the public view should not change the probability dis-
    tribution over the possible answers to a “sensitive query” Q that represents the set of
    secrets. In [9] users can query the knowledge base only through a pre-defined set of
    views (we place no such restriction, instead). A probability distribution P over the set
    of knowledge bases plays a role similar to metaknowledge. However, their confiden-
    tiality condition allows P to be replaced with a di↵erent P0 after enlarging the public


                                                159
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


    view, so at a closer look P does not really model the user’s a priori knowledge about the
    knowledge base (that should remain constant), di↵erently from our PKB.
        Our method is inspired by the literature on controlled query evaluation (CQE) based
    on lies and/or refusals ([4, 5, 6] etc). Technically we use lies, because rejected queries
    are not explicitly marked (the cited papers use the special answer “mum”). However, our
    censor resembles the classical refusal censor, so the properties of fM are not subsumed
    by any of the classical CQE methods. For example (unlike the CQE approaches that use
    lies), fM (KB, u) encodes only correct knowledge (i.e. entailed by KB), and it is secure
    whenever users do not initially know any secret (while lies-based CQE further require
    that no disjunction of secrets should be known a priori). Unlike the refusal method, fM
    can handle cover stories because users are not told that some queries are obfuscated;
    as an additional advantage, our method needs not to adapt existing engines to handle
    nonstandard answers like mum. Finally, the CQE approaches do not deal specifically
    with DL knowledge bases, metaknowledge, and related complexity analysis.


    11     Conclusions
    The confidentiality preservation methods that do not consider background knowledge
    are vulnerable to several attacks. We identified two vulnerabilities (attacks to complete
    knowledge and to the signature) and introduced a knowledge base confidentiality model
    that can detect these vulnerabilities, based on a fully generic formalization of object-
    and meta-level background knowledge. Confidentiality is enforced through a generic
    mechanism for constructing secure views (the filtering fM ) that is provably secure w.r.t.
    the meta-confidentiality model under a continuity assumption, and generalizes a few
    previous approaches (cf. Thm. 6 and Ex. 5). In order to compute secure views in prac-
    tice we introduced a safe, generic method for approximating background knowledge,
    together with a specific rule-based language for expressing metaknowledge. Based on
    this instantiation of the general framework, where fM is always secure, we analyzed the
    computational complexity of computing secure views. If the underlying DL is tractable,
    then in the simplest case fM can be computed in polynomial time. The number of
    variables in metarules and the adoption of a more secure approximation (PAX 0 ) may
    increase complexity up to PNP = 2p and perhaps 3p . The complexity of non-Horn
    metarules, however, can be avoided by replacing each non-Horn r with one of its Horn
    strengthenings: body(r) ) ↵ such that ↵ 2 head(r). This approximation is safe (be-
    cause it restricts PKB), and opens the way to a systematic use of the low-complexity
    bk-models based on PAX 1 and Horn metarules. For the many ExpTime-complete DL,
    secure view computation does not increase asymptotic complexity. So far, the best up-
    per complexity bound for computing secure views in the description logic underlying
    OWL DL (i.e. SROIQ(D)) is coNPN2ExpT ime .
        Finally, we have provided a prototype implementation of the low-complexity frame-
    works based on PAX 1 and Horn metarules using incremental engine versions available
    for Pellet and ELK to avoid repeated classifications in the iterative construction of fM .
    Metarule bodies are evaluated with SPARQL. Secure views are constructed o↵-line, so
    no overhead is placed on user queries, and this approach is expected to be applicable in
    practice.


                                                160
P. A. Bonatti, I. Petrova, L. Sauro. A mechanism for ontology confidentiality


                                           Bibliography


     [1] F. Baader, D. Calvanese, D. L. McGuinness, D. Nardi, and P. F. Patel-Schneider, editors.
         The Description Logic Handbook: Theory, Implementation, and Applications. Cambridge
         University Press, 2003.
     [2] F. Baader, M. Knechtel, and R. Peñaloza. A generic approach for large-scale ontological
         reasoning in the presence of access restrictions to the ontology’s axioms. In International
         Semantic Web Conference, pages 49–64, 2009.
     [3] J. Bao, G. Slutzki, and V. Honavar. Privacy-preserving reasoning on the semantic web. In
         Web Intelligence, pages 791–797. IEEE Computer Society, 2007.
     [4] J. Biskup and P. A. Bonatti. Lying versus refusal for known potential secrets. Data Knowl.
         Eng., 38(2):199–222, 2001.
     [5] J. Biskup and P. A. Bonatti. Controlled query evaluation for enforcing confidentiality in
         complete information systems. Int. J. Inf. Sec., 3(1):14–27, 2004.
     [6] J. Biskup and P. A. Bonatti. Controlled query evaluation for known policies by combining
         lying and refusal. Ann. Math. Artif. Intell., 40(1-2):37–62, 2004.
     [7] P. A. Bonatti and L. Sauro. A confidentiality model for ontologies. In International Seman-
         tic Web Conference (1), pages 17–32, 2013.
     [8] W. Chen and H. Stuckenschmidt. A model-driven approach to enable access control for on-
         tologies. In H. R. Hansen et al., editor, Wirtschaftsinformatik, volume 246 of books@ocg.at,
         pages 663–672. Österreichische Computer Gesellschaft, 2009.
     [9] B. Cuenca Grau and I. Horrocks. Privacy-preserving query answering in logic-based in-
         formation systems. In M. Ghallab, C. D. Spyropoulos, N. Fakotakis, and N. M. Avouris,
         editors, ECAI, volume 178 of Frontiers in Artificial Intelligence and Applications, pages
         40–44. IOS Press, 2008.
    [10] B. Cuenca Grau and B. Motik. Importing ontologies with hidden content. In B. Cuenca Grau
         et al., editor, Description Logics, volume 477 of CEUR Workshop Proceedings. CEUR-
         WS.org, 2009.
    [11] Eldora, M. Knechtel, and R. Peñaloza. Correcting access restrictions to a consequence
         more flexibly. In R. Rosati, S. Rudolph, and M. Zakharyaschev, editors, Description Logics,
         volume 745 of CEUR Workshop Proceedings. CEUR-WS.org, 2011.
    [12] P. Hitzler and T. Lukasiewicz, editors. Web Reasoning and Rule Systems - 4th Int. Confer-
         ence, RR 2010., volume 6333 of Lecture Notes in Computer Science. Springer, 2010.
    [13] M. Knechtel and H. Stuckenschmidt. Query-based access control for ontologies. In Hitzler
         and Lukasiewicz [12], pages 73–87.
    [14] P. Stouppa and T. Studer. Data privacy for knowledge bases. In S. N. Artëmov and
         A. Nerode, editors, LFCS, volume 5407 of LNCS, pages 409–421. Springer, 2009.
    [15] J. Tao, G. Slutzki, and V. Honavar. Secrecy-preserving query answering for instance check-
         ing in EL. In Hitzler and Lukasiewicz [12], pages 195–203.


                                                161