Privacy Preservation Using Multi-Context Systems? Wolfgang Faber University of Calabria, Italy wf@wfaber.com Abstract. Preserving the privacy of sensitive data is one of the major challenges which the information society has to face. Traditional approaches focused on the infrastructure for identifying data which is to be kept private and for managing access rights to these data. However, while these efforts are useful, they do not address an important aspect: While the sensitive data itself can be protected nicely using these mechanisms, related data, which is deemed insensitive per se may be used to infer sensitive data. This can be achieved by combining insensitive data or by exploiting specific background knowledge of the domain of discourse. In this note, we show that resolving this problem can be achieved in a simple and elegant way by using multi-context systems. 1 Introduction The privacy of individuals has become one of the most important and most discussed issues in modern society. With the advent of the Internet and easy access to a lot of data, keeping sensitive data private has become a priority for distributed information systems. An example area in which privacy is at stake are medical information systems. Most databases have privacy mechanisms which are comparatively simple – by and large they boil down to keeping certain columns of the database hidden from certain types of users. There is a huge body of literature that deals with formalisms for this kind of authorization problem, which we cannot discuss in detail in this short note. As an example, see [6] for a work that discusses aspects of the authorization problem in non- monotonic knowledge bases. What we are interested in this short paper is a somewhat different issue, namely that users can infer information that is designated private by asking queries that do not involve private information and then making “common sense” inferences from the answers to infer private information. In an earlier paper [4], we have given a formal definition of the Privacy Preservation Problem and shown how this can be addressed by using default logic (we also refer to this paper for discussions on related work). In that paper, however, there were several restrictions on the knowledge bases that can be used. Effectively, they had to be first- order theories, because in this way it is easily possible to build a default theory around them. In order to lift this restriction, in this note we propose using multi-context systems as defined by Brewka and Eiter in [3] instead of default logic. By switching to that formalism, it is possible to use heterogeneous knowledge bases to which users may ? This work was supported by M.I.U.R. within the PRIN project LoDeN. W. Faber have access or which model user knowledge. The unifying framework are then contexts and bridge rules that link contexts instead of default rules in [4]. Apart from the greater flexibility concerning the types of “participating” knowledge bases, another advantage is that efficient systems for reasoning with multi-context systems begin to emerge [1]. In the following, we will first provide an adapted definition of the privacy preser- vation problem in section 2. This definition is slightly different from the one of [4] in order to allow for more heterogeneous knowledge bases to be involved. In section 3 we will then show how to construct a multi-context system for computing answers for a privacy preservation problem. In section 4 we conclude and outline future work. 2 Privacy Preservation Problem In this section, we provide a simple formulation of the privacy preservation problem (P3 for short), which is a generalization of the definition in [4]. For simplicity, we will not consider any evolution in time of the systems, as it was done in [4]. We consider a logic L as in [3] to be a triple (KBL , BSL , ACCL ) where KBL is the set of well-formed knowledge bases of L (each of which is a set as well), BSL is the set of possible belief sets, and ACCL is a function KBL → 2BSL describes the semantics of each knowledge base. In the following, when mentioning knowledge bases, we will usually not specify the underlying logic, intending that it can be any logic in the sense just described. Let the finite set U contain one user ID for each user in the system under consider- ation. Moreover, we consider the main knowledge base MKB which the users will be querying. Furthermore, the function BK associates each user u ∈ U with a background knowledge base BK(u), while the function Priv associates each user u ∈ U with a belief set Priv(u) that should be kept private. Note that the various knowledge bases need not be of the same logic, but for practical reasons one would assume the belief sets to be homogeneous. It should be pointed out that BK(u) is not necessarily the user’s own knowledge base, but rather a model of the user’s knowledge, maintained by the information system. Example 1. Consider a small medical knowledge base MedKB containing information about the symptoms and diseases of some patients. Let this knowledge base describe two predicates symptom and disease and let the following be its only belief set SMedKB : symptom(john, s1 ) symptom(jane, s1 ) disease(jane, aids) symptom(john, s2 ) symptom(jane, s4 ) disease(john, cancer) symptom(john, s3 ) disease(ed, polio) Note that MedKB could very well be just a database. Assume that john and jane are also users of the system and want to keep their diseases private, so Priv(john) = {disease(john, cancer)}, while Priv(jane) = {disease(jane, aids)}. Consider an- other user acct (an accountant). This person may have the following background knowl- edge base BK(acct) in the form of rules (so the underlying logic might be answer set programming). disease(X, aids) ← symptom(X, s1 ), symptom(X, s4 ) disease(X, cancer) ← symptom(X, s2 ), symptom(X, s3 ) ISSN 1613-0073 ° c 2011 for the individual papers by the papers’ authors. 46 Privacy Preservation Using Multi-Context Systems Let a query be a construct to which for every semantics of a knowledge base a belief set is associated, which is referred to as the answer Ans(Q) to Q. A privacy preserving answer to a query Q over MKB posed by uo ∈ U with respect to BK and Priv is X ⊆ Ans(Q) such that for all u ∈ U \ {u0 } and for all p ∈ Priv(u), if p 6∈ ACC(BK(u0 )) then p 6∈ ACC(X ∪ BK(u0 )). A maximal privacy preserving answer is a subset maximal privacy preserving answer. Note that here we assume that elements of belief sets can be added to knowledge bases, yielding again a knowledge base of the respective logic. A privacy preservation problem P3 is therefore a tuple (MKB, U, BK, Priv, Q, u0 ) and solving it amounts to finding the (maximal) privacy preserving answers to Q posed by u0 over MKB with respect to BK and Priv. Example 2. Returning to our MedKB example, posing the query disease(john, X), we would get as an answer the set {disease(john, cancer)}. Likewise, the answer to the query symptom(john, X) is the set {symptom(john, s1 ), symptom(john, s2 ), symptom(john, s3 )}. We assumed that John and Jane want their diseases kept private. However, the ac- countant can violate John’s privacy by asking the query symptom(john, X). The an- swer that acct would get from the system is {symptom(john, s1 ), symptom(john, s2 ), symptom(john, s3 )}. However, recall that the accountant has some background knowl- edge including the rule disease(X, cancer) ← symptom(X, s2 ), symptom(X, s3 ) which, with the answer of the query, would allow acct to infer disease(john, cancer). Thus the privacy preserving answers to symptom(john, X) are Ans1 = {symptom(john, s1 ), symptom(john, s2 )} Ans2 = {symptom(john, s1 ), symptom(john, s3 )} Ans3 = {symptom(john, s1 )} Ans4 = {symptom(john, s2 )} Ans5 = {symptom(john, s3 )} Ans6 = ∅ None of these answers allows acct to infer the private knowledge disease(john, cancer). However, except for the answers Ans1 and Ans2 , which are maximal, all answers yield fewer information than could be disclosed without infringing privacy requirements. Any system should also provide only one of these answers to the user, because getting for instance both Ans1 and Ans2 would again violate John’s privacy requirements. In a practical system, upon disclosing an answer the system should update the re- spective user’s knowledge model in order to avoid privacy infringements by repeated querying. For example, when the system returns Ans1 to user acct, it should mod- ify BK(acct) in order to reflect the fact that acct now knows symptom(john, s1 ) and symptom(john, s2 ), such that asking the same query again it is made sure that symptom(john, s3 ) will not be disclosed to acct. This however is part of the dynamic aspect of a privacy preserving information system, which we will not address in this paper. ISSN 1613-0073 ° c 2011 for the individual papers by the papers’ authors. 47 W. Faber 3 Solving Privacy Preservation Problems Using Multi-Context Systems The definitions in Section 2 were already slightly geared towards multi-context systems. We recall that a multi-context system in the sense of [3] is a tuple (C1 , . . . , Cn ) where for each i, Ci = (Li , kbi , bri ) where Li is a logic, kbi is a knowledge base of Li and bri is a set of Li bridge rules over {L1 , . . . , Ln }. An Li bridge rule over {L1 , . . . , Ln } is a construct s ← (r1 : p1 ), . . . , (rj : pj ), not (rj+1 : pj+1 ), . . . , not (rm : pm ) where 1 ≤ rk ≤ n, pk is an element of a belief set for Lrk and for each kb ∈ KBi kb ∪ {s} ∈ KBi . The semantics of a multi-context system is defined by means of equilibria. A belief state for a multi-context system (C1 , . . . , Cn ) is S = (S1 , . . . , Sn ), where Si ∈ BSi for 1 ≤ i ≤ n. An Li bridge rule of the form above is applicable in S iff for 1 ≤ k ≤ j pk ∈ Srk holds and for j < k ≤ m pk 6∈ Srk holds. Let app(br, S) denote the set of all bridge rules in br which are applicable in a belief state S. A belief state S = (S1 , . . . , Sn ) is an equilibrium of a multi-context system (C1 , . . . , Cn ) iff for all 1 ≤ i ≤ n, Si ∈ ACCi (kbi ∪ {hd(r) | r ∈ app(bri , S)}), where hd(r) is the head of a bridge rule r, viz. s in the bridge rule schema given earlier. Given a P3 (MKB, U, BK, Priv, Q, u), with U = {u1 , . . . , u|U| }, in order to identify privacy preserving answers, we build a multi-context system MP3 = (C1 , C2 , C3 , C4 , . . . , C|U|+3 ), where C1 = (LMKB , MKB, ∅), C2 = (LMKB , ∅, br2 ), C3 = (LMKB , ∅, br3 ), C4 = (LBK(u1 ) , BK(u1 ), br4 ) . . . , C|U|+3 = (LBK(u|U| ) , BK(u|U| ), br|U|+3 ). Here Lkb is the logic of the knowledge base kb. The meaning is that C1 provides just the belief sets for MKB (no bridge rules), C2 and C3 are used to identify those belief sets which are privacy preserving, while C4 , . . . , C|U|+3 represent the user information, that is, the background knowledge base of the querying user and the privacy requirements of the other users. The important part are the bridge rules, which we will describe next. In many cases, we will create one rule for each symbol that can occur in some belief set of Ans(Q), so for convenience let D = {p | p ∈ B, B ∈ Ans(Q)}. The set br2 contains one bridge rule p ← (1 : p), not (3 : p) for each p ∈ D. Symmetrically, br3 contains one bridge rule p ← (1 : p), not (2 : p) for each p ∈ D. The intuition is that the belief sets of C2 will be subsets of the belief set of C1 in any equilibrium, and hence possible privacy preserving answers. C3 exists only for technical reasons. For i such that ui−2 = u, thus for the context Ci of the querying user, we add one bridge rule p ← (2 : p) for each p ∈ D. This means that in any equilibrium the belief set for i will contain all consequences of the privacy preserving answer with respect to u’s knowledge base. For each i where 3 ≤ i ≤ |U|+2 such that ui−2 6= u, thus for contexts representing non-querying users, bri contains one bridge rule p1 ← (j : p1 ), . . . , (j : pl ), not (i : p1 ) for uj = u and {p1 , . . . , pl } ∈ Priv(ui−2 ). The idea is that no belief state can be an equilibrium, in which the querying user derives information which ui−2 wants to keep private. ISSN 1613-0073 ° c 2011 for the individual papers by the papers’ authors. 48 Privacy Preservation Using Multi-Context Systems Proposition 1. Given a P3 (MKB, U, BK, Priv, Q, u), each equilibrium belief state (S1 , S2 , S3 , S4 , . . . , S|U|+3 ) for MP3 is such that S2 is a privacy preserving answer to P3. Also, each privacy preserving answer S to P3 is the second component of an equilibrium for MP3 . Example 3. In the example examined above, consider the P3 (MedKB, {john, jane, acct}, BK, Priv, symptom(john, X), acct). Note that we did not define background knowledge bases for users john and jane, but their nature is not important for the example, just assume that they exist. We also have not defined any privacy statement for acct, but also this is not important for our example and we will assume that it is empty, that is, acct does not require anything to be kept private. We construct a multi-context system (C1 , C2 , C3 , C4 , C5 , C6 ) where C1 = (LMedKB , MedKB, ∅), C2 = (LMedKB , ∅, br2 ) with bridge rules br2 being symptom(john, s1 ) ← (1 : symptom(john, s1 )), not (3 : symptom(john, s1 )) symptom(john, s2 ) ← (1 : symptom(john, s2 )), not (3 : symptom(john, s2 )) symptom(john, s3 ) ← (1 : symptom(john, s3 )), not (3 : symptom(john, s3 )) then C3 = (LMedKB , ∅, br3 ) with bridge rules br3 being symptom(john, s1 ) ← (1 : symptom(john, s1 )), not (2 : symptom(john, s1 )) symptom(john, s2 ) ← (1 : symptom(john, s2 )), not (2 : symptom(john, s2 )) symptom(john, s3 ) ← (1 : symptom(john, s3 )), not (2 : symptom(john, s3 )) then C4 = (LBK(john) , BK(john), br4 ) with bridge rules br4 being disease(john, cancer) ← (6 : disease(john, cancer)), not (4 : disease(john, cancer)) then C5 = (LBK(jane) , BK(jane), br5 ) with bridge rules br5 being disease(jane, aids) ← (6 : disease(jane, aids)), not (5 : disease(jane, aids) and finally C6 = (LBK(acct) , BK(acct), br6 ) with bridge rules br6 being symptom(john, s1 ) ← (2 : symptom(john, s1 )) symptom(john, s2 ) ← (2 : symptom(john, s2 )) symptom(john, s3 ) ← (2 : symptom(john, s3 )) MP3 has six equilibria E1 = (SMedKB , Ans1 , Ans(symptom(john, X)) \ Ans1 , Ans1 , ∅, ∅) E2 = (SMedKB , Ans2 , Ans(symptom(john, X)) \ Ans2 , Ans2 , ∅, ∅) E3 = (SMedKB , Ans3 , Ans(symptom(john, X)) \ Ans3 , Ans3 , ∅, ∅) E4 = (SMedKB , Ans4 , Ans(symptom(john, X)) \ Ans4 , Ans4 , ∅, ∅) E5 = (SMedKB , Ans5 , Ans(symptom(john, X)) \ Ans5 , Ans5 , ∅, ∅) E6 = (SMedKB , Ans6 , Ans(symptom(john, X)) \ Ans6 , Ans6 , ∅, ∅) where SMedKB is as in Example 1 and the second belief set of each Ei is exactly the re- spective Ansi of Example 2 and the third belief set is the complement of Ansi with respect to Ans(symptom(john, X)) = {symptom(john, s1 ), symptom(john, s2 ), symptom(john, s3 )}. ISSN 1613-0073 ° c 2011 for the individual papers by the papers’ authors. 49 W. Faber We would like to point out that in this construction the original knowledge bases are not changed, we only create contexts and bridge rules. All of the background knowledge bases could be multi-context systems themselves; for instance, if the user model for acct foresees that acct is aware of SNOMED and PEPID, then acct’s background knowledge base could be a multi-context system comprising these two medical knowledge bases. In order to obtain maximal privacy preserving answers using the described con- struction, the simplest way is to postprocessing all privacy preserving answers. More involved solutions would have to interfere with the underlying multi-context system reasoner, for instance by dynamically changing the multi-context system. It is not clear to us at the moment whether it is possible to modify the construction such that the equi- libria of the obtained multi-context system correspond directly to the maximal privacy preserving answers. 4 Conclusion and Future Work We have presented a definition of the privacy preservation problem, which allows for using knowledge bases of different kinds. Finding privacy preserving answers can then be accomplished by building an appropriate multi-context system and computing one of its belief states. Since systems for solving multi-context systems begin to emerge, for example DMCS [1], this also implies that these privacy preserving answers can be effectively computed. However, usually one is interested in maximal privacy preserving answers. It is un- clear to us whether a similar construction as the one presented in this paper can be used for finding privacy preserving answers which are maximal, by just creating appro- priate contexts and bridge rules and without modifying the involved knowledge bases or adding new knowledge bases of particular logics. One possible line of investiga- tion would be to examine work on diagnosing inconsistent multi-context systems [5, 2], since in diagnosis tasks there is an implicit minimization criterion, which could be exploited for encoding maximality. References 1. Bairakdar, S.E., Dao-Tran, M., Eiter, T., Fink, M., Krennwallner, T.: The dmcs solver for dis- tributed nonmonotonic multi-context systems. In: Janhunen, T., Niemelä, I. (eds.) Proceedings of the 12th European Conference on Logics in Artificial Intelligence (JELIA 2010). Lecture Notes in Computer Science, vol. 6341, pp. 352–355. Springer Verlag (2010) 2. Bögl, M., Eiter, T., Fink, M., Schüller, P.: The mcs-ie system for explaining inconsistency in multi-context systems. In: Janhunen, T., Niemelä, I. (eds.) Proceedings of the 12th European Conference on Logics in Artificial Intelligence (JELIA 2010). Lecture Notes in Computer Science, vol. 6341, pp. 356–359. Springer Verlag (2010) 3. Brewka, G., Eiter, T.: Equilibria in heterogeneous nonmonotonic multi-context systems. In: Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI- 2007). pp. 385–390. AAAI Press (2007) 4. Dix, J., Faber, W., Subrahmanian, V.: The Relationship between Reasoning about Privacy and Default Logics. In: Sutcliffe, G., Voronkov, A. (eds.) Logic for Programming, Artificial Intelligence, and Reasoning, 12th International Conference, LPAR 2005. Lecture Notes in Computer Science, vol. 3835, pp. 637–650. Springer Verlag (Dec 2005) ISSN 1613-0073 ° c 2011 for the individual papers by the papers’ authors. 50 Privacy Preservation Using Multi-Context Systems 5. Eiter, T., Fink, M., Schüller, P., Weinzierl, A.: Finding explanations of inconsistency in multi- context systems. In: Lin, F., Sattler, U., Truszczyński, M. (eds.) Proceedings of the Twelfth International Conference on Knowledge Representation and Reasoning (KR 2010). AAAI Press (2010) 6. Zhao, L., Qian, J., Chang, L., Cai, G.: Using ASP for knowledge management with user authorization. Data & Knowledge Engineering 69(8), 737–762 (2010) ISSN 1613-0073 ° c 2011 for the individual papers by the papers’ authors. 51