Privacy Preservation Using Multi-Context Systems?

                                        Wolfgang Faber

                                   University of Calabria, Italy
                                        wf@wfaber.com


        Abstract. Preserving the privacy of sensitive data is one of the major challenges
        which the information society has to face. Traditional approaches focused on the
        infrastructure for identifying data which is to be kept private and for managing
        access rights to these data. However, while these efforts are useful, they do not
        address an important aspect: While the sensitive data itself can be protected nicely
        using these mechanisms, related data, which is deemed insensitive per se may be
        used to infer sensitive data. This can be achieved by combining insensitive data
        or by exploiting specific background knowledge of the domain of discourse. In
        this note, we show that resolving this problem can be achieved in a simple and
        elegant way by using multi-context systems.


1     Introduction

The privacy of individuals has become one of the most important and most discussed
issues in modern society. With the advent of the Internet and easy access to a lot of
data, keeping sensitive data private has become a priority for distributed information
systems. An example area in which privacy is at stake are medical information systems.
    Most databases have privacy mechanisms which are comparatively simple – by and
large they boil down to keeping certain columns of the database hidden from certain
types of users. There is a huge body of literature that deals with formalisms for this kind
of authorization problem, which we cannot discuss in detail in this short note. As an
example, see [6] for a work that discusses aspects of the authorization problem in non-
monotonic knowledge bases. What we are interested in this short paper is a somewhat
different issue, namely that users can infer information that is designated private by
asking queries that do not involve private information and then making “common sense”
inferences from the answers to infer private information.
    In an earlier paper [4], we have given a formal definition of the Privacy Preservation
Problem and shown how this can be addressed by using default logic (we also refer to
this paper for discussions on related work). In that paper, however, there were several
restrictions on the knowledge bases that can be used. Effectively, they had to be first-
order theories, because in this way it is easily possible to build a default theory around
them.
    In order to lift this restriction, in this note we propose using multi-context systems
as defined by Brewka and Eiter in [3] instead of default logic. By switching to that
formalism, it is possible to use heterogeneous knowledge bases to which users may
?
    This work was supported by M.I.U.R. within the PRIN project LoDeN.
W. Faber

have access or which model user knowledge. The unifying framework are then contexts
and bridge rules that link contexts instead of default rules in [4]. Apart from the greater
flexibility concerning the types of “participating” knowledge bases, another advantage
is that efficient systems for reasoning with multi-context systems begin to emerge [1].
     In the following, we will first provide an adapted definition of the privacy preser-
vation problem in section 2. This definition is slightly different from the one of [4] in
order to allow for more heterogeneous knowledge bases to be involved. In section 3 we
will then show how to construct a multi-context system for computing answers for a
privacy preservation problem. In section 4 we conclude and outline future work.

2   Privacy Preservation Problem
In this section, we provide a simple formulation of the privacy preservation problem
(P3 for short), which is a generalization of the definition in [4]. For simplicity, we will
not consider any evolution in time of the systems, as it was done in [4].
     We consider a logic L as in [3] to be a triple (KBL , BSL , ACCL ) where KBL
is the set of well-formed knowledge bases of L (each of which is a set as well), BSL
is the set of possible belief sets, and ACCL is a function KBL → 2BSL describes
the semantics of each knowledge base. In the following, when mentioning knowledge
bases, we will usually not specify the underlying logic, intending that it can be any logic
in the sense just described.
     Let the finite set U contain one user ID for each user in the system under consider-
ation. Moreover, we consider the main knowledge base MKB which the users will be
querying. Furthermore, the function BK associates each user u ∈ U with a background
knowledge base BK(u), while the function Priv associates each user u ∈ U with a
belief set Priv(u) that should be kept private. Note that the various knowledge bases
need not be of the same logic, but for practical reasons one would assume the belief sets
to be homogeneous.
     It should be pointed out that BK(u) is not necessarily the user’s own knowledge
base, but rather a model of the user’s knowledge, maintained by the information system.
Example 1. Consider a small medical knowledge base MedKB containing information
about the symptoms and diseases of some patients. Let this knowledge base describe
two predicates symptom and disease and let the following be its only belief set SMedKB :
       symptom(john, s1 )         symptom(jane, s1 )        disease(jane, aids)
       symptom(john, s2 )         symptom(jane, s4 )        disease(john, cancer)
       symptom(john, s3 )                                   disease(ed, polio)
Note that MedKB could very well be just a database. Assume that john and jane are
also users of the system and want to keep their diseases private, so Priv(john) =
{disease(john, cancer)}, while Priv(jane) = {disease(jane, aids)}. Consider an-
other user acct (an accountant). This person may have the following background knowl-
edge base BK(acct) in the form of rules (so the underlying logic might be answer set
programming).
    disease(X, aids) ← symptom(X, s1 ), symptom(X, s4 )
    disease(X, cancer) ← symptom(X, s2 ), symptom(X, s3 )

ISSN 1613-0073 °
               c 2011 for the individual papers by the papers’ authors.                 46
                                           Privacy Preservation Using Multi-Context Systems

    Let a query be a construct to which for every semantics of a knowledge base a
belief set is associated, which is referred to as the answer Ans(Q) to Q. A privacy
preserving answer to a query Q over MKB posed by uo ∈ U with respect to BK and
Priv is X ⊆ Ans(Q) such that for all u ∈ U \ {u0 } and for all p ∈ Priv(u), if
p 6∈ ACC(BK(u0 )) then p 6∈ ACC(X ∪ BK(u0 )). A maximal privacy preserving
answer is a subset maximal privacy preserving answer.
    Note that here we assume that elements of belief sets can be added to knowledge
bases, yielding again a knowledge base of the respective logic.
    A privacy preservation problem P3 is therefore a tuple (MKB, U, BK, Priv, Q, u0 )
and solving it amounts to finding the (maximal) privacy preserving answers to Q posed
by u0 over MKB with respect to BK and Priv.
Example 2. Returning to our MedKB example, posing the query disease(john, X),
we would get as an answer the set {disease(john, cancer)}. Likewise, the answer
to the query symptom(john, X) is the set {symptom(john, s1 ), symptom(john, s2 ),
symptom(john, s3 )}.
    We assumed that John and Jane want their diseases kept private. However, the ac-
countant can violate John’s privacy by asking the query symptom(john, X). The an-
swer that acct would get from the system is {symptom(john, s1 ), symptom(john, s2 ),
symptom(john, s3 )}. However, recall that the accountant has some background knowl-
edge including the rule
   disease(X, cancer) ← symptom(X, s2 ), symptom(X, s3 )
which, with the answer of the query, would allow acct to infer disease(john, cancer).
Thus the privacy preserving answers to symptom(john, X) are
                 Ans1 = {symptom(john, s1 ), symptom(john, s2 )}
                 Ans2 = {symptom(john, s1 ), symptom(john, s3 )}
                 Ans3 = {symptom(john, s1 )}
                 Ans4 = {symptom(john, s2 )}
                 Ans5 = {symptom(john, s3 )}
                 Ans6 = ∅
None of these answers allows acct to infer the private knowledge disease(john, cancer).
However, except for the answers Ans1 and Ans2 , which are maximal, all answers yield
fewer information than could be disclosed without infringing privacy requirements. Any
system should also provide only one of these answers to the user, because getting for
instance both Ans1 and Ans2 would again violate John’s privacy requirements.
    In a practical system, upon disclosing an answer the system should update the re-
spective user’s knowledge model in order to avoid privacy infringements by repeated
querying. For example, when the system returns Ans1 to user acct, it should mod-
ify BK(acct) in order to reflect the fact that acct now knows symptom(john, s1 )
and symptom(john, s2 ), such that asking the same query again it is made sure that
symptom(john, s3 ) will not be disclosed to acct. This however is part of the dynamic
aspect of a privacy preserving information system, which we will not address in this
paper.


ISSN 1613-0073 °
               c 2011 for the individual papers by the papers’ authors.                 47
W. Faber

3    Solving Privacy Preservation Problems Using Multi-Context
     Systems
The definitions in Section 2 were already slightly geared towards multi-context systems.
We recall that a multi-context system in the sense of [3] is a tuple (C1 , . . . , Cn ) where
for each i, Ci = (Li , kbi , bri ) where Li is a logic, kbi is a knowledge base of Li and
bri is a set of Li bridge rules over {L1 , . . . , Ln }.
    An Li bridge rule over {L1 , . . . , Ln } is a construct
    s ← (r1 : p1 ), . . . , (rj : pj ), not (rj+1 : pj+1 ), . . . , not (rm : pm )
where 1 ≤ rk ≤ n, pk is an element of a belief set for Lrk and for each kb ∈ KBi
kb ∪ {s} ∈ KBi .
    The semantics of a multi-context system is defined by means of equilibria. A belief
state for a multi-context system (C1 , . . . , Cn ) is S = (S1 , . . . , Sn ), where Si ∈ BSi
for 1 ≤ i ≤ n. An Li bridge rule of the form above is applicable in S iff for 1 ≤ k ≤ j
pk ∈ Srk holds and for j < k ≤ m pk 6∈ Srk holds. Let app(br, S) denote the
set of all bridge rules in br which are applicable in a belief state S. A belief state
S = (S1 , . . . , Sn ) is an equilibrium of a multi-context system (C1 , . . . , Cn ) iff for all
1 ≤ i ≤ n, Si ∈ ACCi (kbi ∪ {hd(r) | r ∈ app(bri , S)}), where hd(r) is the head of
a bridge rule r, viz. s in the bridge rule schema given earlier.
    Given a P3 (MKB, U, BK, Priv, Q, u), with U = {u1 , . . . , u|U| }, in order to
identify privacy preserving answers, we build a multi-context system MP3 = (C1 ,
C2 , C3 , C4 , . . . , C|U|+3 ), where C1 = (LMKB , MKB, ∅), C2 = (LMKB , ∅, br2 ),
C3 = (LMKB , ∅, br3 ), C4 = (LBK(u1 ) , BK(u1 ), br4 ) . . . , C|U|+3 = (LBK(u|U| ) ,
BK(u|U| ), br|U|+3 ). Here Lkb is the logic of the knowledge base kb. The meaning is
that C1 provides just the belief sets for MKB (no bridge rules), C2 and C3 are used to
identify those belief sets which are privacy preserving, while C4 , . . . , C|U|+3 represent
the user information, that is, the background knowledge base of the querying user and
the privacy requirements of the other users. The important part are the bridge rules,
which we will describe next. In many cases, we will create one rule for each symbol
that can occur in some belief set of Ans(Q), so for convenience let D = {p | p ∈
B, B ∈ Ans(Q)}.
    The set br2 contains one bridge rule p ← (1 : p), not (3 : p) for each p ∈ D.
Symmetrically, br3 contains one bridge rule p ← (1 : p), not (2 : p) for each p ∈ D.
The intuition is that the belief sets of C2 will be subsets of the belief set of C1 in any
equilibrium, and hence possible privacy preserving answers. C3 exists only for technical
reasons.
    For i such that ui−2 = u, thus for the context Ci of the querying user, we add one
bridge rule p ← (2 : p) for each p ∈ D. This means that in any equilibrium the belief
set for i will contain all consequences of the privacy preserving answer with respect to
u’s knowledge base.
    For each i where 3 ≤ i ≤ |U|+2 such that ui−2 6= u, thus for contexts representing
non-querying users, bri contains one bridge rule p1 ← (j : p1 ), . . . , (j : pl ), not (i :
p1 ) for uj = u and {p1 , . . . , pl } ∈ Priv(ui−2 ). The idea is that no belief state can
be an equilibrium, in which the querying user derives information which ui−2 wants to
keep private.


ISSN 1613-0073 °
               c 2011 for the individual papers by the papers’ authors.                       48
                                            Privacy Preservation Using Multi-Context Systems

Proposition 1. Given a P3 (MKB, U, BK, Priv, Q, u), each equilibrium belief state
(S1 , S2 , S3 , S4 , . . . , S|U|+3 ) for MP3 is such that S2 is a privacy preserving answer to
P3. Also, each privacy preserving answer S to P3 is the second component of an
equilibrium for MP3 .
Example 3. In the example examined above, consider the P3 (MedKB, {john, jane,
acct}, BK, Priv, symptom(john, X), acct). Note that we did not define background
knowledge bases for users john and jane, but their nature is not important for the
example, just assume that they exist. We also have not defined any privacy statement
for acct, but also this is not important for our example and we will assume that it
is empty, that is, acct does not require anything to be kept private. We construct a
multi-context system (C1 , C2 , C3 , C4 , C5 , C6 ) where C1 = (LMedKB , MedKB, ∅),
C2 = (LMedKB , ∅, br2 ) with bridge rules br2 being
    symptom(john, s1 ) ← (1 : symptom(john, s1 )), not (3 : symptom(john, s1 ))
    symptom(john, s2 ) ← (1 : symptom(john, s2 )), not (3 : symptom(john, s2 ))
    symptom(john, s3 ) ← (1 : symptom(john, s3 )), not (3 : symptom(john, s3 ))
then C3 = (LMedKB , ∅, br3 ) with bridge rules br3 being
    symptom(john, s1 ) ← (1 : symptom(john, s1 )), not (2 : symptom(john, s1 ))
    symptom(john, s2 ) ← (1 : symptom(john, s2 )), not (2 : symptom(john, s2 ))
    symptom(john, s3 ) ← (1 : symptom(john, s3 )), not (2 : symptom(john, s3 ))
then C4 = (LBK(john) , BK(john), br4 ) with bridge rules br4 being
    disease(john, cancer) ← (6 : disease(john, cancer)), not (4 : disease(john, cancer))
then C5 = (LBK(jane) , BK(jane), br5 ) with bridge rules br5 being
    disease(jane, aids) ← (6 : disease(jane, aids)), not (5 : disease(jane, aids)
and finally C6 = (LBK(acct) , BK(acct), br6 ) with bridge rules br6 being
    symptom(john, s1 ) ← (2 : symptom(john, s1 ))
    symptom(john, s2 ) ← (2 : symptom(john, s2 ))
    symptom(john, s3 ) ← (2 : symptom(john, s3 ))
MP3 has six equilibria

        E1 = (SMedKB , Ans1 , Ans(symptom(john, X)) \ Ans1 , Ans1 , ∅, ∅)
        E2 = (SMedKB , Ans2 , Ans(symptom(john, X)) \ Ans2 , Ans2 , ∅, ∅)
        E3 = (SMedKB , Ans3 , Ans(symptom(john, X)) \ Ans3 , Ans3 , ∅, ∅)
        E4 = (SMedKB , Ans4 , Ans(symptom(john, X)) \ Ans4 , Ans4 , ∅, ∅)
        E5 = (SMedKB , Ans5 , Ans(symptom(john, X)) \ Ans5 , Ans5 , ∅, ∅)
        E6 = (SMedKB , Ans6 , Ans(symptom(john, X)) \ Ans6 , Ans6 , ∅, ∅)

where SMedKB is as in Example 1 and the second belief set of each Ei is exactly the re-
spective Ansi of Example 2 and the third belief set is the complement of Ansi with
respect to Ans(symptom(john, X)) = {symptom(john, s1 ), symptom(john, s2 ),
symptom(john, s3 )}.


ISSN 1613-0073 °
               c 2011 for the individual papers by the papers’ authors.                     49
W. Faber

    We would like to point out that in this construction the original knowledge bases are
not changed, we only create contexts and bridge rules. All of the background knowledge
bases could be multi-context systems themselves; for instance, if the user model for acct
foresees that acct is aware of SNOMED and PEPID, then acct’s background knowledge
base could be a multi-context system comprising these two medical knowledge bases.
    In order to obtain maximal privacy preserving answers using the described con-
struction, the simplest way is to postprocessing all privacy preserving answers. More
involved solutions would have to interfere with the underlying multi-context system
reasoner, for instance by dynamically changing the multi-context system. It is not clear
to us at the moment whether it is possible to modify the construction such that the equi-
libria of the obtained multi-context system correspond directly to the maximal privacy
preserving answers.

4    Conclusion and Future Work
We have presented a definition of the privacy preservation problem, which allows for
using knowledge bases of different kinds. Finding privacy preserving answers can then
be accomplished by building an appropriate multi-context system and computing one
of its belief states. Since systems for solving multi-context systems begin to emerge,
for example DMCS [1], this also implies that these privacy preserving answers can be
effectively computed.
    However, usually one is interested in maximal privacy preserving answers. It is un-
clear to us whether a similar construction as the one presented in this paper can be
used for finding privacy preserving answers which are maximal, by just creating appro-
priate contexts and bridge rules and without modifying the involved knowledge bases
or adding new knowledge bases of particular logics. One possible line of investiga-
tion would be to examine work on diagnosing inconsistent multi-context systems [5,
2], since in diagnosis tasks there is an implicit minimization criterion, which could be
exploited for encoding maximality.

References
1. Bairakdar, S.E., Dao-Tran, M., Eiter, T., Fink, M., Krennwallner, T.: The dmcs solver for dis-
   tributed nonmonotonic multi-context systems. In: Janhunen, T., Niemelä, I. (eds.) Proceedings
   of the 12th European Conference on Logics in Artificial Intelligence (JELIA 2010). Lecture
   Notes in Computer Science, vol. 6341, pp. 352–355. Springer Verlag (2010)
2. Bögl, M., Eiter, T., Fink, M., Schüller, P.: The mcs-ie system for explaining inconsistency in
   multi-context systems. In: Janhunen, T., Niemelä, I. (eds.) Proceedings of the 12th European
   Conference on Logics in Artificial Intelligence (JELIA 2010). Lecture Notes in Computer
   Science, vol. 6341, pp. 356–359. Springer Verlag (2010)
3. Brewka, G., Eiter, T.: Equilibria in heterogeneous nonmonotonic multi-context systems. In:
   Proceedings of the Twenty-Second National Conference on Artificial Intelligence (AAAI-
   2007). pp. 385–390. AAAI Press (2007)
4. Dix, J., Faber, W., Subrahmanian, V.: The Relationship between Reasoning about Privacy
   and Default Logics. In: Sutcliffe, G., Voronkov, A. (eds.) Logic for Programming, Artificial
   Intelligence, and Reasoning, 12th International Conference, LPAR 2005. Lecture Notes in
   Computer Science, vol. 3835, pp. 637–650. Springer Verlag (Dec 2005)


ISSN 1613-0073 °
               c 2011 for the individual papers by the papers’ authors.                         50
                                               Privacy Preservation Using Multi-Context Systems

5. Eiter, T., Fink, M., Schüller, P., Weinzierl, A.: Finding explanations of inconsistency in multi-
   context systems. In: Lin, F., Sattler, U., Truszczyński, M. (eds.) Proceedings of the Twelfth
   International Conference on Knowledge Representation and Reasoning (KR 2010). AAAI
   Press (2010)
6. Zhao, L., Qian, J., Chang, L., Cai, G.: Using ASP for knowledge management with user
   authorization. Data & Knowledge Engineering 69(8), 737–762 (2010)


ISSN 1613-0073 °
               c 2011 for the individual papers by the papers’ authors.                           51