=Paper= {{Paper |id=Vol-1558/paper34 |storemode=property |title=Formal Models for Privacy |pdfUrl=https://ceur-ws.org/Vol-1558/paper34.pdf |volume=Vol-1558 |authors=Alan Abe,Andrew Simpson |dblpUrl=https://dblp.org/rec/conf/edbt/AbeS16 }} ==Formal Models for Privacy== https://ceur-ws.org/Vol-1558/paper34.pdf
                                        Formal Models for Privacy

                                                 Alan Abe and Andrew Simpson
                                      Department of Computer Science, University of Oxford
                                        Wolfson Building, Parks Road, Oxford OX1 3QD
                                                       United Kingdom
                               Alan.Abe@cs.ox.ac.uk, Andrew.Simpson@cs.ox.ac.uk


ABSTRACT                                                                 Second, individuals will often claim that privacy is impor-
The concept of privacy is becoming increasingly important                tant to them, yet, on the other hand, will exhibit behaviours
in all of our lives. Unfortunately, however, it is a rather neb-         that would indicate that the opposite is true: the so-called
ulous concept; further, many claim that they consider pri-               privacy paradox [20]. Third, the disconnect between high-
vacy to be important, yet undertake behaviour that would                 level requirements, rules and guidelines and their low-level
suggest otherwise — the so-called privacy paradox. As tech-              implementations can be significant — meaning that it is of-
nology becomes more pervasive, the need for assurances that              ten difficult for software engineers and database administra-
individuals’ privacy is not compromised by that technology               tors to be confident that their processes and systems behave
necessarily increases. In this paper, we argue that formal               as intended. Fourth, there are often trade-offs — between
methods have a role to play in helping to provide assurances             privacy and utility, for example — to be made. It is clear,
of privacy in a variety of contexts. As an illustration, we give         therefore, that the appropriate use of formal methods has
consideration to a particular scenario: that of data sharing.            much to offer in this context.
                                                                            In [29], Tshantz and Wing argue that “privacy raises new
                                                                         challenges, and thus new research opportunities, for the for-
Keywords                                                                 mal methods community”. “Privacy-specific needs” — such
Formal methods; privacy; Z notation                                      as statistical / quantitative reasoning and conflicting re-
                                                                         quirements in the context of trustworthy computing — are
1.    INTRODUCTION                                                       identified, with “traditional tools of the trade” — such as
   The ubiquity of computing technologies that can gather,               formal policy languages, abstraction and refinement, and
store and share personal information has reinvigorated de-               code-level analysis — suggested as solutions. They further
bates surrounding an age-old societal dilemma — balanc-                  argue: “It is our responsibility as scientists and engineers
ing the notions of individual privacy and the common good,               to understand what can or cannot be done from a technical
which can emerge from the knowledge gained by analysing                  point of view on privacy . . . Otherwise, society may end
information based not on the individual, but on the ag-                  up in a situation where privacy regulations put into place
gregation of information. For example, aggregating infor-                are technically infeasible to meet” [29]. We pick up that ba-
mation collected in the course of patient care can give rise             ton and give consideration to how formal methods can be
to study populations of sufficient size and heterogeneity in             applied to the modelling and analysis of privacy in the con-
which well-designed secondary uses have the potential to in-             text of data-sharing. (For clarification, by ‘formal methods’
vestigate research questions that could not be pursued via               we mean the application of techniques such as Z [14] and
the traditional route of randomised controlled trials. For               B [1], as opposed to the consideration of models for pro-
it is through the understanding gained from such evidence-               tecting privacy such as k -Anonymity [28] and Differential
based knowledge of diseases and interventions that policies              Privacy [10].) There is a long tradition of formal methods
and strategies to afford effective protection to the health              being applied to ‘real world’ problems (see, for example, the
of communities and improve the quality of human life can                 survey of [30]); this contribution is in that spirit.
emerge. However, there is the potential for harm should the                 To support reasoning about privacy, we consider a broad
inappropriate disclosure or use of such information compro-              range of system–environment interactions that can cause a
mise an individual’s privacy.                                            system to transition to an unwanted state. We use Z [14] to
   Unfortunately, there is a great deal of ‘fuzziness’ surround-         describe the model and ProZ [21] to analyse it. Z semantics,
ing privacy. First, defining the term is notoriously hard [25].          which are based on logic and set theory, enable privacy to
                                                                         be modelled from the perspective of “data in a system” [26].
                                                                         ProZ mechanically validates Z specifications: preservation of
                                                                         an invariant (as related to privacy) and the refinement of one
                                                                         specification (of privacy) by another. Thus, our metric for
                                                                         privacy is a binary result (rather than being probabilistic).
                                                                            We aim to address the gap between the high-level require-
 c 2016, Copyright is with the authors. Published in the Workshop Pro-   ments (derived from, for example, privacy laws and regula-
ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor-
deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this
                                                                         tions) that are described in natural language and their im-
paper is permitted under the terms of the Creative Commons license CC-   plementations. We utilise the UK’s Data Protection Act
by-nc-nd 4.0
1998 (DPA)1 and the approaches to protecting data privacy
that have been proposed in the literature to inform our in-
terpretation of privacy. It is important to note that we do
not define privacy per se; rather, we show how a formal
approach can be used to characterise it.


2.   MOTIVATION
   Advances in IT have motivated the promulgation of laws
and regulations aimed at strengthening privacy protections
for the handling of personal data. Implications for secondary
uses (in particular, in the area of medical research) due to
the aforementioned DPA in the UK and the Privacy Rule             Figure 1: An abstraction of disclosure-processing.
under the Health Insurance Portability Accountability Act
of 1996 (HIPAA)2 in the USA have been debated (see, for
example, [8]). Despite stricter privacy protections, the DPA
                                                                 1996 [3, 7, 9]. Elements of a permission (action and re-
and HIPAA do allow for circumstances in which personal
                                                                 source) for role-based access control (P-RBAC) have been
data may be used or shared without the consent of indi-
                                                                 combined with elements of privacy (e.g., purpose, condi-
viduals to include the appropriate de-identification of data.
                                                                 tions, and obligations) in [18, 19]. Our contribution uses
The DPA regulates the processing of personal data in their
                                                                 a formal method to characterise and reason about privacy
entirety, of which data de-identification is just one aspect.
                                                                 (requirements based on laws and regulations) in the con-
The implication is that the scope of a data-owner’s obliga-
                                                                 text of its characterisation in terms of, for example, design
tion to privacy protection is not limited to the de-identified
                                                                 or code, thereby bridging the gap between abstract notions
data themselves, but also includes their processing.
                                                                 and concrete representations of privacy.
   De-identification is the path envisioned by many to sat-
isfy privacy requirements, and numerous privacy-preserving
methods founded on varying conceptualisations of privacy         4.     DISCLOSURE-PROCESSING
and utility of de-identified data have been proposed (see, for      Our modelling of privacy is motivated by the disclosure of
example, [13]). Empirical validations of these methodolo-        data for secondary uses in which several privacy-preserving
gies are typically based on a high-level abstraction of the      methods have been proposed. The processing of personal
underlying systems that would perform the processing —           data begins with their extraction from a source (or sources),
often, centrally by a single data controller. Further, con-      and ends with their release in an appropriate form.
siderations of threats have typically been underpinned by           We initially abstract disclosure-processing as a unitary
the following characterisation: intruders (characterised as a    system of related processes: we assume processes are in-
“(hypothetical) user who ‘misuses’ the outputs” to “disclose     stantiated singularly and internally in relation to a data
information about one of the data subjects” [24]) obtain de-     controller. It may be argued that such a unitary system is
identified data, obtain relevant auxiliary information, and      unrealistic — in that it is unlikely that a relatively simplis-
then leverage the auxiliary information to compromise the        tic system would be able to effectively or efficiently process
privacy of data subjects. However, processing of data is of-     large amounts of data, dispersed data, or data involving the
ten distributed across multiple systems and organisations,       collaboration of several data controllers. However, its sim-
and, in some cases, across national boundaries. This com-        plicity allows us to establish an abstraction that we can use
plexity can introduce uncertainties as to whether a partic-      to model processing; further, it serves as a starting point for
ular privacy-preserving method can be implemented in a           subsequently thinking about more complicated situations.
manner that maintains confidentiality and privacy. Instead          Our system is comprised of five processes (as per Figure 1):
of attacking the shared de-identified data, an intruder may
find it ‘easier’ to attack the systems or people that produce         1. Parameterisation (PAR). A data controller determines
them. Thus, we give consideration to how formal models                   the parameters that guide each aspect of processing:
can help in the data release process.                                    extraction (extparam), rendering (renparam), testing
                                                                         (tesparam), and dissemination (disparam). In certain
                                                                         instances (such as an interactive mode of disclosure),
3.   RELATED WORK                                                        a data controller may permit, on a controlled basis, a
   Contributions that have applied formal approaches to pri-             data user to determine certain parameters. If auxiliary
vacy include: [27], which is concerned with the modelling                information is used in the evaluation of data, a data
and verification of privacy-enhancing protocols; [2], which is           controller creates or obtains the relevant information.
concerned with detecting and resolving ambiguities in pri-               The primary inputs are the processing parameters and,
vacy requirements; [12], which is concerned with the ver-                where appropriate, auxiliary information that are also
ification of privacy analysis; [16], which presents a formal             its outputs to other processes.
framework for privacy by design [6]; and [15], which presents
a typing system for privacy. Logic-based techniques have              2. Extraction (EXT ). From a designated source, data
been applied to, in particular, the Privacy Rule of HIPAA                that possess certain characteristics or the results of an
                                                                         applied workload are extracted. The primary inputs
1                                                                        are the personal data and extparam to control the ex-
  http://www.legislation.gov.uk/ukpga/1998/29/data.
pdf                                                                      traction process that may include the location of the
2
  http://www.hhs.gov/ocr/privacy/                                        data source, the characteristics of the data to be ex-
        tracted, the workload to be applied, or the method          privacy regulators, and to privacy campaigners, journalists,
        of extraction. The primary outputs are the personal         and members of the public” [5]. It is an appropriate scenario
        data that have been extracted (extdata) or the results      for illustrating our approach for a variety of reasons. First,
        of a workload (extracted information) that has been         certain smart data may be considered to be personal data.
        applied to the personal data.                               Second, there is significant potential for secondary uses of
                                                                    the data. Third, it is an active area of privacy research [23].
     3. Rendering (REN ). The extracted data or information            We consider the disclosure-processing of customers’ per-
        may then be transformed via a rendering method into         sonal information and utility consumption data collected
        a form determined by a data controller to be appro-         via smart meters. To this end, we make the following as-
        priate for release. The primary inputs are the extdata      sumptions. First, smart meters are capable of gathering
        or extracted information and renparam to control the        fine-grained information about a customer’s consumption of
        rendering process that may include the method of ren-       utilities. Second, smart meters support two-way communi-
        dering, the intensity of the method’s application, or       cations: data gathered by a smart meter can be transmitted
        the data characteristics involved. The primary out-         to a utility provider and a utility provider can send data
        puts are the data (rendata) or information that have        (e.g., instructions) to the device. Third, it has been de-
        been rendered into a different form.                        termined that smart meter readings are to be handled as
     4. Testing (TES ). Data or information may be evalu-           personal data. Finally, a customer’s personal information
        ated via the employment of certain tests (pertaining        and smart meter readings are held in accordance with legal
        to privacy and/or data utility) as determined by a          and regulatory requirements.
        data controller. Such testing can support a data con-          Customers of Smart Meter Utilities (SMU) can subscribe
        troller in making a reasoned decision about the release     to one or more of electricity, gas and water. Customers’
        of data for dissemination. The primary inputs are the       consumption data are collected via smart meters that are
        data (extdata or rendata) or information (extracted or      capable of collecting data about consumption by type of
        rendered) to be evaluated, additional data (where ap-       utility, by type of device, and by time. Periodically, SMU
        propriate) to support the evaluation (auxiliary infor-      sends customers consumption information, charges that are
        mation or extdata), and tesparam to control the test-       associated with their consumption, and the amount to be
        ing process that may include the testing methods, the       deducted from their bank accounts (by contractual agree-
        metrics, the data characteristics involved, or the use of   ment). Customers can phone SMU’s customer service cen-
        other data. The primary outputs are the results of the      tre to ask questions about offered services, their accounts,
        tests that have been applied to the data or informa-        or to report problems. For a reported problem, a technician
        tion. Tests of privacy may involve the use of auxiliary     is dispatched to carry out repairs. Marketing information
        information, in particular, to characterise the risk of     about products and services is sent to customers. There
        re-identification based on the likelihood of establish-     is keen interest from government, academia, non-profit or-
        ing links between the auxiliary information and the         ganisations and utility-related businesses to analyse SMU’s
        rendata; tests of data utility may involve the use of       utility consumption data.
        extdata in a comparison with the rendata in which the
        same workload is performed on both.                         6.     THE UK’S DATA PROTECTION ACT
                                                                       Privacy regulations differ across the globe, typically influ-
     5. Dissemination (denoted DIS ). Data or information
                                                                    enced by political and cultural factors. They are often im-
        that are considered appropriate for release are then
                                                                    posed in reaction to pressures from the populace and thereby
        disseminated via a method determined by a data con-
                                                                    strongly influence the practices of data controllers in the
        troller. The primary inputs are the data (extdata or
                                                                    handling of personal data. The ‘appropriateness’ (or lack
        rendata) or information (extracted or rendered) to be
                                                                    thereof), concerning privacy measures undertaken by a data
        disseminated and disparam to control the dissemina-
                                                                    controller will often be litigated.
        tion that may include mode of dissemination or desti-
                                                                       We consider the UK’s Data Protection Act (DPA) as the
        nation for the transmission of the data.
                                                                    legal framework to guide our work. Privacy, however, is nei-
                                                                    ther defined nor explicitly characterised as such in the DPA.
  The rendering and testing processes can be repeated until
                                                                    As exemptions from and contraventions to the Act are spec-
the form in which the data are rendered satisfy criteria for
                                                                    ified (primarily) in terms of the data protection principles
their release. This can be achieved by using different ren-
                                                                    (DPPs)3 and provisions related to the rights of data sub-
dering methods, the same method with different parameters,
                                                                    jects,4 it may be argued that the DPPs and rights of data
or different evaluation criteria. If it is determined that the
                                                                    subjects can inform our interpretation of the DPA (in terms
rendering of data into an acceptable form is unattainable,
                                                                    of privacy). The DPPs, as guiding principles for data con-
processing can be terminated without dissemination.
                                                                    trollers, prescribes that personal data shall be:5

5.     A SCENARIO                                                        1. processed fairly and lawfully;
                                                                         2. obtained for specified and lawful purposes and further
  Our fictional scenario is motivated by the UK’s imple-
                                                                            processed only in a manner compatible with those pur-
mentation of a smart meter programme. In assessing the
                                                                            poses;
pertinent issues, Brown [5] argues that: “Because smart me-
                                                                    3
ters can collect and share detailed information about en-             DPA 1998, Sch 1.
                                                                    4
ergy use and hence household life, their impact on privacy            DPA 1998, Pt II.
                                                                    5
has become a high-profile matter of interest to energy and            DPA 1998, Sch 1, Pt I, para 1 to 8.
    3. adequate, relevant and not excessive for the purposes                        1st
       for which they are processed;                                                DPP
                                                                                                Privacy
    4. accurate and kept up to date (where necessary);
                                                                                    2nd
                                                                                    DPP
    5. kept no longer than is necessary for the purposes for
       which they are processed;
                                                                                    3rd                        8th
    6. processed in accordance with the rights of data sub-                                    Parameter
                                                                                    DPP                        DPP
       jects that have been stipulated in the enactment;
    7. protected by the use of appropriate measures (techni-                        5th                        7th
                                                                                    DPP                        DPP
       cal and organisational); and
                                                                                                System
    8. restricted to being processed in the European Eco-                                                      6th
                                                                                                               DPP
       nomic Area unless adequate protections for the rights
       and freedoms of data subjects can be ensured.
                                                                              Figure 2: Mapping relevant DPPs.
   Responsibility resides with a data controller, “who (either
alone or jointly or in common with other persons) determines
the purposes for which and the manner in which any per-
sonal data are, or are to be, processed”.6 It is with regards       about the processing of personal data by a system in a re-
to purposes (which we assume are to be specified and lawful)        quired manner (such as in accordance with the DPA). We
on which compliance with most of the DPPs hinges: ‘fair’            next consider the application of Z [14] to model a system
and ‘lawful’ processing, in particular, whether data subjects       associated with the release of data in accordance with its
have been informed or provided consent where appropriate            encompassing regulatory environment.
(related to the 1st DPP); obtaining or further processing of
data (the 2nd); characteristics of the data (the 3rd); accu-
racy (the 4th); retention period (the 5th); and violations of
                                                                    7.    A FORMAL MODEL
the rights of data subjects (the 6th).
   Provisions related to the rights of data subjects (in Part II)
                                                                    7.1    A Notion of Privacy
delineate certain actions of stakeholders and prescribe con-           We consider informational privacy — whereby individuals
ditions under which they should or should not occur. Con-           control information about themselves and are able to deter-
sequently, we can abstract them in terms of the behaviours          mine how it is communicated. We interpret (on the basis
of (or interactions with) a system and represent prescribed         of context) the determination by a data controller of ‘pur-
conditions as constraints on those behaviours (or interac-          poses’ and ‘manner’ as specifications. We consider purposes
tions). The aspects of interactions in common are: a trig-          to be the aspect of privacy that its (privacy) specification
gering event (e.g., a notification in writing or the processing     hinges on; the manner is combined with purposes to de-
of personal data) initiated by a either a data subject or data      lineate processing with sufficient granularity. In Figure 2,
controller; a transfer of information (that is abstracted as a      the DPPs (except for the 4th, which we consider as an as-
data object) in which its content and form are prescribed;          sumption) are mapped to their representation in our model.
and prescribed conditions under which a particular action is        Implementations of Privacy and Parameter are represented
or is not performed, such as the amount of time (to include         as schemas. The System is associated with either functional
a start and duration) a data controller has to respond to a         or structural aspects of our model, in particular: access con-
request for information from a data subject.                        trol (related to the 7th DPP); retention of the data that are
   The de-identification of personal data is not explicitly ad-     produced (related to the 5th DPP); and instantiations of
dressed in the DPA (although a code of practice on anonymi-         Privacy and Parameter .
sation exists7 ). The crux of the DPA, it may be argued, is            In relation to a purpose, the 1st DPP (and related aspects
the notion of personal data, and, as such, the applicability        of the DPA) provide us with two other types upon which to
of the enactment to data depends on whether the data in             construct a notion of Privacy: a data subject being informed
question are considered to be personal data. It follows that        and the provision of consent by, or on behalf of, a data
the inflection point at which personal data are no longer           subject. We start by introducing these types.
subject to the DPA is, then, the point at which the data                  [Purpose, Inform, Consent]
are processed into a form that is no longer considered to
be ‘personal’. We can surmise that data can be considered             We consider purpose to mean “the reason for which some-
‘non-personal’ when data subjects can no longer be identi-          thing is done”9 and we interpret ‘something’ to mean the
fied from the data themselves or in combination with other          disclosure-processing of personal data. We assume that, for
information likely to be obtained by a data controller.8            individuals to have consented, they would have had to have
   Our shared interest with data controllers and software en-       been adequately informed (about the processing) prior to its
gineers is in developing an approach to support reasoning           provision. We do not distinguish between a consent that has
6
                                                                    been explicitly or implicitly given, but, rather, that a data
  Data Protection Act 1998, pt I, s 1(1).                           controller has determined a data subject has consented to
7
  https://ico.org.uk/media/for-organisations/                       certain processing. The granularity of the characterisation of
documents/
1061/anonymisation-code.pdf                                         9
                                                                      www.oxforddictionaries.com/definition/english/
8
  Data Protection Act 1998, pt I, s 1(1).                           purpose
each variable should be sufficiently descriptive as to provide,     beginning to an end, constitutes an instance of processing.
in particular, data subjects with an appropriate level of un-       Then, privacy (from beginning to end) ought to be defined
derstanding of: the reasons that underpin the processing of         in relation to each instance of processing. Our notion of pri-
their data; what a data subject has been informed about in          vacy is based on three factors related to: the privacy rights
relation to processing; and what they have consented to be          that are afforded to data subjects by laws and regulations;
done with their data. The higher the granularity with which         the manner in which personal data are processed; and the
Purpose, Inform and Consent are described by a data con-            data that are produced as the result of processing.
troller, the more likely it is that processing will be consistent      We abstract privacy as a set of constraints on a system
with legal and regulatory requirements for privacy.                 to keep it from transitioning to an unwanted state (that
   We consider a parameter to represent “a numerical or             violates privacy). To formally represent these constraints,
other measurable factor forming one of a set that defines           we consider: Purpose, Inform and Consent to abstract the
a system or sets the conditions of its operation”10 . In par-       rights of data subjects; the notion of Parameter to abstract
ticular, we consider a parameter to be the condition under          the manner in which data are processed; and, from our
which processing is carried out: a parameter is determined          abstraction of disclosure-processing, the data that are pro-
in relation to a particular process. This corresponds to the        duced as extdata and rendata. To this end, we introduce the
primary outputs of the PAR process of Section 4. We ab-             schema Privacy:
stract a parameter as a tuple formed by elements: the type
                                                                          Privacy =
                                                                                  b [ purpose : Purpose;
of information (Attribute) and its value (Data).
                                                                                      inform : Inform; consent : Consent ]
       [Attribute, Data]
                                                                      In Section 7.1, we abstracted parameters as sets of tuples
  The types of information that may be described as pa-             formed by two basic types: Attribute and Data. Conse-
rameters are as follows.                                            quently, parameters are captured by the schema Parameter :
                                                                          Parameter =
                                                                                    b [ extparam, renparam, tesparam,
     • Data recipients are persons or entities to whom the
                                                                                        disparam : P (Attribute × Data) ]
       data are to be shared. We do not consider a person
       who is able to access the data during processing as a          Our construct (with respect to a disclosure-processing in-
       recipient, but, instead, as a user in the context of an      stance) is formed by linking Privacy and Parameter in-
       access control mechanism.                                    stances, as well as the data (extdata and rendata) via a
                                                                    process identifier (denoted PID). The same PID links the
     • Data characteristics are considered as inputs to, and        privacy construct to an access control mechanism. Thus, the
       outputs of, a particular process. From an input, a pro-      privacy construct and its implementation have to be consid-
       cess under certain conditions then produces an output.       ered in context.
       The data characteristics of the required output are de-
       scribed implicitly as a set of conditions on the process.    7.3    A System
       As sensitive personal data pertain to certain types of          Our model is in three parts: an access control mecha-
       information (e.g., racial or ethnic origin), we assume       nism to support an implementation of privacy (ACCore);
       that the data have been recorded in a manner that            the source of personal data that are to be processed for re-
       facilitates their identification by a process.               lease (DataStore); and the relation of a process identifier
     • Processing methods are mechanisms by which the data          to an instantiation of the privacy construct and the data
       are processed. Where there are choices, a data con-          produced during processing.
       troller determines the mechanism or multiple mecha-             As privacy can be characterised as a constraint on process-
       nisms that are applied during processing.                    ing and security is a constraint on the access to a process,
                                                                    an access control mechanism can provide a platform for the
     • Data privacy and utility are abstracted on the basis of      implementation of privacy. We extend the formal model of
       the processing conditions that have produced the data        role-based access control [11] of [22] by incorporating the
       or the results of tests applied to the data. The re-         notion of privacy as an element of a permission.
       quirements for privacy and utility of de-identified data        We consider three basic types: Action, Resource, and a
       are described as parameters such that: processing pro-       set of unique identifiers, PID:
       duces data in which certain attributes are absent (e.g.,
                                                                          [ Action, Resource, PID ]
       the Safe Harbor method11 ); an appropriate (as deter-
       mined by a data controller) privacy (and/or utility) pa-       Permissions combine elements of these types:
       rameter (e.g., k = 10) related to a rendering algorithm
                                                                          Perm == Action × Resource × PID
       (e.g., k -Anonymity [28]) is satisfied during processing;
       or results of data testing (privacy and/or utility) sat-       We introduce two further types: User and Role.
       isfy criteria (as determined by a data controller).
                                                                          [ User , Role ]
7.2     A Construct of Privacy                                        The schema ACCore captures an RBAC policy:
  Disclosure-processing begins with the extraction of per-                ACCore = b
sonal data from a source and ends with or without their                   [ user : P User ; role : P Role;
disclosure. This sequence of events (or processes), from a                  perm : P Perm; ur : User ↔ Role;
10
   www.oxforddictionaries.com/definition/english/                           rp : Role ↔ Perm; up : User ↔ Perm |
 parameter                                                                  ur ∈ user ↔ role ∧ rp ∈ role ↔ perm ∧
11
   Privacy Rule of HIPAA 1996 §164.514(b)(2).                               up = ur o9 rp ]
   Here: user , role and perm represents the current set of       (where PID = 1) to the system for billing. The EXT and
users, the current set of roles and the current set of permis-    DIS processes are involved in disclosure-processing. Fur-
sions, respectively; ur captures associations of users with as-   ther assume that Alice has used the same process identi-
signed roles; rp captures associations of roles with assigned     fier to add to the system, as an instantiation of Parameter
permissions; and up captures associations of users with their     (where PID = 1), the parameters to constrain disclosure-
permissions. For a user to have access (via a permission) to      processing. John has been assigned the (run, ext, nullpid )
an aspect of the system, the user has to have been assigned       permission, which allows him access to more personal data
a role that is associated with that permission.                   about a customer than would be required to process data
   Alice, in her role as a data controller, has been assigned     for billing. Based on this, Alice has decided that John can
the permission (add , par , nullpid ). She can access the PAR     assume the added responsibility for billing. Although John
process and add instantiations of Privacy to the system.          would gain access to an additional process (DIS ), Alice be-
John, in his role as a Customer Service Representative, has       lieves that the permission would be sufficiently restrictive.
been assigned the permission (run, ext, nullpid ). To perform     A new role is created for billing and is assigned the permis-
his duties, he is able to access most of a customer’s personal    sions (add , ext, 1) and (run, dis, 1). This new role is then
data. Let us assume that Alice has defined an instantia-          assigned to John. Now let us assume that users are granted
tion of the schema Privacy for the disclosure-processing of       access to processes (or resources) only if permissions that
billing information that she has associated with a PID of 1.      have been assigned to them (via roles) match criteria that
Although John is able to access most of a customer’s per-         have been defined for those processes. For billing, John
sonal data, he is not able to run the EXT process based on        would be allowed to: access the EXT process; from a source,
parameters where PID = 1.                                         extract the relevant personal data based on constraints pre-
   Personal data (in the form of microdata) that have been        scribed by Privacy (where PID = 1) and Parameter (where
obtained and recorded are often held in a relational database.    PID = 1); add the output of extraction to the system in
A particular characteristic of an individual is abstracted as     which PID = 1 is mapped to extdata; then, access the DIS
an Attribute–Data pair. Our abstraction of a source of per-       process; and disseminate the extdata, based on constraints
sonal data for disclosure-processing is twofold: a relational     that are prescribed by Parameter (where PID = 1). Here,
database as a table of data (denoted table); and the pro-         John is able to run the DIS process only if the appropri-
cessed data (denoted shared ) as sets of Attribute–Data pairs.    ate extracted data (extdata) have been added to the system
This is collected together in the schema DataStore:               (where PID = 1). The system will not allow John to alter
                                                                  instantiations of Privacy and Parameter that are associated
               b [ table : P (Attribute →
     DataStore =                          7 Data);                with a process nor the data that are produced.
                   shared : P (Attribute × Data) |
                   ∀ t1 , t2 : table • dom t1 = dom t2 ∧
                   ∀ s : shared • ∃ f : table • s ∈ f ]
                                                                  8.   ANALYSIS
   Here, the information pertaining to data subjects are ab-         A system interacts with its environment, which may in-
stracted as rows of data (in which table represents the total-    volve users of, and threats to, it. Interactions between a
ity of data being held by a data-owner) and the data that         system and these potential actors (directly or indirectly via
are to be processed (shared ) have been drawn from table.         external processes) can cause a system to transition from
   A PID value is mapped to an instantiation of Privacy.          one state to another. As a starting point for modelling sys-
The same PID value is used to define a permission for ac-         tems of increasing complexity, we consider a data-sharing
cess control that is associated with an instance of process-      situation in which the state-space is tractable.
ing. This same PID value is mapped to an instantiation of            We have given consideration to our smart meter scenario
Parameter that has been defined by a data controller. Then,       (in the context of data-sharing), the DPA (for privacy re-
processes produce data (extdata and/or rendata) based on          quirements), and our Z-based model of a system — both
Privacy and Parameter that are related to the same PID            its (privacy) specification (denoted SystemS ) and its imple-
value. The data are added to the system as a binary rela-         mentation (denoted SystemI ), with SystemI being a more
tion using the same PID value.                                    concrete representation of a system congruent with the spec-
   Our system comprises the schema ACCore, the schema             ification. In addition, we adapt the Basic Security Theorem
DataStore, and functions mapping elements of PID to ele-          of [4] (while being aware of its limitations, as discussed in,
ments of Privacy, Parameter , extdata and rendata, respec-        for example, [17]) so that: if a system satisfies our privacy
tively. (For the sake of brevity, we omit constraints.)           requirements in its initial state, and there is a guarantee that
     System =
            b                                                     every transition also ensures that our privacy requirements
         [ ACCore; DataStore;                                     are met, then we may conclude that all states will satisfy
           privacy : PID →7 Privacy;                              our privacy requirements.
           parameter : PID → 7 Parameter ;                           We abstracted the processing of personal data for disclo-
           extdata : PID →
                         7 P (Attribute × Data);                  sure in terms of the schema System, before and after an ex-
           rendata : PID → 7 P (Attribute × Data) ]               ecution of a process (represented as an operation). We rep-
                                                                  resented 26 such operations in terms of Z. Operations were
It is assumed that extdata and rendata have been produced         of two types: those that affected the variables in the schema
from shared , and a particular instantiation of Parameter         System and those that did not. In relation to requirements,
and data that are produced during processing (extdata and         predicates were defined for operations such that privacy was
rendata) are tied to a particular instantiation of Privacy (via   assured. By varying the values of inputs, the same operation
a PID).                                                           was used to model multiple data-sharing situations.
   Assume that Alice has added an instantiation of Privacy           As an example, consider the operation TestRenderData.
     TestRenderData =  b                                                         Model       States   Transitions
         [ ΞSystem;                                                             SystemS     614,385    2,382,109
           i? : PID; u? : User ; b! : Boolean;                                  SystemI     29,225      114,867
           t? : P (Attribute × Data) |
           i? 6= T
                 nullpid ∧
           i? ∈ {dom privacy,                                     Table 1: Total states and transitions covered by the
                     dom parameter , dom extdata} ∧               ProZ model checker.
           (parameter (i?)).tesparam 6= ∅ ∧
           (run, tes, i?) ∈ up (| {u?} |) ∧
                                                                  state space. The number of states and transitions covered
           t? 6= ∅ ∧
                                                                  by ProZ without finding deadlocks, invariant violations and
           t? ⊆ (parameter (i?)).tesparam ∧
                                                                  errors are presented in Table 1. Furthermore, SystemI was
           rendata (i?) ⊆ t? ⇒ b! = T ∧
                                                                  considered to be a trace refinement of SystemS by ProZ. The
           ¬ (rendata (i?)) ⊆ t? ⇒ b! = F ∧
                                                                  analysis found that (irrespective of data-sharing situation),
           privacy 0 = privacy ∧
                                                                  in the execution of operations, the system transitioned to
           parameter 0 = parameter ∧
                                                                  states that satisfied requirements for privacy. However, this
           extdata 0 = extdata ∧
                                                                  assurance of privacy is limited to the context (in terms of
           rendata 0 = rendata ]
                                                                  the state of a system): an individual’s privacy could still
   This tests the rendered data, leaving the state of the sys-    be violated even though a system, during the processing of
tem unchanged. We assume that testing is applied to data to       their data, never transitioned to an unwanted state (e.g., an
be disseminated. Preconditions to test rendata are: the PID       insider using their system authorisations for nefarious pur-
value is not null; the PID value appears in the domains of        poses or an attacker assuming the identity of an authorised
privacy, parameter and rendata; in relation to the instanti-      system user).
ation of Parameter , parameters for testing (tesparam) have          The assurance of privacy is highly dependent on the val-
been defined; and, in relation to the PID value, a user is        ues that are determined and set by a data controller for the
permitted to perform the action (to run the test) and access      attributes associated with Privacy and Parameter . In as-
the resource (TES process). In relation to the PID value          signing a permission (that consists of a PID value) to a role,
associated with the operation, the rendata are tested based       there has to be regard for the instantiations of Privacy and
on parameters constrained by tesparam. We abstract testing        Parameter that are associated with the PID value to sat-
as a comparison between the data being tested and param-          isfy privacy requirements. To minimise the need for manual
eters. Based on the result of testing the data, a user may        intervention, information related to privacy (e.g., a data sub-
decide whether or not to proceed with dissemination. (We          ject being informed, sensitive personal data, or time) has to
assume a type Boolean, with two elements: T and F .)              be expressed or annotated in such a manner that facilitates
   SystemS was the basis from which we derived an imple-          processing. Interactions (in particular, those that are recur-
mentation (or less abstract representation) of privacy. As an     rent) with data subjects in which a data controller incurs a
example, consider the operation AddParameter . The speci-         temporal obligation (e.g., data controller notifications, data
fication is the addition of schema Parameter to the system.       subject requests or data retention) has to be handled pri-
To limit the number of distinct combinations of parameters        marily by the system. In our model, we assume that a data
in the implementation, we constrained the operation further.      controller’s obligation is satisfied instantaneously following
(Again, for the sake of brevity, we provide the constraints in    a triggering system event.
SystemI , but not in SystemS ):
     AddParameter =b                                              9.    CONCLUSIONS
        [ ((p?.extparam 6= ∅ ∧ p?.disparam 6= ∅) ∧                   We have shown, in the spirit of Tschantz and Wing’s con-
             ((p?.renparam = ∅ ∧ p?.tesparam = ∅ ∧                tribution [29], how the judicious use of a formal model can
                p?.disparam ⊆ p?.extparam) ∨                      help in thinking about privacy. In doing so, we have paid
             (p?.renparam = ∅ ∧ p?.tesparam 6= ∅ ∧                attention to a particular problem: that of data-sharing.
                p?.extparam ⊆ p?.tesparam ∧                          There are three areas of further research that we intend to
                p?.disparam ⊆ p?.extparam) ∨                      pursue in the near future. First, we will use Z schema calcu-
             (p?.renparam 6= ∅ ∧ p?.tesparam = ∅ ∧                lus to model privacy in the context of systems of increasing
                p?.renparam ⊆ p?.extparam ∧                       complexity. Second, we intend constructing formal models
                p?.disparam ⊆ p?.renparam) ∨                      of threats. Finally, we intend modelling the aforementioned
             (p?.renparam 6= ∅ ∧ p?.tesparam 6= ∅ ∧               Privacy Rule of HIPAA, the privacy requirements of which
                p?.tesparam ⊆ p?.extparam ∧                       have been used by a number of authors to validate proposed
                p?.renparam ⊆ p?.tesparam ∧                       approaches in the context of data-sharing.
                p?.disparam ⊆ p?.renparam))) ]
  ProZ was applied to our model to analyse privacy (in            10.    REFERENCES
terms of satisfying requirements). Specifically, the ProZ an-      [1] J.-R. Abrial. The B-Book: Assigning Meanings to
imator was used to verify (on a step-by-step basis) that each          Programs. Cambridge University Press, 1996.
state satisfied its privacy requirements. To analyse all reach-    [2] I. Agrafiotis, S. Creese, M. Goldsmith, and
able states, ProZ’s model and refinement checkers were used.           N. Papanikolaou. Applying formal methods to detect
However, a lower cardinality for variable values (than that            and resolve ambiguities in privacy requirements. In
used with the animator) and a higher level of abstraction              S. Fischer-Hübner, P. Duquenoy, M. Hansen,
for access control had to be applied to support a tractable            R. Leenes, and G. Zhang, editors, Privacy and
     Identity Management for Life, volume 352 of IFIP               framework for the analysis of architectural choices. In
     Advances in Information and Communication                      Proceedings of the 3rd ACM conference on Data and
     Technology, pages 271–282, 2011.                               Application Security and Privacy (CODASPY 2013),
 [3] A. Barth, A. Datta, J. Mitchell, and H. Nissenbaum.            pages 95–104. ACM, 2013.
     Privacy and contextual integrity: Framework and           [17] J. McLean. A comment on the “basic security
     applications. In Proceedings of the 2006 IEEE                  theorem” of Bell and LaPadula. Information
     Symposium on Security and Privacy (SP 2006), pages             Processing Letters, 20(2):67–70, 1985.
     184–198. IEEE, 2006.                                      [18] Q. Ni, E. Bertino, and J. Lobo. An obligation model
 [4] D. E. Bell and L. J. La Padula. Secure computer                bridging access control policies and privacy policies. In
     systems: Mathematical foundations, MTR-2547, Vol.              Proceedings of the 13th ACM Symposium on Access
     I. Technical Report ESD-TR-73-278-I, The MITRE                 Control Models and Technologies (SACMAT 2008),
     Corporation, Bedford, March 1973.                              pages 133–142. ACM, 2008.
 [5] I. Brown. Britain’s smart meter programme: A case         [19] Q. Ni, D. Lin, E. Bertino, and J. Lobo. Conditional
     study in privacy by design. International Review of            privacy-aware role based access control. In J. Biskup
     Law, Computers & Technology, 28(2):172–184, 2014.              and J. López, editors, Proceedings of the 12th
 [6] A. Cavoukian. Privacy by design: The 7 foundational            European Symposium On Research In Computer
     principles. Information and Privacy Commissioner of            Security (ESORICS 2007), volume 4734 of Lecture
     Ontario, Canada, 2009.                                         Notes in Computer Science, pages 72–89. Springer,
 [7] O. Chowdhury, A. Gampe, J. Niu, J. von Ronne,                  2007.
     J. Bennatt, A. Datta, L. Jia, and W. H. Winsborough.      [20] P. A. Norberg, D. R. Horne, and D. A. Horne. The
     Privacy promises that can be kept: A policy analysis           privacy paradox: Personal information disclosure
     method with application to the HIPAA Privacy Rule.             intentions versus behaviors. Journal of Consumer
     In Proceedings of the 18th ACM Symposium on Access             Affairs, 41(1):100–126, 2007.
     Control Models and Technologies (SACMAT 2013),            [21] D. Plagge and M. Leuschel. Validating Z specifications
     pages 3–14. ACM, 2013.                                         using the ProB animator and model checker. In
 [8] C. Davies and R. Collins. Balancing potential risks            J. W. M. Davies and J. Gibbons, editors, Integrated
     and benefits of using confidential data. The British           Formal Methods, volume 4591 of Lecture Notes in
     Medical Journal, 333(7563):349–351, 2006.                      Computer Science, pages 480–500. Springer, 2007.
 [9] H. DeYoung, D. Garg, L. Jia, D. Kaynar, and               [22] D. J. Power, M. A. Slaymaker, and A. C. Simpson. On
     A. Datta. Experiences in the logical specification of          formalizing and normalizing role-based access control
     the HIPAA and GLBA privacy laws. In Proceedings of             systems. The Computer Journal, 52(3):305–325, 2009.
     the 9th Annual ACM Workshop on Privacy in the             [23] L. Sankar, S. Rajagopalan, S. Mohajer, and H. Poor.
     Electronic Society (WPES 2010), pages 73–82. ACM,              Smart meter privacy: A theoretical framework. IEEE
     2010.                                                          Transactions on Smart Grid, 4(2):837–846, 2013.
[10] C. Dwork. Differential Privacy. In M. Bugliesi,           [24] C. Skinner. Statistical disclosure risk: Separating
     B. Preneel, V. Sassone, and I. Wegener, editors,               potential and harm. International Statistical Review,
     Proceedings of the 33rd International Colloquium on            80(3):349–368, 2012.
     Automata, Languages and Programming (ICALP                [25] D. J. Solove. Conceptualizing privacy. California Law
     2006), part II, volume 4052 of Lecture Notes in                Review, 90(4):1087–1155, 2002.
     Computer Science, pages 1–12. Springer, 2006.             [26] J. M. Spivey. The Z Notation: a Reference Manual.
[11] D. Ferraiolo, D. R. Kuhn, and R. Chandramouli.                 Prentice Hall, 2nd edition, 1992.
     Role-Based Access Control. Artech House, 2003.            [27] S. Suriadi, C. Ouyang, J. Smith, and E. Foo. Modeling
[12] X. Fu. Conformance verification of privacy policies. In        and verification of privacy enhancing protocols. In
     M. Bravetti and T. Bultan, editors, Proceedings of the         Proceedings of the 11th International Conference on
     7th International Workshop on Web Services and                 Formal Engineering Methods (ICFEM 2009), volume
     Formal Methods (WS-FM 2010), volume 6551 of                    5885 of Lecture Notes in Computer Science, pages
     Lecture Notes in Computer Science, pages 86–100.               127–146. Springer, 2009.
     Springer, 2011.                                           [28] L. Sweeney. k-anonymity: A model for protecting
[13] B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu.                 privacy. International Journal on Uncertainty,
     Privacy-preserving data publishing: A survey of recent         Fuzziness and Knowledge-based Systems,
     developments. ACM Computing Surveys,                           10(5):557–570, 2002.
     42(4):14:1–14:53, 2010.                                   [29] M. C. Tschantz and J. M. Wing. Formal methods for
[14] ISO/IEC. ISO/IEC 13658: Information Technology —               privacy. In A. Cavalcanti and D. Dams, editors,
     Z Formal Specification Notation — Syntax, Type                 Proceedings of the 2nd World Congress on Formal
     System and Semantics. ISO/IEC, 2002.                           Methods (FM 2009), volume 5850 of Lecture Notes in
[15] D. Kouzapas and A. Philippou. A typing system for              Computer Science, pages 1–15. Springer, 2009.
     privacy. In S. Counsell and M. Núñez, editors,          [30] J. C. P. Woodcock, P. G. Larsen, J. Bicarregui, and
     Proceedings of Software Engineering and Formal                 J. Fitzgerald. Formal methods: Practice and
     Methods (SEFM) 2013 Collocated Workshops, Lecture              experience. ACM Computing Surveys, pages
     Notes in Computer Science, pages 56–68. Springer,              19:1–19:36, 2009.
     2014.
[16] D. Le Métayer. Privacy by design: A formal