=Paper=
{{Paper
|id=Vol-1558/paper34
|storemode=property
|title=Formal Models for Privacy
|pdfUrl=https://ceur-ws.org/Vol-1558/paper34.pdf
|volume=Vol-1558
|authors=Alan Abe,Andrew Simpson
|dblpUrl=https://dblp.org/rec/conf/edbt/AbeS16
}}
==Formal Models for Privacy==
Formal Models for Privacy Alan Abe and Andrew Simpson Department of Computer Science, University of Oxford Wolfson Building, Parks Road, Oxford OX1 3QD United Kingdom Alan.Abe@cs.ox.ac.uk, Andrew.Simpson@cs.ox.ac.uk ABSTRACT Second, individuals will often claim that privacy is impor- The concept of privacy is becoming increasingly important tant to them, yet, on the other hand, will exhibit behaviours in all of our lives. Unfortunately, however, it is a rather neb- that would indicate that the opposite is true: the so-called ulous concept; further, many claim that they consider pri- privacy paradox [20]. Third, the disconnect between high- vacy to be important, yet undertake behaviour that would level requirements, rules and guidelines and their low-level suggest otherwise — the so-called privacy paradox. As tech- implementations can be significant — meaning that it is of- nology becomes more pervasive, the need for assurances that ten difficult for software engineers and database administra- individuals’ privacy is not compromised by that technology tors to be confident that their processes and systems behave necessarily increases. In this paper, we argue that formal as intended. Fourth, there are often trade-offs — between methods have a role to play in helping to provide assurances privacy and utility, for example — to be made. It is clear, of privacy in a variety of contexts. As an illustration, we give therefore, that the appropriate use of formal methods has consideration to a particular scenario: that of data sharing. much to offer in this context. In [29], Tshantz and Wing argue that “privacy raises new challenges, and thus new research opportunities, for the for- Keywords mal methods community”. “Privacy-specific needs” — such Formal methods; privacy; Z notation as statistical / quantitative reasoning and conflicting re- quirements in the context of trustworthy computing — are 1. INTRODUCTION identified, with “traditional tools of the trade” — such as The ubiquity of computing technologies that can gather, formal policy languages, abstraction and refinement, and store and share personal information has reinvigorated de- code-level analysis — suggested as solutions. They further bates surrounding an age-old societal dilemma — balanc- argue: “It is our responsibility as scientists and engineers ing the notions of individual privacy and the common good, to understand what can or cannot be done from a technical which can emerge from the knowledge gained by analysing point of view on privacy . . . Otherwise, society may end information based not on the individual, but on the ag- up in a situation where privacy regulations put into place gregation of information. For example, aggregating infor- are technically infeasible to meet” [29]. We pick up that ba- mation collected in the course of patient care can give rise ton and give consideration to how formal methods can be to study populations of sufficient size and heterogeneity in applied to the modelling and analysis of privacy in the con- which well-designed secondary uses have the potential to in- text of data-sharing. (For clarification, by ‘formal methods’ vestigate research questions that could not be pursued via we mean the application of techniques such as Z [14] and the traditional route of randomised controlled trials. For B [1], as opposed to the consideration of models for pro- it is through the understanding gained from such evidence- tecting privacy such as k -Anonymity [28] and Differential based knowledge of diseases and interventions that policies Privacy [10].) There is a long tradition of formal methods and strategies to afford effective protection to the health being applied to ‘real world’ problems (see, for example, the of communities and improve the quality of human life can survey of [30]); this contribution is in that spirit. emerge. However, there is the potential for harm should the To support reasoning about privacy, we consider a broad inappropriate disclosure or use of such information compro- range of system–environment interactions that can cause a mise an individual’s privacy. system to transition to an unwanted state. We use Z [14] to Unfortunately, there is a great deal of ‘fuzziness’ surround- describe the model and ProZ [21] to analyse it. Z semantics, ing privacy. First, defining the term is notoriously hard [25]. which are based on logic and set theory, enable privacy to be modelled from the perspective of “data in a system” [26]. ProZ mechanically validates Z specifications: preservation of an invariant (as related to privacy) and the refinement of one specification (of privacy) by another. Thus, our metric for privacy is a binary result (rather than being probabilistic). We aim to address the gap between the high-level require- c 2016, Copyright is with the authors. Published in the Workshop Pro- ments (derived from, for example, privacy laws and regula- ceedings of the EDBT/ICDT 2016 Joint Conference (March 15, 2016, Bor- deaux, France) on CEUR-WS.org (ISSN 1613-0073). Distribution of this tions) that are described in natural language and their im- paper is permitted under the terms of the Creative Commons license CC- plementations. We utilise the UK’s Data Protection Act by-nc-nd 4.0 1998 (DPA)1 and the approaches to protecting data privacy that have been proposed in the literature to inform our in- terpretation of privacy. It is important to note that we do not define privacy per se; rather, we show how a formal approach can be used to characterise it. 2. MOTIVATION Advances in IT have motivated the promulgation of laws and regulations aimed at strengthening privacy protections for the handling of personal data. Implications for secondary uses (in particular, in the area of medical research) due to the aforementioned DPA in the UK and the Privacy Rule Figure 1: An abstraction of disclosure-processing. under the Health Insurance Portability Accountability Act of 1996 (HIPAA)2 in the USA have been debated (see, for example, [8]). Despite stricter privacy protections, the DPA 1996 [3, 7, 9]. Elements of a permission (action and re- and HIPAA do allow for circumstances in which personal source) for role-based access control (P-RBAC) have been data may be used or shared without the consent of indi- combined with elements of privacy (e.g., purpose, condi- viduals to include the appropriate de-identification of data. tions, and obligations) in [18, 19]. Our contribution uses The DPA regulates the processing of personal data in their a formal method to characterise and reason about privacy entirety, of which data de-identification is just one aspect. (requirements based on laws and regulations) in the con- The implication is that the scope of a data-owner’s obliga- text of its characterisation in terms of, for example, design tion to privacy protection is not limited to the de-identified or code, thereby bridging the gap between abstract notions data themselves, but also includes their processing. and concrete representations of privacy. De-identification is the path envisioned by many to sat- isfy privacy requirements, and numerous privacy-preserving methods founded on varying conceptualisations of privacy 4. DISCLOSURE-PROCESSING and utility of de-identified data have been proposed (see, for Our modelling of privacy is motivated by the disclosure of example, [13]). Empirical validations of these methodolo- data for secondary uses in which several privacy-preserving gies are typically based on a high-level abstraction of the methods have been proposed. The processing of personal underlying systems that would perform the processing — data begins with their extraction from a source (or sources), often, centrally by a single data controller. Further, con- and ends with their release in an appropriate form. siderations of threats have typically been underpinned by We initially abstract disclosure-processing as a unitary the following characterisation: intruders (characterised as a system of related processes: we assume processes are in- “(hypothetical) user who ‘misuses’ the outputs” to “disclose stantiated singularly and internally in relation to a data information about one of the data subjects” [24]) obtain de- controller. It may be argued that such a unitary system is identified data, obtain relevant auxiliary information, and unrealistic — in that it is unlikely that a relatively simplis- then leverage the auxiliary information to compromise the tic system would be able to effectively or efficiently process privacy of data subjects. However, processing of data is of- large amounts of data, dispersed data, or data involving the ten distributed across multiple systems and organisations, collaboration of several data controllers. However, its sim- and, in some cases, across national boundaries. This com- plicity allows us to establish an abstraction that we can use plexity can introduce uncertainties as to whether a partic- to model processing; further, it serves as a starting point for ular privacy-preserving method can be implemented in a subsequently thinking about more complicated situations. manner that maintains confidentiality and privacy. Instead Our system is comprised of five processes (as per Figure 1): of attacking the shared de-identified data, an intruder may find it ‘easier’ to attack the systems or people that produce 1. Parameterisation (PAR). A data controller determines them. Thus, we give consideration to how formal models the parameters that guide each aspect of processing: can help in the data release process. extraction (extparam), rendering (renparam), testing (tesparam), and dissemination (disparam). In certain instances (such as an interactive mode of disclosure), 3. RELATED WORK a data controller may permit, on a controlled basis, a Contributions that have applied formal approaches to pri- data user to determine certain parameters. If auxiliary vacy include: [27], which is concerned with the modelling information is used in the evaluation of data, a data and verification of privacy-enhancing protocols; [2], which is controller creates or obtains the relevant information. concerned with detecting and resolving ambiguities in pri- The primary inputs are the processing parameters and, vacy requirements; [12], which is concerned with the ver- where appropriate, auxiliary information that are also ification of privacy analysis; [16], which presents a formal its outputs to other processes. framework for privacy by design [6]; and [15], which presents a typing system for privacy. Logic-based techniques have 2. Extraction (EXT ). From a designated source, data been applied to, in particular, the Privacy Rule of HIPAA that possess certain characteristics or the results of an applied workload are extracted. The primary inputs 1 are the personal data and extparam to control the ex- http://www.legislation.gov.uk/ukpga/1998/29/data. pdf traction process that may include the location of the 2 http://www.hhs.gov/ocr/privacy/ data source, the characteristics of the data to be ex- tracted, the workload to be applied, or the method privacy regulators, and to privacy campaigners, journalists, of extraction. The primary outputs are the personal and members of the public” [5]. It is an appropriate scenario data that have been extracted (extdata) or the results for illustrating our approach for a variety of reasons. First, of a workload (extracted information) that has been certain smart data may be considered to be personal data. applied to the personal data. Second, there is significant potential for secondary uses of the data. Third, it is an active area of privacy research [23]. 3. Rendering (REN ). The extracted data or information We consider the disclosure-processing of customers’ per- may then be transformed via a rendering method into sonal information and utility consumption data collected a form determined by a data controller to be appro- via smart meters. To this end, we make the following as- priate for release. The primary inputs are the extdata sumptions. First, smart meters are capable of gathering or extracted information and renparam to control the fine-grained information about a customer’s consumption of rendering process that may include the method of ren- utilities. Second, smart meters support two-way communi- dering, the intensity of the method’s application, or cations: data gathered by a smart meter can be transmitted the data characteristics involved. The primary out- to a utility provider and a utility provider can send data puts are the data (rendata) or information that have (e.g., instructions) to the device. Third, it has been de- been rendered into a different form. termined that smart meter readings are to be handled as 4. Testing (TES ). Data or information may be evalu- personal data. Finally, a customer’s personal information ated via the employment of certain tests (pertaining and smart meter readings are held in accordance with legal to privacy and/or data utility) as determined by a and regulatory requirements. data controller. Such testing can support a data con- Customers of Smart Meter Utilities (SMU) can subscribe troller in making a reasoned decision about the release to one or more of electricity, gas and water. Customers’ of data for dissemination. The primary inputs are the consumption data are collected via smart meters that are data (extdata or rendata) or information (extracted or capable of collecting data about consumption by type of rendered) to be evaluated, additional data (where ap- utility, by type of device, and by time. Periodically, SMU propriate) to support the evaluation (auxiliary infor- sends customers consumption information, charges that are mation or extdata), and tesparam to control the test- associated with their consumption, and the amount to be ing process that may include the testing methods, the deducted from their bank accounts (by contractual agree- metrics, the data characteristics involved, or the use of ment). Customers can phone SMU’s customer service cen- other data. The primary outputs are the results of the tre to ask questions about offered services, their accounts, tests that have been applied to the data or informa- or to report problems. For a reported problem, a technician tion. Tests of privacy may involve the use of auxiliary is dispatched to carry out repairs. Marketing information information, in particular, to characterise the risk of about products and services is sent to customers. There re-identification based on the likelihood of establish- is keen interest from government, academia, non-profit or- ing links between the auxiliary information and the ganisations and utility-related businesses to analyse SMU’s rendata; tests of data utility may involve the use of utility consumption data. extdata in a comparison with the rendata in which the same workload is performed on both. 6. THE UK’S DATA PROTECTION ACT Privacy regulations differ across the globe, typically influ- 5. Dissemination (denoted DIS ). Data or information enced by political and cultural factors. They are often im- that are considered appropriate for release are then posed in reaction to pressures from the populace and thereby disseminated via a method determined by a data con- strongly influence the practices of data controllers in the troller. The primary inputs are the data (extdata or handling of personal data. The ‘appropriateness’ (or lack rendata) or information (extracted or rendered) to be thereof), concerning privacy measures undertaken by a data disseminated and disparam to control the dissemina- controller will often be litigated. tion that may include mode of dissemination or desti- We consider the UK’s Data Protection Act (DPA) as the nation for the transmission of the data. legal framework to guide our work. Privacy, however, is nei- ther defined nor explicitly characterised as such in the DPA. The rendering and testing processes can be repeated until As exemptions from and contraventions to the Act are spec- the form in which the data are rendered satisfy criteria for ified (primarily) in terms of the data protection principles their release. This can be achieved by using different ren- (DPPs)3 and provisions related to the rights of data sub- dering methods, the same method with different parameters, jects,4 it may be argued that the DPPs and rights of data or different evaluation criteria. If it is determined that the subjects can inform our interpretation of the DPA (in terms rendering of data into an acceptable form is unattainable, of privacy). The DPPs, as guiding principles for data con- processing can be terminated without dissemination. trollers, prescribes that personal data shall be:5 5. A SCENARIO 1. processed fairly and lawfully; 2. obtained for specified and lawful purposes and further Our fictional scenario is motivated by the UK’s imple- processed only in a manner compatible with those pur- mentation of a smart meter programme. In assessing the poses; pertinent issues, Brown [5] argues that: “Because smart me- 3 ters can collect and share detailed information about en- DPA 1998, Sch 1. 4 ergy use and hence household life, their impact on privacy DPA 1998, Pt II. 5 has become a high-profile matter of interest to energy and DPA 1998, Sch 1, Pt I, para 1 to 8. 3. adequate, relevant and not excessive for the purposes 1st for which they are processed; DPP Privacy 4. accurate and kept up to date (where necessary); 2nd DPP 5. kept no longer than is necessary for the purposes for which they are processed; 3rd 8th 6. processed in accordance with the rights of data sub- Parameter DPP DPP jects that have been stipulated in the enactment; 7. protected by the use of appropriate measures (techni- 5th 7th DPP DPP cal and organisational); and System 8. restricted to being processed in the European Eco- 6th DPP nomic Area unless adequate protections for the rights and freedoms of data subjects can be ensured. Figure 2: Mapping relevant DPPs. Responsibility resides with a data controller, “who (either alone or jointly or in common with other persons) determines the purposes for which and the manner in which any per- sonal data are, or are to be, processed”.6 It is with regards about the processing of personal data by a system in a re- to purposes (which we assume are to be specified and lawful) quired manner (such as in accordance with the DPA). We on which compliance with most of the DPPs hinges: ‘fair’ next consider the application of Z [14] to model a system and ‘lawful’ processing, in particular, whether data subjects associated with the release of data in accordance with its have been informed or provided consent where appropriate encompassing regulatory environment. (related to the 1st DPP); obtaining or further processing of data (the 2nd); characteristics of the data (the 3rd); accu- racy (the 4th); retention period (the 5th); and violations of 7. A FORMAL MODEL the rights of data subjects (the 6th). Provisions related to the rights of data subjects (in Part II) 7.1 A Notion of Privacy delineate certain actions of stakeholders and prescribe con- We consider informational privacy — whereby individuals ditions under which they should or should not occur. Con- control information about themselves and are able to deter- sequently, we can abstract them in terms of the behaviours mine how it is communicated. We interpret (on the basis of (or interactions with) a system and represent prescribed of context) the determination by a data controller of ‘pur- conditions as constraints on those behaviours (or interac- poses’ and ‘manner’ as specifications. We consider purposes tions). The aspects of interactions in common are: a trig- to be the aspect of privacy that its (privacy) specification gering event (e.g., a notification in writing or the processing hinges on; the manner is combined with purposes to de- of personal data) initiated by a either a data subject or data lineate processing with sufficient granularity. In Figure 2, controller; a transfer of information (that is abstracted as a the DPPs (except for the 4th, which we consider as an as- data object) in which its content and form are prescribed; sumption) are mapped to their representation in our model. and prescribed conditions under which a particular action is Implementations of Privacy and Parameter are represented or is not performed, such as the amount of time (to include as schemas. The System is associated with either functional a start and duration) a data controller has to respond to a or structural aspects of our model, in particular: access con- request for information from a data subject. trol (related to the 7th DPP); retention of the data that are The de-identification of personal data is not explicitly ad- produced (related to the 5th DPP); and instantiations of dressed in the DPA (although a code of practice on anonymi- Privacy and Parameter . sation exists7 ). The crux of the DPA, it may be argued, is In relation to a purpose, the 1st DPP (and related aspects the notion of personal data, and, as such, the applicability of the DPA) provide us with two other types upon which to of the enactment to data depends on whether the data in construct a notion of Privacy: a data subject being informed question are considered to be personal data. It follows that and the provision of consent by, or on behalf of, a data the inflection point at which personal data are no longer subject. We start by introducing these types. subject to the DPA is, then, the point at which the data [Purpose, Inform, Consent] are processed into a form that is no longer considered to be ‘personal’. We can surmise that data can be considered We consider purpose to mean “the reason for which some- ‘non-personal’ when data subjects can no longer be identi- thing is done”9 and we interpret ‘something’ to mean the fied from the data themselves or in combination with other disclosure-processing of personal data. We assume that, for information likely to be obtained by a data controller.8 individuals to have consented, they would have had to have Our shared interest with data controllers and software en- been adequately informed (about the processing) prior to its gineers is in developing an approach to support reasoning provision. We do not distinguish between a consent that has 6 been explicitly or implicitly given, but, rather, that a data Data Protection Act 1998, pt I, s 1(1). controller has determined a data subject has consented to 7 https://ico.org.uk/media/for-organisations/ certain processing. The granularity of the characterisation of documents/ 1061/anonymisation-code.pdf 9 www.oxforddictionaries.com/definition/english/ 8 Data Protection Act 1998, pt I, s 1(1). purpose each variable should be sufficiently descriptive as to provide, beginning to an end, constitutes an instance of processing. in particular, data subjects with an appropriate level of un- Then, privacy (from beginning to end) ought to be defined derstanding of: the reasons that underpin the processing of in relation to each instance of processing. Our notion of pri- their data; what a data subject has been informed about in vacy is based on three factors related to: the privacy rights relation to processing; and what they have consented to be that are afforded to data subjects by laws and regulations; done with their data. The higher the granularity with which the manner in which personal data are processed; and the Purpose, Inform and Consent are described by a data con- data that are produced as the result of processing. troller, the more likely it is that processing will be consistent We abstract privacy as a set of constraints on a system with legal and regulatory requirements for privacy. to keep it from transitioning to an unwanted state (that We consider a parameter to represent “a numerical or violates privacy). To formally represent these constraints, other measurable factor forming one of a set that defines we consider: Purpose, Inform and Consent to abstract the a system or sets the conditions of its operation”10 . In par- rights of data subjects; the notion of Parameter to abstract ticular, we consider a parameter to be the condition under the manner in which data are processed; and, from our which processing is carried out: a parameter is determined abstraction of disclosure-processing, the data that are pro- in relation to a particular process. This corresponds to the duced as extdata and rendata. To this end, we introduce the primary outputs of the PAR process of Section 4. We ab- schema Privacy: stract a parameter as a tuple formed by elements: the type Privacy = b [ purpose : Purpose; of information (Attribute) and its value (Data). inform : Inform; consent : Consent ] [Attribute, Data] In Section 7.1, we abstracted parameters as sets of tuples The types of information that may be described as pa- formed by two basic types: Attribute and Data. Conse- rameters are as follows. quently, parameters are captured by the schema Parameter : Parameter = b [ extparam, renparam, tesparam, • Data recipients are persons or entities to whom the disparam : P (Attribute × Data) ] data are to be shared. We do not consider a person who is able to access the data during processing as a Our construct (with respect to a disclosure-processing in- recipient, but, instead, as a user in the context of an stance) is formed by linking Privacy and Parameter in- access control mechanism. stances, as well as the data (extdata and rendata) via a process identifier (denoted PID). The same PID links the • Data characteristics are considered as inputs to, and privacy construct to an access control mechanism. Thus, the outputs of, a particular process. From an input, a pro- privacy construct and its implementation have to be consid- cess under certain conditions then produces an output. ered in context. The data characteristics of the required output are de- scribed implicitly as a set of conditions on the process. 7.3 A System As sensitive personal data pertain to certain types of Our model is in three parts: an access control mecha- information (e.g., racial or ethnic origin), we assume nism to support an implementation of privacy (ACCore); that the data have been recorded in a manner that the source of personal data that are to be processed for re- facilitates their identification by a process. lease (DataStore); and the relation of a process identifier • Processing methods are mechanisms by which the data to an instantiation of the privacy construct and the data are processed. Where there are choices, a data con- produced during processing. troller determines the mechanism or multiple mecha- As privacy can be characterised as a constraint on process- nisms that are applied during processing. ing and security is a constraint on the access to a process, an access control mechanism can provide a platform for the • Data privacy and utility are abstracted on the basis of implementation of privacy. We extend the formal model of the processing conditions that have produced the data role-based access control [11] of [22] by incorporating the or the results of tests applied to the data. The re- notion of privacy as an element of a permission. quirements for privacy and utility of de-identified data We consider three basic types: Action, Resource, and a are described as parameters such that: processing pro- set of unique identifiers, PID: duces data in which certain attributes are absent (e.g., [ Action, Resource, PID ] the Safe Harbor method11 ); an appropriate (as deter- mined by a data controller) privacy (and/or utility) pa- Permissions combine elements of these types: rameter (e.g., k = 10) related to a rendering algorithm Perm == Action × Resource × PID (e.g., k -Anonymity [28]) is satisfied during processing; or results of data testing (privacy and/or utility) sat- We introduce two further types: User and Role. isfy criteria (as determined by a data controller). [ User , Role ] 7.2 A Construct of Privacy The schema ACCore captures an RBAC policy: Disclosure-processing begins with the extraction of per- ACCore = b sonal data from a source and ends with or without their [ user : P User ; role : P Role; disclosure. This sequence of events (or processes), from a perm : P Perm; ur : User ↔ Role; 10 www.oxforddictionaries.com/definition/english/ rp : Role ↔ Perm; up : User ↔ Perm | parameter ur ∈ user ↔ role ∧ rp ∈ role ↔ perm ∧ 11 Privacy Rule of HIPAA 1996 §164.514(b)(2). up = ur o9 rp ] Here: user , role and perm represents the current set of (where PID = 1) to the system for billing. The EXT and users, the current set of roles and the current set of permis- DIS processes are involved in disclosure-processing. Fur- sions, respectively; ur captures associations of users with as- ther assume that Alice has used the same process identi- signed roles; rp captures associations of roles with assigned fier to add to the system, as an instantiation of Parameter permissions; and up captures associations of users with their (where PID = 1), the parameters to constrain disclosure- permissions. For a user to have access (via a permission) to processing. John has been assigned the (run, ext, nullpid ) an aspect of the system, the user has to have been assigned permission, which allows him access to more personal data a role that is associated with that permission. about a customer than would be required to process data Alice, in her role as a data controller, has been assigned for billing. Based on this, Alice has decided that John can the permission (add , par , nullpid ). She can access the PAR assume the added responsibility for billing. Although John process and add instantiations of Privacy to the system. would gain access to an additional process (DIS ), Alice be- John, in his role as a Customer Service Representative, has lieves that the permission would be sufficiently restrictive. been assigned the permission (run, ext, nullpid ). To perform A new role is created for billing and is assigned the permis- his duties, he is able to access most of a customer’s personal sions (add , ext, 1) and (run, dis, 1). This new role is then data. Let us assume that Alice has defined an instantia- assigned to John. Now let us assume that users are granted tion of the schema Privacy for the disclosure-processing of access to processes (or resources) only if permissions that billing information that she has associated with a PID of 1. have been assigned to them (via roles) match criteria that Although John is able to access most of a customer’s per- have been defined for those processes. For billing, John sonal data, he is not able to run the EXT process based on would be allowed to: access the EXT process; from a source, parameters where PID = 1. extract the relevant personal data based on constraints pre- Personal data (in the form of microdata) that have been scribed by Privacy (where PID = 1) and Parameter (where obtained and recorded are often held in a relational database. PID = 1); add the output of extraction to the system in A particular characteristic of an individual is abstracted as which PID = 1 is mapped to extdata; then, access the DIS an Attribute–Data pair. Our abstraction of a source of per- process; and disseminate the extdata, based on constraints sonal data for disclosure-processing is twofold: a relational that are prescribed by Parameter (where PID = 1). Here, database as a table of data (denoted table); and the pro- John is able to run the DIS process only if the appropri- cessed data (denoted shared ) as sets of Attribute–Data pairs. ate extracted data (extdata) have been added to the system This is collected together in the schema DataStore: (where PID = 1). The system will not allow John to alter instantiations of Privacy and Parameter that are associated b [ table : P (Attribute → DataStore = 7 Data); with a process nor the data that are produced. shared : P (Attribute × Data) | ∀ t1 , t2 : table • dom t1 = dom t2 ∧ ∀ s : shared • ∃ f : table • s ∈ f ] 8. ANALYSIS Here, the information pertaining to data subjects are ab- A system interacts with its environment, which may in- stracted as rows of data (in which table represents the total- volve users of, and threats to, it. Interactions between a ity of data being held by a data-owner) and the data that system and these potential actors (directly or indirectly via are to be processed (shared ) have been drawn from table. external processes) can cause a system to transition from A PID value is mapped to an instantiation of Privacy. one state to another. As a starting point for modelling sys- The same PID value is used to define a permission for ac- tems of increasing complexity, we consider a data-sharing cess control that is associated with an instance of process- situation in which the state-space is tractable. ing. This same PID value is mapped to an instantiation of We have given consideration to our smart meter scenario Parameter that has been defined by a data controller. Then, (in the context of data-sharing), the DPA (for privacy re- processes produce data (extdata and/or rendata) based on quirements), and our Z-based model of a system — both Privacy and Parameter that are related to the same PID its (privacy) specification (denoted SystemS ) and its imple- value. The data are added to the system as a binary rela- mentation (denoted SystemI ), with SystemI being a more tion using the same PID value. concrete representation of a system congruent with the spec- Our system comprises the schema ACCore, the schema ification. In addition, we adapt the Basic Security Theorem DataStore, and functions mapping elements of PID to ele- of [4] (while being aware of its limitations, as discussed in, ments of Privacy, Parameter , extdata and rendata, respec- for example, [17]) so that: if a system satisfies our privacy tively. (For the sake of brevity, we omit constraints.) requirements in its initial state, and there is a guarantee that System = b every transition also ensures that our privacy requirements [ ACCore; DataStore; are met, then we may conclude that all states will satisfy privacy : PID →7 Privacy; our privacy requirements. parameter : PID → 7 Parameter ; We abstracted the processing of personal data for disclo- extdata : PID → 7 P (Attribute × Data); sure in terms of the schema System, before and after an ex- rendata : PID → 7 P (Attribute × Data) ] ecution of a process (represented as an operation). We rep- resented 26 such operations in terms of Z. Operations were It is assumed that extdata and rendata have been produced of two types: those that affected the variables in the schema from shared , and a particular instantiation of Parameter System and those that did not. In relation to requirements, and data that are produced during processing (extdata and predicates were defined for operations such that privacy was rendata) are tied to a particular instantiation of Privacy (via assured. By varying the values of inputs, the same operation a PID). was used to model multiple data-sharing situations. Assume that Alice has added an instantiation of Privacy As an example, consider the operation TestRenderData. TestRenderData = b Model States Transitions [ ΞSystem; SystemS 614,385 2,382,109 i? : PID; u? : User ; b! : Boolean; SystemI 29,225 114,867 t? : P (Attribute × Data) | i? 6= T nullpid ∧ i? ∈ {dom privacy, Table 1: Total states and transitions covered by the dom parameter , dom extdata} ∧ ProZ model checker. (parameter (i?)).tesparam 6= ∅ ∧ (run, tes, i?) ∈ up (| {u?} |) ∧ state space. The number of states and transitions covered t? 6= ∅ ∧ by ProZ without finding deadlocks, invariant violations and t? ⊆ (parameter (i?)).tesparam ∧ errors are presented in Table 1. Furthermore, SystemI was rendata (i?) ⊆ t? ⇒ b! = T ∧ considered to be a trace refinement of SystemS by ProZ. The ¬ (rendata (i?)) ⊆ t? ⇒ b! = F ∧ analysis found that (irrespective of data-sharing situation), privacy 0 = privacy ∧ in the execution of operations, the system transitioned to parameter 0 = parameter ∧ states that satisfied requirements for privacy. However, this extdata 0 = extdata ∧ assurance of privacy is limited to the context (in terms of rendata 0 = rendata ] the state of a system): an individual’s privacy could still This tests the rendered data, leaving the state of the sys- be violated even though a system, during the processing of tem unchanged. We assume that testing is applied to data to their data, never transitioned to an unwanted state (e.g., an be disseminated. Preconditions to test rendata are: the PID insider using their system authorisations for nefarious pur- value is not null; the PID value appears in the domains of poses or an attacker assuming the identity of an authorised privacy, parameter and rendata; in relation to the instanti- system user). ation of Parameter , parameters for testing (tesparam) have The assurance of privacy is highly dependent on the val- been defined; and, in relation to the PID value, a user is ues that are determined and set by a data controller for the permitted to perform the action (to run the test) and access attributes associated with Privacy and Parameter . In as- the resource (TES process). In relation to the PID value signing a permission (that consists of a PID value) to a role, associated with the operation, the rendata are tested based there has to be regard for the instantiations of Privacy and on parameters constrained by tesparam. We abstract testing Parameter that are associated with the PID value to sat- as a comparison between the data being tested and param- isfy privacy requirements. To minimise the need for manual eters. Based on the result of testing the data, a user may intervention, information related to privacy (e.g., a data sub- decide whether or not to proceed with dissemination. (We ject being informed, sensitive personal data, or time) has to assume a type Boolean, with two elements: T and F .) be expressed or annotated in such a manner that facilitates SystemS was the basis from which we derived an imple- processing. Interactions (in particular, those that are recur- mentation (or less abstract representation) of privacy. As an rent) with data subjects in which a data controller incurs a example, consider the operation AddParameter . The speci- temporal obligation (e.g., data controller notifications, data fication is the addition of schema Parameter to the system. subject requests or data retention) has to be handled pri- To limit the number of distinct combinations of parameters marily by the system. In our model, we assume that a data in the implementation, we constrained the operation further. controller’s obligation is satisfied instantaneously following (Again, for the sake of brevity, we provide the constraints in a triggering system event. SystemI , but not in SystemS ): AddParameter =b 9. CONCLUSIONS [ ((p?.extparam 6= ∅ ∧ p?.disparam 6= ∅) ∧ We have shown, in the spirit of Tschantz and Wing’s con- ((p?.renparam = ∅ ∧ p?.tesparam = ∅ ∧ tribution [29], how the judicious use of a formal model can p?.disparam ⊆ p?.extparam) ∨ help in thinking about privacy. In doing so, we have paid (p?.renparam = ∅ ∧ p?.tesparam 6= ∅ ∧ attention to a particular problem: that of data-sharing. p?.extparam ⊆ p?.tesparam ∧ There are three areas of further research that we intend to p?.disparam ⊆ p?.extparam) ∨ pursue in the near future. First, we will use Z schema calcu- (p?.renparam 6= ∅ ∧ p?.tesparam = ∅ ∧ lus to model privacy in the context of systems of increasing p?.renparam ⊆ p?.extparam ∧ complexity. Second, we intend constructing formal models p?.disparam ⊆ p?.renparam) ∨ of threats. Finally, we intend modelling the aforementioned (p?.renparam 6= ∅ ∧ p?.tesparam 6= ∅ ∧ Privacy Rule of HIPAA, the privacy requirements of which p?.tesparam ⊆ p?.extparam ∧ have been used by a number of authors to validate proposed p?.renparam ⊆ p?.tesparam ∧ approaches in the context of data-sharing. p?.disparam ⊆ p?.renparam))) ] ProZ was applied to our model to analyse privacy (in 10. REFERENCES terms of satisfying requirements). Specifically, the ProZ an- [1] J.-R. Abrial. The B-Book: Assigning Meanings to imator was used to verify (on a step-by-step basis) that each Programs. Cambridge University Press, 1996. state satisfied its privacy requirements. To analyse all reach- [2] I. Agrafiotis, S. Creese, M. Goldsmith, and able states, ProZ’s model and refinement checkers were used. N. Papanikolaou. Applying formal methods to detect However, a lower cardinality for variable values (than that and resolve ambiguities in privacy requirements. In used with the animator) and a higher level of abstraction S. Fischer-Hübner, P. Duquenoy, M. Hansen, for access control had to be applied to support a tractable R. Leenes, and G. Zhang, editors, Privacy and Identity Management for Life, volume 352 of IFIP framework for the analysis of architectural choices. In Advances in Information and Communication Proceedings of the 3rd ACM conference on Data and Technology, pages 271–282, 2011. Application Security and Privacy (CODASPY 2013), [3] A. Barth, A. Datta, J. Mitchell, and H. Nissenbaum. pages 95–104. ACM, 2013. Privacy and contextual integrity: Framework and [17] J. McLean. A comment on the “basic security applications. In Proceedings of the 2006 IEEE theorem” of Bell and LaPadula. Information Symposium on Security and Privacy (SP 2006), pages Processing Letters, 20(2):67–70, 1985. 184–198. IEEE, 2006. [18] Q. Ni, E. Bertino, and J. Lobo. An obligation model [4] D. E. Bell and L. J. La Padula. Secure computer bridging access control policies and privacy policies. In systems: Mathematical foundations, MTR-2547, Vol. Proceedings of the 13th ACM Symposium on Access I. Technical Report ESD-TR-73-278-I, The MITRE Control Models and Technologies (SACMAT 2008), Corporation, Bedford, March 1973. pages 133–142. ACM, 2008. [5] I. Brown. Britain’s smart meter programme: A case [19] Q. Ni, D. Lin, E. Bertino, and J. Lobo. Conditional study in privacy by design. International Review of privacy-aware role based access control. In J. Biskup Law, Computers & Technology, 28(2):172–184, 2014. and J. López, editors, Proceedings of the 12th [6] A. Cavoukian. Privacy by design: The 7 foundational European Symposium On Research In Computer principles. Information and Privacy Commissioner of Security (ESORICS 2007), volume 4734 of Lecture Ontario, Canada, 2009. Notes in Computer Science, pages 72–89. Springer, [7] O. Chowdhury, A. Gampe, J. Niu, J. von Ronne, 2007. J. Bennatt, A. Datta, L. Jia, and W. H. Winsborough. [20] P. A. Norberg, D. R. Horne, and D. A. Horne. The Privacy promises that can be kept: A policy analysis privacy paradox: Personal information disclosure method with application to the HIPAA Privacy Rule. intentions versus behaviors. Journal of Consumer In Proceedings of the 18th ACM Symposium on Access Affairs, 41(1):100–126, 2007. Control Models and Technologies (SACMAT 2013), [21] D. Plagge and M. Leuschel. Validating Z specifications pages 3–14. ACM, 2013. using the ProB animator and model checker. In [8] C. Davies and R. Collins. Balancing potential risks J. W. M. Davies and J. Gibbons, editors, Integrated and benefits of using confidential data. The British Formal Methods, volume 4591 of Lecture Notes in Medical Journal, 333(7563):349–351, 2006. Computer Science, pages 480–500. Springer, 2007. [9] H. DeYoung, D. Garg, L. Jia, D. Kaynar, and [22] D. J. Power, M. A. Slaymaker, and A. C. Simpson. On A. Datta. Experiences in the logical specification of formalizing and normalizing role-based access control the HIPAA and GLBA privacy laws. In Proceedings of systems. The Computer Journal, 52(3):305–325, 2009. the 9th Annual ACM Workshop on Privacy in the [23] L. Sankar, S. Rajagopalan, S. Mohajer, and H. Poor. Electronic Society (WPES 2010), pages 73–82. ACM, Smart meter privacy: A theoretical framework. IEEE 2010. Transactions on Smart Grid, 4(2):837–846, 2013. [10] C. Dwork. Differential Privacy. In M. Bugliesi, [24] C. Skinner. Statistical disclosure risk: Separating B. Preneel, V. Sassone, and I. Wegener, editors, potential and harm. International Statistical Review, Proceedings of the 33rd International Colloquium on 80(3):349–368, 2012. Automata, Languages and Programming (ICALP [25] D. J. Solove. Conceptualizing privacy. California Law 2006), part II, volume 4052 of Lecture Notes in Review, 90(4):1087–1155, 2002. Computer Science, pages 1–12. Springer, 2006. [26] J. M. Spivey. The Z Notation: a Reference Manual. [11] D. Ferraiolo, D. R. Kuhn, and R. Chandramouli. Prentice Hall, 2nd edition, 1992. Role-Based Access Control. Artech House, 2003. [27] S. Suriadi, C. Ouyang, J. Smith, and E. Foo. Modeling [12] X. Fu. Conformance verification of privacy policies. In and verification of privacy enhancing protocols. In M. Bravetti and T. Bultan, editors, Proceedings of the Proceedings of the 11th International Conference on 7th International Workshop on Web Services and Formal Engineering Methods (ICFEM 2009), volume Formal Methods (WS-FM 2010), volume 6551 of 5885 of Lecture Notes in Computer Science, pages Lecture Notes in Computer Science, pages 86–100. 127–146. Springer, 2009. Springer, 2011. [28] L. Sweeney. k-anonymity: A model for protecting [13] B. C. M. Fung, K. Wang, R. Chen, and P. S. Yu. privacy. International Journal on Uncertainty, Privacy-preserving data publishing: A survey of recent Fuzziness and Knowledge-based Systems, developments. ACM Computing Surveys, 10(5):557–570, 2002. 42(4):14:1–14:53, 2010. [29] M. C. Tschantz and J. M. Wing. Formal methods for [14] ISO/IEC. ISO/IEC 13658: Information Technology — privacy. In A. Cavalcanti and D. Dams, editors, Z Formal Specification Notation — Syntax, Type Proceedings of the 2nd World Congress on Formal System and Semantics. ISO/IEC, 2002. Methods (FM 2009), volume 5850 of Lecture Notes in [15] D. Kouzapas and A. Philippou. A typing system for Computer Science, pages 1–15. Springer, 2009. privacy. In S. Counsell and M. Núñez, editors, [30] J. C. P. Woodcock, P. G. Larsen, J. Bicarregui, and Proceedings of Software Engineering and Formal J. Fitzgerald. Formal methods: Practice and Methods (SEFM) 2013 Collocated Workshops, Lecture experience. ACM Computing Surveys, pages Notes in Computer Science, pages 56–68. Springer, 19:1–19:36, 2009. 2014. [16] D. Le Métayer. Privacy by design: A formal