Privacy Requirement Modeling and Verification in Cloud Computing Jin, Wang College of Computer Science and Technology Nanjing University of Aeronautics and Astronautics Nanjing, China woodenwang55@hotmail.com Abstract—Cloud computing, the architecture which shares regulations such as OECD [6] and ISO29100 [7]. The issues dynamic heterogeneous characteristics in the cross-layer service appearing in this principle are manifold. composition, has affected traditional security, trust and privacy Firstly, we need an approach to precisely describe the mechanisms which are mainly based on data encryption and privacy requirement. On the one hand, due to the ambiguous access control. Approaches that can support accurate privacy requirement description and verifiable compliance between the and inconsistent essence, natural language is essentially privacy requirement and system practice need to be developed to unusable to depict a user concerned privacy requirement. On fit this new paradigm. To tackle the issues of privacy the other hand, directly using a specification language, such as requirement modeling and verification in cloud computing, a Linear Temporal Logic (LTL) and Description Logics (DL), is framework that supports model checking consistency, entailment too complicated to be implemented by the requirements and compliance with the formal definition of privacy analysts and system designers. Therefore, we must establish requirements and privacy model of cloud application is proposed. one privacy requirement description method that can make a This paper provides an overview of the scientific research balance between the expressiveness and the applicability. problem, approaches to solve the problem and ways to evaluate the solution found by the research related PhD thesis. Secondly, considering the multi-participant and outsource Keywords—Cloud computing, privacy requirement, model nature of cloud computing, we must guarantee there is no checking, formal model conflict among different participants which means the privacy requirement among different participants should be I. PROBLEM consistent with each other. Thirdly, some laws and regulations in different Scalability, on-demand access and network-based message application contexts, such as the Children Online Privacy delivery are three core characteristics of cloud computing [1]. Protection Act (COPPA), the Health Information Portability As a scalable and hierarchical distributed collaboration and Accountability Act (HIPAA) and the Gramm–Leach– paradigm, cloud computing is envisioned as an XaaS (X As a Bliley Act (GLBA), are proposed to enforce the privacy Service) architecture, combined with the advantage of reducing requirements of each participant and we must make sure the cost by sharing computing and storage resources [2]. Although privacy requirements entail certain privacy regulations and there is a large push towards cloud computing, security and laws. privacy are the major challenges which inhibit the cloud Finally, we need to verify the compliance between the computing’s wide acceptance in practice. A survey conducted cloud computing system practice and requirements. To by the US Government Accountability Office (GAO) states support the verification, a formal model of privacy that “22 of 24 major federal agencies reported that they were requirements and the specified cloud computing privacy model either concerned or very concerned about the potential are needed. Moreover, to overcome the space explosion information security and privacy risks associated with cloud dilemma in traditional model checking, there should be some computing” [3]. Different from the traditional architecture in reduction method to make our verification represent real-life which users have full control of their privacy data, the internal systems and not just toy examples. operations of cloud computing software systems are usually transparent to the users and once the privacy data of the users II. RELATED WORK are collected, they will lose control over it. The essence of privacy protection in cloud computing is the A. Privacy Requirement Description rights and obligations of individuals and service providers with Current privacy policy definition methods can be classified respect to the collection, use, disclosure, and retention of PII into four categories, access control model based on (Personally Identifiable Information) [4,5]. Providing verifiable RBAC(Role-Based Access Control), access control models mechanism in design phase to ensure that the practice of using Markup Language, Semantic-web policy frameworks software is compliant with the privacy requirement is one of using description logic and declarative language with formal the most important principles in privacy related standards and semantics. For the first category, RBAC is an access control model in a very limited scale. Barth et al. [17] propose a Linear which access rights are specified in terms of roles. As an access Temporal Logic (LTL) based framework CI (Contextual control model, RBAC lacks the notion of privacy data and Integrity) for expressing and reasoning about norms of purpose and is insufficient for directly modeling privacy policy. transmission of personal information. This method describes Privacy-Aware Role-Based Access Control (PARBAC) model requirements with the logic formula directly which are proven [8] notices the partial relations in roles, purpose and data too awkward to implement by the requirement analyzer and do subject and combined RBAC with Chandramouli’s DAFMAT not have the feasibility of reality. S4P, a declarative language framework. There is no formal semantics about the relations to express privacy preferences and policies are proposed in [18]. and interactions in the model, therefore consistency and The method provides a formal semantics and proof rules for compliance checking is not feasible with it. P-RBAC [9], is the their language. S4P is at a higher abstraction level than our first privacy data centric RBAC expansion. However, the work. The purpose of S4P is to check the consistency between authors did not present any formal model for the condition and user preference and privacy policy. obligation except the policy in natural language which makes B. Privacy Requirement Verification that policy unusable for verifying compliance and can only check the consistency in a limited context. Y. Li et al. [19] establish the mapping between business For access control models using Markup Language, process execution language (BPEL) and P3P policy and verify XACML defines a general-purpose access control system and the privacy requirements in P3P. A state machine based provides a privacy profile in [10]. XACML leaves obligation model is proposed and extended to monitor the compliance of interpretation and rule conflict detection to the application. privacy agreements and verify the time properties in privacy Therefore, the semantics of an XACML policy cannot be fully requirements at the runtime [20,21]. Barth et. al [22] use LTL specified by the policy itself which makes the formal semantics formula to depict and verify privacy properties in service rely on specified application and cannot do the consistency and composition. Our team also does research on privacy modeling compliance checking with the model itself. IBM proposed a and verification with interface automaton and hyper-graph. language called EPAL [11] to encode privacy policies. Similar These works, however, mostly focus on the SOAP web service to XACML, the language is based on XML and uses a set of and do not consider the hierarchical and the heterogeneous attributes called vocabularies. The main issue about EPAL is feature of cloud computing. that the obligation definition is natural language based and hard C. Cloud Computing Modeling and Verification to expand to support formal semantics. The Platform for Privacy Preferences (P3P) is a privacy language intended for In Cloud computing environments, different types of use by web site operators in informing their visitors of their services co-exist and collaborate with each other to offer a final data practices [12]. P3P policies were not originally intended to system. Numerous studies have already been carried out on describe the exchange of privacy data among multi participants modeling the most common cloud computing service, Restful context. In describing data collection requirements, all and SOAP services, and their combination. In Restful Service requirements in P3P are categorical with fixed options which modeling, some semi-formal and formal models, such as UML makes it impossible to expand to adapt different domains and state diagram, Petri Net and finite state machine, are proposed application contexts. P3P also lacks a formal semantics, which to depict the Restful service from different aspects. In SOAP may have led to language misuse. service modeling, some standards, such as BPEL and WS- CDL, have been proposed to reduce the complexity required to For Semantic-web policy frameworks, naturally, the Web Ontology Language (OWL) and its predecessors can depict compose web services, hence reducing time and costs, and classification hierarchies with data type constraints. The increase overall efficiency in businesses. To verify functional and non-functional properties, these standards are further Description logic behind those languages is a subset of first- extended and formalized by transition system [29], process order logic for expressing knowledge. Some frameworks such algebra [30] and Petri Net [31]. Some research realize the as KAoS [13] and Rei [14] that contains an OWL policy ontology are proposed for expressing rights, prohibitions, and cross-layer service composition nature of cloud computing and obligations. Eddy [15] further extends previous work with the propose some approaches to analyze different aspects ranging from resource management model [32] to information store formal model and consistency checking method. Bearing on the model [33]. None of the aforementioned approaches focus on limitation of Description logic, these methods cannot describe privacy analysis which makes modeling privacy properties in temporal constraints and can hardly work with the data cloud computing still an open challenge. implementation model of a specified system to check the compliance. III. PROPOSED SOLUTION For declarative language with formal semantics, May et al. [16] introduce privacy APIs, which is a logical framework that To mitigate the aforementioned challenges in Section I, an includes a language to express permissions using commands. approach considering four aspects, i.e., definition, The policies are further formalized in the Promela model- formalization, reduction and verification, respectively, is checking language and can be checked using the model checker proposed. SPIN. However, the only temporal constraints in this method On the privacy requirement side, the start point of our are “Opt-In” and “Opt-Out” that restricts the expressiveness to solution is to precisely describe privacy actions and constraints. Based on refinement of current privacy protection standards and regulations, we systematically analyze the partial order TABLE1 EVENT TEMPLATE EXAMPLES relation among privacy datum, roles and purposes and define Template Description LTL Mapping Least(n, a) a will occur at least n ◇(a∧○( Least(n-1,a) ) the meta-model of privacy action(Fig 1) as a basis for the times. privacy model checking. Response(a, b) Every a is eventually (a  ◇(b)) followed by at least one b. MultiOptionResp If a occurs, there must (a  ◇(b1  b2  ... bn)) (a, B) be at least one privacy action in B occurs after a. formal model that can map privacy action and atomic service request/response are stated. For SOAP service, this mapping is apparent and directly based on WSDL description. For Restful service, on the other hand, the HATEOAS (Hypermedia As The Engine Of Application State) constraint makes the internal transition more complex. To correctly represent those transitions caused by the iteration relation between resources and links, our approach defines a resource link mapping tree and then transforms that tree to the automaton. Furthermore, a cross-layer interaction model will be created based on the Fig 1 Privacy Action meta model. atomic service model and control flow. To express the occurrence condition and obligations of One of the most severe challenges for verifying the cross- privacy action, we further design three categories of event layer model is the state space explosion. The state space for 30 templates, EXISTENCE, BINARY RELATION and PLURAL services composition can reach 106 or more which makes RELATION, respectively. The declarative nature of our event verifying real-life cases impossible before reduction. In our templates makes them easy to use by privacy analyzers. To approach, both the privacy requirement and the privacy support the formal verification, we map each event to the operations of the cloud computing system can be formalized as corresponding LTL formula. Table 1 are some typical events a transition system where the privacy actions are used as the and their corresponding LTL mapping. Considering the transition guards. A lot of elements in privacy actions are influence of partial order relations, we also give the formal disjoint with each other, privacy datum and participant for definition and detection algorithm of privacy action inclusion example. Taking advantage of these disjoint elements, we can relationship. Different from traditional LTL formulas which achieve the partial model from the original formal model. For can monitor two or more properties simultaneously, our model example, the privacy datum set we want to verify is { name, is safely assumed to be a single event system. Using generated address, email }, we can obtain the sub-models for name , LTL formula and inclusion relationship, we can finally achieve address and email respectively and just check each sub-model the single event finite automaton to do the consistency and instead of checking the whole formal model. To generate the entailment checking. sub-model of one specified privacy datum, we firstly get all the Let us take one key constraint , “Online Service does not privacy actions containing that privacy datum. Then we collect personal information from any visitor prior to collecting analyze the transitions that will affect or be affected by these age information”, in §312. 2 of COPPA as an example. With privacy actions and finally remove those transitions which have our language, this term can be represented by two privacy no relation to the specified privacy datum from the original actions and one event constraint as follow ( we use T to formal model. represent universal set and \ for relative complement , OS and IV. PRELIMINARY WORK SP are the abbreviation of Online Service and Service Provider): We started our work by examining the minimum privacy p1 = <{(User, age) },(Collect, T), (Operator, User), disclosure in SOA architecture [23]. When analyzing the state ( OS,SP),Φ> of art privacy regulations and standards, we realized most current privacy requirement definition methods lack a p2 = <{(User, PII \ age)},(Collect, T), (Operator, User), theoretical foundation and therefore is not amenable to (OS, SP),Φ> verification or reasoning. Futhermore, developed from SOA, e1= Prior( p1, p2) cloud computing introduces multi-layer and heterogeneous On the cloud computing system side, when services in SaaS service collaboration which makes privacy protection a more usually use SOAP/WS-*, most products in IaaS and PaaS, such complex challenge. We list most relevant up-to-date work and as OCCI, OpenStack, mOSAIC, Google Map, Yahoo!Local, achieved results as below: are Restful service based. To verify the privacy requirement in - We have introduced an XML based privacy requirement cloud computing system, we need to formally define these two language which preliminarily approaches the problem of kinds of service and present a model can depict the multi-layer expressing temporal constraints [24]. To give the privacy collaboration between these heterogeneous services. Firstly, a requirement more precise and formal semantics, we further defined a declarative privacy policy language with correctness, we plan to conduct an experiment using the same its formal model [25]; data in [15] and compare our conflict detection result with - We have focused on the privacy data in SOAP and theirs. Restful service and established the formal privacy model At the feasibility side, we intend to conduct a set of real for these two types of services [26,27] which are the basis case study to model COPPA, HIPAA and the policies from the of cloud computing; related service providers to check the limitation of our privacy - We have analyzed the privacy data with the predicate requirements modeling method and determine which kind of constraints in SOAP service composition and get a policies can be or cannot be modeled with our approach. feasible path generation method that can support our Furthermore, we aim to model some open-source cloud future verification [28]. computing applications which include both Restful and SOAP - We have conducted a study aimed at reducing the state service interactions to analyze the compliance of the privacy space of privacy requirement and verifying the requirement. consistency among different privacy requirements [25]. At the performance side, we intend to conduct several experiments based on the benchmark from [34] and to V. EXPECTED CONTRIBUTIONS investigate the following indicators: We are currently working on several directions, that will − Number of specifications address the following contributions. − Number of privacy datum items; From the privacy requirement modeling aspect, a privacy − Number of constraints action meta-model and declarative requirements definition − Number of privacy action inclusion constraints; language are proposed. Constraints templates and their LTL − Number of application states in cloud system mapping are stated to support formal specification generation. A Formal semantics of that language based on an automaton VII. CURRENT STATUS are established to support further model checking. The privacy Currently, we are proposing a declarative privacy policy action inclusion relationships are introduced to reduce the state language with the formal semantics that is expected to be the space in model checking. input of our further consistency and entailment verification. From the cloud computing privacy modeling aspect, the The case study and performance evaluation will be performed privacy model for most common service –Restful and SOAP - in parallel with our theoretical research. To finalize our work, in cloud computing are proposed based on mapping between we identify the following tasks that will lead to a Ph.D. system behavior and privacy action. A formal model dissertation. The planned timeline in terms of the expected supporting multi-layer structure and heterogeneous service contributions and current status (dash line) is shown in Figure 2. collaboration in cloud computing will be represented based on Restful and SOAP privacy model. From the privacy requirement verification aspect, an automaton partition method will be presented to mediate the state space explosion issue in model checking. A series of prototype toolkits will be implemented to provide semi- automated privacy data extraction, privacy requirement definition and verification. The main contribution can be summarized as follows. - A declarative privacy requirement language with formal semantics - A formal privacy model for cloud computing Fig. 2: Planned timeline and current status - A model checking Reduction method based on transition relationship and privacy datum feature. • To model the privacy requirement with the formal semantics (T1); VI. PLAN FOR EVALUATION AND VALIDATION • To verify the consistency of privacy requirements We plan to validate and evaluate our work from three based on reduction formal specification (T2); aspects: correctness, feasibility and performance. • To provide a verification approach for the entailment At the correctness side, to measure the expressive power of of privacy regulation and laws (T3); our privacy requirements modeling approach, we will compare • To define mappings between cloud computing our method with other related works using the following behavior and privacy action (T4); criterias: Hierarchical data structure support, Role support, • To study the privacy model of cloud computing that Purpose Specification support, Temporal constraints support, can reflect the hierarchical and heterogeneous Privacy operation definition support, Compliance checking characteristics (T5); support, Consistency checking support and User readable. • To implement a semi-automated privacy framework Those criterias are retrived from the OECD, ISO 29100 and for supporting to extract privacy, data definition from other related standards. To evaluate the verification the system (T6); • To evaluate our work by case study and performance [17] A. Barth, A. Datta, J. Mitchell, and H. Nissenbaum. Privacy and experiment (T7); contextual integrity: Framework and applications, In IEEE Symposium on Security and Privacy, pages 184-198, 2006. • To finish the Ph.D. thesis and dissertation (T8). [18] M. Y. Becker, A. Malkis, L. Bussard, S4P: A generic language for specifying privacy preferences and policies, Technical Report MSR-TR-2010-32, Microsoft Research, 2010. ACKNOWLEDGMENT [19] Y. Li, S. Benbernou, H. Paik, and B. Benatallah. Formal This research has been supported by This work is consistency verification between bpel process and privacy policy. Proc of Privacy Security Trust PST’2006. New York, NY, USA : supported in part by the National Science Foundation of China ACM 2006 : 212–224. under grants (No.61272083, No.61373137, No.61373017) [20] Salima Benbernou, Hassina Meziane, Mohand-Said Hacid. Run- Time Monitoring for Privacy-Agreement Compliance. LNCS REFERENCES 4749 : Proc of ICSOC 2007, Berlin: Springer,2007: 353-364. [1] Foster, Ian, et al. "Cloud computing and grid computing 360- [21] Karima Mokhtari, Salima Benbernou, Mohand-Said Hacid, degree compared." Grid Computing Environments Workshop, Emmanuel Coquery, Frank Leymann. Verification of Privacy Timed Properties in Web Service Protocols. Proc of IEEE SCC 2008. GCE'08. Ieee, 2008. 2008, Washington: IEEE 2008: 593-594. [2] Fox, Armando, et al. Above the clouds: A Berkeley view of [22] Adam Barth, John Mitchell, Anupam Datta, and Sharada cloud computing. Dept. Electrical Eng. and Comput. Sciences, Sundaram. Privacy and utility in business processes. Proc of University of California, Berkeley, Rep. UCB/EECS 28 (2009). Computer Security Foundations Symposium, Washington: IEEE, [3] GAO, “Information Security: Additional Guidance Needed to 2007:279–294. AddressCloud Computing Concerns,” United States [23] Liu, Linyuan, Haibin Zhu, and Zhiqiu Huang. "Analysis of the Government Accountability Office (GAO), October 6, 2011. minimal privacy disclosure for web services collaborations with [4] Pearson, Siani. "Taking account of privacy when designing role mechanisms." Expert Systems with Applications 38.4 cloud computing services." Proceedings of the 2009 ICSE (2011): 4540-4549. Workshop on Software Engineering Challenges of Cloud [24] Lu Jiajun, Huang Zhiqiu, Wang Jin, et al. Behavior-oriented Computing. IEEE Computer Society, 2009. privacy policy description for Web services composition. [5] Pearson, Siani, and Andrew Charlesworth. "Accountability as a Journal of Frontiers of Computer Science and way forward for privacy protection in the cloud." Cloud Technology,2013,7(7):592-601. computing. Springer Berlin Heidelberg, 2009. 131-144. [25] Wang Jin, Huang Zhiqiu. Privacy Requirement Modeling and [6] Organisation for Economic Co-operation and Consistency Checking in Cloud Computing. Journal of Development. OECD Guidelines on the Protection of Privacy Computer Research and Development. 2015, 52(10). and Transborder Flows of Personal Data. OECD Publishing, [26] Cai Zheng-ping, Huang Zhi-Qiu, Wang Jin, et. al. Research of 2002. Web Services Composition Transaction Coordination [7] ISO/IEC 29100. Information technology – Security techniques– Framework based on BPEL and WS-TX. Computer Science. Privacy framework. ISO/IEC 29100 (1st edition), 2011. 39.6 (2012): 120-124. [8] Q. He. Privacy enforcement with an extended role-based access [27] Tie Wei, Huang Zhi-Qiu, Wang Jin. BPEL based asynchronous control model, NCSU Computer Science Technical Report TR- interaction and composition of RESTful Web Service. Computer 2003-09, 2003. Engineering & Science. 35.4 (2013): 29-36. [9] Q. Ni, E. Bertino & J. Lobo,“An Obligation Model Bridging Access Control Policies and Privacy Policies,” in Proc. of ACM [28] Wang Jin, Huang Zhi-qiu ,Tang Jiajun, et. al. Predicate SACMAT, pp.133-142, 2008. Constraint Oriented BPEL Modeling and Feasible Path Analysis. Journal of Computer Research and Development. 2015, 51(4): [10] OASIS. XACML’s Privacy profile. Available: 838-847. http://www.oasisopen.org/committees/document.php?document _id=37643&wg_abbrev=xacml [29] Foster H, Uchitel S, Magee J, et al . Model-based verification of [11] P. Ashley et al.,“Enterprise Privacy Authorization Language web service compositions. Proc of 18th IEEE Int Conf on (EPAL),” Research Report RZ 3485, IBM Research, 2003. Automated Software Engineering. Piscataway, NJ: IEEE, 2003: [12] Cranor L et al, Platform for privacy preferences (P3P) 152-163 specification. W3C working group note,2006 [30] Ferrara A. Web services: A process algebra approach. Proc of [13] Uszok A, Bradshaw JM, Lott J, Breedy M, Bunch L. New 2nd ACM Int Conf on Service Oriented Computing. Yew York : developments in ontology-based policy management: increasing ACM,2003: 242-251 the practicality and comprehensiveness of KAoS, In: IEEE [31] Ouyang Chun, Verbeek Eric, Wil M.P, et al. Formal semantics workshop on policies for distributed systems and networks,pp 145–152, 2008. and analysis of control flow in WS-BPEL . Science of Computer Programming,2007,67(2/3), 162-198. [14] Tonti G, Bradshaw JM, Jeffers R, Montanari R, Suri N, Uszok A. Semantic web languages for policy representation and easoning: [32] De Boer, Frank S., et al. "Formal modeling of resource a comparison of KAoS, Rei, and Ponder, LNCS 2870:419–437, management for cloud architectures: An industrial case 2003. study." Service-Oriented and Cloud Computing. Springer Berlin [15] T. D. Breaux, H. Hibshi, and A. Rao. Eddy, a formal language Heidelberg, 2012. 91-106. for specifying and analyzing data flow specifications for [33] Fitch, Daniel F., et. al. "A Petri Net Model for Secure and Fault- conflicting privacy requirements, Requirements Engineering, pages 1–27, 2013. Tolerant Cloud-Based Information Storage." SEKE. 2012. [16] May MJ. Privacy APIs: formal models for analyzing legal and Westergaard, M.: Better Algorithms for Analyzing and Enacting privacy requirements, Ph.D. Thesis, University of Declarative Workflow Languages Using LTL, In: Rinderle, S., Pennsylvania,2008. Toumani, F., Wolf, K. (eds.) BPM 2011. LNCS, vol. 6896, 2011.